Skip to main content
Background Image

SLOGuard

Manoj
Author
Manoj
ML Engineer @ 7-Eleven
Table of Contents

Pitch
#

SLOGuard is a scheduler replacement for multi-tenant LLM serving.

It prevents a low-priority long generation from blocking a high-priority enterprise request.

Features
#

  • Priority preemption.
  • KV migration to DRAM for paused requests.
  • HBM reservation for premium tiers.
  • SLA-aware admission control.
  • Per-customer latency accounting.

Diagram
#

flowchart TD
  Requests[Incoming requests] --> Classifier[Tier classifier]
  Classifier --> Premium[Premium queue]
  Classifier --> Standard[Standard queue]
  Classifier --> Batch[Batch queue]
  Premium --> Scheduler[SLO scheduler]
  Standard --> Scheduler
  Batch --> Scheduler
  Scheduler --> HBM[HBM slots]
  Scheduler --> Spill[Spill / pause low priority]

Customer
#

LLM API companies with enterprise SLAs, internal platforms serving multiple teams, and SaaS products with paid and free tiers.

Risks
#

  • Needs deep runtime integration.
  • Customers must trust it not to starve lower-tier users.
  • SLA contracts vary widely.