Pitch#
InferGrid is Datadog plus autopilot for LLM inference.
It tells teams whether they are memory-bound, compute-bound, cache-missing, over-prefilling, under-batching, or violating SLOs. Then it suggests or applies serving changes.
Product Surface#
- GPU roofline dashboard.
- Dollars per million tokens.
- TTFT and inter-token latency attribution.
- KV cache hit/miss analysis.
- Batch-size and speculation recommendations.
- Quantization and offload what-if planner.
Control Loop#
flowchart LR Traces[Serving traces] --> Diagnose[Cost diagnosis] Counters[GPU counters] --> Diagnose Diagnose --> Plan[Optimization plan] Plan --> Apply[Apply knobs] Apply --> vLLM[vLLM / SGLang] vLLM --> Traces
Customer#
Teams self-hosting vLLM, SGLang, TensorRT-LLM, or custom inference stacks and spending enough on GPUs that a 10-20 percent savings is meaningful.
Moat#
The early product is observability. The long-term moat is the policy engine: a growing dataset of workload shapes and which interventions actually saved money.
Risks#
- Deep integration with inference engines is painful.
- Some companies already have internal dashboards.
- GPU counter collection must be low overhead.

