Pitch#
SpecDraft Cloud wraps an LLM deployment with managed speculative decoding. Its draft heads improve over time using the customer’s own accept/reject stream.
Flywheel#
More traffic produces more accept/reject labels. More labels improve the draft head. A better draft head increases acceptance rate. Higher acceptance rate reduces decode cost.
flowchart LR Traffic --> Labels[Accept/reject labels] Labels --> Train[Draft-head tuning] Train --> Acceptance[Higher acceptance rate] Acceptance --> Savings[Lower latency and cost] Savings --> Traffic
Customer#
API companies and LLM SaaS products with domain-specific traffic patterns.
Risks#
- Needs enough traffic to personalize.
- Customers may not want a managed service in their inference path.
- Maintaining output equivalence is non-negotiable.

