SpecDraft Cloud

Table of Contents

Pitch
#

SpecDraft Cloud wraps an LLM deployment with managed speculative decoding. Its draft heads improve over time using the customer’s own accept/reject stream.

Flywheel
#

More traffic produces more accept/reject labels. More labels improve the draft head. A better draft head increases acceptance rate. Higher acceptance rate reduces decode cost.

flowchart LR
  Traffic --> Labels[Accept/reject labels]
  Labels --> Train[Draft-head tuning]
  Train --> Acceptance[Higher acceptance rate]
  Acceptance --> Savings[Lower latency and cost]
  Savings --> Traffic

Customer
#

API companies and LLM SaaS products with domain-specific traffic patterns.

Risks
#

Needs enough traffic to personalize.
Customers may not want a managed service in their inference path.
Maintaining output equivalence is non-negotiable.

Pitch#

Flywheel#

Customer#

Risks#

Pitch
#

Flywheel
#

Customer
#

Risks
#