Online EAGLE Draft Learning

Table of Contents

Core Idea
#

Speculative decoding produces a clean feedback signal:

accepted draft tokens were good,
rejected draft tokens were bad.

Instead of discarding that signal, use it to continuously improve a draft head for the actual production query distribution.

Background
#

EAGLE accelerates autoregressive decoding by predicting future features or tokens and letting the target model verify them. The typical training loop is offline. The production accept/reject stream is a natural online dataset.

Training Loop
#

flowchart TD
  Traffic[Production traffic] --> Draft[Draft head proposes tokens]
  Draft --> Verify[Target model verifies]
  Verify --> Accepted[Accepted tokens]
  Verify --> Rejected[Rejected tokens]
  Accepted --> Buffer[Online training buffer]
  Rejected --> Buffer
  Buffer --> Update[Small draft-head update]
  Update --> Draft

Research Questions
#

How much online updating is safe before distribution drift hurts?
Can updates be batched nightly instead of truly online?
Does domain-specific traffic improve acceptance rate enough to matter?
Can the system prevent catastrophic degradation with rollback tests?

Metrics
#

acceptance rate,
tokens per target forward pass,
latency improvement,
quality equivalence to vanilla decoding,
stability over time.

Novelty Opinion
#

High and very clean. The signal is already present, the model being updated is small, and the outcome is easy to measure.

Tenure And Complexity
#

Prototype: 4-6 weeks.
Paper-grade: 2-4 months.
Complexity: Medium.
Main risk: production traffic may be too sparse or non-stationary for reliable online updates.

Core Idea#

Background#

Training Loop#

Research Questions#

Metrics#

Novelty Opinion#

Tenure And Complexity#