<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Speculative-Decoding on jonam'Log</title><link>https://www.jonam.io/tags/speculative-decoding/</link><description>Recent content in Speculative-Decoding on jonam'Log</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>&amp;copy; 2026 Manoj. All Rights Reserved.</copyright><lastBuildDate>Mon, 18 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.jonam.io/tags/speculative-decoding/index.xml" rel="self" type="application/rss+xml"/><item><title>DraftOS</title><link>https://www.jonam.io/journal/inference-engineering/product-ideas/draftos/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/product-ideas/draftos/</guid><description>Use idle CPU cores on GPU instances to draft tokens while the GPU verifies.</description></item><item><title>Speculative Prefill</title><link>https://www.jonam.io/journal/inference-engineering/research-topics/speculative-prefill/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/research-topics/speculative-prefill/</guid><description>Speculative decoding is common; this asks whether speculation can reduce long-prompt prefill latency.</description></item><item><title>Online EAGLE Draft Learning</title><link>https://www.jonam.io/journal/inference-engineering/research-topics/online-eagle-draft-learning/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/research-topics/online-eagle-draft-learning/</guid><description>Speculative decoding throws away a useful supervision signal: which draft tokens were accepted.</description></item><item><title>SpecDraft Cloud</title><link>https://www.jonam.io/journal/inference-engineering/product-ideas/specdraft-cloud/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/product-ideas/specdraft-cloud/</guid><description>A draft model service that learns from accepted and rejected tokens.</description></item></channel></rss>