<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Inference on jonam'Log</title><link>https://www.jonam.io/tags/inference/</link><description>Recent content in Inference on jonam'Log</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>&amp;copy; 2026 Manoj. All Rights Reserved.</copyright><lastBuildDate>Mon, 18 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.jonam.io/tags/inference/index.xml" rel="self" type="application/rss+xml"/><item><title>Attention Head Similarity Pruning</title><link>https://www.jonam.io/journal/inference-engineering/research-topics/attention-head-similarity-pruning/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/research-topics/attention-head-similarity-pruning/</guid><description>Measure cross-head similarity on a prompt and skip heads that are redundant for that input.</description></item><item><title>SpecDraft Cloud</title><link>https://www.jonam.io/journal/inference-engineering/product-ideas/specdraft-cloud/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/product-ideas/specdraft-cloud/</guid><description>A draft model service that learns from accepted and rejected tokens.</description></item><item><title>Research Topics</title><link>https://www.jonam.io/journal/inference-engineering/research-topics/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/research-topics/</guid><description>Novel and practical research directions around KV cache compression, scheduling, speculation, quantization, and hardware-aware serving.</description></item><item><title>Inference Engineering</title><link>https://www.jonam.io/journal/inference-engineering/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/</guid><description>A living notebook for inference engineering research topics and product ideas.</description></item><item><title>Product Ideas</title><link>https://www.jonam.io/journal/inference-engineering/product-ideas/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/journal/inference-engineering/product-ideas/</guid><description>Ten product directions built from KV cache reuse, roofline scheduling, speculative decoding, and inference observability.</description></item><item><title>Inference Engineering Lecture 3</title><link>https://www.jonam.io/files/inference-engineering-lecture-3/</link><pubDate>Thu, 30 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/files/inference-engineering-lecture-3/</guid><description>Practical applications and case studies in inference systems</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://www.jonam.io/files/inference-engineering-lecture-3/feature.png"/></item><item><title>Inference Engineering Lecture 2</title><link>https://www.jonam.io/files/inference-engineering-lecture-2/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.jonam.io/files/inference-engineering-lecture-2/</guid><description>Advanced concepts in inference engineering and optimization techniques</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://www.jonam.io/files/inference-engineering-lecture-2/feature.png"/></item></channel></rss>