Gpu on jonam'Log

Gpu on jonam'Loghttps://www.jonam.io/tags/gpu/Recent content in Gpu on jonam'LogHugo -- gohugo.ioen© 2026 Manoj. All Rights Reserved.Mon, 18 May 2026 00:00:00 +0000InferGridhttps://www.jonam.io/journal/inference-engineering/product-ideas/infergrid/Mon, 18 May 2026 00:00:00 +0000https://www.jonam.io/journal/inference-engineering/product-ideas/infergrid/Measure why your GPU bill is high, then tune batching, speculation, and quantization automatically.Roofline-Adaptive Inference Schedulerhttps://www.jonam.io/journal/inference-engineering/research-topics/roofline-adaptive-inference-scheduler/Mon, 18 May 2026 00:00:00 +0000https://www.jonam.io/journal/inference-engineering/research-topics/roofline-adaptive-inference-scheduler/Move from static max_num_seqs to a feedback loop that chases the hardware ridge point.DraftOShttps://www.jonam.io/journal/inference-engineering/product-ideas/draftos/Mon, 18 May 2026 00:00:00 +0000https://www.jonam.io/journal/inference-engineering/product-ideas/draftos/Use idle CPU cores on GPU instances to draft tokens while the GPU verifies.