Notes

Technical notes, experiment plans, and reading records for inference systems.

Search notes and projects

Jun 21, 2026 1 min read

Prefill vs Decode: LLM Inference 的两个阶段

理解 LLM 推理中 prefill 和 decode 的区别，以及为什么 prefill 更适合 batching。

Jun 20, 2026 1 min read

梳理 KV Cache 的数据结构、显存估算方式，以及长上下文为什么会放大问题。

Jun 19, 2026 1 min read

分析 Prefix Cache 命中与未命中对首 token 延迟的影响，并记录后续 benchmark 计划。