Hu et al. (2025)
EPIC reuses KV caches across any prefix, by recomputing only a handful of tokens per chunk
Position-Independent Caching lets language models reuse document KV vectors regardless of what comes before them. EPIC's LegoLink algorithm fixes the resulting attention sink with O(kN) work instead of O(N²).
- 8×
- lower Time-To-First-Token vs CacheBlend-15 under multi-request workloads
- 7×
- higher throughput than CacheBlend-15 on Llama 3.1 8B at matched context cache ratios













