Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling
In arXiv preprint, 2025
SRAM and frequency tradeoffs for the compute-bound prefill and memory-bound decode phases of LLM inference.
In arXiv preprint, 2025
SRAM and frequency tradeoffs for the compute-bound prefill and memory-bound decode phases of LLM inference.
In arXiv preprint, 2025
An architectural guide to gem5 CPU models and the Ruby memory system.