Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling

In arXiv preprint, 2025

This work studies SRAM size and operating-frequency tradeoffs for LLM inference, distinguishing the compute-bound prefill phase from the memory-bandwidth-bound decode phase.

Recommended citation: H. Atmer, Yuan Yao, T. Voigt, and S. Kaxiras, "Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling," arXiv:2512.22066, 2025.
Download Paper