RXT: RefleXive address Translation for Pointer-Chasing Workloads
Published in 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2025
With increasingly irregular memory access patterns in Indirect Memory Access (IMA) and Graph Processing (GP) applications, Virtual Address Translation (VAT) not only has become a major performance bottleneck, but also a key contributor to high power consumption in out-of-order (OoO) multi-core processor pipeline.
We find that this problem can be attributed to a large extent to an inefficient utilization of TLBs in pointer-chasing instructions (i.e. load-to-load), which constitute a significant portion of workloads in IMA/GP. Conventionally, VATs for pointer-chasing are performed via a series of TLB look-ups: one for each load in the pointer chain. However, this approach is sub-optimal as it overlooks the correlations between pointers, missing opportunities to perform VATs through more cost-effective calculations instead of relying on the more expensive TLB look-ups.
This work draws on the insight that in pointer-chasing workloads, the physical page holding an upstream pointer is often at a fixed distance to the physical page where a downstream pointer resides. Building on this insight, we introduce RefleXive address Translation (RXT), which encodes the physical page distance (termed PageDist) between an upstream and downstream pointer into the unused upper 16 bits of the upstream pointer’s virtual address. Consequently, RXT can compute the translation of a pointer by directly adding the PageDist to its residing address (the physical address where the pointer is stored). This process can be recursively applied throughout the pointer-chasing sequence, transforming address translation from a sequence of TLB accesses to computes.
Through gem5 full-system simulation, RXT reduces TLB energy consumption for pointer-chasing applications by an average of 41.48% (up to 62.50%) and decreases overall core power density by 4.65% (up to 13.09%), without compromising application execution time. In fact, RXT even improves program runtime by an average of 0.71% (up to 2.09%).