Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

[gem5 Q&A] Page Walker: Where the PTE hits in the memory hierarchy

1 minute read

Published:

Hi, I am working on the x86 page walker in gem5. I understand that the page walker accesses the page walker cache (PWC) first and, in case of a miss, it accesses the memory hierarchy (L1, then L2, then L3 caches and lastly the memory). This happens through the packetpointer read, which reads the physical address of the entry at each level (PML4, PDP.. etc.).

[gem5 tech mark] Why are stores in the SQ assumed to have valid addresses?

2 minute read

Published:

Hi, I’m doing some gem5 hacking for research and have been confused over the timing of when loads search the store queue (SQ) and when stores have valid addresses that can be compared against. Gem5 includes an assert in the read() method in the LSQ unit that the addresses of all stores before the executing loads are valid, but I don’t understand how this can be guaranteed in OoO execution.

[gem5 Q&A] Fixed I/O Address Range in x86

2 minute read

Published:

Hi all, I’m trying to model the SPEC HPC benchmark suite in gem5 with an x86 ISA using KVM. As a result, I am trying to link the “_addr” version of the m5ops against the binaries in order to model the region of interest. Unfortunately, I get the following error when trying to build the sample hello world example:

[gem5 Q&A] Microcode_ROM Instruction and fetchRomMicroop() Function

1 minute read

Published:

Hello, I am looking at the AtomicSimpleCPU code in src/cpu/simple for x86 ISA. I am trying to understand the following code snippet. Whenever this condition is true for a given PC, it does NOT follow the regular fetch from the instruction cache and then decode. This results in a macroop called Microcode_ROM, which is not an x86 macroop that has a sequence of uops (can be seen in the O3 CPU). Example: Instruction is: Microcode_ROM : ldst t0, HS:[t0 + t6 + 0x20] (This is taken from the O3 logs running the same workload by checking the same PC in the Debug logs).

[gem5 Q&A] Squashing Instructions after Page Table Fault

10 minute read

Published:

Hello, I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU. After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199. Furthermore I inspected the files in the src/cpu/o3 directory, but couldn’t find anything specific to squashing instructions after a fault.

Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.

portfolio

publications

Fuzzy Flow Regulation for Network-on-Chip based Chip Multiprocessors Systems

Published in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014

Delays;Regulators;Fuzzy logic;Network-on-chip;Throughput;Pragmatics;Multiprocessor interconnection;Network-on-Chip;Chip Multiprocessor;Flow Regulation;Fuzzy Logic

Recommended citation: Y. Yao and Z. Lu, "Fuzzy Flow Regulation for Network-on-Chip based Chip Multiprocessors systems," 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), Singapore, 2014, pp. 343-348, doi: 10.1109/ASPDAC.2014.6742913.
Download Paper </article> </div>

Towards Stochastic Delay Bound Analysis for Network-on-Chip

Published in 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2014

Stochastic processes;Delays;Interference;Calculus;Analytical models;Servers;System-on-chip[<35;31;32M

Recommended citation: Z. Lu, Y. Yao and Y. Jiang, "Towards stochastic delay bound analysis for Network-on-Chip," 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), Ferrara, Italy, 2014, pp. 64-71, doi: 10.1109/NOCS.2014.7008763.
Download Paper </article> </div>

DVFS for NoCs in CMPs: A Thread Voting Approach

Published in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016

DVFS, Multi-core
Top conference publication-HPCA

Recommended citation: Y. Yao and Z. Lu, "DVFS for NoCs in CMPs: A thread voting approach," 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, 2016, pp. 309-320, doi: 10.1109/HPCA.2016.7446074.
Download Paper </article> </div>

Memory-Access Aware DVFS for Network-on-Chip in CMPs

Published in 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

Switches;Resource management;Delays;Load modeling;Nickel;Tuning;Benchmark testing

Recommended citation: Y. Yao and Z. Lu, "Memory-access aware DVFS for network-on-chip in CMPs," 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 2016, pp. 1433-1436.
Download Paper </article> </div>

Opportunistic Competition Overhead Reduction for Expediting Critical Section in NoC Based CMPs

Published in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016

Critical Section; CMP; NoC; OS
Top conference publication-ISCA

Recommended citation: Y. Yao and Z. Lu, "Opportunistic Competition Overhead Reduction for Expediting Critical Section in NoC Based CMPs," 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea (South), 2016, pp. 279-290, doi: 10.1109/ISCA.2016.33.
Download Paper </article> </div>

Aggregate Flow-Based Performance Fairness in CMPs

Published in ACM Transactions on Architecture and Code Optimization (TACO), Volume 13, Issue 4 Article No.: 53, Pages 1 - 27, 2016

computer architecture, performance fairness, quality of service

Recommended citation: Zhonghai Lu and Yuan Yao. 2016. Aggregate Flow-Based Performance Fairness in CMPs. ACM Trans. Archit. Code Optim. 13, 4, Article 53 (December 2016), 27 pages. https://doi.org/10.1145/3014429
Download Paper </article> </div>

Dynamic Traffic Regulation in NoC-Based Systems

Published in IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Volume: 25, Issue: 2, February 2017) , 2017

Delays;System performance;IP networks;Nickel;Regulators;Calculus;Network-on-chip;Chip multi/many-core processor (CMP);fuzzy control;multi/many-processor systems-on-chip (MPSoC);network-on-chip (NoC);traffic engineering

Recommended citation: Z. Lu and Y. Yao, "Dynamic Traffic Regulation in NoC-Based Systems," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 2, pp. 556-569, Feb. 2017, doi: 10.1109/
Download Paper </article> </div>

Marginal Performance: Formalizing and Quantifying Power Over/Under Provisioning in NoC DVFS

Published in IEEE Transactions on Computers (Volume: 66, Issue: 11, 01 November 2017) , 2017

Energy efficiency;Power demand;Measurement;Benchmark testing;Energy efficiency;Network-on-chip;Program processors;Performance evaluation;power efficiency;DVFS;network-on-chip (NoC);CMP
Top transaction publication-TC

Recommended citation: Z. Lu and Y. Yao, "Marginal Performance: Formalizing and Quantifying Power Over/Under Provisioning in NoC DVFS," in IEEE Transactions on Computers, vol. 66, no. 11, pp. 1903-1917, 1 Nov. 2017, doi: 10.1109/TC.2017.2715018.
Download Paper </article> </div>

iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores

Published in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018

Instruction sets;Spinning;Liquid crystal on silicon;Coherence;Acceleration;Routing protocols;In Network Packet Generation;Critical Section;Synchronisation Primitive;Cache Coherency;Network on Chip;CMP
Top conference publication-HPCA
Best-paper candidate

Recommended citation: Y. Yao and Z. Lu, "iNPG: Accelerating Critical Section Access with In-network Packet Generation for NoC Based Many-Cores," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), Vienna, 2018, pp. 15-26, doi: 10.1109/HPCA.2018.00012.
Download Paper </article> </div>

Thread Voting DVFS for Manycore NoCs

Published in IEEE Transactions on Computers ( Volume: 67, Issue: 10, 01 October 2018) , 2018

Measurement;Message systems;System-on-chip;Instruction sets;Voltage control;Load modeling;Power system management;Chip manycore processor (CMP);DVFS;network on chip (NoC);power/energy efficiency
Top conference publication-TC

Recommended citation: Z. Lu and Y. Yao, "Thread Voting DVFS for Manycore NoCs," in IEEE Transactions on Computers, vol. 67, no. 10, pp. 1506-1524, 1 Oct. 2018, doi: 10.1109/TC.2018.2827039.
Download Paper </article> </div>

Pursuing Extreme Power Efficiency With PPCC Guided NoC DVFS

Published in IEEE Transactions on Computers (Volume: 69, Issue: 3, 01 March 2020) , 2020

Power demand;Message systems;Tuning;Thermal management;Monitoring;Energy consumption;Power system management;Manycore processor;DVFS;NoC;power efficiency;CMP
Top transaction publication-TC

Recommended citation: Y. Yao and Z. Lu, "Pursuing Extreme Power Efficiency With PPCC Guided NoC DVFS," in IEEE Transactions on Computers, vol. 69, no. 3, pp. 410-426, 1 March 2020, doi: 10.1109/TC.2019.2949807.
Download Paper </article> </div>

TSOPER: Efficient Coherence-Based Strict Persistency

Published in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021

Protocols;Program processors;Nonvolatile memory;Computational modeling;Semantics;Coherence;Computer architecture;non-volatile memory;persistent memory;persistency;total store order;coherence
Top conference publication-HPCA
I am co-first author

Recommended citation: P. Ekemark, Y. Yao, A. Ros, K. Sagonas and S. Kaxiras, "TSOPER: Efficient Coherence-Based Strict Persistency," 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea (South), 2021, pp. 125-138, doi: 10.1109/HPCA51647.2021.00021.
Download Paper </article> </div>

Game-of-Life Temperature-Aware DVFS Strategy for Tile-based Chip Many-Core Processor

Published in IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS, Volume: 13, Issue: 1, March 2023), 2023

Dynamic voltage scaling, multiprocessor interconnection, automata.

Recommended citation: Y. Yao, "Game-of-Life Temperature-Aware DVFS Strategy for Tile-Based Chip Many-Core Processors," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 13, no. 1, pp. 58-72, March 2023, doi: 10.1109/JETCAS.2023.3244763
Download Paper </article> </div>

SE-CNN: Convolution Neural Network Acceleration via Symbolic Value Prediction

Published in IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS, Volume: 13, Issue: 1, March 2023), 2023

Artificial intelligence, artificial neural networks, AI accelerators.

Recommended citation: Y. Yao, "SE-CNN: Convolution Neural Network Acceleration via Symbolic Value Prediction," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 13, no. 1, pp. 73-85, March 2023, doi: 10.1109/JETCAS.2023.3244767.
Download Paper </article> </div>

Silent Stores in the Battery-less Internet of Things: A Good Idea?

Published in Proceedings of the 2023 International Conference on Embedded Wireless Systems and Networks (EWSN), 2023

sensor network;store buffer;silent store;low power devices

Recommended citation: Weining Song, Stefanos Kaxiras, Luca Mottola, Thiemo Voigt, and Yuan Yao, ''Silent Stores in the Battery-less Internet of Things: A Good Idea?'' in Proceedings of the 2023 International Conference on embedded Wireless Systems and Networks (EWSN), Association for Computing Machinery, New York, NY, USA, 40-45.
Download Paper </article> </div>

TaDA: Task Decoupling Architecture for the Battery-less Internet of Things

Published in Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems (SenSys), 2024

Task decoupling, Internet of Things (IoT), energy harvesting, intermittent computing
SenSys is a top conference in wireless sensors

Recommended citation: Weining Song, Stefanos Kaxiras, Thiemo Voigt, Yuan Yao, and Luca Mottola. 2024. TaDA: Task Decoupling Architecture for the Battery-less Internet of Things. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems (SenSys). Association for Computing Machinery, New York, NY, USA, 409–421. https://doi.org/10.1145/3666025.3699347
Download Paper </article> </div>

TangramFP: Energy-Efficient, Bit-Parallel, Multiply-Accumulate for Deep Neural Networks

Published in 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2024

Bit-Parallel, Energy-Efficient Multiply Accumulate, Deep Neural Networks
Best paper candidate

Recommended citation: Y. Yao, X. Chen, H. Atmer and S. Kaxiras, "TangramFP: Energy-Efficient, Bit-Parallel, Multiply-Accumulate for Deep Neural Networks," 2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Hilo, HI, USA, 2024, pp. 1-12, doi: 10.1109/SBAC-PAD63648.2024.00009.
Download Paper </article> </div>

RXT: RefleXive address Translation for Pointer-Chasing Workloads

Published in 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2025

Early acceptance paper [To be appeared]

Recommended citation: R. Aligholipour, P. Aimoniotis, S. Kaxiras and Y. Yao, "RXT: RefleXive address Translation for Pointer-Chasing Workloads," 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Milan, Italy, 2025.
Download Paper </article> </div>

talks

teaching

tools

TangramFP

Published:

TangramFP: Energy-Efficient, Bit-Parallel Multiply-Accumulate for Deep Neural Networks