[gem5 Q&A] Squashing Instructions after Page Table Fault
Published:
Hello, I am currently trying to locate the code that is used to squash instructions if a Page Table Fault is triggered in the O3 CPU. After using the PageTableWalker Debug Flags, my current guess would be gem5/src/arch/x86/pagetable_walker.cc in line 199. Furthermore I inspected the files in the src/cpu/o3 directory, but couldn’t find anything specific to squashing instructions after a fault.
Is my assumption correct, that the O3 CPU implementation does not handle these things on its own, but the architectural part of the implementation does it? I am missing something, feel free to point it out.
A short addition. I also couldn’t find a specific check for the user/supervisor Page Table Attribute anywhere. Are there parts in the code, where specific bits are checked or does gem5 uses some other kind of implementation here?
My answer
If I understand it correctly, a Page Table Fault instruction is not squashed but not executed. The instruction generating a fault is marked ready to commit. Then, during the commit phase, the fault generated by the instruction is handled.
To explain this in more detail let me I take an example of how Page Fault of a load is handled with gem5:
1, DefaultIEW
2, Later after the translation is done, the page fault and the faulty instruction is marked by translation->finish(…) in pagetable_walker.cc (via walker:recevTimingResp, assuming that there is a page walk). The ‘finish()’ function is defined in the O3 pipeline components. In this case: LSQ
3, Because the faulty instruction is not yet committed, DefaultIEW
4, As the instruction moves to the head of ROB, the commitInst() function of the commit unit will call commitHead(), which further calls cpu->trap(), then fault->invoke() to handle the fault. Different faults have different invoke functions. To your question, please take a look at PageFault::invoke() at arch/x86/faults.cc. The CPU then setup the CR2 register etc and will read the ROM to launch the procedure to transfer control to OS fault handler. (The microrom is defined in romutil.py)
5, And after the page handler is finished the fault instruction (still at the head of ROB) will be re-executed.
The above is based on gem5 21.0.0.0 but I don’t think the code changes much for the above discussions.
Hope this helps.
PS. Page access write is checked at the translate function in tlb.cc.
Following up question
Hi Yuan,
thank you very much for your detailed response. My understanding of the fault handling in gem5 is getting better and better. Using debug flags, I can trace the control flow during the execution of my code. I am currently inspecting tlb.cc in further detail, but I am still searching for the exact check for my problem. To further specify my question:
During the attempt to access kernel memory, the “user/supervisor” (U/S) pagetable attribute is used to check whether this page table belongs to kernel memory or not. If I want to access the memory, it should raise the page table fault. I am looking for this specific check. My goal is, to experiment with gem5 and to customize it. Currently, the instruction is not executed when raising a Page Table Fault. In a first step, I want to change the check in order to execute the instruction although it wants to access kernel memory. So I explicitly search for this check inside this command chain during the Page Fault handling.
Answer 2
Assuming we’re talking about the x86 architecture, line 471 in tlb.cc is where the check in question happens:
https://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/tlb.cc#L471
Note that the raw bits of the PTE have been abstracted out in the gem5 TLB entry data structure, hence properties such as entry->user.
Following up 2
thank you for your help. I experimented with the checks and I was a bit suprised, that the Page Fault seems not to be raised after a unsuccessful user/supervisor check. After enabling the necessary debug flags and including more Debug statements into the code, I observed that the Page Fault is not raised after entering the If-statement, but before it. Here is a short snippet of my outputs:
14442496349500: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0            (Line 470)
14442496349500: system.repeat_switch_cpus5.mmu.dtb: Checks done!                                                      (Line 485)
14442496350000: system.repeat_switch_cpus5.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0
14442496350000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496361000: Page-Fault: RIP 0x402da9: vector 14: #PF(0x4) at 0xffff880019688110
14442496387000: system.repeat_switch_cpus5.mmu.itb: inUser = 1 | entry_user = 0 | badWrite = 1
14442496387000: system.repeat_switch_cpus5.mmu.itb: ***************************** If [Line 471]. *****************************************
14442496424000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496424000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
14442496464000: system.repeat_switch_cpus5.mmu.dtb: inUser = 0 | entry_user = 0 | badWrite = 1
14442496464000: system.repeat_switch_cpus5.mmu.dtb: Checks done!
I expected, that the Page Fault is raised at line 476, but it doesn’t seem so.
For further context, my goal is to get this code (https://github.com/IAIK/meltdown/blob/master/reliability.c) working in gem5. Currently, “libkdump_read” (https://github.com/IAIK/meltdown/blob/master/libkdump/libkdump.c#L528) only returns 0 in gem5.
My guess is, that I need to change much more than initially thought. With reference to the answer of Yuan, I guess that I also need to change stuff in the function chain for handling a fault. Can anyone confirm this?
Answer 3
The “Page-Fault” message is printed out on the constructor of a fault, so gdb that line and move up frames can help.
By the way, a page fault can also be generated during page walks (see herehttps://github.com/gem5/gem5/blob/48a40cf2f5182a82de360b7efa497d82e06b1631/src/arch/x86/pagetable_walker.cc#L491C22-L491C22). The faulty PTE is not inserted into TLB. Debug flag PageTableWalker tracks all these errands.
Following up 3
I have used more debug flags, which increased the execution time by a lot, but I got some new information out of it:
Addresses : var = 39b765b0, start = 198325b0, phys = 198325b0 (output in meltdown "reliability.c" code, after line 39)
 
O3CPU: Ticking main, O3CPU.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059411234500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000
15059411234500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0x7ffe39b765b0 at pc 0x401b34.                    <--- First a TLB miss
15059411234500: system.repeat_switch_cpus1: Scheduling next tick!
[...]
O3CPU: Ticking main, O3CPU.
15059411262000: system.repeat_switch_cpus1: Scheduling next tick!
15059411262500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PTE entry 0x00000019832067.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0x7ffe39b765b0.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0x7ffe39b76000
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Entry found with paddr 0x19832000, doing protection checks.
15059411262500: system.repeat_switch_cpus1.mmu.dtb: inUser = 1 | entry_user = 1 | badWrite = 0
15059411262500: system.repeat_switch_cpus1.mmu.dtb: Translated 0x7ffe39b765b0 -> 0x198325b0.                                                 <--- Translated virt to phys
[...]
O3CPU: Ticking main, O3CPU.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Translating vaddr 0xffff8800198325b0.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: In protected mode.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Paging enabled.
15059514670500: system.repeat_switch_cpus1.mmu.dtb: pageAlignedVaddr for lookup: 0xffff880019832000
15059514670500: system.repeat_switch_cpus1.mmu.dtb: Handling a TLB miss for address 0xffff8800198325b0 at pc 0x402e09.
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e09=>0x402e10).(1=>2) [sn:251369]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e10=>0x402e13).(0=>1) [sn:251370]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e13=>0x402e15).(0=>1) [sn:251371]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(0=>1) [sn:251372]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(1=>2) [sn:251373]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e15=>0x402e17).(2=>3) [sn:251374]
15059514670500: system.repeat_switch_cpus1: Removing committed instruction [tid:0] PC (0x402e17=>0x402e1e).(0=>1) [sn:251375]
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251369] PC (0x402e09=>0x402e10).(1=>2)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251370] PC (0x402e10=>0x402e13).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251371] PC (0x402e13=>0x402e15).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251372] PC (0x402e15=>0x402e17).(0=>1)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251373] PC (0x402e15=>0x402e17).(1=>2)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251374] PC (0x402e15=>0x402e17).(2=>3)
15059514670500: system.repeat_switch_cpus1: Removing instruction, [tid:0] [sn:251375] PC (0x402e17=>0x402e1e).(0=>1)
15059514670500: system.repeat_switch_cpus1: Scheduling next tick!
[...]
O3CPU: Ticking main, O3CPU.
15059514683000: system.repeat_switch_cpus1: Scheduling next tick!
15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Got long mode PML4 entry 0x00000000000000.
15059514683500: system.repeat_switch_cpus1.mmu.dtb.walker: Raising page fault.
[...]
O3CPU: Ticking main, O3CPU.
15059514688500: Page-Fault: RIP 0x402e1e: vector 14: #PF(0x4) at 0xffff8800198325b0
15059514688500: system.repeat_switch_cpus1: Scheduling next tick!
This is a snippet of the debugging output.
For more context: https://github.com/IAIK/meltdown/blob/master/reliability.c (kaslr disabled in gem5 full-system simulation kernel command line)
- First, the address is translated from virt to phys without a problem (line 30)
- Next, the code wants to access the translated kernel address (line 49). Here seems to be the problem. It gets a TLB miss for the address, but after that the PageTableWalker gets the PML4 entry 0x00000000000000 and raises a Page fault.
- My expectation (and goal) is, that during the read of the kernel address, the Page Table Walk is successfull until the Page Table Entry.
Now I have a few questions:
- After the TLB miss at tick 15059514670500, the CPU removes many commited instructions at the PC the miss occured. Why are these instructions commited, although the Page Fault is being raised?
- Does anyone have an idea, why the Page Fault already occurs at the PML4 entry level? And why this entry is only 0x0?
Answer 4
You observed that the check on line 471 in tlb.cc did not seem to be the one causing the fault in the case you were looking at. It occurs to me that the line 471 check is for a resident page. If the page is not resident, some other check would apply, and the fault might be raised when the OS examines the PTE to determine what to do with a disallowed access to a non-resident page.
Could that be the scenario you were looking at? That would indeed seem to be more involved, though at the point gem5 does the interrupt for a non-resident page (one not in the TLB) you might be able to more directly do a check of the PTE. To do that you would need to emulate walking the page tables (hoping that all the relevant page table pages are themselves resident).
Yes, possibly a bit of a mess …
