Discussion:
[gem5-dev] Review Request: O3 LSQ: Implement TSO
(too old to reply)
Nilay Vaish
2011-11-15 15:00:32 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------

Review request for Default.


Summary
-------

O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.


Diffs
-----

src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa

Diff: http://reviews.m5sim.org/r/908/diff


Testing
-------


Thanks,

Nilay
Brad Beckmann
2011-11-15 23:05:23 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1656
-----------------------------------------------------------


Does this really implement a post-retirement store buffer, or is this just a pre-retirement store queue? I don't really know anything about this code, but based on my brief scan the store queue looks like it tracks pre-retirement stores. If so, that is very different than a post-retirement store buffer. Furthermore, you need to be careful whether the head of the pre-retirement store queue is speculative or non-speculative.

- Brad
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2011-11-15 23:25:58 UTC
Permalink
Post by Nilay Vaish
Post by Brad Beckmann
Does this really implement a post-retirement store buffer, or is this just a pre-retirement store queue? I don't really know anything about this code, but based on my brief scan the store queue looks like it tracks pre-retirement stores. If so, that is very different than a post-retirement store buffer. Furthermore, you need to be careful whether the head of the pre-retirement store queue is speculative or non-speculative.
Brad, the store queue maintains both speculative and committed stores.
My understanding is that once the flag canWB is set to true, the instruction
is now free to issue the store to the memory (that is the instruction is now
committed).

The loop in which I added the condition for checking whether there is a
store in flight, is the one used for issuing stores to the memory. Since the
loop condition checks whether canWB is true, therefore only non-speculative
stores would be issued to the memory system.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1656
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Beckmann, Brad
2011-11-15 23:47:12 UTC
Permalink
Great. It sounds like you understand the difference. So just to be sure, I assume that the load queue has no split between committed loads and speculative loads, correct? The load queue should only be tracking speculative memory operations.

Brad


From: Nilay Vaish [mailto:nilay-***@public.gmane.org]
Sent: Tuesday, November 15, 2011 3:26 PM
To: Nilay Vaish; Beckmann, Brad; Default
Subject: Re: Review Request: O3 LSQ: Implement TSO

This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/908/



On November 15th, 2011, 3:05 p.m., Brad Beckmann wrote:

Does this really implement a post-retirement store buffer, or is this just a pre-retirement store queue? I don't really know anything about this code, but based on my brief scan the store queue looks like it tracks pre-retirement stores. If so, that is very different than a post-retirement store buffer. Furthermore, you need to be careful whether the head of the pre-retirement store queue is speculative or non-speculative.

Brad, the store queue maintains both speculative and committed stores.

My understanding is that once the flag canWB is set to true, the instruction

is now free to issue the store to the memory (that is the instruction is now

committed).



The loop in which I added the condition for checking whether there is a

store in flight, is the one used for issuing stores to the memory. Since the

loop condition checks whether canWB is true, therefore only non-speculative

stores would be issued to the memory system.


- Nilay


On November 15th, 2011, 7 a.m., Nilay Vaish wrote:
Review request for Default.
By Nilay Vaish.

Updated 2011-11-15 07:00:31

Description

O3 LSQ: Implement TSO

This patch makes O3's LSQ maintain total order between stores. Essentially

only the store at the head of the store buffer is allowed to be in flight.

Only after that store completes, the next store is issued to the memory

system.


Diffs

* src/cpu/o3/lsq_unit.hh (e66a566f2cfa)
* src/cpu/o3/lsq_unit_impl.hh (e66a566f2cfa)

View Diff<http://reviews.m5sim.org/r/908/diff/>
Ali Saidi
2011-11-16 00:05:29 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1659
-----------------------------------------------------------



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2148>

How about we have a parameter to the CPU and have a check that it's set for x86.



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2149>

Similarly.




src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2150>

Again.



- Ali
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2011-11-16 00:23:25 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 773
<http://reviews.m5sim.org/r/908/diff/1/?file=15488#file15488line773>
How about we have a parameter to the CPU and have a check that it's set for x86.
Or we can have it as a trait of an isa just like we have one flag for
memory access alignment.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1659
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Ali Saidi
2011-11-16 05:51:50 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 773
<http://reviews.m5sim.org/r/908/diff/1/?file=15488#file15488line773>
How about we have a parameter to the CPU and have a check that it's set for x86.
Or we can have it as a trait of an isa just like we have one flag for
memory access alignment.
But you might want to see how performance changes if you enable it on an isa with a weaker memory model.


- Ali


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1659
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Gabe Black
2011-11-16 13:13:25 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 773
<http://reviews.m5sim.org/r/908/diff/1/?file=15488#file15488line773>
How about we have a parameter to the CPU and have a check that it's set for x86.
Or we can have it as a trait of an isa just like we have one flag for
memory access alignment.
But you might want to see how performance changes if you enable it on an isa with a weaker memory model.
Having #ifdefs is usually bad, and making this depend on x86 is probably not right. It's really whether we have or don't have a certain restriction on how memory accesses are handled, and x86 just happens to be where it's needed, right? Moving this out of the preprocessor would be great. Sometimes it's really annoying to do that, but in the long run it's a good idea.


- Gabe


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1659
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-15 07:00:31)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/lsq_unit.hh e66a566f2cfa
src/cpu/o3/lsq_unit_impl.hh e66a566f2cfa
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2011-11-28 16:49:21 UTC
Permalink
Hi Gabe,

Is the work on x86 O3 FS simulation complete? I am thinking of testing it
with Ruby.

Thanks
Nilay
Gabe Black
2011-11-28 21:13:23 UTC
Permalink
More or less. It's working enough to run the regression, but Ali's
branch predictor patch exposes a bug in how O3 handles squashes in the
middle of particular macroops. It will probably work for you at least up
to the point where user mode stuff starts running, and for our boot
regression test is most of the way.

Gabe
Post by Nilay Vaish
Hi Gabe,
Is the work on x86 O3 FS simulation complete? I am thinking of testing
it with Ruby.
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Nilay Vaish
2011-11-30 04:37:46 UTC
Permalink
Post by Gabe Black
More or less. It's working enough to run the regression, but Ali's
branch predictor patch exposes a bug in how O3 handles squashes in the
middle of particular macroops. It will probably work for you at least up
to the point where user mode stuff starts running, and for our boot
regression test is most of the way.
Gabe
Post by Nilay Vaish
Hi Gabe,
Is the work on x86 O3 FS simulation complete? I am thinking of testing
it with Ruby.
Thanks
Nilay
Gabe, when I boot FS with O3 CPU and Ruby, I get the following output on
the terminal of the simulated system.

EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15

The segfault message keeps appearing. Do you know why this might be
happening?

Thanks
Nilay
Gabriel Michael Black
2011-11-30 09:26:58 UTC
Permalink
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.

Gabe
Post by Nilay Vaish
Post by Gabe Black
More or less. It's working enough to run the regression, but Ali's
branch predictor patch exposes a bug in how O3 handles squashes in the
middle of particular macroops. It will probably work for you at least up
to the point where user mode stuff starts running, and for our boot
regression test is most of the way.
Gabe
Post by Nilay Vaish
Hi Gabe,
Is the work on x86 O3 FS simulation complete? I am thinking of testing
it with Ruby.
Thanks
Nilay
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Nilay Vaish
2011-12-03 18:02:30 UTC
Permalink
That may be the same thing that's happening with Ali's branch predictor
patch. With Ruby execution changes enough to hit one of the broken squashing
cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following output on
the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to resolve
the problem with branch prediction?

Thanks
Nilay
Gabe Black
2011-12-05 10:24:42 UTC
Permalink
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.

Gabe
Nilay Vaish
2011-12-22 04:30:43 UTC
Permalink
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe, the instruction pointed to by the rip is SYSRET_TO_64. The micro
code for this instrution does not contain any branch. How is it possible
for the O3 bug to show up in this case? In your example using iret, the
problem was that fetch had to re-fetch the macroop from a kernel page
after the mode has been changed. For SYSRET as well the mode will be
change. But since there is no branch prediction, how will this particular
bug manifest it self?

Thanks
Nilay
Gabe Black
2011-12-22 19:59:05 UTC
Permalink
Post by Nilay Vaish
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe, the instruction pointed to by the rip is SYSRET_TO_64. The micro
code for this instrution does not contain any branch. How is it
possible for the O3 bug to show up in this case? In your example using
iret, the problem was that fetch had to re-fetch the macroop from a
kernel page after the mode has been changed. For SYSRET as well the
mode will be change. But since there is no branch prediction, how will
this particular bug manifest it self?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I haven't gone to look at the microcode again, but how would a system
call return instruction work without branches?

Gabe
Nilay Vaish
2011-12-23 17:15:17 UTC
Permalink
Post by Gabe Black
Post by Nilay Vaish
Post by Gabe Black
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe, the instruction pointed to by the rip is SYSRET_TO_64. The micro
code for this instrution does not contain any branch. How is it
possible for the O3 bug to show up in this case? In your example using
iret, the problem was that fetch had to re-fetch the macroop from a
kernel page after the mode has been changed. For SYSRET as well the
mode will be change. But since there is no branch prediction, how will
this particular bug manifest it self?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I haven't gone to look at the microcode again, but how would a system
call return instruction work without branches?
Gabe, neither the microcode in gem5, nor the pseudo code given in Intel's
instruction manual use branches. EFLAGS, CPL, CS, SS and RIP registers are
set. There are no checks as such on privilege level or any thing else.

--
Nilay
Nilay Vaish
2011-12-26 17:49:22 UTC
Permalink
I think I have figured out where the problem is. In revision 8500, Gabe
added a check that if a certain microop affects state so that fetch stage
also gets affected, then this microop should be marked isSquashAfter.
isSquashAfter, as per my reading of the code, results in all microops
after this microop getting squashed and refetched. If the current microop
changes the CPL, then fetching the instruction again should result in a
fault.

Take a look at the trace below.


5145127448500: system.cpu.commit: Trying to commit head instruction,
[sn:1122692274] [tid:0]
5145127448500: system.cpu.commit: Committing instruction with
[sn:1122692274] PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53)
5145127421500: system.cpu + A0 T0 : @iret_label.52 : IRET_PROT : wrdl
%ctrl140, t8, t2 : IntAlu : D=0x000000000000abd3 FetchSeq=1122692274
CPSeq=992061075
5145127448500: system.cpu.rob: [tid:0]: Retiring head instruction,
instruction PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53),
[sn:1122692274]
5145127448500: system.cpu: Removing committed instruction [tid:0] PC
(0xffffffff80209ae8=>0xffffffff80209aea).(52=>53) [sn:1122692274]
5145127448500: system.cpu.rob: Starting to squash within the ROB.
5145127448500: system.cpu.rob: [tid:0]: Squashing instructions until
[sn:1122692274].
5145127448500: system.cpu.rob: [tid:0]: Squashing instruction PC
(0xffffffff80209ae8=>0xffffffff80209aea).(53=>54), seq num 1122692275.
5145127448500: system.cpu.rob: Reached head of instruction list while
squashing.


All of a sudden the CPU starts squashing all the microops, even though
there was no branch misprediction, or memory dependence misprediction.
Reading the code in commit_impl.hh, it seems that the head microop (wrdl)
is marked isSquashAfter, and hence all the microops will get squashed.

This is what happens some time later.


5145127449500: system.cpu.fetch: [tid:0]: Issuing a pipelined I-cache
access, starting at PC (0xffffffff80209aea=>0xffffffff80209af2).(0=>1).
5145127449500: system.cpu.fetch: [tid:0] Fetching cache line
0xffffffff80209ac0 for addr 0xffffffff80209ae8
5145127449500: system.cpu.itb: Translating vaddr 0xffffffff80209ac0.
5145127449500: system.cpu.itb: In protected mode.
5145127449500: system.cpu.itb: Paging enabled.
5145127449500: system.cpu.itb: Matched vaddr 0xffffffff80209ac0 to entry
starting at 0xffffffff80200000 with size 0x200000.
5145127449500: system.cpu.itb: Entry found with paddr 0x200000, doing
protection checks.
5145127449500: system.cpu.itb: Trying to access kernel mode page from user
mode.
5145127449500: system.cpu: CPU already running.
5145127449500: system.cpu.fetch: [tid:0] Got back req with addr
0xffffffff80209ac0 but expected 0xffffffff80209ac0
5145127449500: system.cpu.fetch: [tid:0]: Translation faulted, building
noop.
5145127449500: global: DynInst: [sn:1122692293] Instruction created.
Instcount for system.cpu = 156
5145127449500: system.cpu.fetch: [tid:0]: Instruction PC
0xffffffff80209aea (0) created [sn:1122692293].
5145127449500: system.cpu.fetch: [tid:0]: Instruction is: NOP
5145127449500: system.cpu.fetch: Activity this cycle.
5145127449500: system.cpu.fetch: [tid:0]: Blocked, need to handle the
trap.
5145127449500: system.cpu.fetch: [tid:0]: fault (Page-Fault) detected @ PC
(0xffffffff80209aea=>0xffffffff80209af2).(0=>1).


I think we need to do two things --

* Do not squash all the microops. Only the ones that do not belong to this
instruction should be squashed.

* Do not fetch while executing a serializing instruction. This is stated
in Intel's manual as well.

--
Nilay
Nilay Vaish
2011-12-27 15:00:01 UTC
Permalink
I think I have figured out where the problem is. In revision 8500, Gabe added
a check that if a certain microop affects state so that fetch stage also gets
affected, then this microop should be marked isSquashAfter. isSquashAfter, as
per my reading of the code, results in all microops after this microop
getting squashed and refetched. If the current microop changes the CPL, then
fetching the instruction again should result in a fault.
Take a look at the trace below.
5145127448500: system.cpu.commit: Trying to commit head instruction,
[sn:1122692274] [tid:0]
5145127448500: system.cpu.commit: Committing instruction with [sn:1122692274]
PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53)
%ctrl140, t8, t2 : IntAlu : D=0x000000000000abd3 FetchSeq=1122692274
CPSeq=992061075
5145127448500: system.cpu.rob: [tid:0]: Retiring head instruction,
instruction PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53),
[sn:1122692274]
5145127448500: system.cpu: Removing committed instruction [tid:0] PC
(0xffffffff80209ae8=>0xffffffff80209aea).(52=>53) [sn:1122692274]
5145127448500: system.cpu.rob: Starting to squash within the ROB.
5145127448500: system.cpu.rob: [tid:0]: Squashing instructions until
[sn:1122692274].
5145127448500: system.cpu.rob: [tid:0]: Squashing instruction PC
(0xffffffff80209ae8=>0xffffffff80209aea).(53=>54), seq num 1122692275.
5145127448500: system.cpu.rob: Reached head of instruction list while
squashing.
All of a sudden the CPU starts squashing all the microops, even though there
was no branch misprediction, or memory dependence misprediction. Reading the
code in commit_impl.hh, it seems that the head microop (wrdl) is marked
isSquashAfter, and hence all the microops will get squashed.
This is what happens some time later.
5145127449500: system.cpu.fetch: [tid:0]: Issuing a pipelined I-cache access,
starting at PC (0xffffffff80209aea=>0xffffffff80209af2).(0=>1).
5145127449500: system.cpu.fetch: [tid:0] Fetching cache line
0xffffffff80209ac0 for addr 0xffffffff80209ae8
5145127449500: system.cpu.itb: Translating vaddr 0xffffffff80209ac0.
5145127449500: system.cpu.itb: In protected mode.
5145127449500: system.cpu.itb: Paging enabled.
5145127449500: system.cpu.itb: Matched vaddr 0xffffffff80209ac0 to entry
starting at 0xffffffff80200000 with size 0x200000.
5145127449500: system.cpu.itb: Entry found with paddr 0x200000, doing
protection checks.
5145127449500: system.cpu.itb: Trying to access kernel mode page from user
mode.
5145127449500: system.cpu: CPU already running.
5145127449500: system.cpu.fetch: [tid:0] Got back req with addr
0xffffffff80209ac0 but expected 0xffffffff80209ac0
5145127449500: system.cpu.fetch: [tid:0]: Translation faulted, building noop.
5145127449500: global: DynInst: [sn:1122692293] Instruction created.
Instcount for system.cpu = 156
5145127449500: system.cpu.fetch: [tid:0]: Instruction PC 0xffffffff80209aea
(0) created [sn:1122692293].
5145127449500: system.cpu.fetch: [tid:0]: Instruction is: NOP
5145127449500: system.cpu.fetch: Activity this cycle.
5145127449500: system.cpu.fetch: [tid:0]: Blocked, need to handle the trap.
(0xffffffff80209aea=>0xffffffff80209af2).(0=>1).
I think we need to do two things --
* Do not squash all the microops. Only the ones that do not belong to this
instruction should be squashed.
* Do not fetch while executing a serializing instruction. This is stated in
Intel's manual as well.
--
Nilay
I think I spoke too soon. I do not see any of those faults generated due
to failed protection checks making it to the commit stage of the O3 CPU.
So, what ever I wrote in the previous email may not hold at all.

--
Nilay
Nilay Vaish
2011-12-29 18:49:27 UTC
Permalink
Ok, here is another shot at explaining what might be going on wrong. I
think this is what happens --

1. A sysret instruction is in execution. It commits some of its microops,
including the one that write the CS selector.

2. There is a pending interrupt, but it is not being handled since the
cpu->instList() is non-empty.

3. One of the microops is marked isSquashAfter. So after this microop
commmits, all the microops after it are squashed. This empties the
cpu->instList().

4. The interrupt is handled, and a trap is scheduled.

5. The trap squashes sysret's microops (which were added again by fetch),
changes the PC to Microcode_ROM:<something>.

6. It seems this microcode saves the CS that was written by the sysret,
but the instruction pointer saved is the pointer to the sysret
instruction since this instruction is yet to complete.

7. When the iret instruction tries to restore the context to the
previously executing context (which is the sysret instruction), it
restores the CS as the one that the sysret instruction wrote. This CS has
a user mode privilege level. But rip points to the sysret instruction,
which should be on a kernel mode page since we are returning from a system
call.

8. When the TLB lookup is performed for fetching the instruction pointed
to by the rip, the TLB generates a page fault, in which a user mode
process is trying to access a kernel mode page. This instruction reaches
the commit stage of the processor at which point (I think) the operating
system realizes that some thing wrong has happened and it outputs the
segfault message to the terminal.


I think the fix remains the same, isSquashAfter should not squash all the
microops, but only those that do not belong to this particular
instruction. Or else, fetch should simply not fetch any microop once it
sees a microop that is marked isSquashAfter.

Any opinions on this?

--
Nilay
Korey Sewell
2011-12-30 03:06:08 UTC
Permalink
Hi Nilay,
are you saying that the the sysret instruction is in the middle of some
microops which includes a CS instruction. Then, in the middle of a microop
sequence, a interrupt happens. When it comes back from the interrupt, it
then resumes the microcode but since the CS microop is marked user mode it
faults from looking up a kernel page.

A few questions:
1. If no interrupt would occur in the middle of the sysret, would the CS
microop cause a fault?
2. In the segfault message, can you match the address of the "rip" pointer
with whatever instructions set the rip pointer in the M5 trace?
3. I'm also unsure how squashing solves this (still trying to grasp the
whole problem). It seems like the CS instruction needs to be marked kernel
if it was microcode from a originally kernel instruction.

I'm admittedly a little behind in this convo, but would like to help if I
can. Please clarify any misunderstandings if you can. If I can grasp the
problem then I can better help/suggest how the squashing mechanisms can be
updated to support this.
Post by Nilay Vaish
Ok, here is another shot at explaining what might be going on wrong. I
think this is what happens --
1. A sysret instruction is in execution. It commits some of its microops,
including the one that write the CS selector.
2. There is a pending interrupt, but it is not being handled since the
cpu->instList() is non-empty.
3. One of the microops is marked isSquashAfter. So after this microop
commmits, all the microops after it are squashed. This empties the
cpu->instList().
4. The interrupt is handled, and a trap is scheduled.
5. The trap squashes sysret's microops (which were added again by fetch),
changes the PC to Microcode_ROM:<something>.
6. It seems this microcode saves the CS that was written by the sysret,
but the instruction pointer saved is the pointer to the sysret instruction
since this instruction is yet to complete.
7. When the iret instruction tries to restore the context to the
previously executing context (which is the sysret instruction), it restores
the CS as the one that the sysret instruction wrote. This CS has a user
mode privilege level. But rip points to the sysret instruction, which
should be on a kernel mode page since we are returning from a system call.
8. When the TLB lookup is performed for fetching the instruction pointed
to by the rip, the TLB generates a page fault, in which a user mode process
is trying to access a kernel mode page. This instruction reaches the commit
stage of the processor at which point (I think) the operating system
realizes that some thing wrong has happened and it outputs the segfault
message to the terminal.
I think the fix remains the same, isSquashAfter should not squash all the
microops, but only those that do not belong to this particular instruction.
Or else, fetch should simply not fetch any microop once it sees a microop
that is marked isSquashAfter.
Any opinions on this?
--
Nilay
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
Nilay
2011-12-30 04:59:29 UTC
Permalink
Post by Korey Sewell
Hi Nilay,
are you saying that the the sysret instruction is in the middle of some
microops which includes a CS instruction. Then, in the middle of a microop
sequence, a interrupt happens. When it comes back from the interrupt, it
then resumes the microcode but since the CS microop is marked user mode it
faults from looking up a kernel page.
I think your understanding is pretty close to mine.
Post by Korey Sewell
1. If no interrupt would occur in the middle of the sysret, would the CS
microop cause a fault?
I don't expect so. It only touches upon some of the registers, does not
even perform any loads / stores.
Post by Korey Sewell
2. In the segfault message, can you match the address of the "rip" pointer
with whatever instructions set the rip pointer in the M5 trace?
As per the trace, the rip in the message is the address of the sysret
instruction.
Post by Korey Sewell
3. I'm also unsure how squashing solves this (still trying to grasp the
whole problem). It seems like the CS instruction needs to be marked kernel
if it was microcode from a originally kernel instruction.
My understanding is that problem is with the way we implement the flag
isSquashAfter. I think the idea behind this flag is that certain
instructions right some control registers, which can change the way
instructions are fetched. For example, the CS register is over written by
the sysret instruction. So next instruction (the one after sysret), should
be fetched from a user page, instead of it being from the kernel page that
contains sysret. But since fetch module starts fetching the next
instruction almost immediately, it would almost certainly fetch the next
instruction on the kernel page after the sysret instruction. Hence the
need for the flag isSquashAfter. If a microop with this flag is committed,
then all the microops following that microop are squashed.
This should work fine, but the problem is that this also empties the
instList maintained by the CPU which triggers the interrup handler
function in commit_impl.hh, which in turn schedules the trap event. If the
flag isSquashAfter only squashes microops that belong to the next
instruction and not this instruction, then, I think, the instList would
not be empty in the middle of the instruction and the interrupt would be
handled only at the end of the instruction.
Post by Korey Sewell
I'm admittedly a little behind in this convo, but would like to help if I
can. Please clarify any misunderstandings if you can. If I can grasp the
problem then I can better help/suggest how the squashing mechanisms can be
updated to support this.
Post by Nilay Vaish
Ok, here is another shot at explaining what might be going on wrong. I
think this is what happens --
1. A sysret instruction is in execution. It commits some of its microops,
including the one that write the CS selector.
2. There is a pending interrupt, but it is not being handled since the
cpu->instList() is non-empty.
3. One of the microops is marked isSquashAfter. So after this microop
commmits, all the microops after it are squashed. This empties the
cpu->instList().
4. The interrupt is handled, and a trap is scheduled.
5. The trap squashes sysret's microops (which were added again by fetch),
changes the PC to Microcode_ROM:<something>.
6. It seems this microcode saves the CS that was written by the sysret,
but the instruction pointer saved is the pointer to the sysret instruction
since this instruction is yet to complete.
7. When the iret instruction tries to restore the context to the
previously executing context (which is the sysret instruction), it restores
the CS as the one that the sysret instruction wrote. This CS has a user
mode privilege level. But rip points to the sysret instruction, which
should be on a kernel mode page since we are returning from a system call.
8. When the TLB lookup is performed for fetching the instruction pointed
to by the rip, the TLB generates a page fault, in which a user mode process
is trying to access a kernel mode page. This instruction reaches the commit
stage of the processor at which point (I think) the operating system
realizes that some thing wrong has happened and it outputs the segfault
message to the terminal.
I think the fix remains the same, isSquashAfter should not squash all the
microops, but only those that do not belong to this particular instruction.
Or else, fetch should simply not fetch any microop once it sees a microop
that is marked isSquashAfter.
Any opinions on this?
--
Nilay
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Nilay
Gabe Black
2011-12-30 07:30:04 UTC
Permalink
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.

Gabe
Korey Sewell
2011-12-30 07:50:10 UTC
Permalink
I agree with you Gabe that the squashing mechanism could be cleaned up.

But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).

That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug with..
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Gabe Black
2011-12-30 08:48:34 UTC
Permalink
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.

Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug with..
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to clean
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell
2011-12-30 09:03:17 UTC
Permalink
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).

Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).

My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.

If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)

(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
Post by Gabe Black
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is
recommended
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
Post by Nilay Vaish
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to
clean
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Nilay Vaish
2011-12-30 14:54:51 UTC
Permalink
Gabe, from all the traces that I have been through, I never found your
explanation to be true. Here is an excerpt from your first email in that
thread --

----
It turns out the problem is that an instruction is started which
returns from kernel to user level and is microcoded. The instruction is
fetched from the kernel's address space successfully and starts to
execute, along the way dropping down to user mode. Some microops later,
there's some microop control flow which O3 mispredicts. When it squashes
the mispredict and tries to restart, it first tries to refetch the
instruction involved. Since it's now at user level and the instruction is
on a kernel level only page, there's a page fault and things go downhill
from there.
----

You claim that because of branch misprediction, the instruction needs to
be refetched. I never saw that happening. In all the cases your fix, in
which the fetch stage picks the current macroop from the just now squashed
instruction, worked as expected.

What I saw was that an isSquashAfter microop squashes everything and this
makes the O3 CPU start on handling interrupt. Returning from the
interrupt, it faults since the CS register had been overwritten by the
sysret instruction, but not the instruction pointer.

Again, I think we need to change the behavior of isSquashAfter.

--
Nilay
Post by Korey Sewell
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
Post by Gabe Black
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
Post by Gabe Black
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Nilay Vaish
2011-12-30 21:16:53 UTC
Permalink
The following patch takes care of the current problem. I am able to
replicate the O3 full system regression test with Ruby in place of the
classic memory system.

diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh
+++ b/src/cpu/o3/commit.hh
@@ -443,6 +443,9 @@
/** Rename map interface. */
RenameMap *renameMap[Impl::MaxThreads];

+ /* Variable for keeping track if an instruction is in middle of
execution */
+ bool instruction_in_flight;
+
/** Updates commit stats based on this instruction. */
void updateComInstStats(DynInstPtr &inst);

diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -105,7 +105,8 @@
numThreads(params->numThreads),
drainPending(false),
switchedOut(false),
- trapLatency(params->trapLatency)
+ trapLatency(params->trapLatency),
+ instruction_in_flight(false)
{
_status = Active;
_nextStatus = Inactive;
@@ -717,7 +718,7 @@

// Wait until all in flight instructions are finished before
enterring
// the interrupt.
- if (cpu->instList.empty()) {
+ if (cpu->instList.empty() && (!instruction_in_flight)) {
// Squash or record that I need to squash this cycle if
// an interrupt needed to be handled.
DPRINTF(Commit, "Interrupt detected.\n");
@@ -990,6 +991,8 @@
cpu->instDone(tid);
}

+ instruction_in_flight = !head_inst->isLastMicroop();
+
// Updates misc. registers.
head_inst->updateMiscRegs();

--
Nilay
Post by Nilay Vaish
Gabe, from all the traces that I have been through, I never found your
explanation to be true. Here is an excerpt from your first email in that
thread --
----
It turns out the problem is that an instruction is started which returns from
kernel to user level and is microcoded. The instruction is fetched from the
kernel's address space successfully and starts to execute, along the way
dropping down to user mode. Some microops later, there's some microop control
flow which O3 mispredicts. When it squashes the mispredict and tries to
restart, it first tries to refetch the instruction involved. Since it's now
at user level and the instruction is on a kernel level only page, there's a
page fault and things go downhill from there.
----
You claim that because of branch misprediction, the instruction needs to be
refetched. I never saw that happening. In all the cases your fix, in which
the fetch stage picks the current macroop from the just now squashed
instruction, worked as expected.
What I saw was that an isSquashAfter microop squashes everything and this
makes the O3 CPU start on handling interrupt. Returning from the interrupt,
it faults since the CS register had been overwritten by the sysret
instruction, but not the instruction pointer.
Again, I think we need to change the behavior of isSquashAfter.
--
Nilay
Post by Korey Sewell
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
Post by Gabe Black
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
Post by Gabe Black
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell
2011-12-30 21:27:34 UTC
Permalink
So your fix is basically to not allow a interrupt to happen in between a
macroop since that leads to imprecise state and the current problem? If so,
I think that's reasonable and definitely it's good work on your part!
Persistence pays off.

Although, I think there is probably a more descriptive name than
"instruction in flight" and also someone may have a better way to
encapsulate this behavior.

I'd suggest posting this to the reviewboard for further comments on this
patch.
Post by Nilay Vaish
The following patch takes care of the current problem. I am able to
replicate the O3 full system regression test with Ruby in place of the
classic memory system.
diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh
+++ b/src/cpu/o3/commit.hh
@@ -443,6 +443,9 @@
/** Rename map interface. */
RenameMap *renameMap[Impl::MaxThreads];
+ /* Variable for keeping track if an instruction is in middle of
execution */
+ bool instruction_in_flight;
+
/** Updates commit stats based on this instruction. */
void updateComInstStats(DynInstPtr &inst);
diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -105,7 +105,8 @@
numThreads(params->numThreads)**,
drainPending(false),
switchedOut(false),
- trapLatency(params->**trapLatency)
+ trapLatency(params->**trapLatency),
+ instruction_in_flight(false)
{
_status = Active;
_nextStatus = Inactive;
@@ -717,7 +718,7 @@
// Wait until all in flight instructions are finished before enterring
// the interrupt.
- if (cpu->instList.empty()) {
+ if (cpu->instList.empty() && (!instruction_in_flight)) {
// Squash or record that I need to squash this cycle if
// an interrupt needed to be handled.
DPRINTF(Commit, "Interrupt detected.\n");
@@ -990,6 +991,8 @@
cpu->instDone(tid);
}
+ instruction_in_flight = !head_inst->isLastMicroop();
+
// Updates misc. registers.
head_inst->updateMiscRegs();
--
Nilay
Gabe, from all the traces that I have been through, I never found your
Post by Nilay Vaish
explanation to be true. Here is an excerpt from your first email in that
thread --
----
It turns out the problem is that an instruction is started which returns
from kernel to user level and is microcoded. The instruction is fetched
from the kernel's address space successfully and starts to execute, along
the way dropping down to user mode. Some microops later, there's some
microop control flow which O3 mispredicts. When it squashes the mispredict
and tries to restart, it first tries to refetch the instruction involved.
Since it's now at user level and the instruction is on a kernel level only
page, there's a page fault and things go downhill from there.
----
You claim that because of branch misprediction, the instruction needs to
be refetched. I never saw that happening. In all the cases your fix, in
which the fetch stage picks the current macroop from the just now squashed
instruction, worked as expected.
What I saw was that an isSquashAfter microop squashes everything and this
makes the O3 CPU start on handling interrupt. Returning from the interrupt,
it faults since the CS register had been overwritten by the sysret
instruction, but not the instruction pointer.
Again, I think we need to change the behavior of isSquashAfter.
--
Nilay
I can't vouch for reading all the emails but I have gone through this
Post by Korey Sewell
whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
If you read my emails the problem would already be identified and
Post by Gabe Black
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
What was unclear about this email and the ones before it? Did you not
Post by Gabe Black
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
Steve Reinhardt
2012-01-02 18:40:38 UTC
Permalink
Hi Nilay,

I think Gabe's point is that if you take the problem he is describing and
replace "misprediction" with "interrupt" then what you're seeing is just a
different manifestation of the same underlying problem. In a sense an
interrupt is just a control-flow misprediction, only much rarer and mostly
unavoidable... the pipeline predicts that control will always go to the
next sequential instruction for non-control-flow-instructions (and to
either the next instruction or the target for control-flow instructions)
and an interrupt is a situation where that prediction is wrong.

One difference is that in the branch misprediction case, the pipeline
immediately refetches along the correct path, while in the interrupt case,
the interrupt handler gets executed first, and the problem doesn't show up
until you return from the interrupt.

I think Gabe is also saying that if we did a larger restructuring of how
squashes are handled we could solve both problems at once (and perhaps
other related cases we haven't encountered yet). I agree that that sounds
good in theory, but in practice if there's a straightforward fix for the
current issue, I'm not opposed to putting that in.

Steve
Post by Nilay Vaish
Gabe, from all the traces that I have been through, I never found your
explanation to be true. Here is an excerpt from your first email in that
thread --
----
It turns out the problem is that an instruction is started which returns
from kernel to user level and is microcoded. The instruction is fetched
from the kernel's address space successfully and starts to execute, along
the way dropping down to user mode. Some microops later, there's some
microop control flow which O3 mispredicts. When it squashes the mispredict
and tries to restart, it first tries to refetch the instruction involved.
Since it's now at user level and the instruction is on a kernel level only
page, there's a page fault and things go downhill from there.
----
You claim that because of branch misprediction, the instruction needs to
be refetched. I never saw that happening. In all the cases your fix, in
which the fetch stage picks the current macroop from the just now squashed
instruction, worked as expected.
What I saw was that an isSquashAfter microop squashes everything and this
makes the O3 CPU start on handling interrupt. Returning from the interrupt,
it faults since the CS register had been overwritten by the sysret
instruction, but not the instruction pointer.
Again, I think we need to change the behavior of isSquashAfter.
--
Nilay
I can't vouch for reading all the emails but I have gone through this
Post by Korey Sewell
whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
If you read my emails the problem would already be identified and
Post by Gabe Black
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
What was unclear about this email and the ones before it? Did you not
Post by Gabe Black
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
Nilay Vaish
2012-01-02 19:20:55 UTC
Permalink
Post by Korey Sewell
Hi Nilay,
I think Gabe's point is that if you take the problem he is describing and
replace "misprediction" with "interrupt" then what you're seeing is just a
different manifestation of the same underlying problem. In a sense an
interrupt is just a control-flow misprediction, only much rarer and mostly
unavoidable... the pipeline predicts that control will always go to the
next sequential instruction for non-control-flow-instructions (and to
either the next instruction or the target for control-flow instructions)
and an interrupt is a situation where that prediction is wrong.
One difference is that in the branch misprediction case, the pipeline
immediately refetches along the correct path, while in the interrupt case,
the interrupt handler gets executed first, and the problem doesn't show up
until you return from the interrupt.
I think Gabe is also saying that if we did a larger restructuring of how
squashes are handled we could solve both problems at once (and perhaps
other related cases we haven't encountered yet). I agree that that sounds
good in theory, but in practice if there's a straightforward fix for the
current issue, I'm not opposed to putting that in.
Steve, from the way I have been able to fix the problem, it seems that the
issue is with the way interrupts are handled. As such I never saw
squashing being the problem, though it might be.

--
Nilay
Steve Reinhardt
2012-01-02 19:33:10 UTC
Permalink
Post by Korey Sewell
Hi Nilay,
Post by Steve Reinhardt
I think Gabe's point is that if you take the problem he is describing and
replace "misprediction" with "interrupt" then what you're seeing is just a
different manifestation of the same underlying problem. In a sense an
interrupt is just a control-flow misprediction, only much rarer and mostly
unavoidable... the pipeline predicts that control will always go to the
next sequential instruction for non-control-flow-instructions (and to
either the next instruction or the target for control-flow instructions)
and an interrupt is a situation where that prediction is wrong.
One difference is that in the branch misprediction case, the pipeline
immediately refetches along the correct path, while in the interrupt case,
the interrupt handler gets executed first, and the problem doesn't show up
until you return from the interrupt.
I think Gabe is also saying that if we did a larger restructuring of how
squashes are handled we could solve both problems at once (and perhaps
other related cases we haven't encountered yet). I agree that that sounds
good in theory, but in practice if there's a straightforward fix for the
current issue, I'm not opposed to putting that in.
Steve, from the way I have been able to fix the problem, it seems that the
issue is with the way interrupts are handled. As such I never saw squashing
being the problem, though it might be.
Step back a little... the problem you're having is in the way that
interrupts and squashing interact. You have fixed it by changing how
interrupts are handled, but it probably could also have been solved by
changing how squashing works (and I think that that was the direction you
were headed initially, talking about changes to the isSquashAfter flag).

Gabe is pointing out that there was an earlier, similar bug in the way that
branch mispredictions and squashing interacted, that probably also could
have been solved with changing how squashing works, in a way that probably
would have addressed your problem too.

I'm not saying anything about which solution is preferable, just trying to
get you and Gabe to see more eye-to-eye (and maybe get everyone else on the
same page too).

Steve
Nilay Vaish
2012-01-02 20:19:57 UTC
Permalink
Post by Steve Reinhardt
Post by Korey Sewell
Hi Nilay,
Post by Steve Reinhardt
I think Gabe's point is that if you take the problem he is describing and
replace "misprediction" with "interrupt" then what you're seeing is just a
different manifestation of the same underlying problem. In a sense an
interrupt is just a control-flow misprediction, only much rarer and mostly
unavoidable... the pipeline predicts that control will always go to the
next sequential instruction for non-control-flow-instructions (and to
either the next instruction or the target for control-flow instructions)
and an interrupt is a situation where that prediction is wrong.
One difference is that in the branch misprediction case, the pipeline
immediately refetches along the correct path, while in the interrupt case,
the interrupt handler gets executed first, and the problem doesn't show up
until you return from the interrupt.
I think Gabe is also saying that if we did a larger restructuring of how
squashes are handled we could solve both problems at once (and perhaps
other related cases we haven't encountered yet). I agree that that sounds
good in theory, but in practice if there's a straightforward fix for the
current issue, I'm not opposed to putting that in.
Steve, from the way I have been able to fix the problem, it seems that the
issue is with the way interrupts are handled. As such I never saw squashing
being the problem, though it might be.
Step back a little... the problem you're having is in the way that
interrupts and squashing interact. You have fixed it by changing how
interrupts are handled, but it probably could also have been solved by
changing how squashing works (and I think that that was the direction you
were headed initially, talking about changes to the isSquashAfter flag).
I am not too confident that squashing working correctly can imply
interrupts are handled correctly. It might just work out. But as Gabe said
in one of his emails today, the condition for interrupt handling (instList
being empty) is broken. And this is independent of squashing.
Post by Steve Reinhardt
Gabe is pointing out that there was an earlier, similar bug in the way that
branch mispredictions and squashing interacted, that probably also could
have been solved with changing how squashing works, in a way that probably
would have addressed your problem too.
I'm not saying anything about which solution is preferable, just trying to
get you and Gabe to see more eye-to-eye (and maybe get everyone else on the
same page too).
Steve, I don't think there is much to gain from this argument. We should
rather focus our energies on getting the O3 CPU to work correctly. In
fact, I need yours and everyone elses opinion on those other threads
(check pointing, interrupt handling) that are open. Let's just agree that
both squashing and interrupt handling mechanisms are broken and that they
need to be fixed.

--
Nilay
Steve Reinhardt
2012-01-02 21:06:02 UTC
Permalink
Post by Steve Reinhardt
Step back a little... the problem you're having is in the way that
Post by Steve Reinhardt
interrupts and squashing interact. You have fixed it by changing how
interrupts are handled, but it probably could also have been solved by
changing how squashing works (and I think that that was the direction you
were headed initially, talking about changes to the isSquashAfter flag).
I am not too confident that squashing working correctly can imply
interrupts are handled correctly. It might just work out. But as Gabe said
in one of his emails today, the condition for interrupt handling (instList
being empty) is broken. And this is independent of squashing.
I agree that it seems that the interrupt handling code is a little more to
blame, and your fix is probably the right thing to do (or at least it's
trying to do the right thing). But nevertheless, the problem it was
causing for you was in how it interacted with squashing.
Post by Steve Reinhardt
Gabe is pointing out that there was an earlier, similar bug in the way that
Post by Steve Reinhardt
branch mispredictions and squashing interacted, that probably also could
have been solved with changing how squashing works, in a way that probably
would have addressed your problem too.
I'm not saying anything about which solution is preferable, just trying to
get you and Gabe to see more eye-to-eye (and maybe get everyone else on the
same page too).
Steve, I don't think there is much to gain from this argument. We should
rather focus our energies on getting the O3 CPU to work correctly. In fact,
I need yours and everyone elses opinion on those other threads (check
pointing, interrupt handling) that are open. Let's just agree that both
squashing and interrupt handling mechanisms are broken and that they need
to be fixed.
This was not intended to be an argument! It was intended to try and
explain to you why Gabe saw the problem you were running into as being
related to one he had already dealt with. It was not intended to come to
any other conclusion than, as I said, to get you and Gabe to see more
eye-to-eye and get everyone on the same page, so that future discussions
can benefit from a common understanding and not be hindered by lingering
beliefs that the person on the other side of the discussion doesn't know
what he is talking about. I hope you don't feel that that goal is unworthy
or impossible to achieve.

Steve
Gabriel Michael Black
2012-01-03 17:06:54 UTC
Permalink
Post by Steve Reinhardt
Post by Steve Reinhardt
Step back a little... the problem you're having is in the way that
Post by Steve Reinhardt
interrupts and squashing interact. You have fixed it by changing how
interrupts are handled, but it probably could also have been solved by
changing how squashing works (and I think that that was the direction you
were headed initially, talking about changes to the isSquashAfter flag).
I am not too confident that squashing working correctly can imply
interrupts are handled correctly. It might just work out. But as Gabe said
in one of his emails today, the condition for interrupt handling (instList
being empty) is broken. And this is independent of squashing.
I agree that it seems that the interrupt handling code is a little more to
blame, and your fix is probably the right thing to do (or at least it's
trying to do the right thing). But nevertheless, the problem it was
causing for you was in how it interacted with squashing.
Post by Steve Reinhardt
Gabe is pointing out that there was an earlier, similar bug in the way that
Post by Steve Reinhardt
branch mispredictions and squashing interacted, that probably also could
have been solved with changing how squashing works, in a way that probably
would have addressed your problem too.
I'm not saying anything about which solution is preferable, just trying to
get you and Gabe to see more eye-to-eye (and maybe get everyone else on the
same page too).
Steve, I don't think there is much to gain from this argument. We should
rather focus our energies on getting the O3 CPU to work correctly. In fact,
I need yours and everyone elses opinion on those other threads (check
pointing, interrupt handling) that are open. Let's just agree that both
squashing and interrupt handling mechanisms are broken and that they need
to be fixed.
This was not intended to be an argument! It was intended to try and
explain to you why Gabe saw the problem you were running into as being
related to one he had already dealt with. It was not intended to come to
any other conclusion than, as I said, to get you and Gabe to see more
eye-to-eye and get everyone on the same page, so that future discussions
can benefit from a common understanding and not be hindered by lingering
beliefs that the person on the other side of the discussion doesn't know
what he is talking about. I hope you don't feel that that goal is unworthy
or impossible to achieve.
Steve
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I see what Nilay was talking about. These were two really similar
problems which were still not the same. If only the interrupt related
problems get fixed for now, that would be ok. It wouldn't just be
hacking around the other problem like I thought at first.

Gabe
Gabe Black
2012-01-02 06:36:30 UTC
Permalink
Yes, many of the things I do for gem5 can be very frustrating.

A hack that works is not necessarily any better than code that doesn't
work because it will have to be maintained and worked with/around for a
long time. It can cause more damage than what it's supposed to be
fixing, and probably will.

Gabe
Post by Korey Sewell
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
Post by Gabe Black
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of the
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is
recommended
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
Post by Nilay Vaish
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might be
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is user
space and the instruction pointer is in kernel space. The page fault is
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to
clean
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations, and
that the squashing stuff is too ad-hoc and all over the place to really
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I didn't
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Steve Reinhardt
2012-01-02 18:50:55 UTC
Permalink
Post by Gabe Black
A hack that works is not necessarily any better than code that doesn't
work because it will have to be maintained and worked with/around for a
long time.
I strongly disagree. There's a huge difference between "works" and
"doesn't work", because those map directly to "useful" and "not useful".
Obviously "maintainable" is also hugely important, but if we think that
"maintainable" is more important than "works", we should just give up now.
A piece of code that does nothing is supremely maintainable, and its only
flaw is that it's not useful, so that's the destination you're headed for
if you prize maintainability over utility.

Obviously we really want both, but let's keep our priorities straight.

Steve
Gabriel Michael Black
2012-01-03 17:17:47 UTC
Permalink
Post by Steve Reinhardt
Post by Gabe Black
A hack that works is not necessarily any better than code that doesn't
work because it will have to be maintained and worked with/around for a
long time.
I strongly disagree. There's a huge difference between "works" and
"doesn't work", because those map directly to "useful" and "not useful".
Obviously "maintainable" is also hugely important, but if we think that
"maintainable" is more important than "works", we should just give up now.
A piece of code that does nothing is supremely maintainable, and its only
flaw is that it's not useful, so that's the destination you're headed for
if you prize maintainability over utility.
Obviously we really want both, but let's keep our priorities straight.
Steve
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm keeping my priorities where they are. It's much better to take the
extra time to do something the right way than to waste little bits (or
lots) of time again and again over the long hall. That's especially
true when the first pass becomes the only pass, and some poor soul
gets stuck with fixing it after everyone including the original author
has forgotten all about it.

And of course I'm not talking about extremes. Never writing the code
at all is not the same as taking a bit longer to write quality code
with a good design. Quality code is not the same as perfect code. But
I would much rather somebody describe what should be done in a comment
and put in a panic or warning than write something that tricks you
into thinking it's handled and has to be untangled later, possibly
even forcing a rewrite of dependent code. This is not a hypothetical
problem, although extreme examples of it are thankful not that common.
It's not fun to have to go three of four steps away from what you're
trying to do and reverse engineer some random code so that everything
doesn't break when you make your own change.

Gabe
Steve Reinhardt
2012-01-03 18:10:56 UTC
Permalink
Post by Gabriel Michael Black
Post by Gabe Black
A hack that works is not necessarily any better than code that doesn't
Post by Gabe Black
work because it will have to be maintained and worked with/around for a
long time.
I strongly disagree. There's a huge difference between "works" and
"doesn't work", because those map directly to "useful" and "not useful".
Obviously "maintainable" is also hugely important, but if we think that
"maintainable" is more important than "works", we should just give up now.
A piece of code that does nothing is supremely maintainable, and its only
flaw is that it's not useful, so that's the destination you're headed for
if you prize maintainability over utility.
Obviously we really want both, but let's keep our priorities straight.
Steve
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
I'm keeping my priorities where they are. It's much better to take the
extra time to do something the right way than to waste little bits (or
lots) of time again and again over the long hall. That's especially true
when the first pass becomes the only pass, and some poor soul gets stuck
with fixing it after everyone including the original author has forgotten
all about it.
And of course I'm not talking about extremes. Never writing the code at
all is not the same as taking a bit longer to write quality code with a
good design. Quality code is not the same as perfect code.
No one is arguing against taking "a bit longer to write quality code". We
consistently spend a lot of time tweaking patches on reviewboard to improve
things even in small ways before they go in.

The issue is what we do when doing something the "right way" is a huge
project involving a significant complex code rewrite that no one is
currently working on or even imminently planning to work on, versus an
admittedly inferior but simple change that does fix the immediate problem
at hand. Your original statement very clearly says that you would prefer
to leave the code broken indefinitely into the future than to settle for
the simple change. That's what I disagree with.

Steve
Korey Sewell
2012-01-02 18:55:38 UTC
Permalink
Philosophically, I agree with you Gabe. A clean "squashing" solution in O3
would be ideal. Defining "clean" for O3 is probably a subjective process in
itself. And yes, whenever I do run into problems with the CPU models it's
usually some weird squashing condition (branch mispredicts, syscalls in SE
mode, interrupts in FS, etc.)

With that said, I'd never straightforward say "commit a hack". But if the
code in the first place is confusing and someone does a hack to get a
problem to work, then that gives us some confidence that a cleaner solution
implementing the same thing will work and also provides a reference point
for that cleaner solution.

The fact that the gem5 has so many developers "guards" the repo in my
opinion, so the diligence we have with code review keeps gem5 maintainable
(Steve's "necessarily complex" quote probably applies here somewhere in
this email!).

So in this case, since Nilay posted the fix to the reviewboard, we (the
gem5 developers) can try to make a decision about the tradeoffs and figure
out whether it's worth it to replicate the change in a "clean" way or push
it to the repo as is.

Well, that's my view anyway...Now off my soapbox!

-Korey
Post by Gabe Black
Yes, many of the things I do for gem5 can be very frustrating.
A hack that works is not necessarily any better than code that doesn't
work because it will have to be maintained and worked with/around for a
long time. It can cause more damage than what it's supposed to be
fixing, and probably will.
Gabe
Post by Korey Sewell
I can't vouch for reading all the emails but I have gone through this
whole
Post by Korey Sewell
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me
from
Post by Korey Sewell
understanding the problem at the detailed level, but I think I am
starting
Post by Korey Sewell
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you
can
Post by Korey Sewell
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first
place.
Post by Korey Sewell
If Nilay or anyone could get something to the reviewboard that worked,
hack
Post by Korey Sewell
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code,
but
Post by Korey Sewell
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
Post by Gabe Black
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my
emails.
Post by Korey Sewell
Post by Gabe Black
Gabe
Post by Korey Sewell
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing
revamp
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the
squash
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
code, there is at least a set test case and trace we can use to debug
with..
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
That may be the same thing that's happening with Ali's branch
predictor patch. With Ruby execution changes enough to hit one of
the
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
broken squashing cases. The Ruby integration is probably working.
Gabe
Post by Nilay Vaish
Gabe, when I boot FS with O3 CPU and Ruby, I get the following
output on the terminal of the simulated system.
EXT2-fs warning: mounting unchecked fs, running e2fsck is
recommended
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
Post by Nilay Vaish
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 232k freed
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
init[1]: segfault at ffffffff802095c0 rip ffffffff802095c8 rsp
00007fff38fa81b8 error 15
The segfault message keeps appearing. Do you know why this might
be
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
Post by Nilay Vaish
Post by Gabriel Michael Black
Post by Nilay Vaish
happening?
Gabe, how can I confirm this? Is there something that I can do to
resolve the problem with branch prediction?
Thanks
Nilay
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
I'm fairly confident that's what's going on. The stack address is
user
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
space and the instruction pointer is in kernel space. The page fault
is
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
from near the ip, and the error code is 15 which means, if I'm not
mistaken, a permission problem on fetch. You can't easily fix the
problem, but if you want to get started the first step would be to
clean
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
up the squashing mechanisms in O3 like I brought up in that email a
while ago. The real problem is that squashing doesn't always preserve
enough state (the macroop instance specifically) in all situations,
and
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
that the squashing stuff is too ad-hoc and all over the place to
really
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
fix it correctly and know that it's correct. I'd thought I fixed it
before when I fixed one particular squash path, but obviously I
didn't
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
Post by Gabe Black
get it all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this
needs
Post by Korey Sewell
Post by Gabe Black
Post by Korey Sewell
Post by Gabe Black
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Nilay Vaish
2011-11-18 23:35:05 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------

(Updated 2011-11-18 15:35:05.830794)


Review request for Default.


Summary
-------

O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.


Diffs (updated)
-----

src/arch/alpha/isa_traits.hh 330f8109b199
src/arch/arm/isa_traits.hh 330f8109b199
src/arch/mips/isa_traits.hh 330f8109b199
src/arch/power/isa_traits.hh 330f8109b199
src/arch/sparc/isa_traits.hh 330f8109b199
src/arch/x86/isa_traits.hh 330f8109b199
src/cpu/o3/lsq_unit.hh 330f8109b199
src/cpu/o3/lsq_unit_impl.hh 330f8109b199

Diff: http://reviews.m5sim.org/r/908/diff


Testing
-------


Thanks,

Nilay
Nilay Vaish
2011-11-18 23:37:35 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1667
-----------------------------------------------------------


I have added a trait hasTSO with each of the ISA. This is set to true only
for x86 and false for the rest.

- Nilay
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2011-11-18 15:35:05)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 330f8109b199
src/arch/arm/isa_traits.hh 330f8109b199
src/arch/mips/isa_traits.hh 330f8109b199
src/arch/power/isa_traits.hh 330f8109b199
src/arch/sparc/isa_traits.hh 330f8109b199
src/arch/x86/isa_traits.hh 330f8109b199
src/cpu/o3/lsq_unit.hh 330f8109b199
src/cpu/o3/lsq_unit_impl.hh 330f8109b199
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2012-01-07 16:11:56 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------

(Updated 2012-01-07 08:11:56.410920)


Review request for Default.


Summary
-------

O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.


Diffs (updated)
-----

src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258

Diff: http://reviews.m5sim.org/r/908/diff


Testing
-------


Thanks,

Nilay
Korey Sewell
2012-01-07 16:43:59 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1867
-----------------------------------------------------------



src/arch/alpha/isa_traits.hh
<http://reviews.m5sim.org/r/908/#comment2377>

Guys, I dont think this should be enabled/disabled through the ISA TRAITS.

Then, if you are doing any architectural exploration on the impact of memory models you are going to have to recompile the code or generate different binaries.

Instead, this should be a parameter to the CPU model. The CPU model can then choose to implement relaxed ordering, TSO, or whatever flavor someone wants.

I dont think there is anything that explicitly ties an ISA to a memory model in all cases, so we would be creating a dependency we would regret later (for instance, SPARC can have a relaxed model or TSO right?)

And IMO, it will be cleaner to just give the CPU a "memory_model" parameter that is a Enum, then the LSQ can by if statement choose what to do and then when someone reads the code it will be obvious what's taking place and why.

What do people think about that?


- Korey
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Gabe Black
2012-01-07 22:14:44 UTC
Permalink
Post by Nilay Vaish
src/arch/alpha/isa_traits.hh, line 133
<http://reviews.m5sim.org/r/908/diff/3/?file=17419#file17419line133>
Guys, I dont think this should be enabled/disabled through the ISA TRAITS.
Then, if you are doing any architectural exploration on the impact of memory models you are going to have to recompile the code or generate different binaries.
Instead, this should be a parameter to the CPU model. The CPU model can then choose to implement relaxed ordering, TSO, or whatever flavor someone wants.
I dont think there is anything that explicitly ties an ISA to a memory model in all cases, so we would be creating a dependency we would regret later (for instance, SPARC can have a relaxed model or TSO right?)
And IMO, it will be cleaner to just give the CPU a "memory_model" parameter that is a Enum, then the LSQ can by if statement choose what to do and then when someone reads the code it will be obvious what's taking place and why.
What do people think about that?
I agree with your conclusion and your reasoning. By extension it could be implemented either be an enum or a collection of bools depending on whether the models can cleanly be broken down into a set of selectable behaviors like doing stores one at a time. Then we could avoid things like having if (a || b || f || g) do_foo(); where a, b, f, and g all have some well defined property requiring foo.


- Gabe


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1867
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Steve Reinhardt
2012-01-09 20:21:30 UTC
Permalink
Post by Gabe Black
src/arch/alpha/isa_traits.hh, line 133
<http://reviews.m5sim.org/r/908/diff/3/?file=17419#file17419line133>
Guys, I dont think this should be enabled/disabled through the ISA TRAITS.
Then, if you are doing any architectural exploration on the impact of memory models you are going to have to recompile the code or generate different binaries.
Instead, this should be a parameter to the CPU model. The CPU model can then choose to implement relaxed ordering, TSO, or whatever flavor someone wants.
I dont think there is anything that explicitly ties an ISA to a memory model in all cases, so we would be creating a dependency we would regret later (for instance, SPARC can have a relaxed model or TSO right?)
And IMO, it will be cleaner to just give the CPU a "memory_model" parameter that is a Enum, then the LSQ can by if statement choose what to do and then when someone reads the code it will be obvious what's taking place and why.
What do people think about that?
I agree with your conclusion and your reasoning. By extension it could be implemented either be an enum or a collection of bools depending on whether the models can cleanly be broken down into a set of selectable behaviors like doing stores one at a time. Then we could avoid things like having if (a || b || f || g) do_foo(); where a, b, f, and g all have some well defined property requiring foo.
I agree that someone might want to enable TSO on ARM, or enable SC on x86, or might even come up with a weakly consistent variant of x86. But anyone that's serious about this will probably have a set of specific models they want to compare with, and will be considering the impact of the cache coherence protocol as well, so it's not a trivial thing like increasing the cache size. And how many people are really going to do this?

I think what's going on here is clear enough, such that if someone really wanted to do this study they could add a CPU model parameter to override TheISA::HasTSO if they wanted to (or if they weren't that ambitious at coding, they could just change the constant and deal with having two binaries).

So sure, ideally it would be great if the CPU model supported all possible consistency models in a cleanly factored way such that each independent behavioral aspect has its own flag, with the default for each ISA being the most aggressive model that that ISA supports, but with parameters exposed to override it with a stronger model or even a weaker model, but making sure to print out adequate warnings in the latter case. If someone actually does this I hope they contribute it back. But I think it's pretty serious overkill to make Nilay do this just to get x86 TSO support in so that x86 O3 MP works correctly.


- Steve


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1867
-----------------------------------------------------------
Post by Gabe Black
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Korey Sewell
2012-01-09 20:51:19 UTC
Permalink
Post by Gabe Black
src/arch/alpha/isa_traits.hh, line 133
<http://reviews.m5sim.org/r/908/diff/3/?file=17419#file17419line133>
Guys, I dont think this should be enabled/disabled through the ISA TRAITS.
Then, if you are doing any architectural exploration on the impact of memory models you are going to have to recompile the code or generate different binaries.
Instead, this should be a parameter to the CPU model. The CPU model can then choose to implement relaxed ordering, TSO, or whatever flavor someone wants.
I dont think there is anything that explicitly ties an ISA to a memory model in all cases, so we would be creating a dependency we would regret later (for instance, SPARC can have a relaxed model or TSO right?)
And IMO, it will be cleaner to just give the CPU a "memory_model" parameter that is a Enum, then the LSQ can by if statement choose what to do and then when someone reads the code it will be obvious what's taking place and why.
What do people think about that?
I agree with your conclusion and your reasoning. By extension it could be implemented either be an enum or a collection of bools depending on whether the models can cleanly be broken down into a set of selectable behaviors like doing stores one at a time. Then we could avoid things like having if (a || b || f || g) do_foo(); where a, b, f, and g all have some well defined property requiring foo.
I agree that someone might want to enable TSO on ARM, or enable SC on x86, or might even come up with a weakly consistent variant of x86. But anyone that's serious about this will probably have a set of specific models they want to compare with, and will be considering the impact of the cache coherence protocol as well, so it's not a trivial thing like increasing the cache size. And how many people are really going to do this?
I think what's going on here is clear enough, such that if someone really wanted to do this study they could add a CPU model parameter to override TheISA::HasTSO if they wanted to (or if they weren't that ambitious at coding, they could just change the constant and deal with having two binaries).
So sure, ideally it would be great if the CPU model supported all possible consistency models in a cleanly factored way such that each independent behavioral aspect has its own flag, with the default for each ISA being the most aggressive model that that ISA supports, but with parameters exposed to override it with a stronger model or even a weaker model, but making sure to print out adequate warnings in the latter case. If someone actually does this I hope they contribute it back. But I think it's pretty serious overkill to make Nilay do this just to get x86 TSO support in so that x86 O3 MP works correctly.
Another "are we naively generalizing for the sake of generalizing?" question that we are so adept at debating :)

Although I think it's pretty easy to add the memory model option to the CPU, the point about the "dependencies" between a proposed memory model and the coherence protocol study is a good one. That probably makes the usefulness of a general parameter minimal if its just used in isolation. The "clean" way you (Steve) describe of parameter overrides and warnings would truly be overkill for something that no one would immediately find useful.

For the case where there is clearly only two options and a 3rd is far away (e.g. branch delay slots or not, TSO or not, etc.), then I could see how this would be viable. If you get to the point where you have more than two options you are setting/unsetting, then appropriate parameters should be in place.

Nilay, I won't stand in the way of your change, it seems reasonable that as long as gem5 is currently "relaxed" or "TSO" that it's not that big of a deal to do this the way you propose.


- Korey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1867
-----------------------------------------------------------
Post by Gabe Black
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Brad Beckmann
2012-01-09 19:38:49 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1870
-----------------------------------------------------------

Ship it!


- Brad
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Korey Sewell
2012-01-09 19:41:02 UTC
Permalink
Brad,
would you be against enabling TSO through a CPU parameter option rather then an ISA characteristic?


- Korey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1870
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Beckmann, Brad
2012-01-09 20:01:23 UTC
Permalink
If there is a way to enable TSO as a combination of a CPU parameter and ISA characteristic, that would probably be best. However, it doesn’t make sense to me to have it be exclusively a CPU parameter, unless the ARM folks aren’t planning to take advantage of their weaker model when simulating with the O3 model.

Brad


From: Korey Sewell [mailto:***@umich.edu]
Sent: Monday, January 09, 2012 11:41 AM
To: Nilay Vaish; Beckmann, Brad; Korey Sewell; Default
Subject: Re: Review Request: O3 LSQ: Implement TSO

This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/908/




Brad,

would you be against enabling TSO through a CPU parameter option rather then an ISA characteristic?


- Korey


On January 7th, 2012, 8:11 a.m., Nilay Vaish wrote:
Review request for Default.
By Nilay Vaish.

Updated 2012-01-07 08:11:56

Description

O3 LSQ: Implement TSO

This patch makes O3's LSQ maintain total order between stores. Essentially

only the store at the head of the store buffer is allowed to be in flight.

Only after that store completes, the next store is issued to the memory

system.


Diffs

* src/arch/alpha/isa_traits.hh (93c6317af258)
* src/arch/arm/isa_traits.hh (93c6317af258)
* src/arch/mips/isa_traits.hh (93c6317af258)
* src/arch/power/isa_traits.hh (93c6317af258)
* src/arch/sparc/isa_traits.hh (93c6317af258)
* src/arch/x86/isa_traits.hh (93c6317af258)
* src/cpu/o3/lsq_unit.hh (93c6317af258)
* src/cpu/o3/lsq_unit_impl.hh (93c6317af258)

View Diff<http://reviews.m5sim.org/r/908/diff/>
Ali Saidi
2012-01-10 00:20:40 UTC
Permalink
Post by Korey Sewell
Brad,
would you be against enabling TSO through a CPU parameter option rather then an ISA characteristic?
I'm neutral on this, either way is fine, but I think the name HasTSO isn't ideal. An ISA is TSO, but I don't know that it has TSO.

Ali


- Ali


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1870
-----------------------------------------------------------
Post by Korey Sewell
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-07 08:11:56)
Review request for Default.
Summary
-------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/arch/alpha/isa_traits.hh 93c6317af258
src/arch/arm/isa_traits.hh 93c6317af258
src/arch/mips/isa_traits.hh 93c6317af258
src/arch/power/isa_traits.hh 93c6317af258
src/arch/sparc/isa_traits.hh 93c6317af258
src/arch/x86/isa_traits.hh 93c6317af258
src/cpu/o3/lsq_unit.hh 93c6317af258
src/cpu/o3/lsq_unit_impl.hh 93c6317af258
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2012-01-10 00:30:23 UTC
Permalink
Ali, are you sure that 'an ISA is total store order' is grammatically
correct? I chose hasTSO on purpose, since I feel 'an ISA has total store
order' is correct.

--
Nilay
Post by Ali Saidi
I'm neutral on this, either way is fine, but I think the name HasTSO isn't ideal. An ISA is TSO, but I don't know that it has TSO.
Ali
Ali Saidi
2012-01-10 00:34:47 UTC
Permalink
That is fine. I can see arguments both ways, but go for it.

Ali
Post by Nilay Vaish
Ali, are you sure that 'an
ISA is total store order' is grammatically
Post by Nilay Vaish
correct? I chose hasTSO on
purpose, since I feel 'an ISA has total store
Post by Nilay Vaish
order' is correct.
--
Post by Nilay Vaish
Nilay
Post by Ali Saidi
I'm neutral
on this, either way is fine, but I think the name HasTSO isn't ideal. An
ISA is TSO, but I don't know that it has TSO. Ali
Steve Reinhardt
2012-01-10 01:01:46 UTC
Permalink
HasTSO is acceptable to me, but if we're looking for alternatives, we could
do NeedsTSO instead (since in a sense the consistency model is a
requirement the ISA puts on the memory system).

Steve
Post by Ali Saidi
That is fine. I can see arguments both ways, but go for it.
Ali
Post by Nilay Vaish
Ali, are you sure that 'an
ISA is total store order' is grammatically
Post by Nilay Vaish
correct? I chose hasTSO on
purpose, since I feel 'an ISA has total store
Post by Nilay Vaish
order' is correct.
--
Post by Nilay Vaish
Nilay
Post by Ali Saidi
I'm neutral
on this, either way is fine, but I think the name HasTSO isn't ideal. An
ISA is TSO, but I don't know that it has TSO. Ali
_______________________________________________
Post by Nilay Vaish
gem5-dev mailing
list
Post by Nilay Vaish
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Nilay Vaish
2012-01-10 01:09:23 UTC
Permalink
I declare Steve to be the winner!

--
Nilay
Post by Steve Reinhardt
HasTSO is acceptable to me, but if we're looking for alternatives, we could
do NeedsTSO instead (since in a sense the consistency model is a
requirement the ISA puts on the memory system).
Steve
Post by Ali Saidi
That is fine. I can see arguments both ways, but go for it.
Ali
Post by Nilay Vaish
Ali, are you sure that 'an
ISA is total store order' is grammatically
Post by Nilay Vaish
correct? I chose hasTSO on
purpose, since I feel 'an ISA has total store
Post by Nilay Vaish
order' is correct.
--
Post by Nilay Vaish
Nilay
Post by Ali Saidi
I'm neutral
on this, either way is fine, but I think the name HasTSO isn't ideal. An
ISA is TSO, but I don't know that it has TSO. Ali
_______________________________________________
Post by Nilay Vaish
gem5-dev mailing
list
Post by Nilay Vaish
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Gabriel Michael Black
2012-01-10 01:31:13 UTC
Permalink
I think HasTSO, NeedsTSO, IsTSO, TSOsBFF, whatever, is fine, since all
of them indicate TSO is being turned on.

The idea that when I want generalization it's for generalization's
sake implies that I have no justification other than it's just what
trips my trigger is nonsense and extremely dismissive. You can
disagree, you may be right, but my arguments should be taken just as
seriously as anyone else's.

Branch delay slots are actually a good example where turning things
into an on/off switch is an oversimplification that does more harm
than good. There are delay slots, annulled delay slots under different
circumstances, micropcs, delayed micropcs, speculative decode state,
etc. which were either not handled at all, handled incorrectly, or
handled poorly even with dedicated machinery in place. There are
10,000 shades of gray, and you can't just pick 3 and expect everybody
to be happy.

I think putting constants or bools into isa_traits.hh is generally
bad. Some of the time they don't make any sense in most of the ISAs
and have to be set to whatever somebody guesses is the most inert
setting. Sometimes we end up with multiple copies of the same thing
because nobody remembers what's already in there. You have to go
spelunking to figure out how a constant is used, and even then it
might not be consistent. There's no good way to know a constant isn't
needed any more and to get rid of it. There's no structure, just a big
wad of gunk doing 100 different things nobody remembers. There are a
few justified cases for things to go in isa_traits.hh, but I think
most of the time things are put there because it wasn't obvious where
they should go, even though there was probably a better place if
somebody looked hard enough.

The reason I don't like putting this in isa_traits.hh is because for
most of the ISAs it's a NOP and just causes clutter. I don't want to
beat Nilay over the head about it since it's not the end of the world,
but it's contributing to the clutter in that file.

If the a, b, c, d, thing I mentioned would be too much of a pain
because of dependencies (I can see that happening), weird checks that
need to happen, etc., then it makes sense not to do that. I would
still prefer this to be a parameter if nothing else for the far off
day when all the ISAs are compiled into one binary (hopefully without
templates that would quintuple the binary size), but I won't have a
heart attack if it isn't. Just remember when you go to put a new
constant in isa_traits.hh that it's probably the wrong thing to do and
you should reconsider your design.

Gabe
Post by Nilay Vaish
I declare Steve to be the winner!
--
Nilay
Post by Steve Reinhardt
HasTSO is acceptable to me, but if we're looking for alternatives, we could
do NeedsTSO instead (since in a sense the consistency model is a
requirement the ISA puts on the memory system).
Steve
Post by Ali Saidi
That is fine. I can see arguments both ways, but go for it.
Ali
Post by Nilay Vaish
Ali, are you sure that 'an
ISA is total store order' is grammatically
Post by Nilay Vaish
correct? I chose hasTSO on
purpose, since I feel 'an ISA has total store
Post by Nilay Vaish
order' is correct.
--
Post by Nilay Vaish
Nilay
Post by Ali Saidi
I'm neutral
on this, either way is fine, but I think the name HasTSO isn't ideal. An
ISA is TSO, but I don't know that it has TSO. Ali
_______________________________________________
Post by Nilay Vaish
gem5-dev mailing
list
Post by Nilay Vaish
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell
2012-01-10 01:49:48 UTC
Permalink
Hi Gabe,
that comment about generalization was not directed at you personally, I was
just trying to be funny that the conversation seems to always be a
balancing act between practicality and generality... bad joke if you took
that comment to be for you (I was the one suggesting not to put it in
isa_traits in the first place)!

As long as there is a 1:1 correlation between ISA and memory model in gem5,
then I've been converted to the side that it's clear what's going on with
the "NeedsTSO" variable and especially better than "#include
THE_ISA==X86"...
I think HasTSO, NeedsTSO, IsTSO, TSOsBFF, whatever, is fine, since all of
them indicate TSO is being turned on.
The idea that when I want generalization it's for generalization's sake
implies that I have no justification other than it's just what trips my
trigger is nonsense and extremely dismissive. You can disagree, you may be
right, but my arguments should be taken just as seriously as anyone else's.
Branch delay slots are actually a good example where turning things into
an on/off switch is an oversimplification that does more harm than good.
There are delay slots, annulled delay slots under different circumstances,
micropcs, delayed micropcs, speculative decode state, etc. which were
either not handled at all, handled incorrectly, or handled poorly even with
dedicated machinery in place. There are 10,000 shades of gray, and you
can't just pick 3 and expect everybody to be happy.
I think putting constants or bools into isa_traits.hh is generally bad.
Some of the time they don't make any sense in most of the ISAs and have to
be set to whatever somebody guesses is the most inert setting. Sometimes we
end up with multiple copies of the same thing because nobody remembers
what's already in there. You have to go spelunking to figure out how a
constant is used, and even then it might not be consistent. There's no good
way to know a constant isn't needed any more and to get rid of it. There's
no structure, just a big wad of gunk doing 100 different things nobody
remembers. There are a few justified cases for things to go in
isa_traits.hh, but I think most of the time things are put there because it
wasn't obvious where they should go, even though there was probably a
better place if somebody looked hard enough.
The reason I don't like putting this in isa_traits.hh is because for most
of the ISAs it's a NOP and just causes clutter. I don't want to beat Nilay
over the head about it since it's not the end of the world, but it's
contributing to the clutter in that file.
If the a, b, c, d, thing I mentioned would be too much of a pain because
of dependencies (I can see that happening), weird checks that need to
happen, etc., then it makes sense not to do that. I would still prefer this
to be a parameter if nothing else for the far off day when all the ISAs are
compiled into one binary (hopefully without templates that would quintuple
the binary size), but I won't have a heart attack if it isn't. Just
remember when you go to put a new constant in isa_traits.hh that it's
probably the wrong thing to do and you should reconsider your design.
Gabe
I declare Steve to be the winner!
Post by Steve Reinhardt
--
Nilay
HasTSO is acceptable to me, but if we're looking for alternatives, we
Post by Steve Reinhardt
could
do NeedsTSO instead (since in a sense the consistency model is a
requirement the ISA puts on the memory system).
Steve
Post by Ali Saidi
That is fine. I can see arguments both ways, but go for it.
Ali
Ali, are you sure that 'an
ISA is total store order' is grammatically
Post by Nilay Vaish
correct? I chose hasTSO on
purpose, since I feel 'an ISA has total store
Post by Nilay Vaish
order' is correct.
--
Nilay
I'm neutral
on this, either way is fine, but I think the name HasTSO isn't ideal.
An
ISA is TSO, but I don't know that it has TSO. Ali
Post by Nilay Vaish
______________________________**_________________
gem5-dev mailing
list
Post by Nilay Vaish
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
--
- Korey
Nilay Vaish
2012-01-11 00:22:46 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------

(Updated 2012-01-10 16:22:46.269309)


Review request for Default.


Summary
-------

O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.


Diffs (updated)
-----

src/cpu/o3/O3CPU.py d0ce111b34ef
src/cpu/o3/cpu.hh d0ce111b34ef
src/cpu/o3/cpu.cc d0ce111b34ef
src/cpu/o3/lsq_unit.hh d0ce111b34ef
src/cpu/o3/lsq_unit_impl.hh d0ce111b34ef

Diff: http://reviews.m5sim.org/r/908/diff


Testing
-------


Thanks,

Nilay
Nilay Vaish
2012-01-13 12:57:18 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------

(Updated 2012-01-13 04:57:18.871103)


Review request for Default.


Summary (updated)
-------

Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.


Diffs (updated)
-----

src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c

Diff: http://reviews.m5sim.org/r/908/diff


Testing
-------


Thanks,

Nilay
Steve Reinhardt
2012-01-14 03:06:40 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1945
-----------------------------------------------------------


Looks good... thanks, Nilay.


src/cpu/o3/O3CPU.py
<http://reviews.m5sim.org/r/908/#comment2449>

Seems like we should do something like:

needsTSO = Param.Bool(buildEnv['TARGET_ISA'] == 'x86', "Memory model")



- Steve
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-13 04:57:18)
Review request for Default.
Summary
-------
Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Korey Sewell
2012-01-14 03:31:38 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/O3CPU.py, line 149
<http://reviews.m5sim.org/r/908/diff/5/?file=21032#file21032line149>
needsTSO = Param.Bool(buildEnv['TARGET_ISA'] == 'x86', "Memory model")
Also, this description should be changed from memory model to "Enforce Total Store Ordering (TSO)".

And thanks Nilay for taking the time to get this done. Your work is appreciated!


- Korey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1945
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-13 04:57:18)
Review request for Default.
Summary
-------
Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Korey Sewell
2012-01-14 03:29:21 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1947
-----------------------------------------------------------



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2453>

Here,
we should use a member variable inside the LSQUnit instead of the pointer reference to the CPU's member variable.

That way we get:
if (needsTSO)
storeInFlight=false

vs.

if (cpu->needsTSO)
storeInFlight=false



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2454>

Here,
we should use a member variable inside the LSQUnit instead of the pointer reference to the CPU's member variable.

That way we get:
if (needsTSO)
storeInFlight=false

vs.

if (cpu->needsTSO)
storeInFlight=false

(same for line 773)



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.m5sim.org/r/908/#comment2452>

same as above.


- Korey
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-13 04:57:18)
Review request for Default.
Summary
-------
Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Nilay Vaish
2012-01-14 05:22:10 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 773
<http://reviews.m5sim.org/r/908/diff/5/?file=21036#file21036line773>
Here,
we should use a member variable inside the LSQUnit instead of the pointer reference to the CPU's member variable.
if (needsTSO)
storeInFlight=false
vs.
if (cpu->needsTSO)
storeInFlight=false
Why does this matter?


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1947
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-13 04:57:18)
Review request for Default.
Summary
-------
Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Korey Sewell
2012-01-14 05:49:33 UTC
Permalink
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 773
<http://reviews.m5sim.org/r/908/diff/5/?file=21036#file21036line773>
Here,
we should use a member variable inside the LSQUnit instead of the pointer reference to the CPU's member variable.
if (needsTSO)
storeInFlight=false
vs.
if (cpu->needsTSO)
storeInFlight=false
Why does this matter?
In terms of overall runtime of the simulator, it probably wont matter considering all the other things with larger overheads (e.g. event scheduling, memory allocation, etc.)

But isolating just this code snippet, the "cpu->" part of that condition is going to cause unnecessary overhead of looking up that pointer to get to the value of "needsTSO".

It caught my eye because you are doing that 3 times per store:
- once to check if a store's in flight
- once to mark a store in flight
- once to unmark a store in flight

Ideally, things that are actually changing values are worth the pointer reference (e.g. the interruptPending flag) whereas things that are generally constant parameters (e.g. needTSO) aren't worth it.

Again, it's not going to be the biggest deal in terms of overall impact and I'm definitely nitpicking. Just something to consider as you go forward with these O3 patches...


- Korey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/908/#review1947
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.m5sim.org/r/908/
-----------------------------------------------------------
(Updated 2012-01-13 04:57:18)
Review request for Default.
Summary
-------
Changeset 8702:9a5651e7bd5b
---------------------------
O3 LSQ: Implement TSO
This patch makes O3's LSQ maintain total order between stores. Essentially
only the store at the head of the store buffer is allowed to be in flight.
Only after that store completes, the next store is issued to the memory
system.
Diffs
-----
src/cpu/o3/O3CPU.py f348cf78072c
src/cpu/o3/cpu.hh f348cf78072c
src/cpu/o3/cpu.cc f348cf78072c
src/cpu/o3/lsq_unit.hh f348cf78072c
src/cpu/o3/lsq_unit_impl.hh f348cf78072c
Diff: http://reviews.m5sim.org/r/908/diff
Testing
-------
Thanks,
Nilay
Continue reading on narkive:
Loading...