Discussion:
[gem5-dev] Review Request 3149: ruby: Fix block_on behavior
(too old to reply)
Joel Hestness
2015-10-09 15:15:59 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------

Review request for Default.


Repository: gem5


Description
-------

Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior

Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.

To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.


Diffs
-----

src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c

Diff: http://reviews.gem5.org/r/3149/diff/


Testing
-------

Considered many other potential solutions on gem5-gpu email list, which seem
unlikely to function as desired:
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g

Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )

Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp

Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.


Thanks,

Joel Hestness
Nilay Vaish
2015-10-10 20:30:17 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7347
-----------------------------------------------------------



src/mem/ruby/system/Sequencer.cc (lines 180 - 184)
<http://reviews.gem5.org/r/3149/#comment6205>

I prefer this check being done just before insertRequest() is called.



src/mem/ruby/system/Sequencer.cc (line 585)
<http://reviews.gem5.org/r/3149/#comment6206>

This is the place I am asking the check be moved to.


- Nilay Vaish
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Joel Hestness
2015-10-10 21:58:39 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/Sequencer.cc, lines 180-184
<http://reviews.gem5.org/r/3149/diff/1/?file=50270#file50270line180>
I prefer this check being done just before insertRequest() is called.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a line address out of the request address, which happens just before this, and this check is consistent with other checks in insertRequest. Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that method. Can you clarify why you think it should be moved?


- Joel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7347
-----------------------------------------------------------
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Nilay Vaish
2015-10-11 17:34:23 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/Sequencer.cc, lines 180-184
<http://reviews.gem5.org/r/3149/diff/1/?file=50270#file50270line180>
I prefer this check being done just before insertRequest() is called.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a line address out of the request address, which happens just before this, and this check is consistent with other checks in insertRequest. Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that method. Can you clarify why you think it should be moved?
insertRequest() is meant for deciding whether to put this request into
either of the read / write tables. And I think that's what it should be
doing and nothing more. It should not be the responsibility of the insertRequest()
to check if there is some RMW operation going on the address.
Post by Joel Hestness
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a > line address out of the request address, which happens just before this,
Making line address is just one operation, other than the fact that we use a global pointer to get the block size.
Post by Joel Hestness
and this check is consistent with other checks in insertRequest.
The only check in insertRequest() is which table to put the request into.
Post by Joel Hestness
Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation
of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that
method. Can you clarify why you think it should be moved?
If you are worried about the return type, add a new one.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7347
-----------------------------------------------------------
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Joel Hestness
2015-10-11 22:44:01 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/Sequencer.cc, lines 180-184
<http://reviews.gem5.org/r/3149/diff/1/?file=50270#file50270line180>
I prefer this check being done just before insertRequest() is called.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a line address out of the request address, which happens just before this, and this check is consistent with other checks in insertRequest. Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that method. Can you clarify why you think it should be moved?
insertRequest() is meant for deciding whether to put this request into
either of the read / write tables. And I think that's what it should be
doing and nothing more. It should not be the responsibility of the insertRequest()
to check if there is some RMW operation going on the address.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a > line address out of the request address, which happens just before this,
Making line address is just one operation, other than the fact that we use a global pointer to get the block size.
and this check is consistent with other checks in insertRequest.
The only check in insertRequest() is which table to put the request into.
Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation
of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that
method. Can you clarify why you think it should be moved?
If you are worried about the return type, add a new one.
insertRequest() is meant for deciding whether to put this request into either of the read / write tables. And I think that's what it should be doing and nothing more. It should not be the responsibility of the insertRequest() to check if there is some RMW operation going on the address.
Sorry, I'm still not clear about what you're arguing. This new isBlocked check is also checking whether to put the request into the read or write tables. It does so based on what accesses are currently in progress (i.e. an in-flight Locked_RMW), so it is consistent with the checks already in insertRequest. As I said, makeRequest only does translation from Packets to RubyRequests, so it has very different functionality than this new isBlocked check. To me, it doesn't make sense to move the isBlocked check into makeRequest.

Also of note, the isBlocked condition should only evaluate to true very infrequently, so there isn't any need to optimize the check's location for performance.


- Joel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7347
-----------------------------------------------------------
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Brandon Potter
2015-10-14 17:15:02 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/Sequencer.cc, lines 180-184
<http://reviews.gem5.org/r/3149/diff/1/?file=50270#file50270line180>
I prefer this check being done just before insertRequest() is called.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a line address out of the request address, which happens just before this, and this check is consistent with other checks in insertRequest. Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that method. Can you clarify why you think it should be moved?
insertRequest() is meant for deciding whether to put this request into
either of the read / write tables. And I think that's what it should be
doing and nothing more. It should not be the responsibility of the insertRequest()
to check if there is some RMW operation going on the address.
I considered your request, but I don't understand why this check should be moved. To check whether the line is blocked requires making a > line address out of the request address, which happens just before this,
Making line address is just one operation, other than the fact that we use a global pointer to get the block size.
and this check is consistent with other checks in insertRequest.
The only check in insertRequest() is which table to put the request into.
Namely, this checks if there is an outstanding operation for the line. Sequencer::makeRequest() just does the translation
of the packet to a RubyRequest, so returning from there with RequestStatus_Aliased is inconsistent with the functionionality of that
method. Can you clarify why you think it should be moved?
If you are worried about the return type, add a new one.
insertRequest() is meant for deciding whether to put this request into either of the read / write tables. And I think that's what it should be doing and nothing more. It should not be the responsibility of the insertRequest() to check if there is some RMW operation going on the address.
Sorry, I'm still not clear about what you're arguing. This new isBlocked check is also checking whether to put the request into the read or write tables. It does so based on what accesses are currently in progress (i.e. an in-flight Locked_RMW), so it is consistent with the checks already in insertRequest. As I said, makeRequest only does translation from Packets to RubyRequests, so it has very different functionality than this new isBlocked check. To me, it doesn't make sense to move the isBlocked check into makeRequest.
Also of note, the isBlocked condition should only evaluate to true very infrequently, so there isn't any need to optimize the check's location for performance.
Joel, if you decide to keep this inside insertRequest() then you might move it up two lines so that it is called before the default_entry is created. It's really minor, but it doesn't make sense to create the default_entry before the check.


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7347
-----------------------------------------------------------
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Brandon Potter
2015-10-15 18:01:28 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7377
-----------------------------------------------------------



src/mem/ruby/system/Sequencer.cc (line 393)
<http://reviews.gem5.org/r/3149/#comment6226>

There's no reason that I can see why this should be a std::map. The mandatory_q_ptr that is passed in is not used by anything at all. (The only checks now are on the address and there is no action taken on the mandatory_q. Maybe I am missing something?)

Can we change the tracking structure to reflect that this is really a vector of addresses for the memory controller that we should be checking instead of a map? Actually, there's only a single address with this new modification since the request will be nack'd back to the requester.


- Brandon Potter
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Brandon Potter
2015-10-15 18:45:34 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/Sequencer.cc, line 393
<http://reviews.gem5.org/r/3149/diff/1/?file=50270#file50270line393>
There's no reason that I can see why this should be a std::map. The mandatory_q_ptr that is passed in is not used by anything at all. (The only checks now are on the address and there is no action taken on the mandatory_q. Maybe I am missing something?)
Can we change the tracking structure to reflect that this is really a vector of addresses for the memory controller that we should be checking instead of a map? Actually, there's only a single address with this new modification since the request will be nack'd back to the requester.
Sorry, disregard that last part of the issue where I was suggesting that there only needs to be a single address which can be blocked on. We'd still need a vector or something equivalent to track multiple addresses to block on.


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review7377
-----------------------------------------------------------
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 9, 2015, 3:15 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11157:65d87638830f
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.hh bd894d2bdd7c
src/mem/ruby/slicc_interface/AbstractController.cc bd894d2bdd7c
src/mem/ruby/system/Sequencer.cc bd894d2bdd7c
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Joel Hestness
2015-10-16 13:55:37 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------

(Updated Oct. 16, 2015, 1:55 p.m.)


Review request for Default.


Changes
-------

Move default_entry creation after check per Brandon's suggestion.


Repository: gem5


Description (updated)
-------

Changeset 11169:0a83a8d08c9e
---------------------------
ruby: Fix block_on behavior

Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.

To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.


Diffs (updated)
-----

src/mem/ruby/slicc_interface/AbstractController.cc 207d6f2f1d53
src/mem/ruby/system/Sequencer.cc 207d6f2f1d53
src/mem/ruby/slicc_interface/AbstractController.hh 207d6f2f1d53

Diff: http://reviews.gem5.org/r/3149/diff/


Testing
-------

Considered many other potential solutions on gem5-gpu email list, which seem
unlikely to function as desired:
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g

Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )

Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp

Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.


Thanks,

Joel Hestness
Brandon Potter
2016-04-11 19:22:24 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review8167
-----------------------------------------------------------

Ship it!


Ship It!


src/mem/ruby/system/Sequencer.cc (line 179)
<http://reviews.gem5.org/r/3149/#comment7022>

Can you add a comment to the code here (or somewhere in this function) that describes how the RequestStatus_Aliased actually fixes the problem.

I looked through the code and eventually figured it out, but it took some time and was a bit convoluted.

This return goes back up into makeRequest with Aliased which eventually becomes visible in RubyPort::MemSlavePort::recvTimingResp and it re-enqueues the port for a retry and returns false to the owner of the port (CPU) which denotes that it failed. The (owner) CPU is expected to wait for the RubyPort to trigger the retry (which happens when the port because available again when RubyPort executes RubyPort::ruby_hit_callback).

The confusing thing was the the O3CPU model has seperate ports for instruction and data which is what allows this to work with the IFETCH. The MemSlavePort corresponding to the i-cache is what gets stalled, but the MemSlavePort associated with the d-cache is still open and allows the RMW_Write to proceed.

I don't know what a good summary would look like, but it would be nice if there was something here to hint at what's happening.



src/mem/ruby/system/Sequencer.cc (line 392)
<http://reviews.gem5.org/r/3149/#comment7019>

Can you add a comment here that the 1st level controller is blocked for all message buffers except the mandatory queue?

The name of the function, blockOnQueue, can be misleading depending on how the reader interprets it; one might just glance at this code and think that the mandatory queue gets blocked when in reality it's all of the other message buffers.


- Brandon Potter
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 16, 2015, 1:55 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11169:0a83a8d08c9e
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.cc 207d6f2f1d53
src/mem/ruby/system/Sequencer.cc 207d6f2f1d53
src/mem/ruby/slicc_interface/AbstractController.hh 207d6f2f1d53
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Brad Beckmann
2016-04-13 19:02:51 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3149/#review8213
-----------------------------------------------------------

Ship it!


Thank you Joel for fixing this bug and posting this patch. Those of us at AMD have encountered this bug as well and some of us are already using a version of this patch. It would be great if you could check this in soon so that we do not have to maintain your fix locally. Please check in this patch as soon as you can.

Thanks!

- Brad Beckmann
Post by Joel Hestness
-----------------------------------------------------------
http://reviews.gem5.org/r/3149/
-----------------------------------------------------------
(Updated Oct. 16, 2015, 1:55 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11169:0a83a8d08c9e
---------------------------
ruby: Fix block_on behavior
Ruby's controller block_on behavior aimed to block MessageBuffer requests into
SLICC controllers when a Locked_RMW was in flight. Unfortunately, this
functionality only partially works: When non-Locked_RMW memory accesses are
issued to the sequencer to an address with an in-flight Locked_RMW, the
sequencer may pass those accesses through to the controller. At the controller,
a number of incorrect activities can occur depending on the protocol. In
MOESI_hammer, for example, an intermediate IFETCH will cause an L1D to L2
transfer, which cannot be serviced, because the block_on functionality blocks
the trigger queue, resulting in a deadlock. Further, if an intermediate store
arrives (e.g. from a separate SMT thread), the sequencer allows the request
through to the controller, and the atomicity of the Locked_RMW may be broken.
To avoid these problems, disallow the Sequencer from passing any memory
accesses to the controller besides Locked_RMW_Write when a Locked_RMW is in-
flight.
Diffs
-----
src/mem/ruby/slicc_interface/AbstractController.cc 207d6f2f1d53
src/mem/ruby/system/Sequencer.cc 207d6f2f1d53
src/mem/ruby/slicc_interface/AbstractController.hh 207d6f2f1d53
Diff: http://reviews.gem5.org/r/3149/diff/
Testing
-------
Considered many other potential solutions on gem5-gpu email list, which seem
https://groups.google.com/forum/#!topic/gem5-gpu-dev/RQv4SxIKv7g
Found reproducible version of the IFETCH bug with gem5 11139:bd894d2bdd7c (using the xsave disable patch in this email thread: http://comments.gmane.org/gmane.comp.emulators.m5.devel/24558 )
Reproducible command line: ../build/X86_MOESI_hammer/gem5.opt --outdir=$outdir ../configs/example/fs.py --ruby --cpu-type=detailed --caches --num-cpus=4 --script ../configs/boot/hack_back_ckpt.rcS --kernel ../../full_system_files/binaries/x86_64-vmlinux-2.6.28.4-smp
Verified that the patch fixes the reproducible bug and tested that booting
Linux works with O3CPU and numerous other system configurations.
Thanks,
Joel Hestness
Loading...