Discussion:
[gem5-dev] Review Request 2183: o3cpu: lsq: Fix TSO implementation
(too old to reply)
Marco Elver
2014-03-05 17:05:00 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2183/
-----------------------------------------------------------

Review request for Default.


Repository: gem5


Description
-------

Changeset 10101:19483fc4f611
---------------------------
o3cpu: lsq: Fix TSO implementation

This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.

Prior to this patch, the following case could be violated:

P0 | P1 ;
MOV [x],$1 | MOV EAX,[y] ;
MOV [y],$1 | MOV EBX,[x] ;

exists (1:EAX=1 /\ 1:EBX=0) [is a violation]

The problem was found using litmus [http://diy.inria.fr].


Diffs
-----

src/cpu/o3/lsq_unit_impl.hh 24cfe67c0749

Diff: http://reviews.gem5.org/r/2183/diff/


Testing
-------

This patch was originally against a much older version of Gem5 (9677:773aae0990ae), where the effect was tested extensively and diy no longer reports violation. This patch was then ported and rebased against the newest Gem5 version, where I only ran quick/se/00.hello/x86/linux/o3-timing. Performance results of parallel workloads will change slightly.


Thanks,

Marco Elver
Nilay Vaish
2014-03-15 19:41:39 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2183/#review4986
-----------------------------------------------------------



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.gem5.org/r/2183/#comment4591>

Do you think checking force_squash in this second if() condition will
work same as checking needsTSO? If yes, then I would suggest that we
check force_squash here.


- Nilay Vaish
Post by Marco Elver
-----------------------------------------------------------
http://reviews.gem5.org/r/2183/
-----------------------------------------------------------
(Updated March 5, 2014, 5:05 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 10101:19483fc4f611
---------------------------
o3cpu: lsq: Fix TSO implementation
This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.
P0 | P1 ;
MOV [x],$1 | MOV EAX,[y] ;
MOV [y],$1 | MOV EBX,[x] ;
exists (1:EAX=1 /\ 1:EBX=0) [is a violation]
The problem was found using litmus [http://diy.inria.fr].
Diffs
-----
src/cpu/o3/lsq_unit_impl.hh 24cfe67c0749
Diff: http://reviews.gem5.org/r/2183/diff/
Testing
-------
This patch was originally against a much older version of Gem5 (9677:773aae0990ae), where the effect was tested extensively and diy no longer reports violation. This patch was then ported and rebased against the newest Gem5 version, where I only ran quick/se/00.hello/x86/linux/o3-timing. Performance results of parallel workloads will change slightly.
Thanks,
Marco Elver
Marco Elver
2014-03-17 14:31:15 UTC
Permalink
Post by Marco Elver
src/cpu/o3/lsq_unit_impl.hh, line 482
<http://reviews.gem5.org/r/2183/diff/1/?file=39331#file39331line482>
Do you think checking force_squash in this second if() condition will
work same as checking needsTSO? If yes, then I would suggest that we
check force_squash here.
The result would be unaffected only iff (load_addr == invalidate_addr) always implies (ld_inst->possibleLoadViolation()). If this is not the case (and I don't think it is), it is not the same.

If this is a stylistic issue, to preserve the current semantics, I propose: move the if() at 483-485 after the if() at 481 and before the if() at 482; then it is valid to replace the use of needsTSO in the currently second if() with force_squash:

...
if (load_addr == invalidate_addr || force_squash) {
if (needsTSO) {
force_squash = true;
}
if (ld_inst->possibleLoadViolation() || force_squash) {
...


The following is also equivalent, but less readable and less flexible (and may lead some to think it is a bug):
...
if (load_addr == invalidate_addr || force_squash) {
if (ld_inst->possibleLoadViolation() || (force_squash = needsTSO)) {
...


- Marco


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2183/#review4986
-----------------------------------------------------------
Post by Marco Elver
-----------------------------------------------------------
http://reviews.gem5.org/r/2183/
-----------------------------------------------------------
(Updated March 5, 2014, 5:05 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 10101:19483fc4f611
---------------------------
o3cpu: lsq: Fix TSO implementation
This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.
P0 | P1 ;
MOV [x],$1 | MOV EAX,[y] ;
MOV [y],$1 | MOV EBX,[x] ;
exists (1:EAX=1 /\ 1:EBX=0) [is a violation]
The problem was found using litmus [http://diy.inria.fr].
Diffs
-----
src/cpu/o3/lsq_unit_impl.hh 24cfe67c0749
Diff: http://reviews.gem5.org/r/2183/diff/
Testing
-------
This patch was originally against a much older version of Gem5 (9677:773aae0990ae), where the effect was tested extensively and diy no longer reports violation. This patch was then ported and rebased against the newest Gem5 version, where I only ran quick/se/00.hello/x86/linux/o3-timing. Performance results of parallel workloads will change slightly.
Thanks,
Marco Elver
Nilay Vaish
2014-03-17 18:39:49 UTC
Permalink
Post by Marco Elver
src/cpu/o3/lsq_unit_impl.hh, line 482
<http://reviews.gem5.org/r/2183/diff/1/?file=39331#file39331line482>
Do you think checking force_squash in this second if() condition will
work same as checking needsTSO? If yes, then I would suggest that we
check force_squash here.
The result would be unaffected only iff (load_addr == invalidate_addr) always implies (ld_inst->possibleLoadViolation()). If this is not the case (and I don't think it is), it is not the same.
...
if (load_addr == invalidate_addr || force_squash) {
if (needsTSO) {
force_squash = true;
}
if (ld_inst->possibleLoadViolation() || force_squash) {
...
I prefer going with this one. In my opinion this is more clear.

--
Nilay
Nilay Vaish
2014-03-18 15:30:19 UTC
Permalink
Marco, can you add some comments to the patch and update it? I'll commit
it sometime later this week.

--
Nilay
Post by Nilay Vaish
Post by Marco Elver
src/cpu/o3/lsq_unit_impl.hh, line 482
<http://reviews.gem5.org/r/2183/diff/1/?file=39331#file39331line482>
Do you think checking force_squash in this second if() condition will
work same as checking needsTSO? If yes, then I would suggest that we
check force_squash here.
The result would be unaffected only iff (load_addr == invalidate_addr)
always implies (ld_inst->possibleLoadViolation()). If this is not the case
(and I don't think it is), it is not the same.
If this is a stylistic issue, to preserve the current semantics, I
propose: move the if() at 483-485 after the if() at 481 and before the
if() at 482; then it is valid to replace the use of needsTSO in the
...
if (load_addr == invalidate_addr || force_squash) {
if (needsTSO) {
force_squash = true;
}
if (ld_inst->possibleLoadViolation() || force_squash) {
...
I prefer going with this one. In my opinion this is more clear.
--
Nilay
Marco Elver
2014-03-24 12:50:59 UTC
Permalink
Finally got round to applying the changes; see latest update to review
request.

Many thanks,

Marco
Marco, can you add some comments to the patch and update it? I'll commit it
sometime later this week.
--
Nilay
Post by Nilay Vaish
src/cpu/o3/lsq_unit_impl.hh, line 482
<http://reviews.gem5.org/r/2183/diff/1/?file=39331#file39331line482>
Post by Nilay Vaish
Do you think checking force_squash in this second if()
condition > > will
work same as checking needsTSO? If yes, then I would suggest
that > > we
check force_squash here.
The result would be unaffected only iff (load_addr == invalidate_addr)
always implies (ld_inst->possibleLoadViolation()). If this is not the case
(and I don't think it is), it is not the same.
If this is a stylistic issue, to preserve the current semantics, I
propose: move the if() at 483-485 after the if() at 481 and before the
if() at 482; then it is valid to replace the use of needsTSO in the
...
if (load_addr == invalidate_addr || force_squash) {
if (needsTSO) {
force_squash = true;
}
if (ld_inst->possibleLoadViolation() || force_squash) {
...
I prefer going with this one. In my opinion this is more clear.
--
Nilay
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Marco Elver
2014-03-24 12:33:26 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2183/
-----------------------------------------------------------

(Updated March 24, 2014, 12:33 p.m.)


Review request for Default.


Changes
-------

Applied suggested change.


Repository: gem5


Description (updated)
-------

Changeset 10149:456195b55e60
---------------------------
o3cpu: lsq: Fix TSO implementation

This patch fixes violation of TSO in the O3CPU, as all loads must be
ordered with all other loads. In the LQ, if a snoop is observed, all
subsequent loads need to be squashed if the system is TSO.

Prior to this patch, the following case could be violated:

P0 | P1 ;
MOV [x],$1 | MOV EAX,[y] ;
MOV [y],$1 | MOV EBX,[x] ;

exists (1:EAX=1 /\ 1:EBX=0) [is a violation]

The problem was found using litmus [http://diy.inria.fr]."


Diffs (updated)
-----

src/cpu/o3/lsq_unit_impl.hh 4574d5882066

Diff: http://reviews.gem5.org/r/2183/diff/


Testing
-------

This patch was originally against a much older version of Gem5 (9677:773aae0990ae), where the effect was tested extensively and diy no longer reports violation. This patch was then ported and rebased against the newest Gem5 version, where I only ran quick/se/00.hello/x86/linux/o3-timing. Performance results of parallel workloads will change slightly.


Thanks,

Marco Elver
Loading...