Discussion:
[gem5-dev] FW: Review Request 3580: config, cpu, mem,
(too old to reply)
Hashe, David
2016-07-22 19:37:49 UTC
Permalink
Forwarded because I forgot to subscribe to the list before publishing this patch.

From: David Hashe [mailto:***@gem5.org] On Behalf Of David Hashe
Sent: Friday, July 22, 2016 2:20 PM
To: Hashe, David <***@amd.com>; Default <gem5-***@gem5.org>
Subject: Review Request 3580: config, cpu, mem, sim: kvm acceleration for apu_se.py

This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/3580/


Review request for Default.
By David Hashe.
Repository: gem5
Description

Changeset 11562:7375e1f533fa

---------------------------

config, cpu, mem, sim: kvm acceleration for apu_se.py



Add support for using KVM to accelerate APU simulations. The intended use

case is to fast-forward through runtime initialization until the first

kernel launch.


Diffs

* src/sim/system.cc (704b0198f747b766b839c577614eb2924fd1dfee)
* src/mem/physical.hh (704b0198f747b766b839c577614eb2924fd1dfee)
* src/mem/physical.cc (704b0198f747b766b839c577614eb2924fd1dfee)
* src/mem/ruby/system/Sequencer.cc (704b0198f747b766b839c577614eb2924fd1dfee)
* configs/example/apu_se.py (704b0198f747b766b839c577614eb2924fd1dfee)
* src/cpu/kvm/vm.cc (704b0198f747b766b839c577614eb2924fd1dfee)

View Diff<http://reviews.gem5.org/r/3580/diff/>
Jason Lowe-Power
2016-07-25 14:38:41 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8532
-----------------------------------------------------------


Maybe this should be two patches? One for the APU changes and one for the changes in to the physical memory. Why are the physical memory changes needed?

Seems like maybe this should be "Enable KVM support for Ruby" and "kvm acceleration for apu...".

Also, does KVM+Ruby+FS work now? Or just KVM+Ruby+SE? Or does this patch only make KVM+Ruby+SE work if you're using the APU?

Thanks! I'm excited to see KVM+Ruby support coming.

- Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 22, 2016, 7:20 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
config, cpu, mem, sim: kvm acceleration for apu_se.py
Add support for using KVM to accelerate APU simulations. The intended use
case is to fast-forward through runtime initialization until the first
kernel launch.
Diffs
-----
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
configs/example/apu_se.py 704b0198f747b766b839c577614eb2924fd1dfee
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-07-26 23:22:11 UTC
Permalink
Post by Jason Lowe-Power
Post by Jason Lowe-Power
Maybe this should be two patches? One for the APU changes and one for the changes in to the physical memory. Why are the physical memory changes needed?
Seems like maybe this should be "Enable KVM support for Ruby" and "kvm acceleration for apu...".
Also, does KVM+Ruby+FS work now? Or just KVM+Ruby+SE? Or does this patch only make KVM+Ruby+SE work if you're using the APU?
Thanks! I'm excited to see KVM+Ruby support coming.
I've gone ahead and split it into two patches along those lines (this patch for the src changes and /r/3584 for the config changes).

The physical memory changes are needed so that KVM works with the --access-backing-store option, which creates a second copy of main memory for functional accesses. Previously, KVM would try to map all physical memories into the guest physical address range, which caused an error because it would map both copies of main memory onto the same range. Now KVM detects this situation using a heuristic and only maps the copy used for functional accesses onto the guest.

So far I've only tested it with the apu_se.py script, but none of the C++ changes specifically rely on using an APU. I think that writing similar configuration code for example/se.py would work. I have not tested it with FS.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8532
-----------------------------------------------------------
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 22, 2016, 7:20 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
config, cpu, mem, sim: kvm acceleration for apu_se.py
Add support for using KVM to accelerate APU simulations. The intended use
case is to fast-forward through runtime initialization until the first
kernel launch.
Diffs
-----
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
configs/example/apu_se.py 704b0198f747b766b839c577614eb2924fd1dfee
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-07-26 23:22:13 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------

(Updated July 26, 2016, 11:22 p.m.)


Review request for Default.


Changes
-------

Split config changes into a separate patch


Summary (updated)
-----------------

cpu, mem, sim: Enable KVM support for Ruby


Repository: gem5


Description (updated)
-------

Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby

Use heuristic to avoid mapping both main memory and its copy when
--access-backing-store is specified.

Remember whether a BackingStoreEntry is in the global address map.

Fix bug causing incomplete draining of Ruby Sequencer.

Skip mapping ranges reserved by KVM.


Diffs (updated)
-----

src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee

Diff: http://reviews.gem5.org/r/3580/diff/


Testing
-------


Thanks,

David Hashe
Jason Lowe-Power
2016-07-27 14:32:48 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8549
-----------------------------------------------------------


Thanks for the patch. I'm happy to see the backing store made into a class instead of a simple pair. I have a patch to do something similar in my queue.

A few things below. I still this this patch needs to be broken up further :). We should aim to keep each patch a single logical change. Mayby 1) Backing store refactoring, 2) Ruby drain fix, 3) Add reserved range for... Though, I'm flexible on this.

The only big question I have is what is the "reserved" range for and why is it hardcoded?


src/cpu/kvm/vm.cc (line 381)
<http://reviews.gem5.org/r/3580/#comment7433>

What is this range reservered for? x86 I/O? I don't really like that this is hardcoded here. Though I don't have a suggestion as to how to fix it.



src/mem/physical.hh (line 145)
<http://reviews.gem5.org/r/3580/#comment7434>

Maybe make is_in_addr_map default to true. Then you'll have fewer changes in physical.cc. This is up to you, though.



src/mem/ruby/system/Sequencer.cc (line 254)
<http://reviews.gem5.org/r/3580/#comment7435>

This should be in a seperate patch. Though, I'm OK with you splitting this out before pushing and leaving it here for the review.



src/sim/system.cc (line 330)
<http://reviews.gem5.org/r/3580/#comment7436>

Again.. what is this range reserved for? Why are we hardcoding it?

Also, how does this change relate to the KVM changes?


- Jason Lowe-Power
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 26, 2016, 11:22 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Use heuristic to avoid mapping both main memory and its copy when
--access-backing-store is specified.
Remember whether a BackingStoreEntry is in the global address map.
Fix bug causing incomplete draining of Ruby Sequencer.
Skip mapping ranges reserved by KVM.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-05 01:39:26 UTC
Permalink
Post by Jason Lowe-Power
Post by Jason Lowe-Power
Thanks for the patch. I'm happy to see the backing store made into a class instead of a simple pair. I have a patch to do something similar in my queue.
A few things below. I still this this patch needs to be broken up further :). We should aim to keep each patch a single logical change. Mayby 1) Backing store refactoring, 2) Ruby drain fix, 3) Add reserved range for... Though, I'm flexible on this.
The only big question I have is what is the "reserved" range for and why is it hardcoded?
Thanks for the review. Sorry for taking so long to get back to you, but hopefully these changes improve the patch.
Post by Jason Lowe-Power
Post by Jason Lowe-Power
src/sim/system.cc, line 330
<http://reviews.gem5.org/r/3580/diff/2/?file=57357#file57357line330>
Again.. what is this range reserved for? Why are we hardcoding it?
Also, how does this change relate to the KVM changes?
I'm not entirely sure, but without blocking the use of this range KVM crashes immediately if the --mem-size is greater than 4095MB. This is reproducible on both Ubuntu 14.04 and Centos 6. The crash does not happen unless KVM is used.

We're hardcoding it because Ruby does not accept split AddrRanges for main memory, so the fix can't be placed in the configuration files.

However, since the problem is KVM- and possibly x86-specific and I'm not sure exactly why this works, I'm okay with leaving this change out until we have a better idea what the problem is.
Post by Jason Lowe-Power
Post by Jason Lowe-Power
src/mem/ruby/system/Sequencer.cc, line 254
<http://reviews.gem5.org/r/3580/diff/2/?file=57356#file57356line254>
This should be in a seperate patch. Though, I'm OK with you splitting this out before pushing and leaving it here for the review.
Good idea. I'll make sure it's split out before it's committed.
Post by Jason Lowe-Power
Post by Jason Lowe-Power
src/mem/physical.hh, line 145
<http://reviews.gem5.org/r/3580/diff/2/?file=57354#file57354line145>
Maybe make is_in_addr_map default to true. Then you'll have fewer changes in physical.cc. This is up to you, though.
Good idea. I updated the patch to use a seperate parameter, kvm_map, which makes the patch overall a lot neater. It defaults to True as a Python configuration parameter for AbstractMemory, so unfortunately I can't make use of this particular trick.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8549
-----------------------------------------------------------
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 26, 2016, 11:22 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Use heuristic to avoid mapping both main memory and its copy when
--access-backing-store is specified.
Remember whether a BackingStoreEntry is in the global address map.
Fix bug causing incomplete draining of Ruby Sequencer.
Skip mapping ranges reserved by KVM.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Hansson
2016-07-27 16:43:56 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8551
-----------------------------------------------------------


I would really prefer if we didn't have to add this complexity.

I thought we were pretty much in a position where we could remove the horrible shadow memory hack. Could we not pursue that route rather? It would be a much better solution long term.

- Andreas Hansson
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 26, 2016, 11:22 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Use heuristic to avoid mapping both main memory and its copy when
--access-backing-store is specified.
Remember whether a BackingStoreEntry is in the global address map.
Fix bug causing incomplete draining of Ruby Sequencer.
Skip mapping ranges reserved by KVM.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-05 01:39:29 UTC
Permalink
Post by Jason Lowe-Power
Post by Andreas Hansson
I would really prefer if we didn't have to add this complexity.
I thought we were pretty much in a position where we could remove the horrible shadow memory hack. Could we not pursue that route rather? It would be a much better solution long term.
That may be a better longterm solution, but until then this is necessary to get KVM working with the APU model. I rewrote the patch to remove the most complicated C++ changes. Now, instead of detecting the shadow memory using a heuristic, which was admittedly a bit overcomplicated and overspecific to the ruby shadow memory, we add a parameter to AbstractMemory explicitly saying whether a memory should be mapped by KVM.

This has several advantages. Now the only code tying this patch to the shadow memory is in the configuration files. It is also generally useful to have a way to stop KVM from accessing certain memories. For example, if a hypothetical configuration script created a separate memory for a device that could not be accessed directly by the CPU, then it would be pointless for KVM to map that memory, and doing so could even introduce difficult-to-detect errors if KVM were to erroneously write to that memory during acceleration. With this patch it would be possible to avoid that by telling KVM not to map the memory region.

Is this acceptable?


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8551
-----------------------------------------------------------
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated July 26, 2016, 11:22 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Use heuristic to avoid mapping both main memory and its copy when
--access-backing-store is specified.
Remember whether a BackingStoreEntry is in the global address map.
Fix bug causing incomplete draining of Ruby Sequencer.
Skip mapping ranges reserved by KVM.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/sim/system.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Jason Lowe-Power
2016-08-05 13:17:19 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I would really prefer if we didn't have to add this complexity.
I thought we were pretty much in a position where we could remove the horrible shadow memory hack. Could we not pursue that route rather? It would be a much better solution long term.
That may be a better longterm solution, but until then this is necessary to get KVM working with the APU model. I rewrote the patch to remove the most complicated C++ changes. Now, instead of detecting the shadow memory using a heuristic, which was admittedly a bit overcomplicated and overspecific to the ruby shadow memory, we add a parameter to AbstractMemory explicitly saying whether a memory should be mapped by KVM.
This has several advantages. Now the only code tying this patch to the shadow memory is in the configuration files. It is also generally useful to have a way to stop KVM from accessing certain memories. For example, if a hypothetical configuration script created a separate memory for a device that could not be accessed directly by the CPU, then it would be pointless for KVM to map that memory, and doing so could even introduce difficult-to-detect errors if KVM were to erroneously write to that memory during acceleration. With this patch it would be possible to avoid that by telling KVM not to map the memory region.
Is this acceptable?
I agree that this feature makes sense. For instance, I had some similar code for simulating a system with a DRAM cache with KVM. With at least two reasons for this I think it passes the bar for being in mainline gem5 :).


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8551
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 1:39 a.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-05 01:39:33 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------

(Updated Aug. 5, 2016, 1:39 a.m.)


Review request for Default.


Changes
-------

Add kvm_map parameter to AbstractMemory.
Store kvm_map in BackingStoreEntry rather than is_in_addr_map.
Simplify KVM mapping logic by using kvm_map.
Remove change to avoid allocating pages within a reserved range.


Repository: gem5


Description (updated)
-------

Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby

Only map memories into the KVM guest address space that are
marked as usable by KVM.

Remember whether a BackingStoreEntry should be mapped by KVM.

Fix bug causing incomplete draining of Ruby Sequencer.


Diffs (updated)
-----

src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee

Diff: http://reviews.gem5.org/r/3580/diff/


Testing
-------


Thanks,

David Hashe
Andreas Hansson
2016-08-05 07:15:47 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------


I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?


src/mem/physical.hh (line 67)
<http://reviews.gem5.org/r/3580/#comment7453>

in the guest



src/mem/physical.hh (line 72)
<http://reviews.gem5.org/r/3580/#comment7454>

perhaps beef this up a bit:

the host memory?

same size as range?



src/mem/physical.hh (line 77)
<http://reviews.gem5.org/r/3580/#comment7456>

Again not a very descriptive parameter.



src/mem/physical.hh (line 79)
<http://reviews.gem5.org/r/3580/#comment7455>

whitespace line



src/mem/physical.cc (line 85)
<http://reviews.gem5.org/r/3580/#comment7457>

I find the control flow here very odd. The kvm_addr_map is initialised and set on a path different to where it is used? What memory is it affecting? Why is it sticky? What if one is not kvm mapped and the others are etc?



src/mem/physical.cc (line 142)
<http://reviews.gem5.org/r/3580/#comment7458>

Same oddity with the configuration here. We should check each and every memory.



src/mem/ruby/system/Sequencer.cc (line 254)
<http://reviews.gem5.org/r/3580/#comment7459>

This seems completely unrelated.


- Andreas Hansson
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 1:39 a.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/ruby/system/Sequencer.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-05 21:37:34 UTC
Permalink
Post by Jason Lowe-Power
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Post by Jason Lowe-Power
Post by Andreas Hansson
src/mem/physical.cc, line 142
<http://reviews.gem5.org/r/3580/diff/3/?file=57445#file57445line142>
Same oddity with the configuration here. We should check each and every memory.
I've changed it to now do that. It doesn't make sense to kvmMap some members of an interleaved group and not others, so it is now a fatal error if the values don't all match.
Post by Jason Lowe-Power
Post by Andreas Hansson
src/mem/physical.cc, line 85
<http://reviews.gem5.org/r/3580/diff/3/?file=57445#file57445line85>
I find the control flow here very odd. The kvm_addr_map is initialised and set on a path different to where it is used? What memory is it affecting? Why is it sticky? What if one is not kvm mapped and the others are etc?
I was confused about how the backing stores were created for memories added to addrMap. I've updated the patch to check each memory individually.
Post by Jason Lowe-Power
Post by Andreas Hansson
src/mem/ruby/system/Sequencer.cc, line 254
<http://reviews.gem5.org/r/3580/diff/3/?file=57446#file57446line254>
This seems completely unrelated.
Removed.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Brandon Potter
2016-08-09 22:43:59 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.

__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)

__Why does Ruby need a shadow copy?__

Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.

__What is a functional access?__

A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.

__What's different about functional accesses compared with timing accesses?__

The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.

Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.

__Why can't Ruby handle functional accesses without the shadow copy?__

Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.

Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.

__How does the shadow copy solve the problem?__

The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".

System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)

If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)

__What happens if we do not have a shadow copy?__

The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Brandon Potter
2016-08-10 01:01:27 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__

Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.

Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Hansson
2016-08-10 07:42:25 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.

The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.

At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.

The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Brad Beckmann
2016-08-12 06:13:17 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.
The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.
At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.
The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.
This has been a great discussion and I woudl ike to see it continue. However, I think we all should conclude the discussion is somewaht orthogonal to this current patch. The current patch fixes an immediate problem using KVM with Ruby. It has gone through 4 revisions and David responded to many comments and suggestions. Can we please check it in now?

Brandon is right on regarding the tester gap. However, we should be clear on the capablities of the current testers and we should all agree this gap is not unique to Ruby. The current memtest randomizes racy accesses, but only really checks single-writer/multiple-reader cohernce. The current rubytest further stresses races, but only checks SC execution. I'm less familar with the MemChecker, but I believe it siimply monitors a exectuion and ensures SC execution. Please correct me if I'm wrong, but I don't believe it has any concept of an acquire/ld fence or a release/st fence. Bradon's frustration with our recent GPU protocols is due to a lack of a relaxed consistency model checker in gem5. We developed such a tester a few years ago, however that work stopped and I have not been able to find someone to take over the development. It is hard and complicated work. There simply isn't many people that can do it. If anyone is interested, please let me know.


- Brad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Hansson
2016-08-12 07:46:34 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.
The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.
At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.
The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.
This has been a great discussion and I woudl ike to see it continue. However, I think we all should conclude the discussion is somewaht orthogonal to this current patch. The current patch fixes an immediate problem using KVM with Ruby. It has gone through 4 revisions and David responded to many comments and suggestions. Can we please check it in now?
Brandon is right on regarding the tester gap. However, we should be clear on the capablities of the current testers and we should all agree this gap is not unique to Ruby. The current memtest randomizes racy accesses, but only really checks single-writer/multiple-reader cohernce. The current rubytest further stresses races, but only checks SC execution. I'm less familar with the MemChecker, but I believe it siimply monitors a exectuion and ensures SC execution. Please correct me if I'm wrong, but I don't believe it has any concept of an acquire/ld fence or a release/st fence. Bradon's frustration with our recent GPU protocols is due to a lack of a relaxed consistency model checker in gem5. We developed such a tester a few years ago, however that work stopped and I have not been able to find someone to take over the development. It is hard and complicated work. There simply isn't many people that can do it. If anyone is interested, please let me know.
The MemChecker does what you want. At least that's my understanding. Stephan or Marco can comment further.

I think we should make sure we get this right, as it's already quite complex code with the various types of memories. Furthermore, once it's committed we traditionally have limited success in getting design changes done.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Brad Beckmann
2016-08-12 14:49:24 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.
The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.
At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.
The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.
This has been a great discussion and I woudl ike to see it continue. However, I think we all should conclude the discussion is somewaht orthogonal to this current patch. The current patch fixes an immediate problem using KVM with Ruby. It has gone through 4 revisions and David responded to many comments and suggestions. Can we please check it in now?
Brandon is right on regarding the tester gap. However, we should be clear on the capablities of the current testers and we should all agree this gap is not unique to Ruby. The current memtest randomizes racy accesses, but only really checks single-writer/multiple-reader cohernce. The current rubytest further stresses races, but only checks SC execution. I'm less familar with the MemChecker, but I believe it siimply monitors a exectuion and ensures SC execution. Please correct me if I'm wrong, but I don't believe it has any concept of an acquire/ld fence or a release/st fence. Bradon's frustration with our recent GPU protocols is due to a lack of a relaxed consistency model checker in gem5. We developed such a tester a few years ago, however that work stopped and I have not been able to find someone to take over the development. It is hard and complicated work. There simply isn't many people that can do it. If anyone is interested, please let me know.
The MemChecker does what you want. At least that's my understanding. Stephan or Marco can comment further.
I think we should make sure we get this right, as it's already quite complex code with the various types of memories. Furthermore, once it's committed we traditionally have limited success in getting design changes done.
Andreas, even if MemChecker does what you think it does, that is orthogonal to this patch. This patch fixes a current problem with KVM and Ruby. We will likely never get rid of the backing store for Ruby, even with a perfect checker.

Do I understand your resistance to checking in this patch, correctly? You don't want us to check this in and instead want us to do it "right" by spending 9-12 man-months of effort to remove the Ruby backing store? Do you fully appreciate the amount of work required? Do you undertand all the benefits of the backing store as Brandon and previously Joel outlined?


- Brad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Hansson
2016-08-12 14:52:04 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.
The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.
At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.
The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.
This has been a great discussion and I woudl ike to see it continue. However, I think we all should conclude the discussion is somewaht orthogonal to this current patch. The current patch fixes an immediate problem using KVM with Ruby. It has gone through 4 revisions and David responded to many comments and suggestions. Can we please check it in now?
Brandon is right on regarding the tester gap. However, we should be clear on the capablities of the current testers and we should all agree this gap is not unique to Ruby. The current memtest randomizes racy accesses, but only really checks single-writer/multiple-reader cohernce. The current rubytest further stresses races, but only checks SC execution. I'm less familar with the MemChecker, but I believe it siimply monitors a exectuion and ensures SC execution. Please correct me if I'm wrong, but I don't believe it has any concept of an acquire/ld fence or a release/st fence. Bradon's frustration with our recent GPU protocols is due to a lack of a relaxed consistency model checker in gem5. We developed such a tester a few years ago, however that work stopped and I have not been able to find someone to take over the development. It is hard and complicated work. There simply isn't many people that can do it. If anyone is interested, please let me know.
The MemChecker does what you want. At least that's my understanding. Stephan or Marco can comment further.
I think we should make sure we get this right, as it's already quite complex code with the various types of memories. Furthermore, once it's committed we traditionally have limited success in getting design changes done.
Andreas, even if MemChecker does what you think it does, that is orthogonal to this patch. This patch fixes a current problem with KVM and Ruby. We will likely never get rid of the backing store for Ruby, even with a perfect checker.
Do I understand your resistance to checking in this patch, correctly? You don't want us to check this in and instead want us to do it "right" by spending 9-12 man-months of effort to remove the Ruby backing store? Do you fully appreciate the amount of work required? Do you undertand all the benefits of the backing store as Brandon and previously Joel outlined?
I think you've got this all wrong.

The current patch only adds a visibility switch, and all I'm asking is that we document how the various switches combine, and what combinations make sense, and where they are used.

The other patch which mixes atomic/functional needs further discussion.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Brad Beckmann
2016-08-18 00:08:45 UTC
Permalink
Post by David Hashe
Post by Andreas Hansson
I see how this works as a stop gap, but ultimately I would like to push for the removal of the shadow memory as the first option. Is it really that much effort?
I'm not personally familiar enough with why the shadow memory is needed to be able to say how much effort it would take to remove, but I believe so.
Providing background since some might not be familiar with the problem.
__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ (Joel Hestness’ comment)
__Why does Ruby need a shadow copy?__
Ruby needs the shadow copy to allow it to do functional accesses in situations where it would normally fail. Functional accesses are generated by system calls or by devices to do functional loading and storing to hack around deficiencies in the device model or runtime.
__What is a functional access?__
A functional access is a memory access that immediately resolves in the memory system. Typically, this involves updating the data value of the memory location without generating any events that go into the event queue. The result is that the memory values appear to have been updated magically without ever creating the events that it would have needed to create if it was operating in the normal manner.
__What's different about functional accesses compared with timing accesses?__
The difference is that functional accesses must complete immediately before returning control back over to the simulation. For example, system calls are executed in an X86 system when the processor executes either 'int 0x80' or 'syscall'. In SE mode, the system call invocation and all of the resulting loads and stores must be completed by the time that we return control back to the simulated process. That single 'syscall' instruction that the processor executes is supposed to represent an entire set of instructions, many of them necessary loads and stores, that would have executed if we were running the code in a real system with an actual kernel.
Timing accesses, on the other hand, are sent through the cache hierarchy and represent what would happen in a "real" system. For timing accesses, the processor creates events that get put into the event queue and are resolved at specific ticks according to the memory model associated with the simulation. Each memory event can generate subsequent events which may or may not modify the cache state and memory state of the simulated system.
__Why can't Ruby handle functional accesses without the shadow copy?__
Well it could handle function accesses without the shadow copy, but it's difficult to implement properly for most protocols. The shadow copy has been considered to be an acceptable crutch to allow protocol writers to avoid the complexities associated with verifying that their protocol is data correct.
Consider the following case: a read request comes into an L1 cache and is about to evict a cache line to be sent to a downstream L2 cache. The eviction is represented by a series of state transitions in Ruby to handle moving the stale data out of the L1 into the L2 or possibly a temporary buffer before copying the new data into the L1 cache. There may be several intermediate states needed to complete transition which are termed transient states. While the cache line’s state machine is in a transient state, data cannot be read or written to the cache line. (Ruby has an assertion in the code to protect against reads on lines that must be due to some of the data being "busy".) The asserts were added because the evicted, old data likely resides in some temporary data structure(s) which are likely not easy to access and update (i.e. MSHR, write buffer, message buffer, request packets, etc.). That doesn’t mean that it’s conceptually impossible to update all of this temporary data; it’s just difficult to do in most cases.
__How does the shadow copy solve the problem?__
The "--access-backing-store" option solves the problem by caching data in a shadow copy of the system memory. __All functional accesses are sent to this shadow copy instead of being directed to the normal, default system memory. Also, hit callbacks from the memory slave ports (which belong to the sequencers that created the request) will write (or read) data into (from) the shadow copy during the hit callback. If I am not mistaken, the hit callback on the memory slave port is equivalent to an L1 hit meaning that the request completed traversing the cache subsystem. In traversing the cache subsystem, the request did touch the default memory through normal behavior, but any returning information carried by the packet will be discarded in favor of what’s in the shadow copy. If data is read from the shadow copy, the request packet (again issued by a timing request) is updated to reflect the shadow copy’s value before the packet is finally handed back to the sequencer.__ The interesting code can be found by searching for "access_backing_store" in "RubyPort.cc".
System call instructions have an ordering semantic that prevents them from being executed before all of the preceding instructions have executed. The ordering semantic protect us from clobbering and/or missing timing accesses with subsequent functional access during the system call. The key thing which protects us here is that the Ruby sequencer needs to tell the processor that the instruction has finished. This cannot happen until the L1's hit callback has returned ensuring that the shadow copy has seen the timing accesses. (Need to verify this by looking through that code, but believe that’s true from previous experience.)
If that’s true, than other functional accesses need to be careful in how they issue instructions or we may see consistency issues caused by value reordering from the cache hierarchy. For instance, consider what might happen if the system call did not have the ordering property. It would be possible to the system call instruction to issue functional accesses to the shadow copy before still active timing accesses were seen by it. (There's no way that the processor could prevent the accesses from occurring by checking normal data dependencies because all it sees is a single instruction: syscall or int0x80.) So, I am a bit wary of seeing functional accesses in weird places. For instance, I wouldn’t embed a functional access into a normal instruction. (I don't know if anyone has ever tried that or if it's even possible, but it would be a bad idea. There might be a magic instruction which does this or someone might try to do it in the future.)
__What happens if we do not have a shadow copy?__
The behavior without a shadow copy of memory (i.e. no --access-backing-store) is kind of interesting. It highlights why we need the shadow copy in the first place (see RubySystem.cc::functional_read/write). Essentially, the functional_writes will always succeed by attempting to write to as much of their state as possible. However, functional_reads can (and will) fail. It’s not completely obvious, but I am confident that the failures stem from the cache lines returning “busy” states caused by recent transitions in the cache hierarchy. (It seems that this is what Nilay is referring to in his summary for reviews.gem5.org/r/2466.)
__Is it possible to remove the shadow copy?__
Yes, it is possible, but it requires a lot of work; more work than most people can reasonably be expected to contribute as an unrelated patch. The solution requires that the protocols are data correct; this entails making all of the functional accesses propagate correctly through temporary variables. Even if it is possible to remove for existing public protcools, it's likely that the protocol developers will want to retain this functionality to help with developing new protocols. Even if that's done, I suspect heavy resistance if we tried to force other developers with private protocols to insure that their protocols are data correct even in the face of functional accesses. It's my understanding that the folks here at AMD aren't the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
Generally, we need better random memory testers to exercise the protocols and uncover problems. In my opinion, that should be the main priority for Ruby developers. I don't have much confidence in running new workloads if the simulation relies on Ruby; the protocols just aren't tested well enough. This memory tester needs to issue functional accesses as well as timing accesses to actually test whether the protocols are always data correct. It's not enough to simply have a few benchmarks that we test in the regressions even if the benchmarks are long running.
Thanks for all the comments Brandon.
The reason why I don't like the original patch is that it confuses atomic and functional (we should really rename the latter to debug accesses to align with SystemC TLM), and does so without any sensible rules/assumptions in place. I guess the shadow memory is quite similar though, as it seems near impossible to actually explain what is correct and/or expected behaviuor when timing and functional accesses interleave.
At the moment we are experiencing actual ordering (and correctness) issues with the interplay between real memory accesses, i.e. timing and atomic, and the debug memory accesses performed using the functional API. In our case the latter are used by a number of wrapped models that are expecting instantaneous updates to memory, and separate function and timing packets. The bottom line is that the use of functional accesses causes big problems when ordering and consistency models matter, and we should try and sort this out before we make even more complicated. We have the same issue with the SST integration. Ultimately the functional read and write accesses should only be used when the guest system is in rest, but clearly we are not sticking to that at the moment.
The MemChecker is a good start in checking that the timing behaviuor works as expected with respect to the consistency model, but it is not clear to me how we could add the functional accesses as part of the check. Also, is it at all used with Ruby today? In the interplay between timing/atomic and functional, what is "right"? Technically the functional/debug accesses do not exist in the simulated guest. I am really keen to hear what people think is right.
This has been a great discussion and I woudl ike to see it continue. However, I think we all should conclude the discussion is somewaht orthogonal to this current patch. The current patch fixes an immediate problem using KVM with Ruby. It has gone through 4 revisions and David responded to many comments and suggestions. Can we please check it in now?
Brandon is right on regarding the tester gap. However, we should be clear on the capablities of the current testers and we should all agree this gap is not unique to Ruby. The current memtest randomizes racy accesses, but only really checks single-writer/multiple-reader cohernce. The current rubytest further stresses races, but only checks SC execution. I'm less familar with the MemChecker, but I believe it siimply monitors a exectuion and ensures SC execution. Please correct me if I'm wrong, but I don't believe it has any concept of an acquire/ld fence or a release/st fence. Bradon's frustration with our recent GPU protocols is due to a lack of a relaxed consistency model checker in gem5. We developed such a tester a few years ago, however that work stopped and I have not been able to find someone to take over the development. It is hard and complicated work. There simply isn't many people that can do it. If anyone is interested, please let me know.
The MemChecker does what you want. At least that's my understanding. Stephan or Marco can comment further.
I think we should make sure we get this right, as it's already quite complex code with the various types of memories. Furthermore, once it's committed we traditionally have limited success in getting design changes done.
Andreas, even if MemChecker does what you think it does, that is orthogonal to this patch. This patch fixes a current problem with KVM and Ruby. We will likely never get rid of the backing store for Ruby, even with a perfect checker.
Do I understand your resistance to checking in this patch, correctly? You don't want us to check this in and instead want us to do it "right" by spending 9-12 man-months of effort to remove the Ruby backing store? Do you fully appreciate the amount of work required? Do you undertand all the benefits of the backing store as Brandon and previously Joel outlined?
I think you've got this all wrong.
The current patch only adds a visibility switch, and all I'm asking is that we document how the various switches combine, and what combinations make sense, and where they are used.
The other patch which mixes atomic/functional needs further discussion.
I'm happy to be wrong here and I appreciate the constructive comments since my previous comment last week. I was pretty frustrated that the discussion was degenerating to a complaint about functional access in Ruby and questioning whether the backing store was required at all. I'm glad we moved on. Hopefully David will respond to the latest set of comments soon and we can get this patch checked in.

By the way, I looked a bit more at MemChecker. It seems like it ensures LLSC operations read permissible values, as well as ensuring non-racy reads and writes are SC. It does not appear to handle release and acquire fences/RMWs that exist in our GPU memory model. More importantly, it is a monitor. It does not initiate memory accesses. I suspect MemChecker is only as good as the tests running on top of it. In contrast, memtest and rubytest are self-contained tests and checkers. There looks to be a lot of valuable code in MemChecker, especially in MemChecker::ByteTracker::inExpectedData(), that may be leveraged in a self-contained release consistency tester/checker. I would be interested to hear Stephan or Marco thoughts on that.


- Brad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------
Post by David Hashe
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-05 21:37:22 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------

(Updated Aug. 5, 2016, 9:37 p.m.)


Review request for Default.


Changes
-------

Comment changes.
Rewrite kvmMap logic for interleaved and non-interleaved memories in the global address map.
Remove drain fix.


Repository: gem5


Description
-------

Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby

Only map memories into the KVM guest address space that are
marked as usable by KVM.

Remember whether a BackingStoreEntry should be mapped by KVM.

Fix bug causing incomplete draining of Ruby Sequencer.


Diffs (updated)
-----

src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee

Diff: http://reviews.gem5.org/r/3580/diff/


Testing
-------


Thanks,

David Hashe
Andreas Hansson
2016-08-12 07:44:03 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8602
-----------------------------------------------------------



src/mem/AbstractMemory.py (line 64)
<http://reviews.gem5.org/r/3580/#comment7485>

I am still trying to wrap my head around what the table looks like with in_addr_map, conf_table_Reported, and now kvm_map, for "real memories", "device memories", "shadow memories" etc. We technically have 8 different types of memories now, but I suspect we only really have three. Could we perhaps enumerate them and call that out in the comments.

Perhaps we should consider an enum rather?



src/mem/abstract_mem.hh (line 120)
<http://reviews.gem5.org/r/3580/#comment7484>

could you make this const, and the two other bools as well?



src/mem/physical.cc (line 82)
<http://reviews.gem5.org/r/3580/#comment7482>

whitespace



src/mem/physical.cc (line 200)
<http://reviews.gem5.org/r/3580/#comment7483>

while you are at it, change to .emplace_back(range, pmem, kvm_map)


- Andreas Hansson
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Sandberg
2016-08-12 08:55:09 UTC
Permalink
Post by Jason Lowe-Power
src/mem/AbstractMemory.py, line 64
<http://reviews.gem5.org/r/3580/diff/4/?file=57455#file57455line64>
I am still trying to wrap my head around what the table looks like with in_addr_map, conf_table_Reported, and now kvm_map, for "real memories", "device memories", "shadow memories" etc. We technically have 8 different types of memories now, but I suspect we only really have three. Could we perhaps enumerate them and call that out in the comments.
Perhaps we should consider an enum rather?
A couple of these params are actually orthogonal. At least conf_table_reported is useful as a separate parameter since there are cases where you want a memory to be in a part of the address map without being listed in the DTB/ACPI tables. Now, at least one combination of in_addr_map and kvm_map is a bit weird. This is what I'd assume that the various combinations do:

* in_addr_map && kvm_map: This is well defined, map in KVM & as a physical memory.
* in_addr_map && !kvm_map: Add to gem5's address map, but not KVM. If I understand correctly, this is what you need to accomplish for the APU.
* !in_addr_map && kvm_map: Should this be mapped in KVM? I think it'd make sense not to map it since it's not a part of the global address map.
* !in_addr_map && !kvm_map: Don't map the memory at all.

Would it ever make sense for a memory to not be in the address map but be mapped by KVM? I think we should probably make sure KVM doesn't map any memories that aren't in the address map.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8602
-----------------------------------------------------------
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 10:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-18 12:28:05 UTC
Permalink
Post by Andreas Sandberg
src/mem/AbstractMemory.py, line 64
<http://reviews.gem5.org/r/3580/diff/4/?file=57455#file57455line64>
I am still trying to wrap my head around what the table looks like with in_addr_map, conf_table_Reported, and now kvm_map, for "real memories", "device memories", "shadow memories" etc. We technically have 8 different types of memories now, but I suspect we only really have three. Could we perhaps enumerate them and call that out in the comments.
Perhaps we should consider an enum rather?
* in_addr_map && kvm_map: This is well defined, map in KVM & as a physical memory.
* in_addr_map && !kvm_map: Add to gem5's address map, but not KVM. If I understand correctly, this is what you need to accomplish for the APU.
* !in_addr_map && kvm_map: Should this be mapped in KVM? I think it'd make sense not to map it since it's not a part of the global address map.
* !in_addr_map && !kvm_map: Don't map the memory at all.
Would it ever make sense for a memory to not be in the address map but be mapped by KVM? I think we should probably make sure KVM doesn't map any memories that aren't in the address map.
We need both the second and third cases for the APU. The third case for the shadow copy (it isn't in the address map because it overlays the regular copy, but it's copy we need for KVM), and the second case for the regular copy of memory. So I think all four cases are plausible.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8602
-----------------------------------------------------------
Post by Andreas Sandberg
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 9:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Andreas Sandberg
2016-08-12 09:00:33 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8605
-----------------------------------------------------------


I think this makes sense overall. I can definitely see use cases where I would want to prevent KVM from punching through to the backing store for some memories. I have two high-level comments:

1. Could you make sure to update the commit message to reflect that this change now only deals with how KVM maps memory? The current message describes what the patch series does and not what this individual commit does (it has nothing to do with Ruby).

2. I like the idea of using a struct to describe the backing store, this makes the code much clearer. Could you make it a bit more symmetric and make sure the conf_table_reported flag is included in the struct as well? We should check that flag as well when mapping memories in kvm, but that should probably be a separate commit since it's a bug fix that's orthogonal to this.

Thanks for sorting this out!

- Andreas Sandberg
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 5, 2016, 10:37 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Enable KVM support for Ruby
Only map memories into the KVM guest address space that are
marked as usable by KVM.
Remember whether a BackingStoreEntry should be mapped by KVM.
Fix bug causing incomplete draining of Ruby Sequencer.
Diffs
-----
src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee
src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
David Hashe
2016-08-18 12:28:10 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------

(Updated Aug. 18, 2016, 12:28 p.m.)


Review request for Default.


Changes
-------

Whitespace and style changes.
Rewrite commit message.
Added additional members to BackingStoreEntry.


Summary (updated)
-----------------

cpu, mem, sim: Change how KVM maps memory


Repository: gem5


Description (updated)
-------

Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Change how KVM maps memory

Only map memories into the KVM guest address space that are
marked as usable by KVM. Create BackingStoreEntry class
containing flags for is_conf_reported, in_addr_map, and
kvm_map.


Diffs (updated)
-----

src/cpu/kvm/vm.cc 7e7157941d70
src/mem/AbstractMemory.py 7e7157941d70
src/mem/abstract_mem.hh 7e7157941d70
src/mem/abstract_mem.cc 7e7157941d70
src/mem/physical.hh 7e7157941d70
src/mem/physical.cc 7e7157941d70

Diff: http://reviews.gem5.org/r/3580/diff/


Testing
-------


Thanks,

David Hashe
Andreas Sandberg
2016-08-18 12:36:16 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8660
-----------------------------------------------------------

Ship it!


Ship It!

- Andreas Sandberg
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 18, 2016, 1:28 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Change how KVM maps memory
Only map memories into the KVM guest address space that are
marked as usable by KVM. Create BackingStoreEntry class
containing flags for is_conf_reported, in_addr_map, and
kvm_map.
Diffs
-----
src/cpu/kvm/vm.cc 7e7157941d70
src/mem/AbstractMemory.py 7e7157941d70
src/mem/abstract_mem.hh 7e7157941d70
src/mem/abstract_mem.cc 7e7157941d70
src/mem/physical.hh 7e7157941d70
src/mem/physical.cc 7e7157941d70
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Jason Lowe-Power
2016-08-18 15:10:51 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8661
-----------------------------------------------------------

Ship it!


Ship It!

- Jason Lowe-Power
Post by Jason Lowe-Power
-----------------------------------------------------------
http://reviews.gem5.org/r/3580/
-----------------------------------------------------------
(Updated Aug. 18, 2016, 12:28 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Changeset 11562:7375e1f533fa
---------------------------
cpu, mem, sim: Change how KVM maps memory
Only map memories into the KVM guest address space that are
marked as usable by KVM. Create BackingStoreEntry class
containing flags for is_conf_reported, in_addr_map, and
kvm_map.
Diffs
-----
src/cpu/kvm/vm.cc 7e7157941d70
src/mem/AbstractMemory.py 7e7157941d70
src/mem/abstract_mem.hh 7e7157941d70
src/mem/abstract_mem.cc 7e7157941d70
src/mem/physical.hh 7e7157941d70
src/mem/physical.cc 7e7157941d70
Diff: http://reviews.gem5.org/r/3580/diff/
Testing
-------
Thanks,
David Hashe
Loading...