Discussion:
[gem5-dev] Review Request: RubyPort: Fix evict/invalidate packet memory leak
(too old to reply)
Nilay Vaish
2013-04-08 04:28:40 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1813/#review4202
-----------------------------------------------------------

Ship it!


Ship It!

- Nilay Vaish
-----------------------------------------------------------
http://reviews.gem5.org/r/1813/
-----------------------------------------------------------
(Updated April 7, 2013, 8:47 p.m.)
Review request for Default.
Description
-------
Changeset 9630:2ea801a03fa5
---------------------------
RubyPort: Fix evict/invalidate packet memory leak
When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.
Diffs
-----
src/mem/ruby/system/RubyPort.cc fa31189e1fb5
Diff: http://reviews.gem5.org/r/1813/diff/
Testing
-------
Long runs with Valgrind and verified that the Ruby transitions that caused
these memory leaks were exercised. Multithreaded benchmarks with elevated
contention witness huge memory consumption decreases.
Thanks,
Joel Hestness
Joel Hestness
2013-04-08 03:47:19 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1813/
-----------------------------------------------------------

Review request for Default.


Description
-------

Changeset 9630:2ea801a03fa5
---------------------------
RubyPort: Fix evict/invalidate packet memory leak

When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.


Diffs
-----

src/mem/ruby/system/RubyPort.cc fa31189e1fb5

Diff: http://reviews.gem5.org/r/1813/diff/


Testing
-------

Long runs with Valgrind and verified that the Ruby transitions that caused
these memory leaks were exercised. Multithreaded benchmarks with elevated
contention witness huge memory consumption decreases.


Thanks,

Joel Hestness
Jason Power
2013-04-08 13:52:12 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1813/#review4209
-----------------------------------------------------------

Ship it!


Ship It!

- Jason Power
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.gem5.org/r/1813/
-----------------------------------------------------------
(Updated April 7, 2013, 8:47 p.m.)
Review request for Default.
Description
-------
Changeset 9630:2ea801a03fa5
---------------------------
RubyPort: Fix evict/invalidate packet memory leak
When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.
Diffs
-----
src/mem/ruby/system/RubyPort.cc fa31189e1fb5
Diff: http://reviews.gem5.org/r/1813/diff/
Testing
-------
Long runs with Valgrind and verified that the Ruby transitions that caused
these memory leaks were exercised. Multithreaded benchmarks with elevated
contention witness huge memory consumption decreases.
Thanks,
Joel Hestness
Andreas Hansson
2013-04-08 15:11:13 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1813/#review4214
-----------------------------------------------------------



src/mem/ruby/system/RubyPort.cc
<http://reviews.gem5.org/r/1813/#comment3990>

Could you add a comment that the request does not need a response and is thus deleted with the packet.

Also, it would be good to highlight that the same packet (potentially altered) is now sent to all the ports and this is making assumptions about what they do (or don't do) to it.

Thanks


- Andreas Hansson
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.gem5.org/r/1813/
-----------------------------------------------------------
(Updated April 7, 2013, 8:47 p.m.)
Review request for Default.
Description
-------
Changeset 9630:2ea801a03fa5
---------------------------
RubyPort: Fix evict/invalidate packet memory leak
When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.
Diffs
-----
src/mem/ruby/system/RubyPort.cc fa31189e1fb5
Diff: http://reviews.gem5.org/r/1813/diff/
Testing
-------
Long runs with Valgrind and verified that the Ruby transitions that caused
these memory leaks were exercised. Multithreaded benchmarks with elevated
contention witness huge memory consumption decreases.
Thanks,
Joel Hestness
Joel Hestness
2013-04-09 16:18:23 UTC
Permalink
Post by Nilay Vaish
src/mem/ruby/system/RubyPort.cc, line 500
<http://reviews.gem5.org/r/1813/diff/1/?file=34762#file34762line500>
Could you add a comment that the request does not need a response and is thus deleted with the packet.
Also, it would be good to highlight that the same packet (potentially altered) is now sent to all the ports and this is making assumptions about what they do (or don't do) to it.
Thanks
Added these comments to the patch for checkin


- Joel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1813/#review4214
-----------------------------------------------------------
Post by Nilay Vaish
-----------------------------------------------------------
http://reviews.gem5.org/r/1813/
-----------------------------------------------------------
(Updated April 7, 2013, 8:47 p.m.)
Review request for Default.
Description
-------
Changeset 9630:2ea801a03fa5
---------------------------
RubyPort: Fix evict/invalidate packet memory leak
When using the o3 or inorder CPUs with many Ruby protocols, the caches may
need to forward invalidations to the CPUs. The RubyPort was instantiating a
packet to be sent to the CPUs to signal the eviction, but the packets were
not being freed by the CPUs. Consistent with the classic memory model, stack
allocate the packet and heap allocate the request so on
ruby_eviction_callback() completion, the packet deconstructor is called, and
deletes the request (*Note: stack allocating the request causes double
deletion, since it will be deleted in the packet destructor). This results in
the least memory allocations without memory errors.
Diffs
-----
src/mem/ruby/system/RubyPort.cc fa31189e1fb5
Diff: http://reviews.gem5.org/r/1813/diff/
Testing
-------
Long runs with Valgrind and verified that the Ruby transitions that caused
these memory leaks were exercised. Multithreaded benchmarks with elevated
contention witness huge memory consumption decreases.
Thanks,
Joel Hestness
Nilay Vaish
2013-04-09 16:44:49 UTC
Permalink
Post by Joel Hestness
src/mem/ruby/system/RubyPort.cc, line 500
<http://reviews.gem5.org/r/1813/diff/1/?file=34762#file34762line500>
Could you add a comment that the request does not need a response and is thus deleted with the packet.
Also, it would be good to highlight that the same packet (potentially altered) is now sent to all the ports and this is making assumptions about what they do (or don't do) to it.
Thanks
Added these comments to the patch for checkin
Why not move the line of code back inside the loop?

--
Nilay
Joel Hestness
2013-04-09 18:35:41 UTC
Permalink
Post by Nilay Vaish
Post by Joel Hestness
src/mem/ruby/system/RubyPort.**cc, line 500
<http://reviews.gem5.org/r/**1813/diff/1/?file=34762#**file34762line500<http://reviews.gem5.org/r/1813/diff/1/?file=34762#file34762line500>
Could you add a comment that the request does not need a response
and is thus deleted with the packet.
Also, it would be good to highlight that the same packet
(potentially altered) is now sent to all the ports and this is making
assumptions about what they do (or don't do) to it.
Thanks
Added these comments to the patch for checkin
Why not move the line of code back inside the loop?
That's a good question. First, note that ALL existing recvTimingSnoopReq()
functions "drop" (e.g. all those in timing CPUs and page table walkers) or
"pass through" (e.g. classic caches, busses) packets that they receive.
Hence, there are no current implementations of recvTimingSnoopReq() that
delete the packet they receive. Given this, these were the options I
considered when trying to decide how to handle this:

1) Allocate packets and requests as I have in this review request. This
results in the least memory allocations, and is correct given that the only
components that observe these requests are the LSQs (O3 and inorder), both
of which do not modify the packet (If we decide to go this route, perhaps
we could consider declaring the recvTimingSnoopReq() packet parameter as
const?).

2) Stack allocate a packet and heap allocate a request inside the
conditional in the loop, which would result in a request heap allocation
for each snooping port (there are usually 2-3 for each snoop: icache,
dcache and page table walker for archs with one). This results in 2-3X
more allocations than my review request and would be correct.

3) Heap allocate both the packet AND request, and delete each packet that
is sent within the conditional in the loop. This results in 4-6X more
allocations than my request and would be correct.

4) Heap allocate both the packet AND request within the conditional in the
loop, and enforce that the destination of the packet delete it. This would
result in 4-6X more allocations than my request and would be correct. This
would require fixing the ~8 places that implement recvTimingSnoopReq() to
delete the packets.

Some other notes from this investigation that are relevant:
- These invalidate snoop requests are handled inconsistently from all
other requests that do not require a response. In other cases (e.g.
FlushReq received by Ruby Sequencers), the destination of the request
deletes the packet and the request if it is not reused.
- Most sendTimingSnoopResp() code paths appear to be stubbed, and the
packet queue is the only component calling sendTimingSnoopResp() currently.
Even in the packet queues, there aren't any master ports that are using
the deferred packet send_as_snoop functionality, so I'm not clear that this
code is actually being used.


There is an obvious simplicity argument to choosing (1), so that's what I
submitted. It may not be the preferred solution though, so I'm willing to
reimplement an alternative. Let me know if you think we should take
another route.

Thanks,
Joel
--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://www.cs.utexas.edu/~hestness
Andreas Hansson
2013-04-09 19:02:29 UTC
Permalink
Hi Joel,

I'm fine with either. I don't think the performance difference is noticeable.

Concerning the snoop response, the cache is the guilty one when it comes to the use of the "send as snoop" functionality.

Andreas

From: Joel Hestness <jthestness-***@public.gmane.org<mailto:jthestness-***@public.gmane.org>>
Date: Tuesday, 9 April 2013 19:35
To: Nilay Vaish <nilay-***@public.gmane.org<mailto:nilay-***@public.gmane.org>>
Cc: gem5 Developer List <gem5-dev-1Gs4CP2/***@public.gmane.org<mailto:gem5-dev-1Gs4CP2/***@public.gmane.org>>, Andreas Hansson <andreas.hansson-***@public.gmane.org<mailto:andreas.hansson-***@public.gmane.org>>
Subject: Re: [gem5-dev] Review Request: RubyPort: Fix evict/invalidate packet memory leak


On Tue, Apr 9, 2013 at 11:44 AM, Nilay Vaish <nilay-***@public.gmane.org<mailto:nilay-***@public.gmane.org>> wrote:
On Tue, 9 Apr 2013, Joel Hestness wrote:
On April 8, 2013, 8:11 a.m., Andreas Hansson wrote:
src/mem/ruby/system/RubyPort.cc, line 500
<http://reviews.gem5.org/r/1813/diff/1/?file=34762#file34762line500>


Could you add a comment that the request does not need a response and is thus deleted with the packet.

Also, it would be good to highlight that the same packet (potentially altered) is now sent to all the ports and this is making assumptions about what they do (or don't do) to it.

Thanks

Added these comments to the patch for checkin



Why not move the line of code back inside the loop?


That's a good question. First, note that ALL existing recvTimingSnoopReq() functions "drop" (e.g. all those in timing CPUs and page table walkers) or "pass through" (e.g. classic caches, busses) packets that they receive. Hence, there are no current implementations of recvTimingSnoopReq() that delete the packet they receive. Given this, these were the options I considered when trying to decide how to handle this:

1) Allocate packets and requests as I have in this review request. This results in the least memory allocations, and is correct given that the only components that observe these requests are the LSQs (O3 and inorder), both of which do not modify the packet (If we decide to go this route, perhaps we could consider declaring the recvTimingSnoopReq() packet parameter as const?).

2) Stack allocate a packet and heap allocate a request inside the conditional in the loop, which would result in a request heap allocation for each snooping port (there are usually 2-3 for each snoop: icache, dcache and page table walker for archs with one). This results in 2-3X more allocations than my review request and would be correct.

3) Heap allocate both the packet AND request, and delete each packet that is sent within the conditional in the loop. This results in 4-6X more allocations than my request and would be correct.

4) Heap allocate both the packet AND request within the conditional in the loop, and enforce that the destination of the packet delete it. This would result in 4-6X more allocations than my request and would be correct. This would require fixing the ~8 places that implement recvTimingSnoopReq() to delete the packets.

Some other notes from this investigation that are relevant:
- These invalidate snoop requests are handled inconsistently from all other requests that do not require a response. In other cases (e.g. FlushReq received by Ruby Sequencers), the destination of the request deletes the packet and the request if it is not reused.
- Most sendTimingSnoopResp() code paths appear to be stubbed, and the packet queue is the only component calling sendTimingSnoopResp() currently. Even in the packet queues, there aren't any master ports that are using the deferred packet send_as_snoop functionality, so I'm not clear that this code is actually being used.


There is an obvious simplicity argument to choosing (1), so that's what I submitted. It may not be the preferred solution though, so I'm willing to reimplement an alternative. Let me know if you think we should take another route.

Thanks,
Joel


--
Joel Hestness
PhD Student, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://www.cs.utexas.edu/~hestness

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Loading...