Discussion:
[gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
(too old to reply)
Joel Hestness
2016-02-01 23:24:04 UTC
Permalink
Hi Andreas,
I'd like to circle back on the thread about removing the QueuedSlavePort
response queue from DRAMCtrl. I've been working to shift over to DRAMCtrl
from the RubyMemoryController, but nearly all of my simulations now crash
on the DRAMCtrl's response queue. Since I need the DRAMCtrl to work, I'll
be looking into this now. However, based on my inspection of the code, it
looks pretty non-trivial to remove the QueueSlavePort, so I'm hoping you
can at least help me work through the changes.

To reproduce the issue, I've put together a slim gem5 patch (attached) to
use the memtest.py script to generate accesses. Here's the command line I
used:

% build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
configs/example/memtest.py -u 100

If you're still willing to take a stab at it, let me know if/how I can
help. Otherwise, I'll start working on it. It seems the trickiest thing is
going to be modeling the arbitrary frontendLatency and backendLatency while
still counting all of the accesses that are in the controller when it needs
to block back to the input queue. These latencies are currently assessed
with scheduling in the port response queue. Any suggestions you could give
would be appreciated.

Thanks!
Joel


Below here is our conversation from the email thread "[gem5-dev] Review
Request 3116: ruby: RubyMemoryControl delete requests"

On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com>
wrote:

> Great. Thanks Joel.
>
> If anything pops up on our side I’ll let you know.
>
> Andreas
>
> From: Joel Hestness <***@gmail.com>
> Date: Wednesday, 23 September 2015 20:29
>
> To: Andreas Hansson <***@arm.com>
> Cc: gem5 Developer List <gem5-***@gem5.org>
> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> delete requests
>
>
>
>> I don’t think there is any big difference in our expectations, quite the
>> contrary :-). GPUs are very important to us (and so is throughput computing
>> in general), and we run plenty simulations with lots of memory-level
>> parallelism from non-CPU components. Still, we haven’t run into the issue.
>>
>
> Ok, cool. Thanks for the context.
>
>
> If you have practical examples that run into problems let me know, and
>> we’ll get it fixed.
>>
>
> I'm having trouble assembling a practical example (with or without using
> gem5-gpu). I'll keep you posted if I find something reasonable.
>
> Thanks!
> Joel
>
>
>
>> From: Joel Hestness <***@gmail.com>
>> Date: Tuesday, 22 September 2015 19:58
>>
>> To: Andreas Hansson <***@arm.com>
>> Cc: gem5 Developer List <gem5-***@gem5.org>
>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>> delete requests
>>
>> Hi Andreas,
>>
>>
>>> If it is a real problem affecting end users I am indeed volunteering to
>>> fix the DRAMCtrl use of QueuedSlavePort. In the classic memory system there
>>> are enough points of regulation (LSQs, MSHR limits, crossbar layers etc)
>>> that having a single memory channel with >100 queued up responses waiting
>>> to be sent is extremely unlikely. Hence, until now the added complexity has
>>> not been needed. If there is regulation on the number of requests in Ruby,
>>> then I would argue that it is equally unlikely there…I could be wrong.
>>>
>>
>> Ok. I think a big part of the difference between our expectations is just
>> the cores that we're modeling. AMD and gem5-gpu can model aggressive GPU
>> cores with potential to expose, perhaps, 4-32x more memory-level parallel
>> requests than a comparable number of multithreaded CPU cores. I feel that
>> this difference warrants different handling of accesses in the memory
>> controller.
>>
>> Joel
>>
>>
>>
>> From: Joel Hestness <***@gmail.com>
>>> Date: Tuesday, 22 September 2015 17:48
>>>
>>> To: Andreas Hansson <***@arm.com>
>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>> delete requests
>>>
>>> Hi Andreas,
>>>
>>> Thanks for the "ship it!"
>>>
>>>
>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl? It
>>>> will make the controller more complex, and I don’t want to do it “just in
>>>> case”.
>>>>
>>>
>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm not
>>> sure who should make that change, but I think it should get done. The
>>> memory access response path starts at the DRAMCtrl and ends at the
>>> RubyPort. If we add control flow to the RubyPort, packets will probably
>>> back-up more quickly on the response path back to where there are open
>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more prevalent as
>>> Ruby adds flow control, unless we add a limitation on outstanding requests
>>> to memory from directory controllers.
>>>
>>> How does the classic memory model deal with this?
>>>
>>> Joel
>>>
>>>
>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>> Date: Tuesday, 22 September 2015 17:30
>>>> To: Andreas Hansson <***@arm.com>
>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>
>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>> delete requests
>>>>
>>>> Hi guys,
>>>> Thanks for the discussion here. I had quickly tested other memory
>>>> controllers, but hadn't connected the dots that this might be the same
>>>> problem Brad/AMD are running into.
>>>>
>>>> My preference would be that we remove the QueuedSlavePort from the
>>>> DRAMCtrls. That would at least eliminate DRAMCtrls as a potential source of
>>>> the QueueSlavePort packet overflows, and would allow us to more closely
>>>> focus on the RubyPort problem when we get to it.
>>>>
>>>> Can we reach resolution on this patch though? Are we okay with
>>>> actually fixing the memory leak in mainline?
>>>>
>>>> Joel
>>>>
>>>>
>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>>>> ***@arm.com> wrote:
>>>>
>>>>> Hi Brad,
>>>>>
>>>>> We can remove the use of QueuedSlavePort in the memory controller and
>>>>> simply not accept requests if the response queue is full. Is this
>>>>> needed?
>>>>> If so we’ll make sure someone gets this in place. The only reason we
>>>>> haven’t done it is because it hasn’t been needed.
>>>>>
>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
>>>>> story. I
>>>>> think most of these can be removed and actually use flow control. I’m
>>>>> happy to code it up, but there is such a flux at the moment that I
>>>>> didn’t
>>>>> want to post yet another patch changing the Ruby port. I really do
>>>>> think
>>>>> we should avoid having implicit buffers for 1000’s of kilobytes to the
>>>>> largest extend possible. If we really need a constructor parameter to
>>>>> make
>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:
>>>>>
>>>>> >From AMD's perspective, we have deprecated our usage of
>>>>> RubyMemoryControl
>>>>> >and we are using the new Memory Controllers with the port interface.
>>>>> >
>>>>> >That being said, I completely agree with Joel that the packet queue
>>>>> >finite invisible buffer limit of 100 needs to go! As you know, we
>>>>> tried
>>>>> >very hard several months ago to essentially make this a infinite
>>>>> buffer,
>>>>> >but Andreas would not allow us to check it in. We are going to post
>>>>> that
>>>>> >patch again in a few weeks when we post our GPU model. Our GPU model
>>>>> >will not work unless we increase that limit.
>>>>> >
>>>>> >Andreas you keep arguing that if you exceed that limit, that
>>>>> something is
>>>>> >fundamentally broken. Please keep in mind that there are many uses of
>>>>> >gem5 beyond what you use it for. Also this is a research simulator
>>>>> and
>>>>> >we should not restrict ourselves to what we think is practical in real
>>>>> >hardware. Finally, the fact that the finite limit is invisible to the
>>>>> >producer is just bad software engineering.
>>>>> >
>>>>> >I beg you to please allow us to remove this finite invisible limit!
>>>>> >
>>>>> >Brad
>>>>> >
>>>>> >
>>>>> >
>>>>> >-----Original Message-----
>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
>>>>> Andreas
>>>>> >Hansson
>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>>>>> >To: Andreas Hansson; Default; Joel Hestness
>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>> >delete requests
>>>>> >
>>>>> >
>>>>> >
>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it not
>>>>> been
>>>>> >>deprecated long enough?
>>>>> >>
>>>>> >> Joel Hestness wrote:
>>>>> >> Unless I'm overlooking something, for Ruby users, I don't see
>>>>> other
>>>>> >>memory controllers that are guaranteed to work. Besides
>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their input
>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>>>>> PacketQueue
>>>>> >>size restrictions can be exceeded (triggering the panic). This occurs
>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu, and
>>>>> appears
>>>>> >>difficult to fix in a systematic way.
>>>>> >>
>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl, this
>>>>> is
>>>>> >>a necessary fix.
>>>>> >
>>>>> >No memory controller is using QueuedSlaavePort for any _input_ queues.
>>>>> >The DRAMCtrl class uses it for the response _output_ queue, that's
>>>>> all.
>>>>> >If that is really an issue we can move away from it and enfore an
>>>>> upper
>>>>> >bound on responses by not accepting new requests. That said, if we hit
>>>>> >the limit I would argue something else is fundamentally broken in the
>>>>> >system and should be addressed.
>>>>> >
>>>>> >In any case, the discussion whether to remove RubyMemoryControl or not
>>>>> >should be completely decoupled.
>>>>> >
>>>>> >
>>>>> >- Andreas
>>>>>
>>>>

--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Beckmann, Brad
2016-02-02 00:17:12 UTC
Permalink
Hi Joel,

Unless I missed it, I do not believe anything is attached to your email.

I'll try to get someone on our side to reproduce these experiments as well. The removal of QueuedSlavePort from the DRAM controllers and RubyPort is important to us as well.

Thanks,

Brad


-----Original Message-----
From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Joel Hestness
Sent: Monday, February 01, 2016 3:24 PM
To: Andreas Hansson
Cc: gem5 Developer List
Subject: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Andreas,
I'd like to circle back on the thread about removing the QueuedSlavePort response queue from DRAMCtrl. I've been working to shift over to DRAMCtrl from the RubyMemoryController, but nearly all of my simulations now crash on the DRAMCtrl's response queue. Since I need the DRAMCtrl to work, I'll be looking into this now. However, based on my inspection of the code, it looks pretty non-trivial to remove the QueueSlavePort, so I'm hoping you can at least help me work through the changes.

To reproduce the issue, I've put together a slim gem5 patch (attached) to use the memtest.py script to generate accesses. Here's the command line I
used:

% build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir configs/example/memtest.py -u 100

If you're still willing to take a stab at it, let me know if/how I can help. Otherwise, I'll start working on it. It seems the trickiest thing is going to be modeling the arbitrary frontendLatency and backendLatency while still counting all of the accesses that are in the controller when it needs to block back to the input queue. These latencies are currently assessed with scheduling in the port response queue. Any suggestions you could give would be appreciated.

Thanks!
Joel


Below here is our conversation from the email thread "[gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests"

On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com>
wrote:

> Great. Thanks Joel.
>
> If anything pops up on our side I’ll let you know.
>
> Andreas
>
> From: Joel Hestness <***@gmail.com>
> Date: Wednesday, 23 September 2015 20:29
>
> To: Andreas Hansson <***@arm.com>
> Cc: gem5 Developer List <gem5-***@gem5.org>
> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> delete requests
>
>
>
>> I don’t think there is any big difference in our expectations, quite
>> the contrary :-). GPUs are very important to us (and so is throughput
>> computing in general), and we run plenty simulations with lots of
>> memory-level parallelism from non-CPU components. Still, we haven’t run into the issue.
>>
>
> Ok, cool. Thanks for the context.
>
>
> If you have practical examples that run into problems let me know, and
>> we’ll get it fixed.
>>
>
> I'm having trouble assembling a practical example (with or without
> using gem5-gpu). I'll keep you posted if I find something reasonable.
>
> Thanks!
> Joel
>
>
>
>> From: Joel Hestness <***@gmail.com>
>> Date: Tuesday, 22 September 2015 19:58
>>
>> To: Andreas Hansson <***@arm.com>
>> Cc: gem5 Developer List <gem5-***@gem5.org>
>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>> delete requests
>>
>> Hi Andreas,
>>
>>
>>> If it is a real problem affecting end users I am indeed volunteering
>>> to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory
>>> system there are enough points of regulation (LSQs, MSHR limits,
>>> crossbar layers etc) that having a single memory channel with >100
>>> queued up responses waiting to be sent is extremely unlikely. Hence,
>>> until now the added complexity has not been needed. If there is
>>> regulation on the number of requests in Ruby, then I would argue that it is equally unlikely there…I could be wrong.
>>>
>>
>> Ok. I think a big part of the difference between our expectations is
>> just the cores that we're modeling. AMD and gem5-gpu can model
>> aggressive GPU cores with potential to expose, perhaps, 4-32x more
>> memory-level parallel requests than a comparable number of
>> multithreaded CPU cores. I feel that this difference warrants
>> different handling of accesses in the memory controller.
>>
>> Joel
>>
>>
>>
>> From: Joel Hestness <***@gmail.com>
>>> Date: Tuesday, 22 September 2015 17:48
>>>
>>> To: Andreas Hansson <***@arm.com>
>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>> delete requests
>>>
>>> Hi Andreas,
>>>
>>> Thanks for the "ship it!"
>>>
>>>
>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
>>>> It will make the controller more complex, and I don’t want to do it
>>>> “just in case”.
>>>>
>>>
>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm
>>> not sure who should make that change, but I think it should get
>>> done. The memory access response path starts at the DRAMCtrl and
>>> ends at the RubyPort. If we add control flow to the RubyPort,
>>> packets will probably back-up more quickly on the response path back
>>> to where there are open buffers. I expect the DRAMCtrl QueuedPort
>>> problem becomes more prevalent as Ruby adds flow control, unless we
>>> add a limitation on outstanding requests to memory from directory controllers.
>>>
>>> How does the classic memory model deal with this?
>>>
>>> Joel
>>>
>>>
>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>> Date: Tuesday, 22 September 2015 17:30
>>>> To: Andreas Hansson <***@arm.com>
>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>
>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>>>> RubyMemoryControl delete requests
>>>>
>>>> Hi guys,
>>>> Thanks for the discussion here. I had quickly tested other memory
>>>> controllers, but hadn't connected the dots that this might be the
>>>> same problem Brad/AMD are running into.
>>>>
>>>> My preference would be that we remove the QueuedSlavePort from
>>>> the DRAMCtrls. That would at least eliminate DRAMCtrls as a
>>>> potential source of the QueueSlavePort packet overflows, and would
>>>> allow us to more closely focus on the RubyPort problem when we get to it.
>>>>
>>>> Can we reach resolution on this patch though? Are we okay with
>>>> actually fixing the memory leak in mainline?
>>>>
>>>> Joel
>>>>
>>>>
>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>>>> ***@arm.com> wrote:
>>>>
>>>>> Hi Brad,
>>>>>
>>>>> We can remove the use of QueuedSlavePort in the memory controller
>>>>> and simply not accept requests if the response queue is full. Is
>>>>> this needed?
>>>>> If so we’ll make sure someone gets this in place. The only reason
>>>>> we haven’t done it is because it hasn’t been needed.
>>>>>
>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
>>>>> story. I think most of these can be removed and actually use flow
>>>>> control. I’m happy to code it up, but there is such a flux at the
>>>>> moment that I didn’t want to post yet another patch changing the
>>>>> Ruby port. I really do think we should avoid having implicit
>>>>> buffers for 1000’s of kilobytes to the largest extend possible. If
>>>>> we really need a constructor parameter to make it “infinite” for
>>>>> some quirky Ruby use-case, then let’s do that...
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:
>>>>>
>>>>> >From AMD's perspective, we have deprecated our usage of
>>>>> RubyMemoryControl
>>>>> >and we are using the new Memory Controllers with the port interface.
>>>>> >
>>>>> >That being said, I completely agree with Joel that the packet
>>>>> >queue finite invisible buffer limit of 100 needs to go! As you
>>>>> >know, we
>>>>> tried
>>>>> >very hard several months ago to essentially make this a infinite
>>>>> buffer,
>>>>> >but Andreas would not allow us to check it in. We are going to
>>>>> >post
>>>>> that
>>>>> >patch again in a few weeks when we post our GPU model. Our GPU
>>>>> >model will not work unless we increase that limit.
>>>>> >
>>>>> >Andreas you keep arguing that if you exceed that limit, that
>>>>> something is
>>>>> >fundamentally broken. Please keep in mind that there are many
>>>>> >uses of
>>>>> >gem5 beyond what you use it for. Also this is a research
>>>>> >simulator
>>>>> and
>>>>> >we should not restrict ourselves to what we think is practical in
>>>>> >real hardware. Finally, the fact that the finite limit is
>>>>> >invisible to the producer is just bad software engineering.
>>>>> >
>>>>> >I beg you to please allow us to remove this finite invisible limit!
>>>>> >
>>>>> >Brad
>>>>> >
>>>>> >
>>>>> >
>>>>> >-----Original Message-----
>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
>>>>> Andreas
>>>>> >Hansson
>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>>>>> >To: Andreas Hansson; Default; Joel Hestness
>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
>>>>> >RubyMemoryControl delete requests
>>>>> >
>>>>> >
>>>>> >
>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it
>>>>> >> > not
>>>>> been
>>>>> >>deprecated long enough?
>>>>> >>
>>>>> >> Joel Hestness wrote:
>>>>> >> Unless I'm overlooking something, for Ruby users, I don't
>>>>> >> see
>>>>> other
>>>>> >>memory controllers that are guaranteed to work. Besides
>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their
>>>>> >>input queues. Given that Ruby hasn't added complete flow
>>>>> >>control,
>>>>> PacketQueue
>>>>> >>size restrictions can be exceeded (triggering the panic). This
>>>>> >>occurs infrequently/irregularly with aggressive GPUs in
>>>>> >>gem5-gpu, and
>>>>> appears
>>>>> >>difficult to fix in a systematic way.
>>>>> >>
>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
>>>>> >> this
>>>>> is
>>>>> >>a necessary fix.
>>>>> >
>>>>> >No memory controller is using QueuedSlaavePort for any _input_ queues.
>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
>>>>> >that's
>>>>> all.
>>>>> >If that is really an issue we can move away from it and enfore an
>>>>> upper
>>>>> >bound on responses by not accepting new requests. That said, if
>>>>> >we hit the limit I would argue something else is fundamentally
>>>>> >broken in the system and should be addressed.
>>>>> >
>>>>> >In any case, the discussion whether to remove RubyMemoryControl
>>>>> >or not should be completely decoupled.
>>>>> >
>>>>> >
>>>>> >- Andreas
>>>>>
>>>>

--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
Joel Hestness
2016-02-02 01:13:30 UTC
Permalink
Hi Brad,
Thanks for letting me know about the attachment problem. I uploaded the
patch to Reviewboard to be sure you can access it:
http://reviews.gem5.org/r/3312/

Also, a quick update: I've implemented the removal of the
QueuedMasterPort from Ruby directory controllers. I will post that patch
when I've completed my testing and am able to update the mainline protocols.

Joel



On Mon, Feb 1, 2016 at 6:17 PM, Beckmann, Brad <***@amd.com>
wrote:

> Hi Joel,
>
> Unless I missed it, I do not believe anything is attached to your email.
>
> I'll try to get someone on our side to reproduce these experiments as
> well. The removal of QueuedSlavePort from the DRAM controllers and
> RubyPort is important to us as well.
>
> Thanks,
>
> Brad
>
>
> -----Original Message-----
> From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Joel
> Hestness
> Sent: Monday, February 01, 2016 3:24 PM
> To: Andreas Hansson
> Cc: gem5 Developer List
> Subject: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi Andreas,
> I'd like to circle back on the thread about removing the QueuedSlavePort
> response queue from DRAMCtrl. I've been working to shift over to DRAMCtrl
> from the RubyMemoryController, but nearly all of my simulations now crash
> on the DRAMCtrl's response queue. Since I need the DRAMCtrl to work, I'll
> be looking into this now. However, based on my inspection of the code, it
> looks pretty non-trivial to remove the QueueSlavePort, so I'm hoping you
> can at least help me work through the changes.
>
> To reproduce the issue, I've put together a slim gem5 patch (attached)
> to use the memtest.py script to generate accesses. Here's the command line I
> used:
>
> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> configs/example/memtest.py -u 100
>
> If you're still willing to take a stab at it, let me know if/how I can
> help. Otherwise, I'll start working on it. It seems the trickiest thing is
> going to be modeling the arbitrary frontendLatency and backendLatency while
> still counting all of the accesses that are in the controller when it needs
> to block back to the input queue. These latencies are currently assessed
> with scheduling in the port response queue. Any suggestions you could give
> would be appreciated.
>
> Thanks!
> Joel
>
>
> Below here is our conversation from the email thread "[gem5-dev] Review
> Request 3116: ruby: RubyMemoryControl delete requests"
>
> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com>
> wrote:
>
> > Great. Thanks Joel.
> >
> > If anything pops up on our side I’ll let you know.
> >
> > Andreas
> >
> > From: Joel Hestness <***@gmail.com>
> > Date: Wednesday, 23 September 2015 20:29
> >
> > To: Andreas Hansson <***@arm.com>
> > Cc: gem5 Developer List <gem5-***@gem5.org>
> > Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> > delete requests
> >
> >
> >
> >> I don’t think there is any big difference in our expectations, quite
> >> the contrary :-). GPUs are very important to us (and so is throughput
> >> computing in general), and we run plenty simulations with lots of
> >> memory-level parallelism from non-CPU components. Still, we haven’t run
> into the issue.
> >>
> >
> > Ok, cool. Thanks for the context.
> >
> >
> > If you have practical examples that run into problems let me know, and
> >> we’ll get it fixed.
> >>
> >
> > I'm having trouble assembling a practical example (with or without
> > using gem5-gpu). I'll keep you posted if I find something reasonable.
> >
> > Thanks!
> > Joel
> >
> >
> >
> >> From: Joel Hestness <***@gmail.com>
> >> Date: Tuesday, 22 September 2015 19:58
> >>
> >> To: Andreas Hansson <***@arm.com>
> >> Cc: gem5 Developer List <gem5-***@gem5.org>
> >> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >> delete requests
> >>
> >> Hi Andreas,
> >>
> >>
> >>> If it is a real problem affecting end users I am indeed volunteering
> >>> to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory
> >>> system there are enough points of regulation (LSQs, MSHR limits,
> >>> crossbar layers etc) that having a single memory channel with >100
> >>> queued up responses waiting to be sent is extremely unlikely. Hence,
> >>> until now the added complexity has not been needed. If there is
> >>> regulation on the number of requests in Ruby, then I would argue that
> it is equally unlikely there…I could be wrong.
> >>>
> >>
> >> Ok. I think a big part of the difference between our expectations is
> >> just the cores that we're modeling. AMD and gem5-gpu can model
> >> aggressive GPU cores with potential to expose, perhaps, 4-32x more
> >> memory-level parallel requests than a comparable number of
> >> multithreaded CPU cores. I feel that this difference warrants
> >> different handling of accesses in the memory controller.
> >>
> >> Joel
> >>
> >>
> >>
> >> From: Joel Hestness <***@gmail.com>
> >>> Date: Tuesday, 22 September 2015 17:48
> >>>
> >>> To: Andreas Hansson <***@arm.com>
> >>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >>> delete requests
> >>>
> >>> Hi Andreas,
> >>>
> >>> Thanks for the "ship it!"
> >>>
> >>>
> >>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
> >>>> It will make the controller more complex, and I don’t want to do it
> >>>> “just in case”.
> >>>>
> >>>
> >>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm
> >>> not sure who should make that change, but I think it should get
> >>> done. The memory access response path starts at the DRAMCtrl and
> >>> ends at the RubyPort. If we add control flow to the RubyPort,
> >>> packets will probably back-up more quickly on the response path back
> >>> to where there are open buffers. I expect the DRAMCtrl QueuedPort
> >>> problem becomes more prevalent as Ruby adds flow control, unless we
> >>> add a limitation on outstanding requests to memory from directory
> controllers.
> >>>
> >>> How does the classic memory model deal with this?
> >>>
> >>> Joel
> >>>
> >>>
> >>>
> >>>> From: Joel Hestness <***@gmail.com>
> >>>> Date: Tuesday, 22 September 2015 17:30
> >>>> To: Andreas Hansson <***@arm.com>
> >>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>>
> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>> RubyMemoryControl delete requests
> >>>>
> >>>> Hi guys,
> >>>> Thanks for the discussion here. I had quickly tested other memory
> >>>> controllers, but hadn't connected the dots that this might be the
> >>>> same problem Brad/AMD are running into.
> >>>>
> >>>> My preference would be that we remove the QueuedSlavePort from
> >>>> the DRAMCtrls. That would at least eliminate DRAMCtrls as a
> >>>> potential source of the QueueSlavePort packet overflows, and would
> >>>> allow us to more closely focus on the RubyPort problem when we get to
> it.
> >>>>
> >>>> Can we reach resolution on this patch though? Are we okay with
> >>>> actually fixing the memory leak in mainline?
> >>>>
> >>>> Joel
> >>>>
> >>>>
> >>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >>>> ***@arm.com> wrote:
> >>>>
> >>>>> Hi Brad,
> >>>>>
> >>>>> We can remove the use of QueuedSlavePort in the memory controller
> >>>>> and simply not accept requests if the response queue is full. Is
> >>>>> this needed?
> >>>>> If so we’ll make sure someone gets this in place. The only reason
> >>>>> we haven’t done it is because it hasn’t been needed.
> >>>>>
> >>>>> The use of QueuedPorts in the Ruby adapters is a whole different
> >>>>> story. I think most of these can be removed and actually use flow
> >>>>> control. I’m happy to code it up, but there is such a flux at the
> >>>>> moment that I didn’t want to post yet another patch changing the
> >>>>> Ruby port. I really do think we should avoid having implicit
> >>>>> buffers for 1000’s of kilobytes to the largest extend possible. If
> >>>>> we really need a constructor parameter to make it “infinite” for
> >>>>> some quirky Ruby use-case, then let’s do that...
> >>>>>
> >>>>> Andreas
> >>>>>
> >>>>>
> >>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
> wrote:
> >>>>>
> >>>>> >From AMD's perspective, we have deprecated our usage of
> >>>>> RubyMemoryControl
> >>>>> >and we are using the new Memory Controllers with the port interface.
> >>>>> >
> >>>>> >That being said, I completely agree with Joel that the packet
> >>>>> >queue finite invisible buffer limit of 100 needs to go! As you
> >>>>> >know, we
> >>>>> tried
> >>>>> >very hard several months ago to essentially make this a infinite
> >>>>> buffer,
> >>>>> >but Andreas would not allow us to check it in. We are going to
> >>>>> >post
> >>>>> that
> >>>>> >patch again in a few weeks when we post our GPU model. Our GPU
> >>>>> >model will not work unless we increase that limit.
> >>>>> >
> >>>>> >Andreas you keep arguing that if you exceed that limit, that
> >>>>> something is
> >>>>> >fundamentally broken. Please keep in mind that there are many
> >>>>> >uses of
> >>>>> >gem5 beyond what you use it for. Also this is a research
> >>>>> >simulator
> >>>>> and
> >>>>> >we should not restrict ourselves to what we think is practical in
> >>>>> >real hardware. Finally, the fact that the finite limit is
> >>>>> >invisible to the producer is just bad software engineering.
> >>>>> >
> >>>>> >I beg you to please allow us to remove this finite invisible limit!
> >>>>> >
> >>>>> >Brad
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >-----Original Message-----
> >>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
> >>>>> Andreas
> >>>>> >Hansson
> >>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >>>>> >To: Andreas Hansson; Default; Joel Hestness
> >>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>> >RubyMemoryControl delete requests
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it
> >>>>> >> > not
> >>>>> been
> >>>>> >>deprecated long enough?
> >>>>> >>
> >>>>> >> Joel Hestness wrote:
> >>>>> >> Unless I'm overlooking something, for Ruby users, I don't
> >>>>> >> see
> >>>>> other
> >>>>> >>memory controllers that are guaranteed to work. Besides
> >>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their
> >>>>> >>input queues. Given that Ruby hasn't added complete flow
> >>>>> >>control,
> >>>>> PacketQueue
> >>>>> >>size restrictions can be exceeded (triggering the panic). This
> >>>>> >>occurs infrequently/irregularly with aggressive GPUs in
> >>>>> >>gem5-gpu, and
> >>>>> appears
> >>>>> >>difficult to fix in a systematic way.
> >>>>> >>
> >>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
> >>>>> >> this
> >>>>> is
> >>>>> >>a necessary fix.
> >>>>> >
> >>>>> >No memory controller is using QueuedSlaavePort for any _input_
> queues.
> >>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> >>>>> >that's
> >>>>> all.
> >>>>> >If that is really an issue we can move away from it and enfore an
> >>>>> upper
> >>>>> >bound on responses by not accepting new requests. That said, if
> >>>>> >we hit the limit I would argue something else is fundamentally
> >>>>> >broken in the system and should be addressed.
> >>>>> >
> >>>>> >In any case, the discussion whether to remove RubyMemoryControl
> >>>>> >or not should be completely decoupled.
> >>>>> >
> >>>>> >
> >>>>> >- Andreas
> >>>>>
> >>>>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Andreas Hansson
2016-02-03 17:29:52 UTC
Permalink
Hi Joel,

I would suggest o keep the queued ports, but add methods to reserve resources, query if it has free space, and a way to register callbacks so that the MemObject is made aware when packets are sent. That way we can use the queue in the cache, memory controller etc, without having all the issues of the “naked” port interface, but still enforcing a bounded queue.

When a packet arrives to the module we call reserve on the output port. Then when we actually add the packet we know that there is space. When request packets arrive we check if the queue is full, and if so we block any new requests. Then through the callback we can unblock the DRAM controller in this case.

What do you think?

Andreas

From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
Date: Tuesday, 2 February 2016 at 00:24
To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Andreas,
I'd like to circle back on the thread about removing the QueuedSlavePort response queue from DRAMCtrl. I've been working to shift over to DRAMCtrl from the RubyMemoryController, but nearly all of my simulations now crash on the DRAMCtrl's response queue. Since I need the DRAMCtrl to work, I'll be looking into this now. However, based on my inspection of the code, it looks pretty non-trivial to remove the QueueSlavePort, so I'm hoping you can at least help me work through the changes.

To reproduce the issue, I've put together a slim gem5 patch (attached) to use the memtest.py script to generate accesses. Here's the command line I used:

% build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir configs/example/memtest.py -u 100

If you're still willing to take a stab at it, let me know if/how I can help. Otherwise, I'll start working on it. It seems the trickiest thing is going to be modeling the arbitrary frontendLatency and backendLatency while still counting all of the accesses that are in the controller when it needs to block back to the input queue. These latencies are currently assessed with scheduling in the port response queue. Any suggestions you could give would be appreciated.

Thanks!
Joel


Below here is our conversation from the email thread "[gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests"

On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com<mailto:***@arm.com>> wrote:
Great. Thanks Joel.

If anything pops up on our side I’ll let you know.

Andreas

From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
Date: Wednesday, 23 September 2015 20:29

To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests


I don’t think there is any big difference in our expectations, quite the contrary :-). GPUs are very important to us (and so is throughput computing in general), and we run plenty simulations with lots of memory-level parallelism from non-CPU components. Still, we haven’t run into the issue.

Ok, cool. Thanks for the context.


If you have practical examples that run into problems let me know, and we’ll get it fixed.

I'm having trouble assembling a practical example (with or without using gem5-gpu). I'll keep you posted if I find something reasonable.

Thanks!
Joel


From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
Date: Tuesday, 22 September 2015 19:58

To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests

Hi Andreas,

If it is a real problem affecting end users I am indeed volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory system there are enough points of regulation (LSQs, MSHR limits, crossbar layers etc) that having a single memory channel with >100 queued up responses waiting to be sent is extremely unlikely. Hence, until now the added complexity has not been needed. If there is regulation on the number of requests in Ruby, then I would argue that it is equally unlikely there…I could be wrong.

Ok. I think a big part of the difference between our expectations is just the cores that we're modeling. AMD and gem5-gpu can model aggressive GPU cores with potential to expose, perhaps, 4-32x more memory-level parallel requests than a comparable number of multithreaded CPU cores. I feel that this difference warrants different handling of accesses in the memory controller.

Joel



From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
Date: Tuesday, 22 September 2015 17:48

To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests

Hi Andreas,

Thanks for the "ship it!"

Do we really need to remove the use of QueuedSlavePort in DRAMCtrl? It will make the controller more complex, and I don’t want to do it “just in case”.

Sorry, I misread your email as offering to change the DRAMCtrl. I'm not sure who should make that change, but I think it should get done. The memory access response path starts at the DRAMCtrl and ends at the RubyPort. If we add control flow to the RubyPort, packets will probably back-up more quickly on the response path back to where there are open buffers. I expect the DRAMCtrl QueuedPort problem becomes more prevalent as Ruby adds flow control, unless we add a limitation on outstanding requests to memory from directory controllers.

How does the classic memory model deal with this?

Joel


From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
Date: Tuesday, 22 September 2015 17:30
To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>

Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl delete requests

Hi guys,
Thanks for the discussion here. I had quickly tested other memory controllers, but hadn't connected the dots that this might be the same problem Brad/AMD are running into.

My preference would be that we remove the QueuedSlavePort from the DRAMCtrls. That would at least eliminate DRAMCtrls as a potential source of the QueueSlavePort packet overflows, and would allow us to more closely focus on the RubyPort problem when we get to it.

Can we reach resolution on this patch though? Are we okay with actually fixing the memory leak in mainline?

Joel


On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <***@arm.com<mailto:***@arm.com>> wrote:
Hi Brad,

We can remove the use of QueuedSlavePort in the memory controller and
simply not accept requests if the response queue is full. Is this needed?
If so we’ll make sure someone gets this in place. The only reason we
haven’t done it is because it hasn’t been needed.

The use of QueuedPorts in the Ruby adapters is a whole different story. I
think most of these can be removed and actually use flow control. I’m
happy to code it up, but there is such a flux at the moment that I didn’t
want to post yet another patch changing the Ruby port. I really do think
we should avoid having implicit buffers for 1000’s of kilobytes to the
largest extend possible. If we really need a constructor parameter to make
it “infinite” for some quirky Ruby use-case, then let’s do that...

Andreas


On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
<gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org> on behalf of ***@amd.com<mailto:***@amd.com>> wrote:

>From AMD's perspective, we have deprecated our usage of RubyMemoryControl
>and we are using the new Memory Controllers with the port interface.
>
>That being said, I completely agree with Joel that the packet queue
>finite invisible buffer limit of 100 needs to go! As you know, we tried
>very hard several months ago to essentially make this a infinite buffer,
>but Andreas would not allow us to check it in. We are going to post that
>patch again in a few weeks when we post our GPU model. Our GPU model
>will not work unless we increase that limit.
>
>Andreas you keep arguing that if you exceed that limit, that something is
>fundamentally broken. Please keep in mind that there are many uses of
>gem5 beyond what you use it for. Also this is a research simulator and
>we should not restrict ourselves to what we think is practical in real
>hardware. Finally, the fact that the finite limit is invisible to the
>producer is just bad software engineering.
>
>I beg you to please allow us to remove this finite invisible limit!
>
>Brad
>
>
>
>-----Original Message-----
>From: gem5-dev [mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On Behalf Of Andreas
>Hansson
>Sent: Tuesday, September 22, 2015 6:35 AM
>To: Andreas Hansson; Default; Joel Hestness
>Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>delete requests
>
>
>
>> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>> > Can we just prune the whole RubyMemoryControl rather? Has it not been
>>deprecated long enough?
>>
>> Joel Hestness wrote:
>> Unless I'm overlooking something, for Ruby users, I don't see other
>>memory controllers that are guaranteed to work. Besides
>>RubyMemoryControl, all others use a QueuedSlavePort for their input
>>queues. Given that Ruby hasn't added complete flow control, PacketQueue
>>size restrictions can be exceeded (triggering the panic). This occurs
>>infrequently/irregularly with aggressive GPUs in gem5-gpu, and appears
>>difficult to fix in a systematic way.
>>
>> Regardless of the fact we've deprecated RubyMemoryControl, this is
>>a necessary fix.
>
>No memory controller is using QueuedSlaavePort for any _input_ queues.
>The DRAMCtrl class uses it for the response _output_ queue, that's all.
>If that is really an issue we can move away from it and enfore an upper
>bound on responses by not accepting new requests. That said, if we hit
>the limit I would argue something else is fundamentally broken in the
>system and should be addressed.
>
>In any case, the discussion whether to remove RubyMemoryControl or not
>should be completely decoupled.
>
>
>- Andreas


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Joel Hestness
2016-02-04 16:06:36 UTC
Permalink
Hi Andreas,
Thanks for the input. I had tried adding front- and back-end queues
within the DRAMCtrl, but it became very difficult to propagate the flow
control back through the component due to the complicated implementation of
timing across different accessAndRespond() calls. I had to put this
solution on hold.

I think your proposed solution should simplify the flow control issue,
and should have the derivative effect of making the Queued*Ports capable of
flow control. I'm a little concerned that your solution would make the
buffering very fluid, and I'm not sufficiently familiar with memory
controller microarchitecture to know if that would be realistic. I wonder
if you might have a way to do performance validation after I work through
either of these implementations.

Thanks!
Joel



On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <***@arm.com>
wrote:

> Hi Joel,
>
> I would suggest o keep the queued ports, but add methods to reserve
> resources, query if it has free space, and a way to register callbacks so
> that the MemObject is made aware when packets are sent. That way we can use
> the queue in the cache, memory controller etc, without having all the
> issues of the “naked” port interface, but still enforcing a bounded queue.
>
> When a packet arrives to the module we call reserve on the output port.
> Then when we actually add the packet we know that there is space. When
> request packets arrive we check if the queue is full, and if so we block
> any new requests. Then through the callback we can unblock the DRAM
> controller in this case.
>
> What do you think?
>
> Andreas
>
> From: Joel Hestness <***@gmail.com>
> Date: Tuesday, 2 February 2016 at 00:24
> To: Andreas Hansson <***@arm.com>
> Cc: gem5 Developer List <gem5-***@gem5.org>
> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi Andreas,
> I'd like to circle back on the thread about removing the QueuedSlavePort
> response queue from DRAMCtrl. I've been working to shift over to DRAMCtrl
> from the RubyMemoryController, but nearly all of my simulations now crash
> on the DRAMCtrl's response queue. Since I need the DRAMCtrl to work, I'll
> be looking into this now. However, based on my inspection of the code, it
> looks pretty non-trivial to remove the QueueSlavePort, so I'm hoping you
> can at least help me work through the changes.
>
> To reproduce the issue, I've put together a slim gem5 patch (attached)
> to use the memtest.py script to generate accesses. Here's the command line
> I used:
>
> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> configs/example/memtest.py -u 100
>
> If you're still willing to take a stab at it, let me know if/how I can
> help. Otherwise, I'll start working on it. It seems the trickiest thing is
> going to be modeling the arbitrary frontendLatency and backendLatency while
> still counting all of the accesses that are in the controller when it needs
> to block back to the input queue. These latencies are currently assessed
> with scheduling in the port response queue. Any suggestions you could give
> would be appreciated.
>
> Thanks!
> Joel
>
>
> Below here is our conversation from the email thread "[gem5-dev] Review
> Request 3116: ruby: RubyMemoryControl delete requests"
>
> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com>
> wrote:
>
>> Great. Thanks Joel.
>>
>> If anything pops up on our side I’ll let you know.
>>
>> Andreas
>>
>> From: Joel Hestness <***@gmail.com>
>> Date: Wednesday, 23 September 2015 20:29
>>
>> To: Andreas Hansson <***@arm.com>
>> Cc: gem5 Developer List <gem5-***@gem5.org>
>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>> delete requests
>>
>>
>>
>>> I don’t think there is any big difference in our expectations, quite the
>>> contrary :-). GPUs are very important to us (and so is throughput computing
>>> in general), and we run plenty simulations with lots of memory-level
>>> parallelism from non-CPU components. Still, we haven’t run into the issue.
>>>
>>
>> Ok, cool. Thanks for the context.
>>
>>
>> If you have practical examples that run into problems let me know, and
>>> we’ll get it fixed.
>>>
>>
>> I'm having trouble assembling a practical example (with or without using
>> gem5-gpu). I'll keep you posted if I find something reasonable.
>>
>> Thanks!
>> Joel
>>
>>
>>
>>> From: Joel Hestness <***@gmail.com>
>>> Date: Tuesday, 22 September 2015 19:58
>>>
>>> To: Andreas Hansson <***@arm.com>
>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>> delete requests
>>>
>>> Hi Andreas,
>>>
>>>
>>>> If it is a real problem affecting end users I am indeed volunteering to
>>>> fix the DRAMCtrl use of QueuedSlavePort. In the classic memory system there
>>>> are enough points of regulation (LSQs, MSHR limits, crossbar layers etc)
>>>> that having a single memory channel with >100 queued up responses waiting
>>>> to be sent is extremely unlikely. Hence, until now the added complexity has
>>>> not been needed. If there is regulation on the number of requests in Ruby,
>>>> then I would argue that it is equally unlikely there…I could be wrong.
>>>>
>>>
>>> Ok. I think a big part of the difference between our expectations is
>>> just the cores that we're modeling. AMD and gem5-gpu can model aggressive
>>> GPU cores with potential to expose, perhaps, 4-32x more memory-level
>>> parallel requests than a comparable number of multithreaded CPU cores. I
>>> feel that this difference warrants different handling of accesses in the
>>> memory controller.
>>>
>>> Joel
>>>
>>>
>>>
>>> From: Joel Hestness <***@gmail.com>
>>>> Date: Tuesday, 22 September 2015 17:48
>>>>
>>>> To: Andreas Hansson <***@arm.com>
>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>> delete requests
>>>>
>>>> Hi Andreas,
>>>>
>>>> Thanks for the "ship it!"
>>>>
>>>>
>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl? It
>>>>> will make the controller more complex, and I don’t want to do it “just in
>>>>> case”.
>>>>>
>>>>
>>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm not
>>>> sure who should make that change, but I think it should get done. The
>>>> memory access response path starts at the DRAMCtrl and ends at the
>>>> RubyPort. If we add control flow to the RubyPort, packets will probably
>>>> back-up more quickly on the response path back to where there are open
>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more prevalent as
>>>> Ruby adds flow control, unless we add a limitation on outstanding requests
>>>> to memory from directory controllers.
>>>>
>>>> How does the classic memory model deal with this?
>>>>
>>>> Joel
>>>>
>>>>
>>>>
>>>>> From: Joel Hestness <***@gmail.com>
>>>>> Date: Tuesday, 22 September 2015 17:30
>>>>> To: Andreas Hansson <***@arm.com>
>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>>
>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>> delete requests
>>>>>
>>>>> Hi guys,
>>>>> Thanks for the discussion here. I had quickly tested other memory
>>>>> controllers, but hadn't connected the dots that this might be the same
>>>>> problem Brad/AMD are running into.
>>>>>
>>>>> My preference would be that we remove the QueuedSlavePort from the
>>>>> DRAMCtrls. That would at least eliminate DRAMCtrls as a potential source of
>>>>> the QueueSlavePort packet overflows, and would allow us to more closely
>>>>> focus on the RubyPort problem when we get to it.
>>>>>
>>>>> Can we reach resolution on this patch though? Are we okay with
>>>>> actually fixing the memory leak in mainline?
>>>>>
>>>>> Joel
>>>>>
>>>>>
>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>>>>> ***@arm.com> wrote:
>>>>>
>>>>>> Hi Brad,
>>>>>>
>>>>>> We can remove the use of QueuedSlavePort in the memory controller and
>>>>>> simply not accept requests if the response queue is full. Is this
>>>>>> needed?
>>>>>> If so we’ll make sure someone gets this in place. The only reason we
>>>>>> haven’t done it is because it hasn’t been needed.
>>>>>>
>>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
>>>>>> story. I
>>>>>> think most of these can be removed and actually use flow control. I’m
>>>>>> happy to code it up, but there is such a flux at the moment that I
>>>>>> didn’t
>>>>>> want to post yet another patch changing the Ruby port. I really do
>>>>>> think
>>>>>> we should avoid having implicit buffers for 1000’s of kilobytes to the
>>>>>> largest extend possible. If we really need a constructor parameter to
>>>>>> make
>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
>>>>>>
>>>>>> Andreas
>>>>>>
>>>>>>
>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:
>>>>>>
>>>>>> >From AMD's perspective, we have deprecated our usage of
>>>>>> RubyMemoryControl
>>>>>> >and we are using the new Memory Controllers with the port interface.
>>>>>> >
>>>>>> >That being said, I completely agree with Joel that the packet queue
>>>>>> >finite invisible buffer limit of 100 needs to go! As you know, we
>>>>>> tried
>>>>>> >very hard several months ago to essentially make this a infinite
>>>>>> buffer,
>>>>>> >but Andreas would not allow us to check it in. We are going to post
>>>>>> that
>>>>>> >patch again in a few weeks when we post our GPU model. Our GPU model
>>>>>> >will not work unless we increase that limit.
>>>>>> >
>>>>>> >Andreas you keep arguing that if you exceed that limit, that
>>>>>> something is
>>>>>> >fundamentally broken. Please keep in mind that there are many uses
>>>>>> of
>>>>>> >gem5 beyond what you use it for. Also this is a research simulator
>>>>>> and
>>>>>> >we should not restrict ourselves to what we think is practical in
>>>>>> real
>>>>>> >hardware. Finally, the fact that the finite limit is invisible to
>>>>>> the
>>>>>> >producer is just bad software engineering.
>>>>>> >
>>>>>> >I beg you to please allow us to remove this finite invisible limit!
>>>>>> >
>>>>>> >Brad
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >-----Original Message-----
>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
>>>>>> Andreas
>>>>>> >Hansson
>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>>>>>> >To: Andreas Hansson; Default; Joel Hestness
>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>>> >delete requests
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it not
>>>>>> been
>>>>>> >>deprecated long enough?
>>>>>> >>
>>>>>> >> Joel Hestness wrote:
>>>>>> >> Unless I'm overlooking something, for Ruby users, I don't see
>>>>>> other
>>>>>> >>memory controllers that are guaranteed to work. Besides
>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their input
>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>>>>>> PacketQueue
>>>>>> >>size restrictions can be exceeded (triggering the panic). This
>>>>>> occurs
>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu, and
>>>>>> appears
>>>>>> >>difficult to fix in a systematic way.
>>>>>> >>
>>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
>>>>>> this is
>>>>>> >>a necessary fix.
>>>>>> >
>>>>>> >No memory controller is using QueuedSlaavePort for any _input_
>>>>>> queues.
>>>>>> >The DRAMCtrl class uses it for the response _output_ queue, that's
>>>>>> all.
>>>>>> >If that is really an issue we can move away from it and enfore an
>>>>>> upper
>>>>>> >bound on responses by not accepting new requests. That said, if we
>>>>>> hit
>>>>>> >the limit I would argue something else is fundamentally broken in the
>>>>>> >system and should be addressed.
>>>>>> >
>>>>>> >In any case, the discussion whether to remove RubyMemoryControl or
>>>>>> not
>>>>>> >should be completely decoupled.
>>>>>> >
>>>>>> >
>>>>>> >- Andreas
>>>>>>
>>>>>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Joel Hestness
2016-02-05 18:03:47 UTC
Permalink
Hi guys,
Quick updates on this:
1) I have a finite response buffer implementation working. I removed the
QueuedSlavePort and added a response queue with reservation (Andreas'
underlying suggestion). I have a question with this solution: The
QueuedSlavePort prioritized responses based their scheduled response time.
However, since writes have a shorter pipeline from request to response,
this architecture prioritized write requests ahead of read requests
received earlier, and it performs ~1-8% worse than a strict queue (what
I've implemented at this point). I can make the response queue a priority
queue if we want the same structure as previously, but I'm wondering if we
might prefer to just have the better-performing strict queue.

2) To reflect on Andreas' specific suggestion of using unblock callbacks
from the PacketQueue: Modifying the QueuedSlavePort with callbacks is ugly
when trying to call the callback: The call needs to originate from
PacketQueue::sendDeferredPacket(), but PacketQueue doesn't have a pointer
to the owner component; The SlavePort has the pointer, so the PacketQueue
would need to first callback to the port, which would call the owner
component callback.
The exercise getting this to work has solidified my opinion that the
Queued*Ports should probably be removed from the codebase: Queues and ports
are separate subcomponents of simulated components, and only the component
knows how they should interact. Including a Queued*Port inside a component
requires the component to manage the flow-control into the Queued*Port just
as it would need to manage a standard port anyway, and hiding the queue in
the port obfuscates how it is managed.


Thanks!
Joel


On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com> wrote:

> Hi Andreas,
> Thanks for the input. I had tried adding front- and back-end queues
> within the DRAMCtrl, but it became very difficult to propagate the flow
> control back through the component due to the complicated implementation of
> timing across different accessAndRespond() calls. I had to put this
> solution on hold.
>
> I think your proposed solution should simplify the flow control issue,
> and should have the derivative effect of making the Queued*Ports capable of
> flow control. I'm a little concerned that your solution would make the
> buffering very fluid, and I'm not sufficiently familiar with memory
> controller microarchitecture to know if that would be realistic. I wonder
> if you might have a way to do performance validation after I work through
> either of these implementations.
>
> Thanks!
> Joel
>
>
>
> On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <***@arm.com>
> wrote:
>
>> Hi Joel,
>>
>> I would suggest o keep the queued ports, but add methods to reserve
>> resources, query if it has free space, and a way to register callbacks so
>> that the MemObject is made aware when packets are sent. That way we can use
>> the queue in the cache, memory controller etc, without having all the
>> issues of the “naked” port interface, but still enforcing a bounded queue.
>>
>> When a packet arrives to the module we call reserve on the output port.
>> Then when we actually add the packet we know that there is space. When
>> request packets arrive we check if the queue is full, and if so we block
>> any new requests. Then through the callback we can unblock the DRAM
>> controller in this case.
>>
>> What do you think?
>>
>> Andreas
>>
>> From: Joel Hestness <***@gmail.com>
>> Date: Tuesday, 2 February 2016 at 00:24
>> To: Andreas Hansson <***@arm.com>
>> Cc: gem5 Developer List <gem5-***@gem5.org>
>> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
>>
>> Hi Andreas,
>> I'd like to circle back on the thread about removing the
>> QueuedSlavePort response queue from DRAMCtrl. I've been working to shift
>> over to DRAMCtrl from the RubyMemoryController, but nearly all of my
>> simulations now crash on the DRAMCtrl's response queue. Since I need the
>> DRAMCtrl to work, I'll be looking into this now. However, based on my
>> inspection of the code, it looks pretty non-trivial to remove the
>> QueueSlavePort, so I'm hoping you can at least help me work through the
>> changes.
>>
>> To reproduce the issue, I've put together a slim gem5 patch (attached)
>> to use the memtest.py script to generate accesses. Here's the command line
>> I used:
>>
>> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
>> configs/example/memtest.py -u 100
>>
>> If you're still willing to take a stab at it, let me know if/how I can
>> help. Otherwise, I'll start working on it. It seems the trickiest thing is
>> going to be modeling the arbitrary frontendLatency and backendLatency while
>> still counting all of the accesses that are in the controller when it needs
>> to block back to the input queue. These latencies are currently assessed
>> with scheduling in the port response queue. Any suggestions you could give
>> would be appreciated.
>>
>> Thanks!
>> Joel
>>
>>
>> Below here is our conversation from the email thread "[gem5-dev] Review
>> Request 3116: ruby: RubyMemoryControl delete requests"
>>
>> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com
>> > wrote:
>>
>>> Great. Thanks Joel.
>>>
>>> If anything pops up on our side I’ll let you know.
>>>
>>> Andreas
>>>
>>> From: Joel Hestness <***@gmail.com>
>>> Date: Wednesday, 23 September 2015 20:29
>>>
>>> To: Andreas Hansson <***@arm.com>
>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>> delete requests
>>>
>>>
>>>
>>>> I don’t think there is any big difference in our expectations, quite
>>>> the contrary :-). GPUs are very important to us (and so is throughput
>>>> computing in general), and we run plenty simulations with lots of
>>>> memory-level parallelism from non-CPU components. Still, we haven’t run
>>>> into the issue.
>>>>
>>>
>>> Ok, cool. Thanks for the context.
>>>
>>>
>>> If you have practical examples that run into problems let me know, and
>>>> we’ll get it fixed.
>>>>
>>>
>>> I'm having trouble assembling a practical example (with or without using
>>> gem5-gpu). I'll keep you posted if I find something reasonable.
>>>
>>> Thanks!
>>> Joel
>>>
>>>
>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>> Date: Tuesday, 22 September 2015 19:58
>>>>
>>>> To: Andreas Hansson <***@arm.com>
>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>> delete requests
>>>>
>>>> Hi Andreas,
>>>>
>>>>
>>>>> If it is a real problem affecting end users I am indeed volunteering
>>>>> to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory system
>>>>> there are enough points of regulation (LSQs, MSHR limits, crossbar layers
>>>>> etc) that having a single memory channel with >100 queued up responses
>>>>> waiting to be sent is extremely unlikely. Hence, until now the added
>>>>> complexity has not been needed. If there is regulation on the number of
>>>>> requests in Ruby, then I would argue that it is equally unlikely there…I
>>>>> could be wrong.
>>>>>
>>>>
>>>> Ok. I think a big part of the difference between our expectations is
>>>> just the cores that we're modeling. AMD and gem5-gpu can model aggressive
>>>> GPU cores with potential to expose, perhaps, 4-32x more memory-level
>>>> parallel requests than a comparable number of multithreaded CPU cores. I
>>>> feel that this difference warrants different handling of accesses in the
>>>> memory controller.
>>>>
>>>> Joel
>>>>
>>>>
>>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>>> Date: Tuesday, 22 September 2015 17:48
>>>>>
>>>>> To: Andreas Hansson <***@arm.com>
>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>> delete requests
>>>>>
>>>>> Hi Andreas,
>>>>>
>>>>> Thanks for the "ship it!"
>>>>>
>>>>>
>>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
>>>>>> It will make the controller more complex, and I don’t want to do it “just
>>>>>> in case”.
>>>>>>
>>>>>
>>>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm
>>>>> not sure who should make that change, but I think it should get done. The
>>>>> memory access response path starts at the DRAMCtrl and ends at the
>>>>> RubyPort. If we add control flow to the RubyPort, packets will probably
>>>>> back-up more quickly on the response path back to where there are open
>>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more prevalent as
>>>>> Ruby adds flow control, unless we add a limitation on outstanding requests
>>>>> to memory from directory controllers.
>>>>>
>>>>> How does the classic memory model deal with this?
>>>>>
>>>>> Joel
>>>>>
>>>>>
>>>>>
>>>>>> From: Joel Hestness <***@gmail.com>
>>>>>> Date: Tuesday, 22 September 2015 17:30
>>>>>> To: Andreas Hansson <***@arm.com>
>>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>>>
>>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>>> delete requests
>>>>>>
>>>>>> Hi guys,
>>>>>> Thanks for the discussion here. I had quickly tested other memory
>>>>>> controllers, but hadn't connected the dots that this might be the same
>>>>>> problem Brad/AMD are running into.
>>>>>>
>>>>>> My preference would be that we remove the QueuedSlavePort from the
>>>>>> DRAMCtrls. That would at least eliminate DRAMCtrls as a potential source of
>>>>>> the QueueSlavePort packet overflows, and would allow us to more closely
>>>>>> focus on the RubyPort problem when we get to it.
>>>>>>
>>>>>> Can we reach resolution on this patch though? Are we okay with
>>>>>> actually fixing the memory leak in mainline?
>>>>>>
>>>>>> Joel
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>>>>>> ***@arm.com> wrote:
>>>>>>
>>>>>>> Hi Brad,
>>>>>>>
>>>>>>> We can remove the use of QueuedSlavePort in the memory controller and
>>>>>>> simply not accept requests if the response queue is full. Is this
>>>>>>> needed?
>>>>>>> If so we’ll make sure someone gets this in place. The only reason we
>>>>>>> haven’t done it is because it hasn’t been needed.
>>>>>>>
>>>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
>>>>>>> story. I
>>>>>>> think most of these can be removed and actually use flow control. I’m
>>>>>>> happy to code it up, but there is such a flux at the moment that I
>>>>>>> didn’t
>>>>>>> want to post yet another patch changing the Ruby port. I really do
>>>>>>> think
>>>>>>> we should avoid having implicit buffers for 1000’s of kilobytes to
>>>>>>> the
>>>>>>> largest extend possible. If we really need a constructor parameter
>>>>>>> to make
>>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
>>>>>>>
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> >From AMD's perspective, we have deprecated our usage of
>>>>>>> RubyMemoryControl
>>>>>>> >and we are using the new Memory Controllers with the port interface.
>>>>>>> >
>>>>>>> >That being said, I completely agree with Joel that the packet queue
>>>>>>> >finite invisible buffer limit of 100 needs to go! As you know, we
>>>>>>> tried
>>>>>>> >very hard several months ago to essentially make this a infinite
>>>>>>> buffer,
>>>>>>> >but Andreas would not allow us to check it in. We are going to
>>>>>>> post that
>>>>>>> >patch again in a few weeks when we post our GPU model. Our GPU
>>>>>>> model
>>>>>>> >will not work unless we increase that limit.
>>>>>>> >
>>>>>>> >Andreas you keep arguing that if you exceed that limit, that
>>>>>>> something is
>>>>>>> >fundamentally broken. Please keep in mind that there are many uses
>>>>>>> of
>>>>>>> >gem5 beyond what you use it for. Also this is a research simulator
>>>>>>> and
>>>>>>> >we should not restrict ourselves to what we think is practical in
>>>>>>> real
>>>>>>> >hardware. Finally, the fact that the finite limit is invisible to
>>>>>>> the
>>>>>>> >producer is just bad software engineering.
>>>>>>> >
>>>>>>> >I beg you to please allow us to remove this finite invisible limit!
>>>>>>> >
>>>>>>> >Brad
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >-----Original Message-----
>>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
>>>>>>> Andreas
>>>>>>> >Hansson
>>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>>>>>>> >To: Andreas Hansson; Default; Joel Hestness
>>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>>>> >delete requests
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it
>>>>>>> not been
>>>>>>> >>deprecated long enough?
>>>>>>> >>
>>>>>>> >> Joel Hestness wrote:
>>>>>>> >> Unless I'm overlooking something, for Ruby users, I don't see
>>>>>>> other
>>>>>>> >>memory controllers that are guaranteed to work. Besides
>>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their input
>>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>>>>>>> PacketQueue
>>>>>>> >>size restrictions can be exceeded (triggering the panic). This
>>>>>>> occurs
>>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu, and
>>>>>>> appears
>>>>>>> >>difficult to fix in a systematic way.
>>>>>>> >>
>>>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
>>>>>>> this is
>>>>>>> >>a necessary fix.
>>>>>>> >
>>>>>>> >No memory controller is using QueuedSlaavePort for any _input_
>>>>>>> queues.
>>>>>>> >The DRAMCtrl class uses it for the response _output_ queue, that's
>>>>>>> all.
>>>>>>> >If that is really an issue we can move away from it and enfore an
>>>>>>> upper
>>>>>>> >bound on responses by not accepting new requests. That said, if we
>>>>>>> hit
>>>>>>> >the limit I would argue something else is fundamentally broken in
>>>>>>> the
>>>>>>> >system and should be addressed.
>>>>>>> >
>>>>>>> >In any case, the discussion whether to remove RubyMemoryControl or
>>>>>>> not
>>>>>>> >should be completely decoupled.
>>>>>>> >
>>>>>>> >
>>>>>>> >- Andreas
>>>>>>>
>>>>>>
>>
>> --
>> Joel Hestness
>> PhD Candidate, Computer Architecture
>> Dept. of Computer Science, University of Wisconsin - Madison
>> http://pages.cs.wisc.edu/~hestness/
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Gross, Joe
2016-02-08 18:33:08 UTC
Permalink
Hi Joel,

I'd be curious to see a patch of what you're proposing as I'm not sure I really follow what you're doing. The reason I ask is because I have been discussing an implementation with with Brad and would like to see how similar it is to what you have. Namely it's an idea similar to what is commonly used in hardware, where senders have tokens that correspond to slots in the receiver queue so the reservation happens at startup. The only communication that goes from a receiving port back to a sender is token return. The port and queue would still be coupled and the device which owns the Queued*Port would manage removal from the PacketQueue. In my experience, this is a very effective mechanism for flow control and addresses your point about transparency of the queue and its state.
The tokens removes the need for unblock callbacks, but it's the responsibility of the receiver not to send when the queue is full or when it has a conflicting request. There's no implementation yet, but the simplicity and similarity to hardware techniques may prove useful. Anyway, could you post something so I can better understand what you've described? Please don't get rid of the Queued*Ports, as I think there is a simple way to improve them to do efficient flow control.

Joe
________________________________________
From: gem5-dev <gem5-dev-***@gem5.org> on behalf of Joel Hestness <***@gmail.com>
Sent: Friday, February 5, 2016 12:03 PM
To: Andreas Hansson
Cc: gem5 Developer List
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi guys,
Quick updates on this:
1) I have a finite response buffer implementation working. I removed the
QueuedSlavePort and added a response queue with reservation (Andreas'
underlying suggestion). I have a question with this solution: The
QueuedSlavePort prioritized responses based their scheduled response time.
However, since writes have a shorter pipeline from request to response,
this architecture prioritized write requests ahead of read requests
received earlier, and it performs ~1-8% worse than a strict queue (what
I've implemented at this point). I can make the response queue a priority
queue if we want the same structure as previously, but I'm wondering if we
might prefer to just have the better-performing strict queue.

2) To reflect on Andreas' specific suggestion of using unblock callbacks
from the PacketQueue: Modifying the QueuedSlavePort with callbacks is ugly
when trying to call the callback: The call needs to originate from
PacketQueue::sendDeferredPacket(), but PacketQueue doesn't have a pointer
to the owner component; The SlavePort has the pointer, so the PacketQueue
would need to first callback to the port, which would call the owner
component callback.
The exercise getting this to work has solidified my opinion that the
Queued*Ports should probably be removed from the codebase: Queues and ports
are separate subcomponents of simulated components, and only the component
knows how they should interact. Including a Queued*Port inside a component
requires the component to manage the flow-control into the Queued*Port just
as it would need to manage a standard port anyway, and hiding the queue in
the port obfuscates how it is managed.


Thanks!
Joel


On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com> wrote:

> Hi Andreas,
> Thanks for the input. I had tried adding front- and back-end queues
> within the DRAMCtrl, but it became very difficult to propagate the flow
> control back through the component due to the complicated implementation of
> timing across different accessAndRespond() calls. I had to put this
> solution on hold.
>
> I think your proposed solution should simplify the flow control issue,
> and should have the derivative effect of making the Queued*Ports capable of
> flow control. I'm a little concerned that your solution would make the
> buffering very fluid, and I'm not sufficiently familiar with memory
> controller microarchitecture to know if that would be realistic. I wonder
> if you might have a way to do performance validation after I work through
> either of these implementations.
>
> Thanks!
> Joel
>
>
>
> On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <***@arm.com>
> wrote:
>
>> Hi Joel,
>>
>> I would suggest o keep the queued ports, but add methods to reserve
>> resources, query if it has free space, and a way to register callbacks so
>> that the MemObject is made aware when packets are sent. That way we can use
>> the queue in the cache, memory controller etc, without having all the
>> issues of the “naked” port interface, but still enforcing a bounded queue.
>>
>> When a packet arrives to the module we call reserve on the output port.
>> Then when we actually add the packet we know that there is space. When
>> request packets arrive we check if the queue is full, and if so we block
>> any new requests. Then through the callback we can unblock the DRAM
>> controller in this case.
>>
>> What do you think?
>>
>> Andreas
>>
>> From: Joel Hestness <***@gmail.com>
>> Date: Tuesday, 2 February 2016 at 00:24
>> To: Andreas Hansson <***@arm.com>
>> Cc: gem5 Developer List <gem5-***@gem5.org>
>> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
>>
>> Hi Andreas,
>> I'd like to circle back on the thread about removing the
>> QueuedSlavePort response queue from DRAMCtrl. I've been working to shift
>> over to DRAMCtrl from the RubyMemoryController, but nearly all of my
>> simulations now crash on the DRAMCtrl's response queue. Since I need the
>> DRAMCtrl to work, I'll be looking into this now. However, based on my
>> inspection of the code, it looks pretty non-trivial to remove the
>> QueueSlavePort, so I'm hoping you can at least help me work through the
>> changes.
>>
>> To reproduce the issue, I've put together a slim gem5 patch (attached)
>> to use the memtest.py script to generate accesses. Here's the command line
>> I used:
>>
>> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
>> configs/example/memtest.py -u 100
>>
>> If you're still willing to take a stab at it, let me know if/how I can
>> help. Otherwise, I'll start working on it. It seems the trickiest thing is
>> going to be modeling the arbitrary frontendLatency and backendLatency while
>> still counting all of the accesses that are in the controller when it needs
>> to block back to the input queue. These latencies are currently assessed
>> with scheduling in the port response queue. Any suggestions you could give
>> would be appreciated.
>>
>> Thanks!
>> Joel
>>
>>
>> Below here is our conversation from the email thread "[gem5-dev] Review
>> Request 3116: ruby: RubyMemoryControl delete requests"
>>
>> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <***@arm.com
>> > wrote:
>>
>>> Great. Thanks Joel.
>>>
>>> If anything pops up on our side I’ll let you know.
>>>
>>> Andreas
>>>
>>> From: Joel Hestness <***@gmail.com>
>>> Date: Wednesday, 23 September 2015 20:29
>>>
>>> To: Andreas Hansson <***@arm.com>
>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>> delete requests
>>>
>>>
>>>
>>>> I don’t think there is any big difference in our expectations, quite
>>>> the contrary :-). GPUs are very important to us (and so is throughput
>>>> computing in general), and we run plenty simulations with lots of
>>>> memory-level parallelism from non-CPU components. Still, we haven’t run
>>>> into the issue.
>>>>
>>>
>>> Ok, cool. Thanks for the context.
>>>
>>>
>>> If you have practical examples that run into problems let me know, and
>>>> we’ll get it fixed.
>>>>
>>>
>>> I'm having trouble assembling a practical example (with or without using
>>> gem5-gpu). I'll keep you posted if I find something reasonable.
>>>
>>> Thanks!
>>> Joel
>>>
>>>
>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>> Date: Tuesday, 22 September 2015 19:58
>>>>
>>>> To: Andreas Hansson <***@arm.com>
>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>> delete requests
>>>>
>>>> Hi Andreas,
>>>>
>>>>
>>>>> If it is a real problem affecting end users I am indeed volunteering
>>>>> to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory system
>>>>> there are enough points of regulation (LSQs, MSHR limits, crossbar layers
>>>>> etc) that having a single memory channel with >100 queued up responses
>>>>> waiting to be sent is extremely unlikely. Hence, until now the added
>>>>> complexity has not been needed. If there is regulation on the number of
>>>>> requests in Ruby, then I would argue that it is equally unlikely there…I
>>>>> could be wrong.
>>>>>
>>>>
>>>> Ok. I think a big part of the difference between our expectations is
>>>> just the cores that we're modeling. AMD and gem5-gpu can model aggressive
>>>> GPU cores with potential to expose, perhaps, 4-32x more memory-level
>>>> parallel requests than a comparable number of multithreaded CPU cores. I
>>>> feel that this difference warrants different handling of accesses in the
>>>> memory controller.
>>>>
>>>> Joel
>>>>
>>>>
>>>>
>>>> From: Joel Hestness <***@gmail.com>
>>>>> Date: Tuesday, 22 September 2015 17:48
>>>>>
>>>>> To: Andreas Hansson <***@arm.com>
>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>> delete requests
>>>>>
>>>>> Hi Andreas,
>>>>>
>>>>> Thanks for the "ship it!"
>>>>>
>>>>>
>>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
>>>>>> It will make the controller more complex, and I don’t want to do it “just
>>>>>> in case”.
>>>>>>
>>>>>
>>>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm
>>>>> not sure who should make that change, but I think it should get done. The
>>>>> memory access response path starts at the DRAMCtrl and ends at the
>>>>> RubyPort. If we add control flow to the RubyPort, packets will probably
>>>>> back-up more quickly on the response path back to where there are open
>>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more prevalent as
>>>>> Ruby adds flow control, unless we add a limitation on outstanding requests
>>>>> to memory from directory controllers.
>>>>>
>>>>> How does the classic memory model deal with this?
>>>>>
>>>>> Joel
>>>>>
>>>>>
>>>>>
>>>>>> From: Joel Hestness <***@gmail.com>
>>>>>> Date: Tuesday, 22 September 2015 17:30
>>>>>> To: Andreas Hansson <***@arm.com>
>>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
>>>>>>
>>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>>> delete requests
>>>>>>
>>>>>> Hi guys,
>>>>>> Thanks for the discussion here. I had quickly tested other memory
>>>>>> controllers, but hadn't connected the dots that this might be the same
>>>>>> problem Brad/AMD are running into.
>>>>>>
>>>>>> My preference would be that we remove the QueuedSlavePort from the
>>>>>> DRAMCtrls. That would at least eliminate DRAMCtrls as a potential source of
>>>>>> the QueueSlavePort packet overflows, and would allow us to more closely
>>>>>> focus on the RubyPort problem when we get to it.
>>>>>>
>>>>>> Can we reach resolution on this patch though? Are we okay with
>>>>>> actually fixing the memory leak in mainline?
>>>>>>
>>>>>> Joel
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>>>>>> ***@arm.com> wrote:
>>>>>>
>>>>>>> Hi Brad,
>>>>>>>
>>>>>>> We can remove the use of QueuedSlavePort in the memory controller and
>>>>>>> simply not accept requests if the response queue is full. Is this
>>>>>>> needed?
>>>>>>> If so we’ll make sure someone gets this in place. The only reason we
>>>>>>> haven’t done it is because it hasn’t been needed.
>>>>>>>
>>>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
>>>>>>> story. I
>>>>>>> think most of these can be removed and actually use flow control. I’m
>>>>>>> happy to code it up, but there is such a flux at the moment that I
>>>>>>> didn’t
>>>>>>> want to post yet another patch changing the Ruby port. I really do
>>>>>>> think
>>>>>>> we should avoid having implicit buffers for 1000’s of kilobytes to
>>>>>>> the
>>>>>>> largest extend possible. If we really need a constructor parameter
>>>>>>> to make
>>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
>>>>>>>
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> >From AMD's perspective, we have deprecated our usage of
>>>>>>> RubyMemoryControl
>>>>>>> >and we are using the new Memory Controllers with the port interface.
>>>>>>> >
>>>>>>> >That being said, I completely agree with Joel that the packet queue
>>>>>>> >finite invisible buffer limit of 100 needs to go! As you know, we
>>>>>>> tried
>>>>>>> >very hard several months ago to essentially make this a infinite
>>>>>>> buffer,
>>>>>>> >but Andreas would not allow us to check it in. We are going to
>>>>>>> post that
>>>>>>> >patch again in a few weeks when we post our GPU model. Our GPU
>>>>>>> model
>>>>>>> >will not work unless we increase that limit.
>>>>>>> >
>>>>>>> >Andreas you keep arguing that if you exceed that limit, that
>>>>>>> something is
>>>>>>> >fundamentally broken. Please keep in mind that there are many uses
>>>>>>> of
>>>>>>> >gem5 beyond what you use it for. Also this is a research simulator
>>>>>>> and
>>>>>>> >we should not restrict ourselves to what we think is practical in
>>>>>>> real
>>>>>>> >hardware. Finally, the fact that the finite limit is invisible to
>>>>>>> the
>>>>>>> >producer is just bad software engineering.
>>>>>>> >
>>>>>>> >I beg you to please allow us to remove this finite invisible limit!
>>>>>>> >
>>>>>>> >Brad
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >-----Original Message-----
>>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
>>>>>>> Andreas
>>>>>>> >Hansson
>>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>>>>>>> >To: Andreas Hansson; Default; Joel Hestness
>>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
>>>>>>> >delete requests
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it
>>>>>>> not been
>>>>>>> >>deprecated long enough?
>>>>>>> >>
>>>>>>> >> Joel Hestness wrote:
>>>>>>> >> Unless I'm overlooking something, for Ruby users, I don't see
>>>>>>> other
>>>>>>> >>memory controllers that are guaranteed to work. Besides
>>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their input
>>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>>>>>>> PacketQueue
>>>>>>> >>size restrictions can be exceeded (triggering the panic). This
>>>>>>> occurs
>>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu, and
>>>>>>> appears
>>>>>>> >>difficult to fix in a systematic way.
>>>>>>> >>
>>>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
>>>>>>> this is
>>>>>>> >>a necessary fix.
>>>>>>> >
>>>>>>> >No memory controller is using QueuedSlaavePort for any _input_
>>>>>>> queues.
>>>>>>> >The DRAMCtrl class uses it for the response _output_ queue, that's
>>>>>>> all.
>>>>>>> >If that is really an issue we can move away from it and enfore an
>>>>>>> upper
>>>>>>> >bound on responses by not accepting new requests. That said, if we
>>>>>>> hit
>>>>>>> >the limit I would argue something else is fundamentally broken in
>>>>>>> the
>>>>>>> >system and should be addressed.
>>>>>>> >
>>>>>>> >In any case, the discussion whether to remove RubyMemoryControl or
>>>>>>> not
>>>>>>> >should be completely decoupled.
>>>>>>> >
>>>>>>> >
>>>>>>> >- Andreas
>>>>>>>
>>>>>>
>>
>> --
>> Joel Hestness
>> PhD Candidate, Computer Architecture
>> Dept. of Computer Science, University of Wisconsin - Madison
>> http://pages.cs.wisc.edu/~hestness/
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
Joel Hestness
2016-02-08 20:15:43 UTC
Permalink
Hi guys,
I just posted a draft of my DRAMCtrl flow-control patch so you can take a
look here: http://reviews.gem5.org/r/3315/

NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from
directories to memory controllers into a MasterPort, and it places a
MessageBuffer in front of the MasterPort, so that the user can make all
buffering finite within a Ruby memory hierarchy. I still need to merge this
patch with gem5, before I can share it. Let me know if you'd like to see
the draft there also.

@Joe:


> I'd be curious to see a patch of what you're proposing as I'm not sure I
> really follow what you're doing. The reason I ask is because I have been
> discussing an implementation with with Brad and would like to see how
> similar it is to what you have. Namely it's an idea similar to what is
> commonly used in hardware, where senders have tokens that correspond to
> slots in the receiver queue so the reservation happens at startup. The only
> communication that goes from a receiving port back to a sender is token
> return. The port and queue would still be coupled and the device which owns
> the Queued*Port would manage removal from the PacketQueue. In my
> experience, this is a very effective mechanism for flow control and
> addresses your point about transparency of the queue and its state.
> The tokens removes the need for unblock callbacks, but it's the
> responsibility of the receiver not to send when the queue is full or when
> it has a conflicting request. There's no implementation yet, but the
> simplicity and similarity to hardware techniques may prove useful. Anyway,
> could you post something so I can better understand what you've described?


My implementation effectively does what you're describing: The DRAMCtrl now
has a finite number of buffers (i.e. tokens), and it allocates a buffer
slot when a request is received (senders spend a token when the DRAMCtrl
accepts a request). The only real difference is that the DRAMCtrl now
implements a SlavePort with flow control consistent with the rest of gem5,
so if there are no buffer slots available, the request is nacked and a
retry must be sent (i.e. a token is returned).


Please don't get rid of the Queued*Ports, as I think there is a simple way
> to improve them to do efficient flow control.
>

Heh... not sure I have the time/motivation to remove the Queued*Ports
myself. I've just been swapping out the Queued*Ports that break when trying
to implement finite buffering in a Ruby memory hierarchy. I'll leave
Queued*Ports for later fixing or removal, as appropriate.


Joel


________________________________________
> From: gem5-dev <gem5-dev-***@gem5.org> on behalf of Joel Hestness <
> ***@gmail.com>
> Sent: Friday, February 5, 2016 12:03 PM
> To: Andreas Hansson
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi guys,
> Quick updates on this:
> 1) I have a finite response buffer implementation working. I removed the
> QueuedSlavePort and added a response queue with reservation (Andreas'
> underlying suggestion). I have a question with this solution: The
> QueuedSlavePort prioritized responses based their scheduled response time.
> However, since writes have a shorter pipeline from request to response,
> this architecture prioritized write requests ahead of read requests
> received earlier, and it performs ~1-8% worse than a strict queue (what
> I've implemented at this point). I can make the response queue a priority
> queue if we want the same structure as previously, but I'm wondering if we
> might prefer to just have the better-performing strict queue.
>
> 2) To reflect on Andreas' specific suggestion of using unblock callbacks
> from the PacketQueue: Modifying the QueuedSlavePort with callbacks is ugly
> when trying to call the callback: The call needs to originate from
> PacketQueue::sendDeferredPacket(), but PacketQueue doesn't have a pointer
> to the owner component; The SlavePort has the pointer, so the PacketQueue
> would need to first callback to the port, which would call the owner
> component callback.
> The exercise getting this to work has solidified my opinion that the
> Queued*Ports should probably be removed from the codebase: Queues and ports
> are separate subcomponents of simulated components, and only the component
> knows how they should interact. Including a Queued*Port inside a component
> requires the component to manage the flow-control into the Queued*Port just
> as it would need to manage a standard port anyway, and hiding the queue in
> the port obfuscates how it is managed.
>
>
> Thanks!
> Joel
>
>
> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com>
> wrote:
>
> > Hi Andreas,
> > Thanks for the input. I had tried adding front- and back-end queues
> > within the DRAMCtrl, but it became very difficult to propagate the flow
> > control back through the component due to the complicated implementation
> of
> > timing across different accessAndRespond() calls. I had to put this
> > solution on hold.
> >
> > I think your proposed solution should simplify the flow control issue,
> > and should have the derivative effect of making the Queued*Ports capable
> of
> > flow control. I'm a little concerned that your solution would make the
> > buffering very fluid, and I'm not sufficiently familiar with memory
> > controller microarchitecture to know if that would be realistic. I wonder
> > if you might have a way to do performance validation after I work through
> > either of these implementations.
> >
> > Thanks!
> > Joel
> >
> >
> >
> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> ***@arm.com>
> > wrote:
> >
> >> Hi Joel,
> >>
> >> I would suggest o keep the queued ports, but add methods to reserve
> >> resources, query if it has free space, and a way to register callbacks
> so
> >> that the MemObject is made aware when packets are sent. That way we can
> use
> >> the queue in the cache, memory controller etc, without having all the
> >> issues of the “naked” port interface, but still enforcing a bounded
> queue.
> >>
> >> When a packet arrives to the module we call reserve on the output port.
> >> Then when we actually add the packet we know that there is space. When
> >> request packets arrive we check if the queue is full, and if so we block
> >> any new requests. Then through the callback we can unblock the DRAM
> >> controller in this case.
> >>
> >> What do you think?
> >>
> >> Andreas
> >>
> >> From: Joel Hestness <***@gmail.com>
> >> Date: Tuesday, 2 February 2016 at 00:24
> >> To: Andreas Hansson <***@arm.com>
> >> Cc: gem5 Developer List <gem5-***@gem5.org>
> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> >>
> >> Hi Andreas,
> >> I'd like to circle back on the thread about removing the
> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to shift
> >> over to DRAMCtrl from the RubyMemoryController, but nearly all of my
> >> simulations now crash on the DRAMCtrl's response queue. Since I need the
> >> DRAMCtrl to work, I'll be looking into this now. However, based on my
> >> inspection of the code, it looks pretty non-trivial to remove the
> >> QueueSlavePort, so I'm hoping you can at least help me work through the
> >> changes.
> >>
> >> To reproduce the issue, I've put together a slim gem5 patch (attached)
> >> to use the memtest.py script to generate accesses. Here's the command
> line
> >> I used:
> >>
> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> >> configs/example/memtest.py -u 100
> >>
> >> If you're still willing to take a stab at it, let me know if/how I can
> >> help. Otherwise, I'll start working on it. It seems the trickiest thing
> is
> >> going to be modeling the arbitrary frontendLatency and backendLatency
> while
> >> still counting all of the accesses that are in the controller when it
> needs
> >> to block back to the input queue. These latencies are currently assessed
> >> with scheduling in the port response queue. Any suggestions you could
> give
> >> would be appreciated.
> >>
> >> Thanks!
> >> Joel
> >>
> >>
> >> Below here is our conversation from the email thread "[gem5-dev] Review
> >> Request 3116: ruby: RubyMemoryControl delete requests"
> >>
> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> ***@arm.com
> >> > wrote:
> >>
> >>> Great. Thanks Joel.
> >>>
> >>> If anything pops up on our side I’ll let you know.
> >>>
> >>> Andreas
> >>>
> >>> From: Joel Hestness <***@gmail.com>
> >>> Date: Wednesday, 23 September 2015 20:29
> >>>
> >>> To: Andreas Hansson <***@arm.com>
> >>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >>> delete requests
> >>>
> >>>
> >>>
> >>>> I don’t think there is any big difference in our expectations, quite
> >>>> the contrary :-). GPUs are very important to us (and so is throughput
> >>>> computing in general), and we run plenty simulations with lots of
> >>>> memory-level parallelism from non-CPU components. Still, we haven’t
> run
> >>>> into the issue.
> >>>>
> >>>
> >>> Ok, cool. Thanks for the context.
> >>>
> >>>
> >>> If you have practical examples that run into problems let me know, and
> >>>> we’ll get it fixed.
> >>>>
> >>>
> >>> I'm having trouble assembling a practical example (with or without
> using
> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> >>>
> >>> Thanks!
> >>> Joel
> >>>
> >>>
> >>>
> >>>> From: Joel Hestness <***@gmail.com>
> >>>> Date: Tuesday, 22 September 2015 19:58
> >>>>
> >>>> To: Andreas Hansson <***@arm.com>
> >>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >>>> delete requests
> >>>>
> >>>> Hi Andreas,
> >>>>
> >>>>
> >>>>> If it is a real problem affecting end users I am indeed volunteering
> >>>>> to fix the DRAMCtrl use of QueuedSlavePort. In the classic memory
> system
> >>>>> there are enough points of regulation (LSQs, MSHR limits, crossbar
> layers
> >>>>> etc) that having a single memory channel with >100 queued up
> responses
> >>>>> waiting to be sent is extremely unlikely. Hence, until now the added
> >>>>> complexity has not been needed. If there is regulation on the number
> of
> >>>>> requests in Ruby, then I would argue that it is equally unlikely
> there…I
> >>>>> could be wrong.
> >>>>>
> >>>>
> >>>> Ok. I think a big part of the difference between our expectations is
> >>>> just the cores that we're modeling. AMD and gem5-gpu can model
> aggressive
> >>>> GPU cores with potential to expose, perhaps, 4-32x more memory-level
> >>>> parallel requests than a comparable number of multithreaded CPU
> cores. I
> >>>> feel that this difference warrants different handling of accesses in
> the
> >>>> memory controller.
> >>>>
> >>>> Joel
> >>>>
> >>>>
> >>>>
> >>>> From: Joel Hestness <***@gmail.com>
> >>>>> Date: Tuesday, 22 September 2015 17:48
> >>>>>
> >>>>> To: Andreas Hansson <***@arm.com>
> >>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >>>>> delete requests
> >>>>>
> >>>>> Hi Andreas,
> >>>>>
> >>>>> Thanks for the "ship it!"
> >>>>>
> >>>>>
> >>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
> >>>>>> It will make the controller more complex, and I don’t want to do it
> “just
> >>>>>> in case”.
> >>>>>>
> >>>>>
> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl. I'm
> >>>>> not sure who should make that change, but I think it should get
> done. The
> >>>>> memory access response path starts at the DRAMCtrl and ends at the
> >>>>> RubyPort. If we add control flow to the RubyPort, packets will
> probably
> >>>>> back-up more quickly on the response path back to where there are
> open
> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> prevalent as
> >>>>> Ruby adds flow control, unless we add a limitation on outstanding
> requests
> >>>>> to memory from directory controllers.
> >>>>>
> >>>>> How does the classic memory model deal with this?
> >>>>>
> >>>>> Joel
> >>>>>
> >>>>>
> >>>>>
> >>>>>> From: Joel Hestness <***@gmail.com>
> >>>>>> Date: Tuesday, 22 September 2015 17:30
> >>>>>> To: Andreas Hansson <***@arm.com>
> >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>>>>
> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby: RubyMemoryControl
> >>>>>> delete requests
> >>>>>>
> >>>>>> Hi guys,
> >>>>>> Thanks for the discussion here. I had quickly tested other memory
> >>>>>> controllers, but hadn't connected the dots that this might be the
> same
> >>>>>> problem Brad/AMD are running into.
> >>>>>>
> >>>>>> My preference would be that we remove the QueuedSlavePort from the
> >>>>>> DRAMCtrls. That would at least eliminate DRAMCtrls as a potential
> source of
> >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> closely
> >>>>>> focus on the RubyPort problem when we get to it.
> >>>>>>
> >>>>>> Can we reach resolution on this patch though? Are we okay with
> >>>>>> actually fixing the memory leak in mainline?
> >>>>>>
> >>>>>> Joel
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >>>>>> ***@arm.com> wrote:
> >>>>>>
> >>>>>>> Hi Brad,
> >>>>>>>
> >>>>>>> We can remove the use of QueuedSlavePort in the memory controller
> and
> >>>>>>> simply not accept requests if the response queue is full. Is this
> >>>>>>> needed?
> >>>>>>> If so we’ll make sure someone gets this in place. The only reason
> we
> >>>>>>> haven’t done it is because it hasn’t been needed.
> >>>>>>>
> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole different
> >>>>>>> story. I
> >>>>>>> think most of these can be removed and actually use flow control.
> I’m
> >>>>>>> happy to code it up, but there is such a flux at the moment that I
> >>>>>>> didn’t
> >>>>>>> want to post yet another patch changing the Ruby port. I really do
> >>>>>>> think
> >>>>>>> we should avoid having implicit buffers for 1000’s of kilobytes to
> >>>>>>> the
> >>>>>>> largest extend possible. If we really need a constructor parameter
> >>>>>>> to make
> >>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
> >>>>>>>
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>>
> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> >From AMD's perspective, we have deprecated our usage of
> >>>>>>> RubyMemoryControl
> >>>>>>> >and we are using the new Memory Controllers with the port
> interface.
> >>>>>>> >
> >>>>>>> >That being said, I completely agree with Joel that the packet
> queue
> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you know, we
> >>>>>>> tried
> >>>>>>> >very hard several months ago to essentially make this a infinite
> >>>>>>> buffer,
> >>>>>>> >but Andreas would not allow us to check it in. We are going to
> >>>>>>> post that
> >>>>>>> >patch again in a few weeks when we post our GPU model. Our GPU
> >>>>>>> model
> >>>>>>> >will not work unless we increase that limit.
> >>>>>>> >
> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> >>>>>>> something is
> >>>>>>> >fundamentally broken. Please keep in mind that there are many
> uses
> >>>>>>> of
> >>>>>>> >gem5 beyond what you use it for. Also this is a research
> simulator
> >>>>>>> and
> >>>>>>> >we should not restrict ourselves to what we think is practical in
> >>>>>>> real
> >>>>>>> >hardware. Finally, the fact that the finite limit is invisible to
> >>>>>>> the
> >>>>>>> >producer is just bad software engineering.
> >>>>>>> >
> >>>>>>> >I beg you to please allow us to remove this finite invisible
> limit!
> >>>>>>> >
> >>>>>>> >Brad
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >-----Original Message-----
> >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of
> >>>>>>> Andreas
> >>>>>>> >Hansson
> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> RubyMemoryControl
> >>>>>>> >delete requests
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has it
> >>>>>>> not been
> >>>>>>> >>deprecated long enough?
> >>>>>>> >>
> >>>>>>> >> Joel Hestness wrote:
> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I don't
> see
> >>>>>>> other
> >>>>>>> >>memory controllers that are guaranteed to work. Besides
> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for their
> input
> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> >>>>>>> PacketQueue
> >>>>>>> >>size restrictions can be exceeded (triggering the panic). This
> >>>>>>> occurs
> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu, and
> >>>>>>> appears
> >>>>>>> >>difficult to fix in a systematic way.
> >>>>>>> >>
> >>>>>>> >> Regardless of the fact we've deprecated RubyMemoryControl,
> >>>>>>> this is
> >>>>>>> >>a necessary fix.
> >>>>>>> >
> >>>>>>> >No memory controller is using QueuedSlaavePort for any _input_
> >>>>>>> queues.
> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue, that's
> >>>>>>> all.
> >>>>>>> >If that is really an issue we can move away from it and enfore an
> >>>>>>> upper
> >>>>>>> >bound on responses by not accepting new requests. That said, if we
> >>>>>>> hit
> >>>>>>> >the limit I would argue something else is fundamentally broken in
> >>>>>>> the
> >>>>>>> >system and should be addressed.
> >>>>>>> >
> >>>>>>> >In any case, the discussion whether to remove RubyMemoryControl or
> >>>>>>> not
> >>>>>>> >should be completely decoupled.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >- Andreas
> >>>>>>>
> >>>>>>
> >>
> >> --
> >> Joel Hestness
> >> PhD Candidate, Computer Architecture
> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> http://pages.cs.wisc.edu/~hestness/
> >> IMPORTANT NOTICE: The contents of this email and any attachments are
> >> confidential and may also be privileged. If you are not the intended
> >> recipient, please notify the sender immediately and do not disclose the
> >> contents to any other person, use it for any purpose, or store or copy
> the
> >> information in any medium. Thank you.
> >>
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Poremba, Matthew
2016-02-11 17:21:12 UTC
Permalink
Hi Joel,


I've tried your draft patch and it's working for me in removing backup in the response queue as designed. Thanks for posting it! In regards to the buffersFull() implementation, I can think of a pathological case where the back-end queue is full because the sender is not accepting responses (for whatever reason) but is still issuing requests. buffersFull() will return false in this case and allow the request to be enqueued and eventually scheduled, causing the back-end queue to grow larger than the response_buffer_size parameter.

Perhaps one way to better emulate exchanging tokens (credit) as Joe mentioned is to have buffersFull() "reserve" slots in the queues by making sure there is a slot in both the read queue (or write queue) and a corresponding slot available in the back-end queue. The reservation can be lifted once the response is sent on the port.

Another more aggressive implementation would be to not use buffersFull() and prevent scheduling memory requests from the read/write queue if the back-end queue is full. This would allow a sender to enqueue memory requests even if the back-end queue is full up until the read/write queue fills up, but would require a number of changes to the code.


In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite calls or are you placing a MessageBuffer in AbstractController? I am currently trying out an implementation using the former for a few additional reasons other than flow control.


-Matt

-----Original Message-----
From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Joel Hestness
Sent: Monday, February 08, 2016 12:16 PM
To: Gross, Joe
Cc: gem5 Developer List
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi guys,
I just posted a draft of my DRAMCtrl flow-control patch so you can take a look here: http://reviews.gem5.org/r/3315/

NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from directories to memory controllers into a MasterPort, and it places a MessageBuffer in front of the MasterPort, so that the user can make all buffering finite within a Ruby memory hierarchy. I still need to merge this patch with gem5, before I can share it. Let me know if you'd like to see the draft there also.

@Joe:


> I'd be curious to see a patch of what you're proposing as I'm not sure
> I really follow what you're doing. The reason I ask is because I have
> been discussing an implementation with with Brad and would like to see
> how similar it is to what you have. Namely it's an idea similar to
> what is commonly used in hardware, where senders have tokens that
> correspond to slots in the receiver queue so the reservation happens
> at startup. The only communication that goes from a receiving port
> back to a sender is token return. The port and queue would still be
> coupled and the device which owns the Queued*Port would manage removal
> from the PacketQueue. In my experience, this is a very effective
> mechanism for flow control and addresses your point about transparency of the queue and its state.
> The tokens removes the need for unblock callbacks, but it's the
> responsibility of the receiver not to send when the queue is full or
> when it has a conflicting request. There's no implementation yet, but
> the simplicity and similarity to hardware techniques may prove useful.
> Anyway, could you post something so I can better understand what you've described?


My implementation effectively does what you're describing: The DRAMCtrl now has a finite number of buffers (i.e. tokens), and it allocates a buffer slot when a request is received (senders spend a token when the DRAMCtrl accepts a request). The only real difference is that the DRAMCtrl now implements a SlavePort with flow control consistent with the rest of gem5, so if there are no buffer slots available, the request is nacked and a retry must be sent (i.e. a token is returned).


Please don't get rid of the Queued*Ports, as I think there is a simple way
> to improve them to do efficient flow control.
>

Heh... not sure I have the time/motivation to remove the Queued*Ports myself. I've just been swapping out the Queued*Ports that break when trying to implement finite buffering in a Ruby memory hierarchy. I'll leave Queued*Ports for later fixing or removal, as appropriate.


Joel


________________________________________
> From: gem5-dev <gem5-dev-***@gem5.org> on behalf of Joel Hestness
> < ***@gmail.com>
> Sent: Friday, February 5, 2016 12:03 PM
> To: Andreas Hansson
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> DRAMCtrl
>
> Hi guys,
> Quick updates on this:
> 1) I have a finite response buffer implementation working. I
> removed the QueuedSlavePort and added a response queue with reservation (Andreas'
> underlying suggestion). I have a question with this solution: The
> QueuedSlavePort prioritized responses based their scheduled response time.
> However, since writes have a shorter pipeline from request to
> response, this architecture prioritized write requests ahead of read
> requests received earlier, and it performs ~1-8% worse than a strict
> queue (what I've implemented at this point). I can make the response
> queue a priority queue if we want the same structure as previously,
> but I'm wondering if we might prefer to just have the better-performing strict queue.
>
> 2) To reflect on Andreas' specific suggestion of using unblock
> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> callbacks is ugly when trying to call the callback: The call needs to
> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> doesn't have a pointer to the owner component; The SlavePort has the
> pointer, so the PacketQueue would need to first callback to the port,
> which would call the owner component callback.
> The exercise getting this to work has solidified my opinion that the
> Queued*Ports should probably be removed from the codebase: Queues and
> ports are separate subcomponents of simulated components, and only the
> component knows how they should interact. Including a Queued*Port
> inside a component requires the component to manage the flow-control
> into the Queued*Port just as it would need to manage a standard port
> anyway, and hiding the queue in the port obfuscates how it is managed.
>
>
> Thanks!
> Joel
>
>
> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com>
> wrote:
>
> > Hi Andreas,
> > Thanks for the input. I had tried adding front- and back-end
> > queues within the DRAMCtrl, but it became very difficult to
> > propagate the flow control back through the component due to the
> > complicated implementation
> of
> > timing across different accessAndRespond() calls. I had to put this
> > solution on hold.
> >
> > I think your proposed solution should simplify the flow control
> > issue, and should have the derivative effect of making the
> > Queued*Ports capable
> of
> > flow control. I'm a little concerned that your solution would make
> > the buffering very fluid, and I'm not sufficiently familiar with
> > memory controller microarchitecture to know if that would be
> > realistic. I wonder if you might have a way to do performance
> > validation after I work through either of these implementations.
> >
> > Thanks!
> > Joel
> >
> >
> >
> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> ***@arm.com>
> > wrote:
> >
> >> Hi Joel,
> >>
> >> I would suggest o keep the queued ports, but add methods to reserve
> >> resources, query if it has free space, and a way to register
> >> callbacks
> so
> >> that the MemObject is made aware when packets are sent. That way we
> >> can
> use
> >> the queue in the cache, memory controller etc, without having all
> >> the issues of the “naked” port interface, but still enforcing a
> >> bounded
> queue.
> >>
> >> When a packet arrives to the module we call reserve on the output port.
> >> Then when we actually add the packet we know that there is space.
> >> When request packets arrive we check if the queue is full, and if
> >> so we block any new requests. Then through the callback we can
> >> unblock the DRAM controller in this case.
> >>
> >> What do you think?
> >>
> >> Andreas
> >>
> >> From: Joel Hestness <***@gmail.com>
> >> Date: Tuesday, 2 February 2016 at 00:24
> >> To: Andreas Hansson <***@arm.com>
> >> Cc: gem5 Developer List <gem5-***@gem5.org>
> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> >>
> >> Hi Andreas,
> >> I'd like to circle back on the thread about removing the
> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> >> all of my simulations now crash on the DRAMCtrl's response queue.
> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> >> However, based on my inspection of the code, it looks pretty
> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> >> least help me work through the changes.
> >>
> >> To reproduce the issue, I've put together a slim gem5 patch
> >> (attached) to use the memtest.py script to generate accesses.
> >> Here's the command
> line
> >> I used:
> >>
> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> >> configs/example/memtest.py -u 100
> >>
> >> If you're still willing to take a stab at it, let me know if/how
> >> I can help. Otherwise, I'll start working on it. It seems the
> >> trickiest thing
> is
> >> going to be modeling the arbitrary frontendLatency and
> >> backendLatency
> while
> >> still counting all of the accesses that are in the controller when
> >> it
> needs
> >> to block back to the input queue. These latencies are currently
> >> assessed with scheduling in the port response queue. Any
> >> suggestions you could
> give
> >> would be appreciated.
> >>
> >> Thanks!
> >> Joel
> >>
> >>
> >> Below here is our conversation from the email thread "[gem5-dev]
> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> >>
> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> ***@arm.com
> >> > wrote:
> >>
> >>> Great. Thanks Joel.
> >>>
> >>> If anything pops up on our side I’ll let you know.
> >>>
> >>> Andreas
> >>>
> >>> From: Joel Hestness <***@gmail.com>
> >>> Date: Wednesday, 23 September 2015 20:29
> >>>
> >>> To: Andreas Hansson <***@arm.com>
> >>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>> RubyMemoryControl delete requests
> >>>
> >>>
> >>>
> >>>> I don’t think there is any big difference in our expectations,
> >>>> quite the contrary :-). GPUs are very important to us (and so is
> >>>> throughput computing in general), and we run plenty simulations
> >>>> with lots of memory-level parallelism from non-CPU components.
> >>>> Still, we haven’t
> run
> >>>> into the issue.
> >>>>
> >>>
> >>> Ok, cool. Thanks for the context.
> >>>
> >>>
> >>> If you have practical examples that run into problems let me know,
> >>> and
> >>>> we’ll get it fixed.
> >>>>
> >>>
> >>> I'm having trouble assembling a practical example (with or without
> using
> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> >>>
> >>> Thanks!
> >>> Joel
> >>>
> >>>
> >>>
> >>>> From: Joel Hestness <***@gmail.com>
> >>>> Date: Tuesday, 22 September 2015 19:58
> >>>>
> >>>> To: Andreas Hansson <***@arm.com>
> >>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>> RubyMemoryControl delete requests
> >>>>
> >>>> Hi Andreas,
> >>>>
> >>>>
> >>>>> If it is a real problem affecting end users I am indeed
> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> >>>>> classic memory
> system
> >>>>> there are enough points of regulation (LSQs, MSHR limits,
> >>>>> crossbar
> layers
> >>>>> etc) that having a single memory channel with >100 queued up
> responses
> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> >>>>> added complexity has not been needed. If there is regulation on
> >>>>> the number
> of
> >>>>> requests in Ruby, then I would argue that it is equally unlikely
> there…I
> >>>>> could be wrong.
> >>>>>
> >>>>
> >>>> Ok. I think a big part of the difference between our expectations
> >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
> aggressive
> >>>> GPU cores with potential to expose, perhaps, 4-32x more
> >>>> memory-level parallel requests than a comparable number of
> >>>> multithreaded CPU
> cores. I
> >>>> feel that this difference warrants different handling of accesses
> >>>> in
> the
> >>>> memory controller.
> >>>>
> >>>> Joel
> >>>>
> >>>>
> >>>>
> >>>> From: Joel Hestness <***@gmail.com>
> >>>>> Date: Tuesday, 22 September 2015 17:48
> >>>>>
> >>>>> To: Andreas Hansson <***@arm.com>
> >>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>> RubyMemoryControl delete requests
> >>>>>
> >>>>> Hi Andreas,
> >>>>>
> >>>>> Thanks for the "ship it!"
> >>>>>
> >>>>>
> >>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
> >>>>>> It will make the controller more complex, and I don’t want to
> >>>>>> do it
> “just
> >>>>>> in case”.
> >>>>>>
> >>>>>
> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> >>>>> I'm not sure who should make that change, but I think it should
> >>>>> get
> done. The
> >>>>> memory access response path starts at the DRAMCtrl and ends at
> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> >>>>> will
> probably
> >>>>> back-up more quickly on the response path back to where there
> >>>>> are
> open
> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> prevalent as
> >>>>> Ruby adds flow control, unless we add a limitation on
> >>>>> outstanding
> requests
> >>>>> to memory from directory controllers.
> >>>>>
> >>>>> How does the classic memory model deal with this?
> >>>>>
> >>>>> Joel
> >>>>>
> >>>>>
> >>>>>
> >>>>>> From: Joel Hestness <***@gmail.com>
> >>>>>> Date: Tuesday, 22 September 2015 17:30
> >>>>>> To: Andreas Hansson <***@arm.com>
> >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> >>>>>>
> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>>> RubyMemoryControl delete requests
> >>>>>>
> >>>>>> Hi guys,
> >>>>>> Thanks for the discussion here. I had quickly tested other
> >>>>>> memory controllers, but hadn't connected the dots that this
> >>>>>> might be the
> same
> >>>>>> problem Brad/AMD are running into.
> >>>>>>
> >>>>>> My preference would be that we remove the QueuedSlavePort
> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> >>>>>> a potential
> source of
> >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> closely
> >>>>>> focus on the RubyPort problem when we get to it.
> >>>>>>
> >>>>>> Can we reach resolution on this patch though? Are we okay
> >>>>>> with actually fixing the memory leak in mainline?
> >>>>>>
> >>>>>> Joel
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >>>>>> ***@arm.com> wrote:
> >>>>>>
> >>>>>>> Hi Brad,
> >>>>>>>
> >>>>>>> We can remove the use of QueuedSlavePort in the memory
> >>>>>>> controller
> and
> >>>>>>> simply not accept requests if the response queue is full. Is
> >>>>>>> this needed?
> >>>>>>> If so we’ll make sure someone gets this in place. The only
> >>>>>>> reason
> we
> >>>>>>> haven’t done it is because it hasn’t been needed.
> >>>>>>>
> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> >>>>>>> different story. I think most of these can be removed and
> >>>>>>> actually use flow control.
> I’m
> >>>>>>> happy to code it up, but there is such a flux at the moment
> >>>>>>> that I didn’t want to post yet another patch changing the Ruby
> >>>>>>> port. I really do think we should avoid having implicit
> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> >>>>>>> possible. If we really need a constructor parameter to make it
> >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
> >>>>>>>
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>>
> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> >From AMD's perspective, we have deprecated our usage of
> >>>>>>> RubyMemoryControl
> >>>>>>> >and we are using the new Memory Controllers with the port
> interface.
> >>>>>>> >
> >>>>>>> >That being said, I completely agree with Joel that the packet
> queue
> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> >>>>>>> >know, we
> >>>>>>> tried
> >>>>>>> >very hard several months ago to essentially make this a
> >>>>>>> >infinite
> >>>>>>> buffer,
> >>>>>>> >but Andreas would not allow us to check it in. We are going
> >>>>>>> >to
> >>>>>>> post that
> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> >>>>>>> >GPU
> >>>>>>> model
> >>>>>>> >will not work unless we increase that limit.
> >>>>>>> >
> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> >>>>>>> something is
> >>>>>>> >fundamentally broken. Please keep in mind that there are
> >>>>>>> >many
> uses
> >>>>>>> of
> >>>>>>> >gem5 beyond what you use it for. Also this is a research
> simulator
> >>>>>>> and
> >>>>>>> >we should not restrict ourselves to what we think is
> >>>>>>> >practical in
> >>>>>>> real
> >>>>>>> >hardware. Finally, the fact that the finite limit is
> >>>>>>> >invisible to
> >>>>>>> the
> >>>>>>> >producer is just bad software engineering.
> >>>>>>> >
> >>>>>>> >I beg you to please allow us to remove this finite invisible
> limit!
> >>>>>>> >
> >>>>>>> >Brad
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >-----Original Message-----
> >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf
> >>>>>>> >Of
> >>>>>>> Andreas
> >>>>>>> >Hansson
> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> RubyMemoryControl
> >>>>>>> >delete requests
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
> >>>>>>> >> > it
> >>>>>>> not been
> >>>>>>> >>deprecated long enough?
> >>>>>>> >>
> >>>>>>> >> Joel Hestness wrote:
> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> >>>>>>> >> don't
> see
> >>>>>>> other
> >>>>>>> >>memory controllers that are guaranteed to work. Besides
> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> >>>>>>> >>their
> input
> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> >>>>>>> PacketQueue
> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> >>>>>>> >>This
> >>>>>>> occurs
> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> >>>>>>> >>and
> >>>>>>> appears
> >>>>>>> >>difficult to fix in a systematic way.
> >>>>>>> >>
> >>>>>>> >> Regardless of the fact we've deprecated
> >>>>>>> >> RubyMemoryControl,
> >>>>>>> this is
> >>>>>>> >>a necessary fix.
> >>>>>>> >
> >>>>>>> >No memory controller is using QueuedSlaavePort for any
> >>>>>>> >_input_
> >>>>>>> queues.
> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> >>>>>>> >that's
> >>>>>>> all.
> >>>>>>> >If that is really an issue we can move away from it and
> >>>>>>> >enfore an
> >>>>>>> upper
> >>>>>>> >bound on responses by not accepting new requests. That said,
> >>>>>>> >if we
> >>>>>>> hit
> >>>>>>> >the limit I would argue something else is fundamentally
> >>>>>>> >broken in
> >>>>>>> the
> >>>>>>> >system and should be addressed.
> >>>>>>> >
> >>>>>>> >In any case, the discussion whether to remove
> >>>>>>> >RubyMemoryControl or
> >>>>>>> not
> >>>>>>> >should be completely decoupled.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >- Andreas
> >>>>>>>
> >>>>>>
> >>
> >> --
> >> Joel Hestness
> >> PhD Candidate, Computer Architecture
> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> http://pages.cs.wisc.edu/~hestness/
> >> IMPORTANT NOTICE: The contents of this email and any attachments
> >> are confidential and may also be privileged. If you are not the
> >> intended recipient, please notify the sender immediately and do not
> >> disclose the contents to any other person, use it for any purpose,
> >> or store or copy
> the
> >> information in any medium. Thank you.
> >>
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
Joel Hestness
2016-02-11 22:52:26 UTC
Permalink
Hi Matt,

In regards to the buffersFull() implementation, I can think of a
> pathological case where the back-end queue is full because the sender is
> not accepting responses (for whatever reason) but is still issuing
> requests. buffersFull() will return false in this case and allow the
> request to be enqueued and eventually scheduled, causing the back-end queue
> to grow larger than the response_buffer_size parameter.



Perhaps one way to better emulate exchanging tokens (credit) as Joe
> mentioned is to have buffersFull() "reserve" slots in the queues by making
> sure there is a slot in both the read queue (or write queue) and a
> corresponding slot available in the back-end queue. The reservation can be
> lifted once the response is sent on the port.
>

I'm not sure I understand the difference between this description and what
I've implemented, except that what I've implemented adds some extra
back-end queuing. The capacity of the back-end queue in my implementation
is equal to the sum of the read and write queue capacities (plus a little
extra: response_buffer_size). The reservation of a slot in this large
back-end queue is released when a response is sent through the port, as you
describe. To me, this seems exactly the way a token-like structure would
reserve back-end queue slots.

That said, I realize that by "token" structure, Joe and you might be
describing something more than what I've implemented. Namely, since tokens
are the credits that allow senders to push into a receiver's queues, they
might allow multiple directories/caches sending to a single DRAMCtrl, which
I don't believe is possible with my current implementation. I think we'd
need to allow the DRAMCtrl to receive requests and queue retries while
other requesters are blocked, and sending those retries would need fair
arbitration, which a token scheme might automatically handle. Can you
clarify if that's what you're referring to as a token scheme?


Another more aggressive implementation would be to not use buffersFull()
> and prevent scheduling memory requests from the read/write queue if the
> back-end queue is full. This would allow a sender to enqueue memory
> requests even if the back-end queue is full up until the read/write queue
> fills up, but would require a number of changes to the code.
>

Yes, I tried implementing this first, and it ends up being very difficult
due to the DRAMCtrl's calls to and implementation of accessAndRespond().
Basically, reads and writes require different processing latencies, so we
would need not only a back-end queue, but also separate read and write
delay queues to model the different DRAM access latencies. We'd also need a
well-performing way to arbitrate for slots in the back-end queue that
doesn't conflict with the batching efforts of the front-end. To me, all
this complexity seems maligned with Andreas et al.'s original aim with the
DRAMCtrl: fast and reasonably accurate simulation of a memory controller
<http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.


In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in
> the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite
> calls or are you placing a MessageBuffer in AbstractController? I am
> currently trying out an implementation using the former for a few
> additional reasons other than flow control.
>

If I understand what you're asking, I think I've also done the former,
though I've modified SLICC and the AbstractController to deal with parts of
the buffer management. I've merged my code with a recent gem5 revision
(11315:10647f5d0f7f) so I could post a draft review request. Here are the
patches (including links) to test all of this:

- http://reviews.gem5.org/r/3331/
- http://reviews.gem5.org/r/3332/
- http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queues
- http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand

More holistically, I feel that the best solution would be to hide the
memory request and response queues in an AbstractDirectoryController class
that inherits from AbstractController in C++, and from which all SLICC
directory controller machines descend. This structure would move all the
directory-specific code out of AbstractController and not model it in other
SLICC generated machines. This would also eliminate the need for assertions
that only directory controllers are calling the directory-specific
functions.


Joel


-----Original Message-----
> From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Joel
> Hestness
> Sent: Monday, February 08, 2016 12:16 PM
> To: Gross, Joe
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi guys,
> I just posted a draft of my DRAMCtrl flow-control patch so you can take
> a look here: http://reviews.gem5.org/r/3315/
>
> NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from
> directories to memory controllers into a MasterPort, and it places a
> MessageBuffer in front of the MasterPort, so that the user can make all
> buffering finite within a Ruby memory hierarchy. I still need to merge this
> patch with gem5, before I can share it. Let me know if you'd like to see
> the draft there also.
>
> @Joe:
>
>
> > I'd be curious to see a patch of what you're proposing as I'm not sure
> > I really follow what you're doing. The reason I ask is because I have
> > been discussing an implementation with with Brad and would like to see
> > how similar it is to what you have. Namely it's an idea similar to
> > what is commonly used in hardware, where senders have tokens that
> > correspond to slots in the receiver queue so the reservation happens
> > at startup. The only communication that goes from a receiving port
> > back to a sender is token return. The port and queue would still be
> > coupled and the device which owns the Queued*Port would manage removal
> > from the PacketQueue. In my experience, this is a very effective
> > mechanism for flow control and addresses your point about transparency
> of the queue and its state.
> > The tokens removes the need for unblock callbacks, but it's the
> > responsibility of the receiver not to send when the queue is full or
> > when it has a conflicting request. There's no implementation yet, but
> > the simplicity and similarity to hardware techniques may prove useful.
> > Anyway, could you post something so I can better understand what you've
> described?
>
>
> My implementation effectively does what you're describing: The DRAMCtrl
> now has a finite number of buffers (i.e. tokens), and it allocates a buffer
> slot when a request is received (senders spend a token when the DRAMCtrl
> accepts a request). The only real difference is that the DRAMCtrl now
> implements a SlavePort with flow control consistent with the rest of gem5,
> so if there are no buffer slots available, the request is nacked and a
> retry must be sent (i.e. a token is returned).
>
>
> Please don't get rid of the Queued*Ports, as I think there is a simple way
> > to improve them to do efficient flow control.
> >
>
> Heh... not sure I have the time/motivation to remove the Queued*Ports
> myself. I've just been swapping out the Queued*Ports that break when trying
> to implement finite buffering in a Ruby memory hierarchy. I'll leave
> Queued*Ports for later fixing or removal, as appropriate.
>
>
> Joel
>
>
> ________________________________________
> > From: gem5-dev <gem5-dev-***@gem5.org> on behalf of Joel Hestness
> > < ***@gmail.com>
> > Sent: Friday, February 5, 2016 12:03 PM
> > To: Andreas Hansson
> > Cc: gem5 Developer List
> > Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> > DRAMCtrl
> >
> > Hi guys,
> > Quick updates on this:
> > 1) I have a finite response buffer implementation working. I
> > removed the QueuedSlavePort and added a response queue with reservation
> (Andreas'
> > underlying suggestion). I have a question with this solution: The
> > QueuedSlavePort prioritized responses based their scheduled response
> time.
> > However, since writes have a shorter pipeline from request to
> > response, this architecture prioritized write requests ahead of read
> > requests received earlier, and it performs ~1-8% worse than a strict
> > queue (what I've implemented at this point). I can make the response
> > queue a priority queue if we want the same structure as previously,
> > but I'm wondering if we might prefer to just have the better-performing
> strict queue.
> >
> > 2) To reflect on Andreas' specific suggestion of using unblock
> > callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> > callbacks is ugly when trying to call the callback: The call needs to
> > originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> > doesn't have a pointer to the owner component; The SlavePort has the
> > pointer, so the PacketQueue would need to first callback to the port,
> > which would call the owner component callback.
> > The exercise getting this to work has solidified my opinion that the
> > Queued*Ports should probably be removed from the codebase: Queues and
> > ports are separate subcomponents of simulated components, and only the
> > component knows how they should interact. Including a Queued*Port
> > inside a component requires the component to manage the flow-control
> > into the Queued*Port just as it would need to manage a standard port
> > anyway, and hiding the queue in the port obfuscates how it is managed.
> >
> >
> > Thanks!
> > Joel
> >
> >
> > On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com>
> > wrote:
> >
> > > Hi Andreas,
> > > Thanks for the input. I had tried adding front- and back-end
> > > queues within the DRAMCtrl, but it became very difficult to
> > > propagate the flow control back through the component due to the
> > > complicated implementation
> > of
> > > timing across different accessAndRespond() calls. I had to put this
> > > solution on hold.
> > >
> > > I think your proposed solution should simplify the flow control
> > > issue, and should have the derivative effect of making the
> > > Queued*Ports capable
> > of
> > > flow control. I'm a little concerned that your solution would make
> > > the buffering very fluid, and I'm not sufficiently familiar with
> > > memory controller microarchitecture to know if that would be
> > > realistic. I wonder if you might have a way to do performance
> > > validation after I work through either of these implementations.
> > >
> > > Thanks!
> > > Joel
> > >
> > >
> > >
> > > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> > ***@arm.com>
> > > wrote:
> > >
> > >> Hi Joel,
> > >>
> > >> I would suggest o keep the queued ports, but add methods to reserve
> > >> resources, query if it has free space, and a way to register
> > >> callbacks
> > so
> > >> that the MemObject is made aware when packets are sent. That way we
> > >> can
> > use
> > >> the queue in the cache, memory controller etc, without having all
> > >> the issues of the “naked” port interface, but still enforcing a
> > >> bounded
> > queue.
> > >>
> > >> When a packet arrives to the module we call reserve on the output
> port.
> > >> Then when we actually add the packet we know that there is space.
> > >> When request packets arrive we check if the queue is full, and if
> > >> so we block any new requests. Then through the callback we can
> > >> unblock the DRAM controller in this case.
> > >>
> > >> What do you think?
> > >>
> > >> Andreas
> > >>
> > >> From: Joel Hestness <***@gmail.com>
> > >> Date: Tuesday, 2 February 2016 at 00:24
> > >> To: Andreas Hansson <***@arm.com>
> > >> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> > >>
> > >> Hi Andreas,
> > >> I'd like to circle back on the thread about removing the
> > >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> > >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> > >> all of my simulations now crash on the DRAMCtrl's response queue.
> > >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> > >> However, based on my inspection of the code, it looks pretty
> > >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> > >> least help me work through the changes.
> > >>
> > >> To reproduce the issue, I've put together a slim gem5 patch
> > >> (attached) to use the memtest.py script to generate accesses.
> > >> Here's the command
> > line
> > >> I used:
> > >>
> > >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> > >> configs/example/memtest.py -u 100
> > >>
> > >> If you're still willing to take a stab at it, let me know if/how
> > >> I can help. Otherwise, I'll start working on it. It seems the
> > >> trickiest thing
> > is
> > >> going to be modeling the arbitrary frontendLatency and
> > >> backendLatency
> > while
> > >> still counting all of the accesses that are in the controller when
> > >> it
> > needs
> > >> to block back to the input queue. These latencies are currently
> > >> assessed with scheduling in the port response queue. Any
> > >> suggestions you could
> > give
> > >> would be appreciated.
> > >>
> > >> Thanks!
> > >> Joel
> > >>
> > >>
> > >> Below here is our conversation from the email thread "[gem5-dev]
> > >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> > >>
> > >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> > ***@arm.com
> > >> > wrote:
> > >>
> > >>> Great. Thanks Joel.
> > >>>
> > >>> If anything pops up on our side I’ll let you know.
> > >>>
> > >>> Andreas
> > >>>
> > >>> From: Joel Hestness <***@gmail.com>
> > >>> Date: Wednesday, 23 September 2015 20:29
> > >>>
> > >>> To: Andreas Hansson <***@arm.com>
> > >>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>> RubyMemoryControl delete requests
> > >>>
> > >>>
> > >>>
> > >>>> I don’t think there is any big difference in our expectations,
> > >>>> quite the contrary :-). GPUs are very important to us (and so is
> > >>>> throughput computing in general), and we run plenty simulations
> > >>>> with lots of memory-level parallelism from non-CPU components.
> > >>>> Still, we haven’t
> > run
> > >>>> into the issue.
> > >>>>
> > >>>
> > >>> Ok, cool. Thanks for the context.
> > >>>
> > >>>
> > >>> If you have practical examples that run into problems let me know,
> > >>> and
> > >>>> we’ll get it fixed.
> > >>>>
> > >>>
> > >>> I'm having trouble assembling a practical example (with or without
> > using
> > >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> > >>>
> > >>> Thanks!
> > >>> Joel
> > >>>
> > >>>
> > >>>
> > >>>> From: Joel Hestness <***@gmail.com>
> > >>>> Date: Tuesday, 22 September 2015 19:58
> > >>>>
> > >>>> To: Andreas Hansson <***@arm.com>
> > >>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>> RubyMemoryControl delete requests
> > >>>>
> > >>>> Hi Andreas,
> > >>>>
> > >>>>
> > >>>>> If it is a real problem affecting end users I am indeed
> > >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> > >>>>> classic memory
> > system
> > >>>>> there are enough points of regulation (LSQs, MSHR limits,
> > >>>>> crossbar
> > layers
> > >>>>> etc) that having a single memory channel with >100 queued up
> > responses
> > >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> > >>>>> added complexity has not been needed. If there is regulation on
> > >>>>> the number
> > of
> > >>>>> requests in Ruby, then I would argue that it is equally unlikely
> > there…I
> > >>>>> could be wrong.
> > >>>>>
> > >>>>
> > >>>> Ok. I think a big part of the difference between our expectations
> > >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
> > aggressive
> > >>>> GPU cores with potential to expose, perhaps, 4-32x more
> > >>>> memory-level parallel requests than a comparable number of
> > >>>> multithreaded CPU
> > cores. I
> > >>>> feel that this difference warrants different handling of accesses
> > >>>> in
> > the
> > >>>> memory controller.
> > >>>>
> > >>>> Joel
> > >>>>
> > >>>>
> > >>>>
> > >>>> From: Joel Hestness <***@gmail.com>
> > >>>>> Date: Tuesday, 22 September 2015 17:48
> > >>>>>
> > >>>>> To: Andreas Hansson <***@arm.com>
> > >>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>>> RubyMemoryControl delete requests
> > >>>>>
> > >>>>> Hi Andreas,
> > >>>>>
> > >>>>> Thanks for the "ship it!"
> > >>>>>
> > >>>>>
> > >>>>>> Do we really need to remove the use of QueuedSlavePort in
> DRAMCtrl?
> > >>>>>> It will make the controller more complex, and I don’t want to
> > >>>>>> do it
> > “just
> > >>>>>> in case”.
> > >>>>>>
> > >>>>>
> > >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> > >>>>> I'm not sure who should make that change, but I think it should
> > >>>>> get
> > done. The
> > >>>>> memory access response path starts at the DRAMCtrl and ends at
> > >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> > >>>>> will
> > probably
> > >>>>> back-up more quickly on the response path back to where there
> > >>>>> are
> > open
> > >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> > prevalent as
> > >>>>> Ruby adds flow control, unless we add a limitation on
> > >>>>> outstanding
> > requests
> > >>>>> to memory from directory controllers.
> > >>>>>
> > >>>>> How does the classic memory model deal with this?
> > >>>>>
> > >>>>> Joel
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> From: Joel Hestness <***@gmail.com>
> > >>>>>> Date: Tuesday, 22 September 2015 17:30
> > >>>>>> To: Andreas Hansson <***@arm.com>
> > >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>>>>
> > >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>>>> RubyMemoryControl delete requests
> > >>>>>>
> > >>>>>> Hi guys,
> > >>>>>> Thanks for the discussion here. I had quickly tested other
> > >>>>>> memory controllers, but hadn't connected the dots that this
> > >>>>>> might be the
> > same
> > >>>>>> problem Brad/AMD are running into.
> > >>>>>>
> > >>>>>> My preference would be that we remove the QueuedSlavePort
> > >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> > >>>>>> a potential
> > source of
> > >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> > closely
> > >>>>>> focus on the RubyPort problem when we get to it.
> > >>>>>>
> > >>>>>> Can we reach resolution on this patch though? Are we okay
> > >>>>>> with actually fixing the memory leak in mainline?
> > >>>>>>
> > >>>>>> Joel
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> > >>>>>> ***@arm.com> wrote:
> > >>>>>>
> > >>>>>>> Hi Brad,
> > >>>>>>>
> > >>>>>>> We can remove the use of QueuedSlavePort in the memory
> > >>>>>>> controller
> > and
> > >>>>>>> simply not accept requests if the response queue is full. Is
> > >>>>>>> this needed?
> > >>>>>>> If so we’ll make sure someone gets this in place. The only
> > >>>>>>> reason
> > we
> > >>>>>>> haven’t done it is because it hasn’t been needed.
> > >>>>>>>
> > >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> > >>>>>>> different story. I think most of these can be removed and
> > >>>>>>> actually use flow control.
> > I’m
> > >>>>>>> happy to code it up, but there is such a flux at the moment
> > >>>>>>> that I didn’t want to post yet another patch changing the Ruby
> > >>>>>>> port. I really do think we should avoid having implicit
> > >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> > >>>>>>> possible. If we really need a constructor parameter to make it
> > >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
> > >>>>>>>
> > >>>>>>> Andreas
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> > >>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> >From AMD's perspective, we have deprecated our usage of
> > >>>>>>> RubyMemoryControl
> > >>>>>>> >and we are using the new Memory Controllers with the port
> > interface.
> > >>>>>>> >
> > >>>>>>> >That being said, I completely agree with Joel that the packet
> > queue
> > >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> > >>>>>>> >know, we
> > >>>>>>> tried
> > >>>>>>> >very hard several months ago to essentially make this a
> > >>>>>>> >infinite
> > >>>>>>> buffer,
> > >>>>>>> >but Andreas would not allow us to check it in. We are going
> > >>>>>>> >to
> > >>>>>>> post that
> > >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> > >>>>>>> >GPU
> > >>>>>>> model
> > >>>>>>> >will not work unless we increase that limit.
> > >>>>>>> >
> > >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> > >>>>>>> something is
> > >>>>>>> >fundamentally broken. Please keep in mind that there are
> > >>>>>>> >many
> > uses
> > >>>>>>> of
> > >>>>>>> >gem5 beyond what you use it for. Also this is a research
> > simulator
> > >>>>>>> and
> > >>>>>>> >we should not restrict ourselves to what we think is
> > >>>>>>> >practical in
> > >>>>>>> real
> > >>>>>>> >hardware. Finally, the fact that the finite limit is
> > >>>>>>> >invisible to
> > >>>>>>> the
> > >>>>>>> >producer is just bad software engineering.
> > >>>>>>> >
> > >>>>>>> >I beg you to please allow us to remove this finite invisible
> > limit!
> > >>>>>>> >
> > >>>>>>> >Brad
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >-----Original Message-----
> > >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf
> > >>>>>>> >Of
> > >>>>>>> Andreas
> > >>>>>>> >Hansson
> > >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> > >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> > >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > RubyMemoryControl
> > >>>>>>> >delete requests
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> > >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
> > >>>>>>> >> > it
> > >>>>>>> not been
> > >>>>>>> >>deprecated long enough?
> > >>>>>>> >>
> > >>>>>>> >> Joel Hestness wrote:
> > >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> > >>>>>>> >> don't
> > see
> > >>>>>>> other
> > >>>>>>> >>memory controllers that are guaranteed to work. Besides
> > >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> > >>>>>>> >>their
> > input
> > >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> > >>>>>>> PacketQueue
> > >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> > >>>>>>> >>This
> > >>>>>>> occurs
> > >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> > >>>>>>> >>and
> > >>>>>>> appears
> > >>>>>>> >>difficult to fix in a systematic way.
> > >>>>>>> >>
> > >>>>>>> >> Regardless of the fact we've deprecated
> > >>>>>>> >> RubyMemoryControl,
> > >>>>>>> this is
> > >>>>>>> >>a necessary fix.
> > >>>>>>> >
> > >>>>>>> >No memory controller is using QueuedSlaavePort for any
> > >>>>>>> >_input_
> > >>>>>>> queues.
> > >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> > >>>>>>> >that's
> > >>>>>>> all.
> > >>>>>>> >If that is really an issue we can move away from it and
> > >>>>>>> >enfore an
> > >>>>>>> upper
> > >>>>>>> >bound on responses by not accepting new requests. That said,
> > >>>>>>> >if we
> > >>>>>>> hit
> > >>>>>>> >the limit I would argue something else is fundamentally
> > >>>>>>> >broken in
> > >>>>>>> the
> > >>>>>>> >system and should be addressed.
> > >>>>>>> >
> > >>>>>>> >In any case, the discussion whether to remove
> > >>>>>>> >RubyMemoryControl or
> > >>>>>>> not
> > >>>>>>> >should be completely decoupled.
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >- Andreas
> > >>>>>>>
> > >>>>>>
> > >>
> > >> --
> > >> Joel Hestness
> > >> PhD Candidate, Computer Architecture
> > >> Dept. of Computer Science, University of Wisconsin - Madison
> > >> http://pages.cs.wisc.edu/~hestness/
> > >> IMPORTANT NOTICE: The contents of this email and any attachments
> > >> are confidential and may also be privileged. If you are not the
> > >> intended recipient, please notify the sender immediately and do not
> > >> disclose the contents to any other person, use it for any purpose,
> > >> or store or copy
> > the
> > >> information in any medium. Thank you.
> > >>
> > >
> > >
> > >
> > > --
> > > Joel Hestness
> > > PhD Candidate, Computer Architecture
> > > Dept. of Computer Science, University of Wisconsin - Madison
> > > http://pages.cs.wisc.edu/~hestness/
> > >
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> > _______________________________________________
> > gem5-dev mailing list
> > gem5-***@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Poremba, Matthew
2016-02-12 22:55:14 UTC
Permalink
Hi Joel,

‘I'm not sure I understand the difference between this description and what I've implemented, except that what I've implemented adds some extra back-end queuing. The capacity of the back-end queue in my implementation is equal to the sum of the read and write queue capacities (plus a little extra: response_buffer_size). The reservation of a slot in this large back-end queue is released when a response is sent through the port, as you describe. To me, this seems exactly the way a token-like structure would reserve back-end queue slots.’

I see... I mistakenly thought the response_buffer_size corresponded to the maximum number of possible queued responses from memory. If the backend queue is the sum of all other queues, then the pathological situation I described should never happen.

‘That said, I realize that by "token" structure, Joe and you might be describing something more than what I've implemented. Namely, since tokens are the credits that allow senders to push into a receiver's queues, they might allow multiple directories/caches sending to a single DRAMCtrl, which I don't believe is possible with my current implementation. I think we'd need to allow the DRAMCtrl to receive requests and queue retries while other requesters are blocked, and sending those retries would need fair arbitration, which a token scheme might automatically handle. Can you clarify if that's what you're referring to as a token scheme?’

A token scheme would not use a retry/unblock mechanisms at all. The number of tokens available is sent to each producer from a consumer when the ports are connected/start of simulation. In this regard, the producers know how many requests can be sent and stop sending once the tokens are exhausted. The consumer will return tokens once a request is handled. This removes the need for retries and unblock calls, reduces overall complexity, and is closer to hardware implementations imo. The token scheme would indeed automatically handle the situation where multiple producers are blocked and can also be hidden away in the port without needing to add a retry queue to consumers, which I don’t believe is a great idea.

In any case, I’m hoping to have a patch/draft for this soon.

-Matt


From: Joel Hestness [mailto:***@gmail.com]
Sent: Thursday, February 11, 2016 2:52 PM
To: gem5 Developer List
Cc: Gross, Joe; Poremba, Matthew
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Matt,

In regards to the buffersFull() implementation, I can think of a pathological case where the back-end queue is full because the sender is not accepting responses (for whatever reason) but is still issuing requests. buffersFull() will return false in this case and allow the request to be enqueued and eventually scheduled, causing the back-end queue to grow larger than the response_buffer_size parameter.

Perhaps one way to better emulate exchanging tokens (credit) as Joe mentioned is to have buffersFull() "reserve" slots in the queues by making sure there is a slot in both the read queue (or write queue) and a corresponding slot available in the back-end queue. The reservation can be lifted once the response is sent on the port.

I'm not sure I understand the difference between this description and what I've implemented, except that what I've implemented adds some extra back-end queuing. The capacity of the back-end queue in my implementation is equal to the sum of the read and write queue capacities (plus a little extra: response_buffer_size). The reservation of a slot in this large back-end queue is released when a response is sent through the port, as you describe. To me, this seems exactly the way a token-like structure would reserve back-end queue slots.

That said, I realize that by "token" structure, Joe and you might be describing something more than what I've implemented. Namely, since tokens are the credits that allow senders to push into a receiver's queues, they might allow multiple directories/caches sending to a single DRAMCtrl, which I don't believe is possible with my current implementation. I think we'd need to allow the DRAMCtrl to receive requests and queue retries while other requesters are blocked, and sending those retries would need fair arbitration, which a token scheme might automatically handle. Can you clarify if that's what you're referring to as a token scheme?


Another more aggressive implementation would be to not use buffersFull() and prevent scheduling memory requests from the read/write queue if the back-end queue is full. This would allow a sender to enqueue memory requests even if the back-end queue is full up until the read/write queue fills up, but would require a number of changes to the code.

Yes, I tried implementing this first, and it ends up being very difficult due to the DRAMCtrl's calls to and implementation of accessAndRespond(). Basically, reads and writes require different processing latencies, so we would need not only a back-end queue, but also separate read and write delay queues to model the different DRAM access latencies. We'd also need a well-performing way to arbitrate for slots in the back-end queue that doesn't conflict with the batching efforts of the front-end. To me, all this complexity seems maligned with Andreas et al.'s original aim with the DRAMCtrl: fast and reasonably accurate simulation of a memory controller<http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.


In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite calls or are you placing a MessageBuffer in AbstractController? I am currently trying out an implementation using the former for a few additional reasons other than flow control.

If I understand what you're asking, I think I've also done the former, though I've modified SLICC and the AbstractController to deal with parts of the buffer management. I've merged my code with a recent gem5 revision (11315:10647f5d0f7f) so I could post a draft review request. Here are the patches (including links) to test all of this:

- http://reviews.gem5.org/r/3331/
- http://reviews.gem5.org/r/3332/
- http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queues
- http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand

More holistically, I feel that the best solution would be to hide the memory request and response queues in an AbstractDirectoryController class that inherits from AbstractController in C++, and from which all SLICC directory controller machines descend. This structure would move all the directory-specific code out of AbstractController and not model it in other SLICC generated machines. This would also eliminate the need for assertions that only directory controllers are calling the directory-specific functions.


Joel


-----Original Message-----
From: gem5-dev [mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On Behalf Of Joel Hestness
Sent: Monday, February 08, 2016 12:16 PM
To: Gross, Joe
Cc: gem5 Developer List
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi guys,
I just posted a draft of my DRAMCtrl flow-control patch so you can take a look here: http://reviews.gem5.org/r/3315/

NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from directories to memory controllers into a MasterPort, and it places a MessageBuffer in front of the MasterPort, so that the user can make all buffering finite within a Ruby memory hierarchy. I still need to merge this patch with gem5, before I can share it. Let me know if you'd like to see the draft there also.

@Joe:


> I'd be curious to see a patch of what you're proposing as I'm not sure
> I really follow what you're doing. The reason I ask is because I have
> been discussing an implementation with with Brad and would like to see
> how similar it is to what you have. Namely it's an idea similar to
> what is commonly used in hardware, where senders have tokens that
> correspond to slots in the receiver queue so the reservation happens
> at startup. The only communication that goes from a receiving port
> back to a sender is token return. The port and queue would still be
> coupled and the device which owns the Queued*Port would manage removal
> from the PacketQueue. In my experience, this is a very effective
> mechanism for flow control and addresses your point about transparency of the queue and its state.
> The tokens removes the need for unblock callbacks, but it's the
> responsibility of the receiver not to send when the queue is full or
> when it has a conflicting request. There's no implementation yet, but
> the simplicity and similarity to hardware techniques may prove useful.
> Anyway, could you post something so I can better understand what you've described?


My implementation effectively does what you're describing: The DRAMCtrl now has a finite number of buffers (i.e. tokens), and it allocates a buffer slot when a request is received (senders spend a token when the DRAMCtrl accepts a request). The only real difference is that the DRAMCtrl now implements a SlavePort with flow control consistent with the rest of gem5, so if there are no buffer slots available, the request is nacked and a retry must be sent (i.e. a token is returned).


Please don't get rid of the Queued*Ports, as I think there is a simple way
> to improve them to do efficient flow control.
>

Heh... not sure I have the time/motivation to remove the Queued*Ports myself. I've just been swapping out the Queued*Ports that break when trying to implement finite buffering in a Ruby memory hierarchy. I'll leave Queued*Ports for later fixing or removal, as appropriate.


Joel


________________________________________
> From: gem5-dev <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>> on behalf of Joel Hestness
> < ***@gmail.com<mailto:***@gmail.com>>
> Sent: Friday, February 5, 2016 12:03 PM
> To: Andreas Hansson
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> DRAMCtrl
>
> Hi guys,
> Quick updates on this:
> 1) I have a finite response buffer implementation working. I
> removed the QueuedSlavePort and added a response queue with reservation (Andreas'
> underlying suggestion). I have a question with this solution: The
> QueuedSlavePort prioritized responses based their scheduled response time.
> However, since writes have a shorter pipeline from request to
> response, this architecture prioritized write requests ahead of read
> requests received earlier, and it performs ~1-8% worse than a strict
> queue (what I've implemented at this point). I can make the response
> queue a priority queue if we want the same structure as previously,
> but I'm wondering if we might prefer to just have the better-performing strict queue.
>
> 2) To reflect on Andreas' specific suggestion of using unblock
> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> callbacks is ugly when trying to call the callback: The call needs to
> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> doesn't have a pointer to the owner component; The SlavePort has the
> pointer, so the PacketQueue would need to first callback to the port,
> which would call the owner component callback.
> The exercise getting this to work has solidified my opinion that the
> Queued*Ports should probably be removed from the codebase: Queues and
> ports are separate subcomponents of simulated components, and only the
> component knows how they should interact. Including a Queued*Port
> inside a component requires the component to manage the flow-control
> into the Queued*Port just as it would need to manage a standard port
> anyway, and hiding the queue in the port obfuscates how it is managed.
>
>
> Thanks!
> Joel
>
>
> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> wrote:
>
> > Hi Andreas,
> > Thanks for the input. I had tried adding front- and back-end
> > queues within the DRAMCtrl, but it became very difficult to
> > propagate the flow control back through the component due to the
> > complicated implementation
> of
> > timing across different accessAndRespond() calls. I had to put this
> > solution on hold.
> >
> > I think your proposed solution should simplify the flow control
> > issue, and should have the derivative effect of making the
> > Queued*Ports capable
> of
> > flow control. I'm a little concerned that your solution would make
> > the buffering very fluid, and I'm not sufficiently familiar with
> > memory controller microarchitecture to know if that would be
> > realistic. I wonder if you might have a way to do performance
> > validation after I work through either of these implementations.
> >
> > Thanks!
> > Joel
> >
> >
> >
> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> ***@arm.com<mailto:***@arm.com>>
> > wrote:
> >
> >> Hi Joel,
> >>
> >> I would suggest o keep the queued ports, but add methods to reserve
> >> resources, query if it has free space, and a way to register
> >> callbacks
> so
> >> that the MemObject is made aware when packets are sent. That way we
> >> can
> use
> >> the queue in the cache, memory controller etc, without having all
> >> the issues of the “naked” port interface, but still enforcing a
> >> bounded
> queue.
> >>
> >> When a packet arrives to the module we call reserve on the output port.
> >> Then when we actually add the packet we know that there is space.
> >> When request packets arrive we check if the queue is full, and if
> >> so we block any new requests. Then through the callback we can
> >> unblock the DRAM controller in this case.
> >>
> >> What do you think?
> >>
> >> Andreas
> >>
> >> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >> Date: Tuesday, 2 February 2016 at 00:24
> >> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> >>
> >> Hi Andreas,
> >> I'd like to circle back on the thread about removing the
> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> >> all of my simulations now crash on the DRAMCtrl's response queue.
> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> >> However, based on my inspection of the code, it looks pretty
> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> >> least help me work through the changes.
> >>
> >> To reproduce the issue, I've put together a slim gem5 patch
> >> (attached) to use the memtest.py script to generate accesses.
> >> Here's the command
> line
> >> I used:
> >>
> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> >> configs/example/memtest.py -u 100
> >>
> >> If you're still willing to take a stab at it, let me know if/how
> >> I can help. Otherwise, I'll start working on it. It seems the
> >> trickiest thing
> is
> >> going to be modeling the arbitrary frontendLatency and
> >> backendLatency
> while
> >> still counting all of the accesses that are in the controller when
> >> it
> needs
> >> to block back to the input queue. These latencies are currently
> >> assessed with scheduling in the port response queue. Any
> >> suggestions you could
> give
> >> would be appreciated.
> >>
> >> Thanks!
> >> Joel
> >>
> >>
> >> Below here is our conversation from the email thread "[gem5-dev]
> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> >>
> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> ***@arm.com<mailto:***@arm.com>
> >> > wrote:
> >>
> >>> Great. Thanks Joel.
> >>>
> >>> If anything pops up on our side I’ll let you know.
> >>>
> >>> Andreas
> >>>
> >>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>> Date: Wednesday, 23 September 2015 20:29
> >>>
> >>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>> RubyMemoryControl delete requests
> >>>
> >>>
> >>>
> >>>> I don’t think there is any big difference in our expectations,
> >>>> quite the contrary :-). GPUs are very important to us (and so is
> >>>> throughput computing in general), and we run plenty simulations
> >>>> with lots of memory-level parallelism from non-CPU components.
> >>>> Still, we haven’t
> run
> >>>> into the issue.
> >>>>
> >>>
> >>> Ok, cool. Thanks for the context.
> >>>
> >>>
> >>> If you have practical examples that run into problems let me know,
> >>> and
> >>>> we’ll get it fixed.
> >>>>
> >>>
> >>> I'm having trouble assembling a practical example (with or without
> using
> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> >>>
> >>> Thanks!
> >>> Joel
> >>>
> >>>
> >>>
> >>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>> Date: Tuesday, 22 September 2015 19:58
> >>>>
> >>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>> RubyMemoryControl delete requests
> >>>>
> >>>> Hi Andreas,
> >>>>
> >>>>
> >>>>> If it is a real problem affecting end users I am indeed
> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> >>>>> classic memory
> system
> >>>>> there are enough points of regulation (LSQs, MSHR limits,
> >>>>> crossbar
> layers
> >>>>> etc) that having a single memory channel with >100 queued up
> responses
> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> >>>>> added complexity has not been needed. If there is regulation on
> >>>>> the number
> of
> >>>>> requests in Ruby, then I would argue that it is equally unlikely
> there…I
> >>>>> could be wrong.
> >>>>>
> >>>>
> >>>> Ok. I think a big part of the difference between our expectations
> >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
> aggressive
> >>>> GPU cores with potential to expose, perhaps, 4-32x more
> >>>> memory-level parallel requests than a comparable number of
> >>>> multithreaded CPU
> cores. I
> >>>> feel that this difference warrants different handling of accesses
> >>>> in
> the
> >>>> memory controller.
> >>>>
> >>>> Joel
> >>>>
> >>>>
> >>>>
> >>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>>> Date: Tuesday, 22 September 2015 17:48
> >>>>>
> >>>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>> RubyMemoryControl delete requests
> >>>>>
> >>>>> Hi Andreas,
> >>>>>
> >>>>> Thanks for the "ship it!"
> >>>>>
> >>>>>
> >>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
> >>>>>> It will make the controller more complex, and I don’t want to
> >>>>>> do it
> “just
> >>>>>> in case”.
> >>>>>>
> >>>>>
> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> >>>>> I'm not sure who should make that change, but I think it should
> >>>>> get
> done. The
> >>>>> memory access response path starts at the DRAMCtrl and ends at
> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> >>>>> will
> probably
> >>>>> back-up more quickly on the response path back to where there
> >>>>> are
> open
> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> prevalent as
> >>>>> Ruby adds flow control, unless we add a limitation on
> >>>>> outstanding
> requests
> >>>>> to memory from directory controllers.
> >>>>>
> >>>>> How does the classic memory model deal with this?
> >>>>>
> >>>>> Joel
> >>>>>
> >>>>>
> >>>>>
> >>>>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>>>> Date: Tuesday, 22 September 2015 17:30
> >>>>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>>>>
> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>>> RubyMemoryControl delete requests
> >>>>>>
> >>>>>> Hi guys,
> >>>>>> Thanks for the discussion here. I had quickly tested other
> >>>>>> memory controllers, but hadn't connected the dots that this
> >>>>>> might be the
> same
> >>>>>> problem Brad/AMD are running into.
> >>>>>>
> >>>>>> My preference would be that we remove the QueuedSlavePort
> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> >>>>>> a potential
> source of
> >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> closely
> >>>>>> focus on the RubyPort problem when we get to it.
> >>>>>>
> >>>>>> Can we reach resolution on this patch though? Are we okay
> >>>>>> with actually fixing the memory leak in mainline?
> >>>>>>
> >>>>>> Joel
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >>>>>> ***@arm.com<mailto:***@arm.com>> wrote:
> >>>>>>
> >>>>>>> Hi Brad,
> >>>>>>>
> >>>>>>> We can remove the use of QueuedSlavePort in the memory
> >>>>>>> controller
> and
> >>>>>>> simply not accept requests if the response queue is full. Is
> >>>>>>> this needed?
> >>>>>>> If so we’ll make sure someone gets this in place. The only
> >>>>>>> reason
> we
> >>>>>>> haven’t done it is because it hasn’t been needed.
> >>>>>>>
> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> >>>>>>> different story. I think most of these can be removed and
> >>>>>>> actually use flow control.
> I’m
> >>>>>>> happy to code it up, but there is such a flux at the moment
> >>>>>>> that I didn’t want to post yet another patch changing the Ruby
> >>>>>>> port. I really do think we should avoid having implicit
> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> >>>>>>> possible. If we really need a constructor parameter to make it
> >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
> >>>>>>>
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>>
> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >>>>>>> <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org> on behalf of ***@amd.com<mailto:***@amd.com>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> >From AMD's perspective, we have deprecated our usage of
> >>>>>>> RubyMemoryControl
> >>>>>>> >and we are using the new Memory Controllers with the port
> interface.
> >>>>>>> >
> >>>>>>> >That being said, I completely agree with Joel that the packet
> queue
> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> >>>>>>> >know, we
> >>>>>>> tried
> >>>>>>> >very hard several months ago to essentially make this a
> >>>>>>> >infinite
> >>>>>>> buffer,
> >>>>>>> >but Andreas would not allow us to check it in. We are going
> >>>>>>> >to
> >>>>>>> post that
> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> >>>>>>> >GPU
> >>>>>>> model
> >>>>>>> >will not work unless we increase that limit.
> >>>>>>> >
> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> >>>>>>> something is
> >>>>>>> >fundamentally broken. Please keep in mind that there are
> >>>>>>> >many
> uses
> >>>>>>> of
> >>>>>>> >gem5 beyond what you use it for. Also this is a research
> simulator
> >>>>>>> and
> >>>>>>> >we should not restrict ourselves to what we think is
> >>>>>>> >practical in
> >>>>>>> real
> >>>>>>> >hardware. Finally, the fact that the finite limit is
> >>>>>>> >invisible to
> >>>>>>> the
> >>>>>>> >producer is just bad software engineering.
> >>>>>>> >
> >>>>>>> >I beg you to please allow us to remove this finite invisible
> limit!
> >>>>>>> >
> >>>>>>> >Brad
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >-----Original Message-----
> >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On Behalf
> >>>>>>> >Of
> >>>>>>> Andreas
> >>>>>>> >Hansson
> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> RubyMemoryControl
> >>>>>>> >delete requests
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
> >>>>>>> >> > it
> >>>>>>> not been
> >>>>>>> >>deprecated long enough?
> >>>>>>> >>
> >>>>>>> >> Joel Hestness wrote:
> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> >>>>>>> >> don't
> see
> >>>>>>> other
> >>>>>>> >>memory controllers that are guaranteed to work. Besides
> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> >>>>>>> >>their
> input
> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> >>>>>>> PacketQueue
> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> >>>>>>> >>This
> >>>>>>> occurs
> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> >>>>>>> >>and
> >>>>>>> appears
> >>>>>>> >>difficult to fix in a systematic way.
> >>>>>>> >>
> >>>>>>> >> Regardless of the fact we've deprecated
> >>>>>>> >> RubyMemoryControl,
> >>>>>>> this is
> >>>>>>> >>a necessary fix.
> >>>>>>> >
> >>>>>>> >No memory controller is using QueuedSlaavePort for any
> >>>>>>> >_input_
> >>>>>>> queues.
> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> >>>>>>> >that's
> >>>>>>> all.
> >>>>>>> >If that is really an issue we can move away from it and
> >>>>>>> >enfore an
> >>>>>>> upper
> >>>>>>> >bound on responses by not accepting new requests. That said,
> >>>>>>> >if we
> >>>>>>> hit
> >>>>>>> >the limit I would argue something else is fundamentally
> >>>>>>> >broken in
> >>>>>>> the
> >>>>>>> >system and should be addressed.
> >>>>>>> >
> >>>>>>> >In any case, the discussion whether to remove
> >>>>>>> >RubyMemoryControl or
> >>>>>>> not
> >>>>>>> >should be completely decoupled.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >- Andreas
> >>>>>>>
> >>>>>>
> >>
> >> --
> >> Joel Hestness
> >> PhD Candidate, Computer Architecture
> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> http://pages.cs.wisc.edu/~hestness/
> >> IMPORTANT NOTICE: The contents of this email and any attachments
> >> are confidential and may also be privileged. If you are not the
> >> intended recipient, please notify the sender immediately and do not
> >> disclose the contents to any other person, use it for any purpose,
> >> or store or copy
> the
> >> information in any medium. Thank you.
> >>
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org<mailto:gem5-***@gem5.org>
> http://m5sim.org/mailman/listinfo/gem5-dev
>


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/mailman/listinfo/gem5-dev


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Joel Hestness
2016-02-13 17:06:35 UTC
Permalink
Hi Matt,

‘That said, I realize that by "token" structure, Joe and you might be
> describing something more than what I've implemented. Namely, since tokens
> are the credits that allow senders to push into a receiver's queues, they
> might allow multiple directories/caches sending to a single DRAMCtrl, which
> I don't believe is possible with my current implementation. I think we'd
> need to allow the DRAMCtrl to receive requests and queue retries while
> other requesters are blocked, and sending those retries would need fair
> arbitration, which a token scheme might automatically handle. Can you
> clarify if that's what you're referring to as a token scheme?’
>
>
>
> A token scheme would not use a retry/unblock mechanisms at all. The number
> of tokens available is sent to each producer from a consumer when the ports
> are connected/start of simulation. In this regard, the producers know how
> many requests can be sent and stop sending once the tokens are exhausted.
> The consumer will return tokens once a request is handled. This removes the
> need for retries and unblock calls, reduces overall complexity, and is
> closer to hardware implementations imo. The token scheme would indeed
> automatically handle the situation where multiple producers are blocked and
> can also be hidden away in the port without needing to add a retry queue to
> consumers, which I don’t believe is a great idea.
>
>
>
Ok. Yes, that makes sense. I look forward to seeing your changes.


Thanks!
Joel



> *From:* Joel Hestness [mailto:***@gmail.com]
> *Sent:* Thursday, February 11, 2016 2:52 PM
> *To:* gem5 Developer List
> *Cc:* Gross, Joe; Poremba, Matthew
>
> *Subject:* Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> DRAMCtrl
>
>
>
> Hi Matt,
>
>
>
> In regards to the buffersFull() implementation, I can think of a
> pathological case where the back-end queue is full because the sender is
> not accepting responses (for whatever reason) but is still issuing
> requests. buffersFull() will return false in this case and allow the
> request to be enqueued and eventually scheduled, causing the back-end queue
> to grow larger than the response_buffer_size parameter.
>
>
>
> Perhaps one way to better emulate exchanging tokens (credit) as Joe
> mentioned is to have buffersFull() "reserve" slots in the queues by making
> sure there is a slot in both the read queue (or write queue) and a
> corresponding slot available in the back-end queue. The reservation can be
> lifted once the response is sent on the port.
>
>
>
> I'm not sure I understand the difference between this description and what
> I've implemented, except that what I've implemented adds some extra
> back-end queuing. The capacity of the back-end queue in my implementation
> is equal to the sum of the read and write queue capacities (plus a little
> extra: response_buffer_size). The reservation of a slot in this large
> back-end queue is released when a response is sent through the port, as you
> describe. To me, this seems exactly the way a token-like structure would
> reserve back-end queue slots.
>
>
>
> That said, I realize that by "token" structure, Joe and you might be
> describing something more than what I've implemented. Namely, since tokens
> are the credits that allow senders to push into a receiver's queues, they
> might allow multiple directories/caches sending to a single DRAMCtrl, which
> I don't believe is possible with my current implementation. I think we'd
> need to allow the DRAMCtrl to receive requests and queue retries while
> other requesters are blocked, and sending those retries would need fair
> arbitration, which a token scheme might automatically handle. Can you
> clarify if that's what you're referring to as a token scheme?
>
>
>
>
>
> Another more aggressive implementation would be to not use buffersFull()
> and prevent scheduling memory requests from the read/write queue if the
> back-end queue is full. This would allow a sender to enqueue memory
> requests even if the back-end queue is full up until the read/write queue
> fills up, but would require a number of changes to the code.
>
>
>
> Yes, I tried implementing this first, and it ends up being very difficult
> due to the DRAMCtrl's calls to and implementation of accessAndRespond().
> Basically, reads and writes require different processing latencies, so we
> would need not only a back-end queue, but also separate read and write
> delay queues to model the different DRAM access latencies. We'd also need a
> well-performing way to arbitrate for slots in the back-end queue that
> doesn't conflict with the batching efforts of the front-end. To me, all
> this complexity seems maligned with Andreas et al.'s original aim with
> the DRAMCtrl: fast and reasonably accurate simulation of a memory controller
> <http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.
>
>
>
>
>
> In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in
> the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite
> calls or are you placing a MessageBuffer in AbstractController? I am
> currently trying out an implementation using the former for a few
> additional reasons other than flow control.
>
>
>
> If I understand what you're asking, I think I've also done the former,
> though I've modified SLICC and the AbstractController to deal with parts of
> the buffer management. I've merged my code with a recent gem5 revision
> (11315:10647f5d0f7f) so I could post a draft review request. Here are the
> patches (including links) to test all of this:
>
>
>
> - http://reviews.gem5.org/r/3331/
>
> - http://reviews.gem5.org/r/3332/
>
> -
> http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queues
>
> - http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand
>
>
>
> More holistically, I feel that the best solution would be to hide the
> memory request and response queues in an AbstractDirectoryController class
> that inherits from AbstractController in C++, and from which all SLICC
> directory controller machines descend. This structure would move all the
> directory-specific code out of AbstractController and not model it in other
> SLICC generated machines. This would also eliminate the need for assertions
> that only directory controllers are calling the directory-specific
> functions.
>
>
>
>
>
> Joel
>
>
>
>
>
> -----Original Message-----
> From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Joel
> Hestness
>
> Sent: Monday, February 08, 2016 12:16 PM
> To: Gross, Joe
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi guys,
> I just posted a draft of my DRAMCtrl flow-control patch so you can take
> a look here: http://reviews.gem5.org/r/3315/
>
> NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from
> directories to memory controllers into a MasterPort, and it places a
> MessageBuffer in front of the MasterPort, so that the user can make all
> buffering finite within a Ruby memory hierarchy. I still need to merge this
> patch with gem5, before I can share it. Let me know if you'd like to see
> the draft there also.
>
> @Joe:
>
>
> > I'd be curious to see a patch of what you're proposing as I'm not sure
> > I really follow what you're doing. The reason I ask is because I have
> > been discussing an implementation with with Brad and would like to see
> > how similar it is to what you have. Namely it's an idea similar to
> > what is commonly used in hardware, where senders have tokens that
> > correspond to slots in the receiver queue so the reservation happens
> > at startup. The only communication that goes from a receiving port
> > back to a sender is token return. The port and queue would still be
> > coupled and the device which owns the Queued*Port would manage removal
> > from the PacketQueue. In my experience, this is a very effective
> > mechanism for flow control and addresses your point about transparency
> of the queue and its state.
> > The tokens removes the need for unblock callbacks, but it's the
> > responsibility of the receiver not to send when the queue is full or
> > when it has a conflicting request. There's no implementation yet, but
> > the simplicity and similarity to hardware techniques may prove useful.
> > Anyway, could you post something so I can better understand what you've
> described?
>
>
> My implementation effectively does what you're describing: The DRAMCtrl
> now has a finite number of buffers (i.e. tokens), and it allocates a buffer
> slot when a request is received (senders spend a token when the DRAMCtrl
> accepts a request). The only real difference is that the DRAMCtrl now
> implements a SlavePort with flow control consistent with the rest of gem5,
> so if there are no buffer slots available, the request is nacked and a
> retry must be sent (i.e. a token is returned).
>
>
> Please don't get rid of the Queued*Ports, as I think there is a simple way
> > to improve them to do efficient flow control.
> >
>
> Heh... not sure I have the time/motivation to remove the Queued*Ports
> myself. I've just been swapping out the Queued*Ports that break when trying
> to implement finite buffering in a Ruby memory hierarchy. I'll leave
> Queued*Ports for later fixing or removal, as appropriate.
>
>
> Joel
>
>
> ________________________________________
> > From: gem5-dev <gem5-dev-***@gem5.org> on behalf of Joel Hestness
> > < ***@gmail.com>
> > Sent: Friday, February 5, 2016 12:03 PM
> > To: Andreas Hansson
> > Cc: gem5 Developer List
> > Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> > DRAMCtrl
> >
> > Hi guys,
> > Quick updates on this:
> > 1) I have a finite response buffer implementation working. I
> > removed the QueuedSlavePort and added a response queue with reservation
> (Andreas'
> > underlying suggestion). I have a question with this solution: The
> > QueuedSlavePort prioritized responses based their scheduled response
> time.
> > However, since writes have a shorter pipeline from request to
> > response, this architecture prioritized write requests ahead of read
> > requests received earlier, and it performs ~1-8% worse than a strict
> > queue (what I've implemented at this point). I can make the response
> > queue a priority queue if we want the same structure as previously,
> > but I'm wondering if we might prefer to just have the better-performing
> strict queue.
> >
> > 2) To reflect on Andreas' specific suggestion of using unblock
> > callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> > callbacks is ugly when trying to call the callback: The call needs to
> > originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> > doesn't have a pointer to the owner component; The SlavePort has the
> > pointer, so the PacketQueue would need to first callback to the port,
> > which would call the owner component callback.
> > The exercise getting this to work has solidified my opinion that the
> > Queued*Ports should probably be removed from the codebase: Queues and
> > ports are separate subcomponents of simulated components, and only the
> > component knows how they should interact. Including a Queued*Port
> > inside a component requires the component to manage the flow-control
> > into the Queued*Port just as it would need to manage a standard port
> > anyway, and hiding the queue in the port obfuscates how it is managed.
> >
> >
> > Thanks!
> > Joel
> >
> >
> > On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com>
> > wrote:
> >
> > > Hi Andreas,
> > > Thanks for the input. I had tried adding front- and back-end
> > > queues within the DRAMCtrl, but it became very difficult to
> > > propagate the flow control back through the component due to the
> > > complicated implementation
> > of
> > > timing across different accessAndRespond() calls. I had to put this
> > > solution on hold.
> > >
> > > I think your proposed solution should simplify the flow control
> > > issue, and should have the derivative effect of making the
> > > Queued*Ports capable
> > of
> > > flow control. I'm a little concerned that your solution would make
> > > the buffering very fluid, and I'm not sufficiently familiar with
> > > memory controller microarchitecture to know if that would be
> > > realistic. I wonder if you might have a way to do performance
> > > validation after I work through either of these implementations.
> > >
> > > Thanks!
> > > Joel
> > >
> > >
> > >
> > > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> > ***@arm.com>
> > > wrote:
> > >
> > >> Hi Joel,
> > >>
> > >> I would suggest o keep the queued ports, but add methods to reserve
> > >> resources, query if it has free space, and a way to register
> > >> callbacks
> > so
> > >> that the MemObject is made aware when packets are sent. That way we
> > >> can
> > use
> > >> the queue in the cache, memory controller etc, without having all
> > >> the issues of the “naked” port interface, but still enforcing a
> > >> bounded
> > queue.
> > >>
> > >> When a packet arrives to the module we call reserve on the output
> port.
> > >> Then when we actually add the packet we know that there is space.
> > >> When request packets arrive we check if the queue is full, and if
> > >> so we block any new requests. Then through the callback we can
> > >> unblock the DRAM controller in this case.
> > >>
> > >> What do you think?
> > >>
> > >> Andreas
> > >>
> > >> From: Joel Hestness <***@gmail.com>
> > >> Date: Tuesday, 2 February 2016 at 00:24
> > >> To: Andreas Hansson <***@arm.com>
> > >> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> > >>
> > >> Hi Andreas,
> > >> I'd like to circle back on the thread about removing the
> > >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> > >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> > >> all of my simulations now crash on the DRAMCtrl's response queue.
> > >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> > >> However, based on my inspection of the code, it looks pretty
> > >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> > >> least help me work through the changes.
> > >>
> > >> To reproduce the issue, I've put together a slim gem5 patch
> > >> (attached) to use the memtest.py script to generate accesses.
> > >> Here's the command
> > line
> > >> I used:
> > >>
> > >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> > >> configs/example/memtest.py -u 100
> > >>
> > >> If you're still willing to take a stab at it, let me know if/how
> > >> I can help. Otherwise, I'll start working on it. It seems the
> > >> trickiest thing
> > is
> > >> going to be modeling the arbitrary frontendLatency and
> > >> backendLatency
> > while
> > >> still counting all of the accesses that are in the controller when
> > >> it
> > needs
> > >> to block back to the input queue. These latencies are currently
> > >> assessed with scheduling in the port response queue. Any
> > >> suggestions you could
> > give
> > >> would be appreciated.
> > >>
> > >> Thanks!
> > >> Joel
> > >>
> > >>
> > >> Below here is our conversation from the email thread "[gem5-dev]
> > >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> > >>
> > >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> > ***@arm.com
> > >> > wrote:
> > >>
> > >>> Great. Thanks Joel.
> > >>>
> > >>> If anything pops up on our side I’ll let you know.
> > >>>
> > >>> Andreas
> > >>>
> > >>> From: Joel Hestness <***@gmail.com>
> > >>> Date: Wednesday, 23 September 2015 20:29
> > >>>
> > >>> To: Andreas Hansson <***@arm.com>
> > >>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>> RubyMemoryControl delete requests
> > >>>
> > >>>
> > >>>
> > >>>> I don’t think there is any big difference in our expectations,
> > >>>> quite the contrary :-). GPUs are very important to us (and so is
> > >>>> throughput computing in general), and we run plenty simulations
> > >>>> with lots of memory-level parallelism from non-CPU components.
> > >>>> Still, we haven’t
> > run
> > >>>> into the issue.
> > >>>>
> > >>>
> > >>> Ok, cool. Thanks for the context.
> > >>>
> > >>>
> > >>> If you have practical examples that run into problems let me know,
> > >>> and
> > >>>> we’ll get it fixed.
> > >>>>
> > >>>
> > >>> I'm having trouble assembling a practical example (with or without
> > using
> > >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> > >>>
> > >>> Thanks!
> > >>> Joel
> > >>>
> > >>>
> > >>>
> > >>>> From: Joel Hestness <***@gmail.com>
> > >>>> Date: Tuesday, 22 September 2015 19:58
> > >>>>
> > >>>> To: Andreas Hansson <***@arm.com>
> > >>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>> RubyMemoryControl delete requests
> > >>>>
> > >>>> Hi Andreas,
> > >>>>
> > >>>>
> > >>>>> If it is a real problem affecting end users I am indeed
> > >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> > >>>>> classic memory
> > system
> > >>>>> there are enough points of regulation (LSQs, MSHR limits,
> > >>>>> crossbar
> > layers
> > >>>>> etc) that having a single memory channel with >100 queued up
> > responses
> > >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> > >>>>> added complexity has not been needed. If there is regulation on
> > >>>>> the number
> > of
> > >>>>> requests in Ruby, then I would argue that it is equally unlikely
> > there…I
> > >>>>> could be wrong.
> > >>>>>
> > >>>>
> > >>>> Ok. I think a big part of the difference between our expectations
> > >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
> > aggressive
> > >>>> GPU cores with potential to expose, perhaps, 4-32x more
> > >>>> memory-level parallel requests than a comparable number of
> > >>>> multithreaded CPU
> > cores. I
> > >>>> feel that this difference warrants different handling of accesses
> > >>>> in
> > the
> > >>>> memory controller.
> > >>>>
> > >>>> Joel
> > >>>>
> > >>>>
> > >>>>
> > >>>> From: Joel Hestness <***@gmail.com>
> > >>>>> Date: Tuesday, 22 September 2015 17:48
> > >>>>>
> > >>>>> To: Andreas Hansson <***@arm.com>
> > >>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>>> RubyMemoryControl delete requests
> > >>>>>
> > >>>>> Hi Andreas,
> > >>>>>
> > >>>>> Thanks for the "ship it!"
> > >>>>>
> > >>>>>
> > >>>>>> Do we really need to remove the use of QueuedSlavePort in
> DRAMCtrl?
> > >>>>>> It will make the controller more complex, and I don’t want to
> > >>>>>> do it
> > “just
> > >>>>>> in case”.
> > >>>>>>
> > >>>>>
> > >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> > >>>>> I'm not sure who should make that change, but I think it should
> > >>>>> get
> > done. The
> > >>>>> memory access response path starts at the DRAMCtrl and ends at
> > >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> > >>>>> will
> > probably
> > >>>>> back-up more quickly on the response path back to where there
> > >>>>> are
> > open
> > >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> > prevalent as
> > >>>>> Ruby adds flow control, unless we add a limitation on
> > >>>>> outstanding
> > requests
> > >>>>> to memory from directory controllers.
> > >>>>>
> > >>>>> How does the classic memory model deal with this?
> > >>>>>
> > >>>>> Joel
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> From: Joel Hestness <***@gmail.com>
> > >>>>>> Date: Tuesday, 22 September 2015 17:30
> > >>>>>> To: Andreas Hansson <***@arm.com>
> > >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org>
> > >>>>>>
> > >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > >>>>>> RubyMemoryControl delete requests
> > >>>>>>
> > >>>>>> Hi guys,
> > >>>>>> Thanks for the discussion here. I had quickly tested other
> > >>>>>> memory controllers, but hadn't connected the dots that this
> > >>>>>> might be the
> > same
> > >>>>>> problem Brad/AMD are running into.
> > >>>>>>
> > >>>>>> My preference would be that we remove the QueuedSlavePort
> > >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> > >>>>>> a potential
> > source of
> > >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> > closely
> > >>>>>> focus on the RubyPort problem when we get to it.
> > >>>>>>
> > >>>>>> Can we reach resolution on this patch though? Are we okay
> > >>>>>> with actually fixing the memory leak in mainline?
> > >>>>>>
> > >>>>>> Joel
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> > >>>>>> ***@arm.com> wrote:
> > >>>>>>
> > >>>>>>> Hi Brad,
> > >>>>>>>
> > >>>>>>> We can remove the use of QueuedSlavePort in the memory
> > >>>>>>> controller
> > and
> > >>>>>>> simply not accept requests if the response queue is full. Is
> > >>>>>>> this needed?
> > >>>>>>> If so we’ll make sure someone gets this in place. The only
> > >>>>>>> reason
> > we
> > >>>>>>> haven’t done it is because it hasn’t been needed.
> > >>>>>>>
> > >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> > >>>>>>> different story. I think most of these can be removed and
> > >>>>>>> actually use flow control.
> > I’m
> > >>>>>>> happy to code it up, but there is such a flux at the moment
> > >>>>>>> that I didn’t want to post yet another patch changing the Ruby
> > >>>>>>> port. I really do think we should avoid having implicit
> > >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> > >>>>>>> possible. If we really need a constructor parameter to make it
> > >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
> > >>>>>>>
> > >>>>>>> Andreas
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> > >>>>>>> <gem5-dev-***@gem5.org on behalf of ***@amd.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> >From AMD's perspective, we have deprecated our usage of
> > >>>>>>> RubyMemoryControl
> > >>>>>>> >and we are using the new Memory Controllers with the port
> > interface.
> > >>>>>>> >
> > >>>>>>> >That being said, I completely agree with Joel that the packet
> > queue
> > >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> > >>>>>>> >know, we
> > >>>>>>> tried
> > >>>>>>> >very hard several months ago to essentially make this a
> > >>>>>>> >infinite
> > >>>>>>> buffer,
> > >>>>>>> >but Andreas would not allow us to check it in. We are going
> > >>>>>>> >to
> > >>>>>>> post that
> > >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> > >>>>>>> >GPU
> > >>>>>>> model
> > >>>>>>> >will not work unless we increase that limit.
> > >>>>>>> >
> > >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> > >>>>>>> something is
> > >>>>>>> >fundamentally broken. Please keep in mind that there are
> > >>>>>>> >many
> > uses
> > >>>>>>> of
> > >>>>>>> >gem5 beyond what you use it for. Also this is a research
> > simulator
> > >>>>>>> and
> > >>>>>>> >we should not restrict ourselves to what we think is
> > >>>>>>> >practical in
> > >>>>>>> real
> > >>>>>>> >hardware. Finally, the fact that the finite limit is
> > >>>>>>> >invisible to
> > >>>>>>> the
> > >>>>>>> >producer is just bad software engineering.
> > >>>>>>> >
> > >>>>>>> >I beg you to please allow us to remove this finite invisible
> > limit!
> > >>>>>>> >
> > >>>>>>> >Brad
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >-----Original Message-----
> > >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf
> > >>>>>>> >Of
> > >>>>>>> Andreas
> > >>>>>>> >Hansson
> > >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> > >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> > >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> > RubyMemoryControl
> > >>>>>>> >delete requests
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> > >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
> > >>>>>>> >> > it
> > >>>>>>> not been
> > >>>>>>> >>deprecated long enough?
> > >>>>>>> >>
> > >>>>>>> >> Joel Hestness wrote:
> > >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> > >>>>>>> >> don't
> > see
> > >>>>>>> other
> > >>>>>>> >>memory controllers that are guaranteed to work. Besides
> > >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> > >>>>>>> >>their
> > input
> > >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> > >>>>>>> PacketQueue
> > >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> > >>>>>>> >>This
> > >>>>>>> occurs
> > >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> > >>>>>>> >>and
> > >>>>>>> appears
> > >>>>>>> >>difficult to fix in a systematic way.
> > >>>>>>> >>
> > >>>>>>> >> Regardless of the fact we've deprecated
> > >>>>>>> >> RubyMemoryControl,
> > >>>>>>> this is
> > >>>>>>> >>a necessary fix.
> > >>>>>>> >
> > >>>>>>> >No memory controller is using QueuedSlaavePort for any
> > >>>>>>> >_input_
> > >>>>>>> queues.
> > >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> > >>>>>>> >that's
> > >>>>>>> all.
> > >>>>>>> >If that is really an issue we can move away from it and
> > >>>>>>> >enfore an
> > >>>>>>> upper
> > >>>>>>> >bound on responses by not accepting new requests. That said,
> > >>>>>>> >if we
> > >>>>>>> hit
> > >>>>>>> >the limit I would argue something else is fundamentally
> > >>>>>>> >broken in
> > >>>>>>> the
> > >>>>>>> >system and should be addressed.
> > >>>>>>> >
> > >>>>>>> >In any case, the discussion whether to remove
> > >>>>>>> >RubyMemoryControl or
> > >>>>>>> not
> > >>>>>>> >should be completely decoupled.
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> >- Andreas
> > >>>>>>>
> > >>>>>>
> > >>
> > >> --
> > >> Joel Hestness
> > >> PhD Candidate, Computer Architecture
> > >> Dept. of Computer Science, University of Wisconsin - Madison
> > >> http://pages.cs.wisc.edu/~hestness/
> > >> IMPORTANT NOTICE: The contents of this email and any attachments
> > >> are confidential and may also be privileged. If you are not the
> > >> intended recipient, please notify the sender immediately and do not
> > >> disclose the contents to any other person, use it for any purpose,
> > >> or store or copy
> > the
> > >> information in any medium. Thank you.
> > >>
> > >
> > >
> > >
> > > --
> > > Joel Hestness
> > > PhD Candidate, Computer Architecture
> > > Dept. of Computer Science, University of Wisconsin - Madison
> > > http://pages.cs.wisc.edu/~hestness/
> > >
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> > _______________________________________________
> > gem5-dev mailing list
> > gem5-***@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>
>
>
>
> --
>
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Poremba, Matthew
2016-02-29 18:50:25 UTC
Permalink
Hi Joel/All,


I have my proposed changes up on the reviewboard finally. I’ve tried to make the flow control API generic enough that any of the common flow control types could be implemented (i.e., ack/nack [retries], token-based, credit-based, Xon/Xoff, etc.), but there are still a few conflicts between our implementations. Specifically, the serviceMemoryQueue method in AbstractController is handled differently by different flow controls, but SLICC changes are the same as yours. You can see the patches here (they’ll need to be applied in the order below for anyone interested):

http://reviews.gem5.org/r/3354/
http://reviews.gem5.org/r/3355/
http://reviews.gem5.org/r/3356/

I am still working on making the flow control more easily configurable within python scripts. Currently this is possible but seems to require passing the command line options all over the place. I’m also working on something to pass flow control calls through connector type MemObjects (Xbar, Bridge, CommMonitor), which will make the potential for QoS capability much more interesting (for example, flow control can monitor MasterIDs and do prioritization/throttling/balancing).


-Matt

From: Joel Hestness [mailto:***@gmail.com]
Sent: Saturday, February 13, 2016 9:07 AM
To: Poremba, Matthew
Cc: gem5 Developer List; Gross, Joe
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Matt,

‘That said, I realize that by "token" structure, Joe and you might be describing something more than what I've implemented. Namely, since tokens are the credits that allow senders to push into a receiver's queues, they might allow multiple directories/caches sending to a single DRAMCtrl, which I don't believe is possible with my current implementation. I think we'd need to allow the DRAMCtrl to receive requests and queue retries while other requesters are blocked, and sending those retries would need fair arbitration, which a token scheme might automatically handle. Can you clarify if that's what you're referring to as a token scheme?’

A token scheme would not use a retry/unblock mechanisms at all. The number of tokens available is sent to each producer from a consumer when the ports are connected/start of simulation. In this regard, the producers know how many requests can be sent and stop sending once the tokens are exhausted. The consumer will return tokens once a request is handled. This removes the need for retries and unblock calls, reduces overall complexity, and is closer to hardware implementations imo. The token scheme would indeed automatically handle the situation where multiple producers are blocked and can also be hidden away in the port without needing to add a retry queue to consumers, which I don’t believe is a great idea.


Ok. Yes, that makes sense. I look forward to seeing your changes.


Thanks!
Joel


From: Joel Hestness [mailto:***@gmail.com<mailto:***@gmail.com>]
Sent: Thursday, February 11, 2016 2:52 PM
To: gem5 Developer List
Cc: Gross, Joe; Poremba, Matthew

Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Matt,

In regards to the buffersFull() implementation, I can think of a pathological case where the back-end queue is full because the sender is not accepting responses (for whatever reason) but is still issuing requests. buffersFull() will return false in this case and allow the request to be enqueued and eventually scheduled, causing the back-end queue to grow larger than the response_buffer_size parameter.

Perhaps one way to better emulate exchanging tokens (credit) as Joe mentioned is to have buffersFull() "reserve" slots in the queues by making sure there is a slot in both the read queue (or write queue) and a corresponding slot available in the back-end queue. The reservation can be lifted once the response is sent on the port.

I'm not sure I understand the difference between this description and what I've implemented, except that what I've implemented adds some extra back-end queuing. The capacity of the back-end queue in my implementation is equal to the sum of the read and write queue capacities (plus a little extra: response_buffer_size). The reservation of a slot in this large back-end queue is released when a response is sent through the port, as you describe. To me, this seems exactly the way a token-like structure would reserve back-end queue slots.

That said, I realize that by "token" structure, Joe and you might be describing something more than what I've implemented. Namely, since tokens are the credits that allow senders to push into a receiver's queues, they might allow multiple directories/caches sending to a single DRAMCtrl, which I don't believe is possible with my current implementation. I think we'd need to allow the DRAMCtrl to receive requests and queue retries while other requesters are blocked, and sending those retries would need fair arbitration, which a token scheme might automatically handle. Can you clarify if that's what you're referring to as a token scheme?


Another more aggressive implementation would be to not use buffersFull() and prevent scheduling memory requests from the read/write queue if the back-end queue is full. This would allow a sender to enqueue memory requests even if the back-end queue is full up until the read/write queue fills up, but would require a number of changes to the code.

Yes, I tried implementing this first, and it ends up being very difficult due to the DRAMCtrl's calls to and implementation of accessAndRespond(). Basically, reads and writes require different processing latencies, so we would need not only a back-end queue, but also separate read and write delay queues to model the different DRAM access latencies. We'd also need a well-performing way to arbitrate for slots in the back-end queue that doesn't conflict with the batching efforts of the front-end. To me, all this complexity seems maligned with Andreas et al.'s original aim with the DRAMCtrl: fast and reasonably accurate simulation of a memory controller<http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.


In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite calls or are you placing a MessageBuffer in AbstractController? I am currently trying out an implementation using the former for a few additional reasons other than flow control.

If I understand what you're asking, I think I've also done the former, though I've modified SLICC and the AbstractController to deal with parts of the buffer management. I've merged my code with a recent gem5 revision (11315:10647f5d0f7f) so I could post a draft review request. Here are the patches (including links) to test all of this:

- http://reviews.gem5.org/r/3331/
- http://reviews.gem5.org/r/3332/
- http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queues
- http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand

More holistically, I feel that the best solution would be to hide the memory request and response queues in an AbstractDirectoryController class that inherits from AbstractController in C++, and from which all SLICC directory controller machines descend. This structure would move all the directory-specific code out of AbstractController and not model it in other SLICC generated machines. This would also eliminate the need for assertions that only directory controllers are calling the directory-specific functions.


Joel


-----Original Message-----
From: gem5-dev [mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On Behalf Of Joel Hestness
Sent: Monday, February 08, 2016 12:16 PM
To: Gross, Joe
Cc: gem5 Developer List
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi guys,
I just posted a draft of my DRAMCtrl flow-control patch so you can take a look here: http://reviews.gem5.org/r/3315/

NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from directories to memory controllers into a MasterPort, and it places a MessageBuffer in front of the MasterPort, so that the user can make all buffering finite within a Ruby memory hierarchy. I still need to merge this patch with gem5, before I can share it. Let me know if you'd like to see the draft there also.

@Joe:


> I'd be curious to see a patch of what you're proposing as I'm not sure
> I really follow what you're doing. The reason I ask is because I have
> been discussing an implementation with with Brad and would like to see
> how similar it is to what you have. Namely it's an idea similar to
> what is commonly used in hardware, where senders have tokens that
> correspond to slots in the receiver queue so the reservation happens
> at startup. The only communication that goes from a receiving port
> back to a sender is token return. The port and queue would still be
> coupled and the device which owns the Queued*Port would manage removal
> from the PacketQueue. In my experience, this is a very effective
> mechanism for flow control and addresses your point about transparency of the queue and its state.
> The tokens removes the need for unblock callbacks, but it's the
> responsibility of the receiver not to send when the queue is full or
> when it has a conflicting request. There's no implementation yet, but
> the simplicity and similarity to hardware techniques may prove useful.
> Anyway, could you post something so I can better understand what you've described?


My implementation effectively does what you're describing: The DRAMCtrl now has a finite number of buffers (i.e. tokens), and it allocates a buffer slot when a request is received (senders spend a token when the DRAMCtrl accepts a request). The only real difference is that the DRAMCtrl now implements a SlavePort with flow control consistent with the rest of gem5, so if there are no buffer slots available, the request is nacked and a retry must be sent (i.e. a token is returned).


Please don't get rid of the Queued*Ports, as I think there is a simple way
> to improve them to do efficient flow control.
>

Heh... not sure I have the time/motivation to remove the Queued*Ports myself. I've just been swapping out the Queued*Ports that break when trying to implement finite buffering in a Ruby memory hierarchy. I'll leave Queued*Ports for later fixing or removal, as appropriate.


Joel


________________________________________
> From: gem5-dev <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>> on behalf of Joel Hestness
> < ***@gmail.com<mailto:***@gmail.com>>
> Sent: Friday, February 5, 2016 12:03 PM
> To: Andreas Hansson
> Cc: gem5 Developer List
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> DRAMCtrl
>
> Hi guys,
> Quick updates on this:
> 1) I have a finite response buffer implementation working. I
> removed the QueuedSlavePort and added a response queue with reservation (Andreas'
> underlying suggestion). I have a question with this solution: The
> QueuedSlavePort prioritized responses based their scheduled response time.
> However, since writes have a shorter pipeline from request to
> response, this architecture prioritized write requests ahead of read
> requests received earlier, and it performs ~1-8% worse than a strict
> queue (what I've implemented at this point). I can make the response
> queue a priority queue if we want the same structure as previously,
> but I'm wondering if we might prefer to just have the better-performing strict queue.
>
> 2) To reflect on Andreas' specific suggestion of using unblock
> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> callbacks is ugly when trying to call the callback: The call needs to
> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> doesn't have a pointer to the owner component; The SlavePort has the
> pointer, so the PacketQueue would need to first callback to the port,
> which would call the owner component callback.
> The exercise getting this to work has solidified my opinion that the
> Queued*Ports should probably be removed from the codebase: Queues and
> ports are separate subcomponents of simulated components, and only the
> component knows how they should interact. Including a Queued*Port
> inside a component requires the component to manage the flow-control
> into the Queued*Port just as it would need to manage a standard port
> anyway, and hiding the queue in the port obfuscates how it is managed.
>
>
> Thanks!
> Joel
>
>
> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> wrote:
>
> > Hi Andreas,
> > Thanks for the input. I had tried adding front- and back-end
> > queues within the DRAMCtrl, but it became very difficult to
> > propagate the flow control back through the component due to the
> > complicated implementation
> of
> > timing across different accessAndRespond() calls. I had to put this
> > solution on hold.
> >
> > I think your proposed solution should simplify the flow control
> > issue, and should have the derivative effect of making the
> > Queued*Ports capable
> of
> > flow control. I'm a little concerned that your solution would make
> > the buffering very fluid, and I'm not sufficiently familiar with
> > memory controller microarchitecture to know if that would be
> > realistic. I wonder if you might have a way to do performance
> > validation after I work through either of these implementations.
> >
> > Thanks!
> > Joel
> >
> >
> >
> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> ***@arm.com<mailto:***@arm.com>>
> > wrote:
> >
> >> Hi Joel,
> >>
> >> I would suggest o keep the queued ports, but add methods to reserve
> >> resources, query if it has free space, and a way to register
> >> callbacks
> so
> >> that the MemObject is made aware when packets are sent. That way we
> >> can
> use
> >> the queue in the cache, memory controller etc, without having all
> >> the issues of the “naked” port interface, but still enforcing a
> >> bounded
> queue.
> >>
> >> When a packet arrives to the module we call reserve on the output port.
> >> Then when we actually add the packet we know that there is space.
> >> When request packets arrive we check if the queue is full, and if
> >> so we block any new requests. Then through the callback we can
> >> unblock the DRAM controller in this case.
> >>
> >> What do you think?
> >>
> >> Andreas
> >>
> >> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >> Date: Tuesday, 2 February 2016 at 00:24
> >> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> >>
> >> Hi Andreas,
> >> I'd like to circle back on the thread about removing the
> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> >> all of my simulations now crash on the DRAMCtrl's response queue.
> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> >> However, based on my inspection of the code, it looks pretty
> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> >> least help me work through the changes.
> >>
> >> To reproduce the issue, I've put together a slim gem5 patch
> >> (attached) to use the memtest.py script to generate accesses.
> >> Here's the command
> line
> >> I used:
> >>
> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> >> configs/example/memtest.py -u 100
> >>
> >> If you're still willing to take a stab at it, let me know if/how
> >> I can help. Otherwise, I'll start working on it. It seems the
> >> trickiest thing
> is
> >> going to be modeling the arbitrary frontendLatency and
> >> backendLatency
> while
> >> still counting all of the accesses that are in the controller when
> >> it
> needs
> >> to block back to the input queue. These latencies are currently
> >> assessed with scheduling in the port response queue. Any
> >> suggestions you could
> give
> >> would be appreciated.
> >>
> >> Thanks!
> >> Joel
> >>
> >>
> >> Below here is our conversation from the email thread "[gem5-dev]
> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> >>
> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> ***@arm.com<mailto:***@arm.com>
> >> > wrote:
> >>
> >>> Great. Thanks Joel.
> >>>
> >>> If anything pops up on our side I’ll let you know.
> >>>
> >>> Andreas
> >>>
> >>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>> Date: Wednesday, 23 September 2015 20:29
> >>>
> >>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>> RubyMemoryControl delete requests
> >>>
> >>>
> >>>
> >>>> I don’t think there is any big difference in our expectations,
> >>>> quite the contrary :-). GPUs are very important to us (and so is
> >>>> throughput computing in general), and we run plenty simulations
> >>>> with lots of memory-level parallelism from non-CPU components.
> >>>> Still, we haven’t
> run
> >>>> into the issue.
> >>>>
> >>>
> >>> Ok, cool. Thanks for the context.
> >>>
> >>>
> >>> If you have practical examples that run into problems let me know,
> >>> and
> >>>> we’ll get it fixed.
> >>>>
> >>>
> >>> I'm having trouble assembling a practical example (with or without
> using
> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> >>>
> >>> Thanks!
> >>> Joel
> >>>
> >>>
> >>>
> >>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>> Date: Tuesday, 22 September 2015 19:58
> >>>>
> >>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>> RubyMemoryControl delete requests
> >>>>
> >>>> Hi Andreas,
> >>>>
> >>>>
> >>>>> If it is a real problem affecting end users I am indeed
> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> >>>>> classic memory
> system
> >>>>> there are enough points of regulation (LSQs, MSHR limits,
> >>>>> crossbar
> layers
> >>>>> etc) that having a single memory channel with >100 queued up
> responses
> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> >>>>> added complexity has not been needed. If there is regulation on
> >>>>> the number
> of
> >>>>> requests in Ruby, then I would argue that it is equally unlikely
> there…I
> >>>>> could be wrong.
> >>>>>
> >>>>
> >>>> Ok. I think a big part of the difference between our expectations
> >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
> aggressive
> >>>> GPU cores with potential to expose, perhaps, 4-32x more
> >>>> memory-level parallel requests than a comparable number of
> >>>> multithreaded CPU
> cores. I
> >>>> feel that this difference warrants different handling of accesses
> >>>> in
> the
> >>>> memory controller.
> >>>>
> >>>> Joel
> >>>>
> >>>>
> >>>>
> >>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>>> Date: Tuesday, 22 September 2015 17:48
> >>>>>
> >>>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>> RubyMemoryControl delete requests
> >>>>>
> >>>>> Hi Andreas,
> >>>>>
> >>>>> Thanks for the "ship it!"
> >>>>>
> >>>>>
> >>>>>> Do we really need to remove the use of QueuedSlavePort in DRAMCtrl?
> >>>>>> It will make the controller more complex, and I don’t want to
> >>>>>> do it
> “just
> >>>>>> in case”.
> >>>>>>
> >>>>>
> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> >>>>> I'm not sure who should make that change, but I think it should
> >>>>> get
> done. The
> >>>>> memory access response path starts at the DRAMCtrl and ends at
> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> >>>>> will
> probably
> >>>>> back-up more quickly on the response path back to where there
> >>>>> are
> open
> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> prevalent as
> >>>>> Ruby adds flow control, unless we add a limitation on
> >>>>> outstanding
> requests
> >>>>> to memory from directory controllers.
> >>>>>
> >>>>> How does the classic memory model deal with this?
> >>>>>
> >>>>> Joel
> >>>>>
> >>>>>
> >>>>>
> >>>>>> From: Joel Hestness <***@gmail.com<mailto:***@gmail.com>>
> >>>>>> Date: Tuesday, 22 September 2015 17:30
> >>>>>> To: Andreas Hansson <***@arm.com<mailto:***@arm.com>>
> >>>>>> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >>>>>>
> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >>>>>> RubyMemoryControl delete requests
> >>>>>>
> >>>>>> Hi guys,
> >>>>>> Thanks for the discussion here. I had quickly tested other
> >>>>>> memory controllers, but hadn't connected the dots that this
> >>>>>> might be the
> same
> >>>>>> problem Brad/AMD are running into.
> >>>>>>
> >>>>>> My preference would be that we remove the QueuedSlavePort
> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> >>>>>> a potential
> source of
> >>>>>> the QueueSlavePort packet overflows, and would allow us to more
> closely
> >>>>>> focus on the RubyPort problem when we get to it.
> >>>>>>
> >>>>>> Can we reach resolution on this patch though? Are we okay
> >>>>>> with actually fixing the memory leak in mainline?
> >>>>>>
> >>>>>> Joel
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >>>>>> ***@arm.com<mailto:***@arm.com>> wrote:
> >>>>>>
> >>>>>>> Hi Brad,
> >>>>>>>
> >>>>>>> We can remove the use of QueuedSlavePort in the memory
> >>>>>>> controller
> and
> >>>>>>> simply not accept requests if the response queue is full. Is
> >>>>>>> this needed?
> >>>>>>> If so we’ll make sure someone gets this in place. The only
> >>>>>>> reason
> we
> >>>>>>> haven’t done it is because it hasn’t been needed.
> >>>>>>>
> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> >>>>>>> different story. I think most of these can be removed and
> >>>>>>> actually use flow control.
> I’m
> >>>>>>> happy to code it up, but there is such a flux at the moment
> >>>>>>> that I didn’t want to post yet another patch changing the Ruby
> >>>>>>> port. I really do think we should avoid having implicit
> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> >>>>>>> possible. If we really need a constructor parameter to make it
> >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
> >>>>>>>
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>>
> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >>>>>>> <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org> on behalf of ***@amd.com<mailto:***@amd.com>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> >From AMD's perspective, we have deprecated our usage of
> >>>>>>> RubyMemoryControl
> >>>>>>> >and we are using the new Memory Controllers with the port
> interface.
> >>>>>>> >
> >>>>>>> >That being said, I completely agree with Joel that the packet
> queue
> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> >>>>>>> >know, we
> >>>>>>> tried
> >>>>>>> >very hard several months ago to essentially make this a
> >>>>>>> >infinite
> >>>>>>> buffer,
> >>>>>>> >but Andreas would not allow us to check it in. We are going
> >>>>>>> >to
> >>>>>>> post that
> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> >>>>>>> >GPU
> >>>>>>> model
> >>>>>>> >will not work unless we increase that limit.
> >>>>>>> >
> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> >>>>>>> something is
> >>>>>>> >fundamentally broken. Please keep in mind that there are
> >>>>>>> >many
> uses
> >>>>>>> of
> >>>>>>> >gem5 beyond what you use it for. Also this is a research
> simulator
> >>>>>>> and
> >>>>>>> >we should not restrict ourselves to what we think is
> >>>>>>> >practical in
> >>>>>>> real
> >>>>>>> >hardware. Finally, the fact that the finite limit is
> >>>>>>> >invisible to
> >>>>>>> the
> >>>>>>> >producer is just bad software engineering.
> >>>>>>> >
> >>>>>>> >I beg you to please allow us to remove this finite invisible
> limit!
> >>>>>>> >
> >>>>>>> >Brad
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >-----Original Message-----
> >>>>>>> >From: gem5-dev [mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On Behalf
> >>>>>>> >Of
> >>>>>>> Andreas
> >>>>>>> >Hansson
> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> RubyMemoryControl
> >>>>>>> >delete requests
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
> >>>>>>> >> > it
> >>>>>>> not been
> >>>>>>> >>deprecated long enough?
> >>>>>>> >>
> >>>>>>> >> Joel Hestness wrote:
> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> >>>>>>> >> don't
> see
> >>>>>>> other
> >>>>>>> >>memory controllers that are guaranteed to work. Besides
> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> >>>>>>> >>their
> input
> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> >>>>>>> PacketQueue
> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> >>>>>>> >>This
> >>>>>>> occurs
> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> >>>>>>> >>and
> >>>>>>> appears
> >>>>>>> >>difficult to fix in a systematic way.
> >>>>>>> >>
> >>>>>>> >> Regardless of the fact we've deprecated
> >>>>>>> >> RubyMemoryControl,
> >>>>>>> this is
> >>>>>>> >>a necessary fix.
> >>>>>>> >
> >>>>>>> >No memory controller is using QueuedSlaavePort for any
> >>>>>>> >_input_
> >>>>>>> queues.
> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> >>>>>>> >that's
> >>>>>>> all.
> >>>>>>> >If that is really an issue we can move away from it and
> >>>>>>> >enfore an
> >>>>>>> upper
> >>>>>>> >bound on responses by not accepting new requests. That said,
> >>>>>>> >if we
> >>>>>>> hit
> >>>>>>> >the limit I would argue something else is fundamentally
> >>>>>>> >broken in
> >>>>>>> the
> >>>>>>> >system and should be addressed.
> >>>>>>> >
> >>>>>>> >In any case, the discussion whether to remove
> >>>>>>> >RubyMemoryControl or
> >>>>>>> not
> >>>>>>> >should be completely decoupled.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >- Andreas
> >>>>>>>
> >>>>>>
> >>
> >> --
> >> Joel Hestness
> >> PhD Candidate, Computer Architecture
> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> http://pages.cs.wisc.edu/~hestness/
> >> IMPORTANT NOTICE: The contents of this email and any attachments
> >> are confidential and may also be privileged. If you are not the
> >> intended recipient, please notify the sender immediately and do not
> >> disclose the contents to any other person, use it for any purpose,
> >> or store or copy
> the
> >> information in any medium. Thank you.
> >>
> >
> >
> >
> > --
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >
>
>
>
> --
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org<mailto:gem5-***@gem5.org>
> http://m5sim.org/mailman/listinfo/gem5-dev
>


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org<mailto:gem5-***@gem5.org>
http://m5sim.org/mailman/listinfo/gem5-dev


--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Andreas Hansson
2016-02-29 21:56:00 UTC
Permalink
Hi Matt,

Can we hit the pause button please?

Your proposed changes all look sensible in terms of allowing different
types of flow control, but I would really like to understand _what_ we are
trying to accomplish here, and what trade-offs in terms of speed and
complexity we are willing to make.

I fundamentally do not think it is wise to try and support all options
(hop-to-hop valid/accept, hop-to-hop credit based, end-to-end credit based
etc). Is your proposal to switch both classic and Ruby to end-to-end
credit based? Fixed credits or opportunistic as well? Etc etc.

At the moment the classic memory system (and the port infrastructure) is
resembling AMBA ACE, in that it uses hop-to-hop valid/accept, with
separate flow control for reads and writes, snoops and snoop responses. It
is also worth noting that the current port flow control is fairly well
aligned with SystemC TLM 2. Any changes need to be considered in this
context.

The patch that Joel posted for adding flow-control within the memory
controller is a clear win and we should get that in once it has converged.
Any news on that front Joel?

Thanks,

Andreas

On 29/02/2016, 18:50, "gem5-dev on behalf of Poremba, Matthew"
<gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:

>Hi Joel/All,
>
>
>I have my proposed changes up on the reviewboard finally. I’ve tried to
>make the flow control API generic enough that any of the common flow
>control types could be implemented (i.e., ack/nack [retries],
>token-based, credit-based, Xon/Xoff, etc.), but there are still a few
>conflicts between our implementations. Specifically, the
>serviceMemoryQueue method in AbstractController is handled differently by
>different flow controls, but SLICC changes are the same as yours. You can
>see the patches here (they’ll need to be applied in the order below for
>anyone interested):
>
>http://reviews.gem5.org/r/3354/
>http://reviews.gem5.org/r/3355/
>http://reviews.gem5.org/r/3356/
>
>I am still working on making the flow control more easily configurable
>within python scripts. Currently this is possible but seems to require
>passing the command line options all over the place. I’m also working on
>something to pass flow control calls through connector type MemObjects
>(Xbar, Bridge, CommMonitor), which will make the potential for QoS
>capability much more interesting (for example, flow control can monitor
>MasterIDs and do prioritization/throttling/balancing).
>
>
>-Matt
>
>From: Joel Hestness [mailto:***@gmail.com]
>Sent: Saturday, February 13, 2016 9:07 AM
>To: Poremba, Matthew
>Cc: gem5 Developer List; Gross, Joe
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
>Hi Matt,
>
>‘That said, I realize that by "token" structure, Joe and you might be
>describing something more than what I've implemented. Namely, since
>tokens are the credits that allow senders to push into a receiver's
>queues, they might allow multiple directories/caches sending to a single
>DRAMCtrl, which I don't believe is possible with my current
>implementation. I think we'd need to allow the DRAMCtrl to receive
>requests and queue retries while other requesters are blocked, and
>sending those retries would need fair arbitration, which a token scheme
>might automatically handle. Can you clarify if that's what you're
>referring to as a token scheme?’
>
>A token scheme would not use a retry/unblock mechanisms at all. The
>number of tokens available is sent to each producer from a consumer when
>the ports are connected/start of simulation. In this regard, the
>producers know how many requests can be sent and stop sending once the
>tokens are exhausted. The consumer will return tokens once a request is
>handled. This removes the need for retries and unblock calls, reduces
>overall complexity, and is closer to hardware implementations imo. The
>token scheme would indeed automatically handle the situation where
>multiple producers are blocked and can also be hidden away in the port
>without needing to add a retry queue to consumers, which I don’t believe
>is a great idea.
>
>
>Ok. Yes, that makes sense. I look forward to seeing your changes.
>
>
> Thanks!
> Joel
>
>
>From: Joel Hestness
>[mailto:***@gmail.com<mailto:***@gmail.com>]
>Sent: Thursday, February 11, 2016 2:52 PM
>To: gem5 Developer List
>Cc: Gross, Joe; Poremba, Matthew
>
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
>Hi Matt,
>
>In regards to the buffersFull() implementation, I can think of a
>pathological case where the back-end queue is full because the sender is
>not accepting responses (for whatever reason) but is still issuing
>requests. buffersFull() will return false in this case and allow the
>request to be enqueued and eventually scheduled, causing the back-end
>queue to grow larger than the response_buffer_size parameter.
>
>Perhaps one way to better emulate exchanging tokens (credit) as Joe
>mentioned is to have buffersFull() "reserve" slots in the queues by
>making sure there is a slot in both the read queue (or write queue) and a
>corresponding slot available in the back-end queue. The reservation can
>be lifted once the response is sent on the port.
>
>I'm not sure I understand the difference between this description and
>what I've implemented, except that what I've implemented adds some extra
>back-end queuing. The capacity of the back-end queue in my implementation
>is equal to the sum of the read and write queue capacities (plus a little
>extra: response_buffer_size). The reservation of a slot in this large
>back-end queue is released when a response is sent through the port, as
>you describe. To me, this seems exactly the way a token-like structure
>would reserve back-end queue slots.
>
>That said, I realize that by "token" structure, Joe and you might be
>describing something more than what I've implemented. Namely, since
>tokens are the credits that allow senders to push into a receiver's
>queues, they might allow multiple directories/caches sending to a single
>DRAMCtrl, which I don't believe is possible with my current
>implementation. I think we'd need to allow the DRAMCtrl to receive
>requests and queue retries while other requesters are blocked, and
>sending those retries would need fair arbitration, which a token scheme
>might automatically handle. Can you clarify if that's what you're
>referring to as a token scheme?
>
>
>Another more aggressive implementation would be to not use buffersFull()
>and prevent scheduling memory requests from the read/write queue if the
>back-end queue is full. This would allow a sender to enqueue memory
>requests even if the back-end queue is full up until the read/write queue
>fills up, but would require a number of changes to the code.
>
>Yes, I tried implementing this first, and it ends up being very difficult
>due to the DRAMCtrl's calls to and implementation of accessAndRespond().
>Basically, reads and writes require different processing latencies, so we
>would need not only a back-end queue, but also separate read and write
>delay queues to model the different DRAM access latencies. We'd also need
>a well-performing way to arbitrate for slots in the back-end queue that
>doesn't conflict with the batching efforts of the front-end. To me, all
>this complexity seems maligned with Andreas et al.'s original aim with
>the DRAMCtrl: fast and reasonably accurate simulation of a memory
>controller<http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.
>
>
>In regards to Ruby, I am a bit curious- Are you placing MessageBuffers in
>the SLICC files and doing away with the queueMemoryRead/queueMemoryWrite
>calls or are you placing a MessageBuffer in AbstractController? I am
>currently trying out an implementation using the former for a few
>additional reasons other than flow control.
>
>If I understand what you're asking, I think I've also done the former,
>though I've modified SLICC and the AbstractController to deal with parts
>of the buffer management. I've merged my code with a recent gem5 revision
>(11315:10647f5d0f7f) so I could post a draft review request. Here are the
>patches (including links) to test all of this:
>
> - http://reviews.gem5.org/r/3331/
> - http://reviews.gem5.org/r/3332/
> -
>http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queues
> - http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand
>
>More holistically, I feel that the best solution would be to hide the
>memory request and response queues in an AbstractDirectoryController
>class that inherits from AbstractController in C++, and from which all
>SLICC directory controller machines descend. This structure would move
>all the directory-specific code out of AbstractController and not model
>it in other SLICC generated machines. This would also eliminate the need
>for assertions that only directory controllers are calling the
>directory-specific functions.
>
>
> Joel
>
>
>-----Original Message-----
>From: gem5-dev
>[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On
>Behalf Of Joel Hestness
>Sent: Monday, February 08, 2016 12:16 PM
>To: Gross, Joe
>Cc: gem5 Developer List
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
>Hi guys,
> I just posted a draft of my DRAMCtrl flow-control patch so you can take
>a look here: http://reviews.gem5.org/r/3315/
>
> NOTE: I have a separate patch that changes Ruby's QueuedMasterPort from
>directories to memory controllers into a MasterPort, and it places a
>MessageBuffer in front of the MasterPort, so that the user can make all
>buffering finite within a Ruby memory hierarchy. I still need to merge
>this patch with gem5, before I can share it. Let me know if you'd like to
>see the draft there also.
>
>@Joe:
>
>
>> I'd be curious to see a patch of what you're proposing as I'm not sure
>> I really follow what you're doing. The reason I ask is because I have
>> been discussing an implementation with with Brad and would like to see
>> how similar it is to what you have. Namely it's an idea similar to
>> what is commonly used in hardware, where senders have tokens that
>> correspond to slots in the receiver queue so the reservation happens
>> at startup. The only communication that goes from a receiving port
>> back to a sender is token return. The port and queue would still be
>> coupled and the device which owns the Queued*Port would manage removal
>> from the PacketQueue. In my experience, this is a very effective
>> mechanism for flow control and addresses your point about transparency
>>of the queue and its state.
>> The tokens removes the need for unblock callbacks, but it's the
>> responsibility of the receiver not to send when the queue is full or
>> when it has a conflicting request. There's no implementation yet, but
>> the simplicity and similarity to hardware techniques may prove useful.
>> Anyway, could you post something so I can better understand what you've
>>described?
>
>
>My implementation effectively does what you're describing: The DRAMCtrl
>now has a finite number of buffers (i.e. tokens), and it allocates a
>buffer slot when a request is received (senders spend a token when the
>DRAMCtrl accepts a request). The only real difference is that the
>DRAMCtrl now implements a SlavePort with flow control consistent with the
>rest of gem5, so if there are no buffer slots available, the request is
>nacked and a retry must be sent (i.e. a token is returned).
>
>
>Please don't get rid of the Queued*Ports, as I think there is a simple way
>> to improve them to do efficient flow control.
>>
>
>Heh... not sure I have the time/motivation to remove the Queued*Ports
>myself. I've just been swapping out the Queued*Ports that break when
>trying to implement finite buffering in a Ruby memory hierarchy. I'll
>leave Queued*Ports for later fixing or removal, as appropriate.
>
>
> Joel
>
>
>________________________________________
>> From: gem5-dev
>><gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>> on behalf
>>of Joel Hestness
>> < ***@gmail.com<mailto:***@gmail.com>>
>> Sent: Friday, February 5, 2016 12:03 PM
>> To: Andreas Hansson
>> Cc: gem5 Developer List
>> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
>> DRAMCtrl
>>
>> Hi guys,
>> Quick updates on this:
>> 1) I have a finite response buffer implementation working. I
>> removed the QueuedSlavePort and added a response queue with reservation
>>(Andreas'
>> underlying suggestion). I have a question with this solution: The
>> QueuedSlavePort prioritized responses based their scheduled response
>>time.
>> However, since writes have a shorter pipeline from request to
>> response, this architecture prioritized write requests ahead of read
>> requests received earlier, and it performs ~1-8% worse than a strict
>> queue (what I've implemented at this point). I can make the response
>> queue a priority queue if we want the same structure as previously,
>> but I'm wondering if we might prefer to just have the better-performing
>>strict queue.
>>
>> 2) To reflect on Andreas' specific suggestion of using unblock
>> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
>> callbacks is ugly when trying to call the callback: The call needs to
>> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
>> doesn't have a pointer to the owner component; The SlavePort has the
>> pointer, so the PacketQueue would need to first callback to the port,
>> which would call the owner component callback.
>> The exercise getting this to work has solidified my opinion that the
>> Queued*Ports should probably be removed from the codebase: Queues and
>> ports are separate subcomponents of simulated components, and only the
>> component knows how they should interact. Including a Queued*Port
>> inside a component requires the component to manage the flow-control
>> into the Queued*Port just as it would need to manage a standard port
>> anyway, and hiding the queue in the port obfuscates how it is managed.
>>
>>
>> Thanks!
>> Joel
>>
>>
>> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> wrote:
>>
>> > Hi Andreas,
>> > Thanks for the input. I had tried adding front- and back-end
>> > queues within the DRAMCtrl, but it became very difficult to
>> > propagate the flow control back through the component due to the
>> > complicated implementation
>> of
>> > timing across different accessAndRespond() calls. I had to put this
>> > solution on hold.
>> >
>> > I think your proposed solution should simplify the flow control
>> > issue, and should have the derivative effect of making the
>> > Queued*Ports capable
>> of
>> > flow control. I'm a little concerned that your solution would make
>> > the buffering very fluid, and I'm not sufficiently familiar with
>> > memory controller microarchitecture to know if that would be
>> > realistic. I wonder if you might have a way to do performance
>> > validation after I work through either of these implementations.
>> >
>> > Thanks!
>> > Joel
>> >
>> >
>> >
>> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
>> ***@arm.com<mailto:***@arm.com>>
>> > wrote:
>> >
>> >> Hi Joel,
>> >>
>> >> I would suggest o keep the queued ports, but add methods to reserve
>> >> resources, query if it has free space, and a way to register
>> >> callbacks
>> so
>> >> that the MemObject is made aware when packets are sent. That way we
>> >> can
>> use
>> >> the queue in the cache, memory controller etc, without having all
>> >> the issues of the “naked” port interface, but still enforcing a
>> >> bounded
>> queue.
>> >>
>> >> When a packet arrives to the module we call reserve on the output
>>port.
>> >> Then when we actually add the packet we know that there is space.
>> >> When request packets arrive we check if the queue is full, and if
>> >> so we block any new requests. Then through the callback we can
>> >> unblock the DRAM controller in this case.
>> >>
>> >> What do you think?
>> >>
>> >> Andreas
>> >>
>> >> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >> Date: Tuesday, 2 February 2016 at 00:24
>> >> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >> Cc: gem5 Developer List <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
>> >>
>> >> Hi Andreas,
>> >> I'd like to circle back on the thread about removing the
>> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
>> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
>> >> all of my simulations now crash on the DRAMCtrl's response queue.
>> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
>> >> However, based on my inspection of the code, it looks pretty
>> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
>> >> least help me work through the changes.
>> >>
>> >> To reproduce the issue, I've put together a slim gem5 patch
>> >> (attached) to use the memtest.py script to generate accesses.
>> >> Here's the command
>> line
>> >> I used:
>> >>
>> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
>> >> configs/example/memtest.py -u 100
>> >>
>> >> If you're still willing to take a stab at it, let me know if/how
>> >> I can help. Otherwise, I'll start working on it. It seems the
>> >> trickiest thing
>> is
>> >> going to be modeling the arbitrary frontendLatency and
>> >> backendLatency
>> while
>> >> still counting all of the accesses that are in the controller when
>> >> it
>> needs
>> >> to block back to the input queue. These latencies are currently
>> >> assessed with scheduling in the port response queue. Any
>> >> suggestions you could
>> give
>> >> would be appreciated.
>> >>
>> >> Thanks!
>> >> Joel
>> >>
>> >>
>> >> Below here is our conversation from the email thread "[gem5-dev]
>> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
>> >>
>> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
>> ***@arm.com<mailto:***@arm.com>
>> >> > wrote:
>> >>
>> >>> Great. Thanks Joel.
>> >>>
>> >>> If anything pops up on our side I’ll let you know.
>> >>>
>> >>> Andreas
>> >>>
>> >>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>> Date: Wednesday, 23 September 2015 20:29
>> >>>
>> >>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>> RubyMemoryControl delete requests
>> >>>
>> >>>
>> >>>
>> >>>> I don’t think there is any big difference in our expectations,
>> >>>> quite the contrary :-). GPUs are very important to us (and so is
>> >>>> throughput computing in general), and we run plenty simulations
>> >>>> with lots of memory-level parallelism from non-CPU components.
>> >>>> Still, we haven’t
>> run
>> >>>> into the issue.
>> >>>>
>> >>>
>> >>> Ok, cool. Thanks for the context.
>> >>>
>> >>>
>> >>> If you have practical examples that run into problems let me know,
>> >>> and
>> >>>> we’ll get it fixed.
>> >>>>
>> >>>
>> >>> I'm having trouble assembling a practical example (with or without
>> using
>> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
>> >>>
>> >>> Thanks!
>> >>> Joel
>> >>>
>> >>>
>> >>>
>> >>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>> Date: Tuesday, 22 September 2015 19:58
>> >>>>
>> >>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>> RubyMemoryControl delete requests
>> >>>>
>> >>>> Hi Andreas,
>> >>>>
>> >>>>
>> >>>>> If it is a real problem affecting end users I am indeed
>> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
>> >>>>> classic memory
>> system
>> >>>>> there are enough points of regulation (LSQs, MSHR limits,
>> >>>>> crossbar
>> layers
>> >>>>> etc) that having a single memory channel with >100 queued up
>> responses
>> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
>> >>>>> added complexity has not been needed. If there is regulation on
>> >>>>> the number
>> of
>> >>>>> requests in Ruby, then I would argue that it is equally unlikely
>> there…I
>> >>>>> could be wrong.
>> >>>>>
>> >>>>
>> >>>> Ok. I think a big part of the difference between our expectations
>> >>>> is just the cores that we're modeling. AMD and gem5-gpu can model
>> aggressive
>> >>>> GPU cores with potential to expose, perhaps, 4-32x more
>> >>>> memory-level parallel requests than a comparable number of
>> >>>> multithreaded CPU
>> cores. I
>> >>>> feel that this difference warrants different handling of accesses
>> >>>> in
>> the
>> >>>> memory controller.
>> >>>>
>> >>>> Joel
>> >>>>
>> >>>>
>> >>>>
>> >>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>>> Date: Tuesday, 22 September 2015 17:48
>> >>>>>
>> >>>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>>> RubyMemoryControl delete requests
>> >>>>>
>> >>>>> Hi Andreas,
>> >>>>>
>> >>>>> Thanks for the "ship it!"
>> >>>>>
>> >>>>>
>> >>>>>> Do we really need to remove the use of QueuedSlavePort in
>>DRAMCtrl?
>> >>>>>> It will make the controller more complex, and I don’t want to
>> >>>>>> do it
>> “just
>> >>>>>> in case”.
>> >>>>>>
>> >>>>>
>> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
>> >>>>> I'm not sure who should make that change, but I think it should
>> >>>>> get
>> done. The
>> >>>>> memory access response path starts at the DRAMCtrl and ends at
>> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
>> >>>>> will
>> probably
>> >>>>> back-up more quickly on the response path back to where there
>> >>>>> are
>> open
>> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
>> prevalent as
>> >>>>> Ruby adds flow control, unless we add a limitation on
>> >>>>> outstanding
>> requests
>> >>>>> to memory from directory controllers.
>> >>>>>
>> >>>>> How does the classic memory model deal with this?
>> >>>>>
>> >>>>> Joel
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>>>> Date: Tuesday, 22 September 2015 17:30
>> >>>>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>>>>
>> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>>>> RubyMemoryControl delete requests
>> >>>>>>
>> >>>>>> Hi guys,
>> >>>>>> Thanks for the discussion here. I had quickly tested other
>> >>>>>> memory controllers, but hadn't connected the dots that this
>> >>>>>> might be the
>> same
>> >>>>>> problem Brad/AMD are running into.
>> >>>>>>
>> >>>>>> My preference would be that we remove the QueuedSlavePort
>> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
>> >>>>>> a potential
>> source of
>> >>>>>> the QueueSlavePort packet overflows, and would allow us to more
>> closely
>> >>>>>> focus on the RubyPort problem when we get to it.
>> >>>>>>
>> >>>>>> Can we reach resolution on this patch though? Are we okay
>> >>>>>> with actually fixing the memory leak in mainline?
>> >>>>>>
>> >>>>>> Joel
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>> >>>>>> ***@arm.com<mailto:***@arm.com>> wrote:
>> >>>>>>
>> >>>>>>> Hi Brad,
>> >>>>>>>
>> >>>>>>> We can remove the use of QueuedSlavePort in the memory
>> >>>>>>> controller
>> and
>> >>>>>>> simply not accept requests if the response queue is full. Is
>> >>>>>>> this needed?
>> >>>>>>> If so we’ll make sure someone gets this in place. The only
>> >>>>>>> reason
>> we
>> >>>>>>> haven’t done it is because it hasn’t been needed.
>> >>>>>>>
>> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
>> >>>>>>> different story. I think most of these can be removed and
>> >>>>>>> actually use flow control.
>> I’m
>> >>>>>>> happy to code it up, but there is such a flux at the moment
>> >>>>>>> that I didn’t want to post yet another patch changing the Ruby
>> >>>>>>> port. I really do think we should avoid having implicit
>> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
>> >>>>>>> possible. If we really need a constructor parameter to make it
>> >>>>>>> “infinite” for some quirky Ruby use-case, then let’s do that...
>> >>>>>>>
>> >>>>>>> Andreas
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>> >>>>>>> <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org> on
>>behalf of ***@amd.com<mailto:***@amd.com>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> >From AMD's perspective, we have deprecated our usage of
>> >>>>>>> RubyMemoryControl
>> >>>>>>> >and we are using the new Memory Controllers with the port
>> interface.
>> >>>>>>> >
>> >>>>>>> >That being said, I completely agree with Joel that the packet
>> queue
>> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
>> >>>>>>> >know, we
>> >>>>>>> tried
>> >>>>>>> >very hard several months ago to essentially make this a
>> >>>>>>> >infinite
>> >>>>>>> buffer,
>> >>>>>>> >but Andreas would not allow us to check it in. We are going
>> >>>>>>> >to
>> >>>>>>> post that
>> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
>> >>>>>>> >GPU
>> >>>>>>> model
>> >>>>>>> >will not work unless we increase that limit.
>> >>>>>>> >
>> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
>> >>>>>>> something is
>> >>>>>>> >fundamentally broken. Please keep in mind that there are
>> >>>>>>> >many
>> uses
>> >>>>>>> of
>> >>>>>>> >gem5 beyond what you use it for. Also this is a research
>> simulator
>> >>>>>>> and
>> >>>>>>> >we should not restrict ourselves to what we think is
>> >>>>>>> >practical in
>> >>>>>>> real
>> >>>>>>> >hardware. Finally, the fact that the finite limit is
>> >>>>>>> >invisible to
>> >>>>>>> the
>> >>>>>>> >producer is just bad software engineering.
>> >>>>>>> >
>> >>>>>>> >I beg you to please allow us to remove this finite invisible
>> limit!
>> >>>>>>> >
>> >>>>>>> >Brad
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >-----Original Message-----
>> >>>>>>> >From: gem5-dev
>>[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On
>>Behalf
>> >>>>>>> >Of
>> >>>>>>> Andreas
>> >>>>>>> >Hansson
>> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
>> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> RubyMemoryControl
>> >>>>>>> >delete requests
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather? Has
>> >>>>>>> >> > it
>> >>>>>>> not been
>> >>>>>>> >>deprecated long enough?
>> >>>>>>> >>
>> >>>>>>> >> Joel Hestness wrote:
>> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
>> >>>>>>> >> don't
>> see
>> >>>>>>> other
>> >>>>>>> >>memory controllers that are guaranteed to work. Besides
>> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
>> >>>>>>> >>their
>> input
>> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>> >>>>>>> PacketQueue
>> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
>> >>>>>>> >>This
>> >>>>>>> occurs
>> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
>> >>>>>>> >>and
>> >>>>>>> appears
>> >>>>>>> >>difficult to fix in a systematic way.
>> >>>>>>> >>
>> >>>>>>> >> Regardless of the fact we've deprecated
>> >>>>>>> >> RubyMemoryControl,
>> >>>>>>> this is
>> >>>>>>> >>a necessary fix.
>> >>>>>>> >
>> >>>>>>> >No memory controller is using QueuedSlaavePort for any
>> >>>>>>> >_input_
>> >>>>>>> queues.
>> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
>> >>>>>>> >that's
>> >>>>>>> all.
>> >>>>>>> >If that is really an issue we can move away from it and
>> >>>>>>> >enfore an
>> >>>>>>> upper
>> >>>>>>> >bound on responses by not accepting new requests. That said,
>> >>>>>>> >if we
>> >>>>>>> hit
>> >>>>>>> >the limit I would argue something else is fundamentally
>> >>>>>>> >broken in
>> >>>>>>> the
>> >>>>>>> >system and should be addressed.
>> >>>>>>> >
>> >>>>>>> >In any case, the discussion whether to remove
>> >>>>>>> >RubyMemoryControl or
>> >>>>>>> not
>> >>>>>>> >should be completely decoupled.
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >- Andreas
>> >>>>>>>
>> >>>>>>
>> >>
>> >> --
>> >> Joel Hestness
>> >> PhD Candidate, Computer Architecture
>> >> Dept. of Computer Science, University of Wisconsin - Madison
>> >> http://pages.cs.wisc.edu/~hestness/
>> >> IMPORTANT NOTICE: The contents of this email and any attachments
>> >> are confidential and may also be privileged. If you are not the
>> >> intended recipient, please notify the sender immediately and do not
>> >> disclose the contents to any other person, use it for any purpose,
>> >> or store or copy
>> the
>> >> information in any medium. Thank you.
>> >>
>> >
>> >
>> >
>> > --
>> > Joel Hestness
>> > PhD Candidate, Computer Architecture
>> > Dept. of Computer Science, University of Wisconsin - Madison
>> > http://pages.cs.wisc.edu/~hestness/
>> >
>>
>>
>>
>> --
>> Joel Hestness
>> PhD Candidate, Computer Architecture
>> Dept. of Computer Science, University of Wisconsin - Madison
>> http://pages.cs.wisc.edu/~hestness/
>> _______________________________________________
>> gem5-dev mailing list
>> gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>http://m5sim.org/mailman/listinfo/gem5-dev
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>http://m5sim.org/mailman/listinfo/gem5-dev
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org
>http://m5sim.org/mailman/listinfo/gem5-dev

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Poremba, Matthew
2016-03-01 01:20:38 UTC
Permalink
Hi Andreas,


Sure. I did not have any plans to implement any types of flow control other than what I have provided in the posted patch, but I am simply stating it should be possible if someone were to use gem5 for that kind of research.

What I have implemented is closest to hop-to-hop credit based in your terms. One thing I would like to accomplish with this is the ability for the master mem object to decide how to operate given information about slave buffer availability. This can be used for packet prioritization, throttling, etc. based on information available at the directory, especially in Ruby between directory and DRAM, but requires something like a credit/token-based scheme.

I am currently only making changes to very specific mem objects to test this. I don't plan on touching the much classic memory system besides what is needed for Ruby. Ruby's flow control is very much credit-based already, so an "end-to-end" credit system is just a matter of changing interfaces between Ruby and classic. I've also tried to setup the configuration so that it would not impact external interfaces such as SystemC TLM 2 so much.

For speed/complexity, I don't see much of a change in simulation time since the event is simply moved from sending a retry to returning a token. Complexity wise, we can actually reduce common code by moving it into the flow controller and out of the mem object. We may have to ensure having something called "flow control" with a few lines in some mem objects isn't confusing to users, though.


-Matt


-----Original Message-----
From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Andreas Hansson
Sent: Monday, February 29, 2016 1:56 PM
To: gem5 Developer List; Joel Hestness
Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl

Hi Matt,

Can we hit the pause button please?

Your proposed changes all look sensible in terms of allowing different types of flow control, but I would really like to understand _what_ we are trying to accomplish here, and what trade-offs in terms of speed and complexity we are willing to make.

I fundamentally do not think it is wise to try and support all options (hop-to-hop valid/accept, hop-to-hop credit based, end-to-end credit based etc). Is your proposal to switch both classic and Ruby to end-to-end credit based? Fixed credits or opportunistic as well? Etc etc.

At the moment the classic memory system (and the port infrastructure) is resembling AMBA ACE, in that it uses hop-to-hop valid/accept, with separate flow control for reads and writes, snoops and snoop responses. It is also worth noting that the current port flow control is fairly well aligned with SystemC TLM 2. Any changes need to be considered in this context.

The patch that Joel posted for adding flow-control within the memory controller is a clear win and we should get that in once it has converged.
Any news on that front Joel?

Thanks,

Andreas

On 29/02/2016, 18:50, "gem5-dev on behalf of Poremba, Matthew"
<gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:

>Hi Joel/All,
>
>
>I have my proposed changes up on the reviewboard finally. I’ve tried to
>make the flow control API generic enough that any of the common flow
>control types could be implemented (i.e., ack/nack [retries],
>token-based, credit-based, Xon/Xoff, etc.), but there are still a few
>conflicts between our implementations. Specifically, the
>serviceMemoryQueue method in AbstractController is handled differently
>by different flow controls, but SLICC changes are the same as yours.
>You can see the patches here (they’ll need to be applied in the order
>below for anyone interested):
>
>http://reviews.gem5.org/r/3354/
>http://reviews.gem5.org/r/3355/
>http://reviews.gem5.org/r/3356/
>
>I am still working on making the flow control more easily configurable
>within python scripts. Currently this is possible but seems to require
>passing the command line options all over the place. I’m also working
>on something to pass flow control calls through connector type
>MemObjects (Xbar, Bridge, CommMonitor), which will make the potential
>for QoS capability much more interesting (for example, flow control can
>monitor MasterIDs and do prioritization/throttling/balancing).
>
>
>-Matt
>
>From: Joel Hestness [mailto:***@gmail.com]
>Sent: Saturday, February 13, 2016 9:07 AM
>To: Poremba, Matthew
>Cc: gem5 Developer List; Gross, Joe
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
>DRAMCtrl
>
>Hi Matt,
>
>‘That said, I realize that by "token" structure, Joe and you might be
>describing something more than what I've implemented. Namely, since
>tokens are the credits that allow senders to push into a receiver's
>queues, they might allow multiple directories/caches sending to a
>single DRAMCtrl, which I don't believe is possible with my current
>implementation. I think we'd need to allow the DRAMCtrl to receive
>requests and queue retries while other requesters are blocked, and
>sending those retries would need fair arbitration, which a token scheme
>might automatically handle. Can you clarify if that's what you're
>referring to as a token scheme?’
>
>A token scheme would not use a retry/unblock mechanisms at all. The
>number of tokens available is sent to each producer from a consumer
>when the ports are connected/start of simulation. In this regard, the
>producers know how many requests can be sent and stop sending once the
>tokens are exhausted. The consumer will return tokens once a request is
>handled. This removes the need for retries and unblock calls, reduces
>overall complexity, and is closer to hardware implementations imo. The
>token scheme would indeed automatically handle the situation where
>multiple producers are blocked and can also be hidden away in the port
>without needing to add a retry queue to consumers, which I don’t
>believe is a great idea.
>
>
>Ok. Yes, that makes sense. I look forward to seeing your changes.
>
>
> Thanks!
> Joel
>
>
>From: Joel Hestness
>[mailto:***@gmail.com<mailto:***@gmail.com>]
>Sent: Thursday, February 11, 2016 2:52 PM
>To: gem5 Developer List
>Cc: Gross, Joe; Poremba, Matthew
>
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
>DRAMCtrl
>
>Hi Matt,
>
>In regards to the buffersFull() implementation, I can think of a
>pathological case where the back-end queue is full because the sender
>is not accepting responses (for whatever reason) but is still issuing
>requests. buffersFull() will return false in this case and allow the
>request to be enqueued and eventually scheduled, causing the back-end
>queue to grow larger than the response_buffer_size parameter.
>
>Perhaps one way to better emulate exchanging tokens (credit) as Joe
>mentioned is to have buffersFull() "reserve" slots in the queues by
>making sure there is a slot in both the read queue (or write queue) and
>a corresponding slot available in the back-end queue. The reservation
>can be lifted once the response is sent on the port.
>
>I'm not sure I understand the difference between this description and
>what I've implemented, except that what I've implemented adds some
>extra back-end queuing. The capacity of the back-end queue in my
>implementation is equal to the sum of the read and write queue
>capacities (plus a little
>extra: response_buffer_size). The reservation of a slot in this large
>back-end queue is released when a response is sent through the port, as
>you describe. To me, this seems exactly the way a token-like structure
>would reserve back-end queue slots.
>
>That said, I realize that by "token" structure, Joe and you might be
>describing something more than what I've implemented. Namely, since
>tokens are the credits that allow senders to push into a receiver's
>queues, they might allow multiple directories/caches sending to a
>single DRAMCtrl, which I don't believe is possible with my current
>implementation. I think we'd need to allow the DRAMCtrl to receive
>requests and queue retries while other requesters are blocked, and
>sending those retries would need fair arbitration, which a token scheme
>might automatically handle. Can you clarify if that's what you're
>referring to as a token scheme?
>
>
>Another more aggressive implementation would be to not use
>buffersFull() and prevent scheduling memory requests from the
>read/write queue if the back-end queue is full. This would allow a
>sender to enqueue memory requests even if the back-end queue is full up
>until the read/write queue fills up, but would require a number of changes to the code.
>
>Yes, I tried implementing this first, and it ends up being very
>difficult due to the DRAMCtrl's calls to and implementation of accessAndRespond().
>Basically, reads and writes require different processing latencies, so
>we would need not only a back-end queue, but also separate read and
>write delay queues to model the different DRAM access latencies. We'd
>also need a well-performing way to arbitrate for slots in the back-end
>queue that doesn't conflict with the batching efforts of the front-end.
>To me, all this complexity seems maligned with Andreas et al.'s
>original aim with the DRAMCtrl: fast and reasonably accurate simulation
>of a memory controller<http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.
>
>
>In regards to Ruby, I am a bit curious- Are you placing MessageBuffers
>in the SLICC files and doing away with the
>queueMemoryRead/queueMemoryWrite calls or are you placing a
>MessageBuffer in AbstractController? I am currently trying out an
>implementation using the former for a few additional reasons other than flow control.
>
>If I understand what you're asking, I think I've also done the former,
>though I've modified SLICC and the AbstractController to deal with
>parts of the buffer management. I've merged my code with a recent gem5
>revision
>(11315:10647f5d0f7f) so I could post a draft review request. Here are
>the patches (including links) to test all of this:
>
> - http://reviews.gem5.org/r/3331/
> - http://reviews.gem5.org/r/3332/
> -
>http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queue
>s
> - http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand
>
>More holistically, I feel that the best solution would be to hide the
>memory request and response queues in an AbstractDirectoryController
>class that inherits from AbstractController in C++, and from which all
>SLICC directory controller machines descend. This structure would move
>all the directory-specific code out of AbstractController and not model
>it in other SLICC generated machines. This would also eliminate the
>need for assertions that only directory controllers are calling the
>directory-specific functions.
>
>
> Joel
>
>
>-----Original Message-----
>From: gem5-dev
>[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On
>Behalf Of Joel Hestness
>Sent: Monday, February 08, 2016 12:16 PM
>To: Gross, Joe
>Cc: gem5 Developer List
>Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
>DRAMCtrl
>
>Hi guys,
> I just posted a draft of my DRAMCtrl flow-control patch so you can
>take a look here: http://reviews.gem5.org/r/3315/
>
> NOTE: I have a separate patch that changes Ruby's QueuedMasterPort
>from directories to memory controllers into a MasterPort, and it places
>a MessageBuffer in front of the MasterPort, so that the user can make
>all buffering finite within a Ruby memory hierarchy. I still need to
>merge this patch with gem5, before I can share it. Let me know if you'd
>like to see the draft there also.
>
>@Joe:
>
>
>> I'd be curious to see a patch of what you're proposing as I'm not
>>sure I really follow what you're doing. The reason I ask is because I
>>have been discussing an implementation with with Brad and would like
>>to see how similar it is to what you have. Namely it's an idea
>>similar to what is commonly used in hardware, where senders have
>>tokens that correspond to slots in the receiver queue so the
>>reservation happens at startup. The only communication that goes from
>>a receiving port back to a sender is token return. The port and queue
>>would still be coupled and the device which owns the Queued*Port
>>would manage removal from the PacketQueue. In my experience, this is
>>a very effective mechanism for flow control and addresses your point
>>about transparency of the queue and its state.
>> The tokens removes the need for unblock callbacks, but it's the
>>responsibility of the receiver not to send when the queue is full or
>>when it has a conflicting request. There's no implementation yet, but
>>the simplicity and similarity to hardware techniques may prove useful.
>> Anyway, could you post something so I can better understand what
>>you've described?
>
>
>My implementation effectively does what you're describing: The DRAMCtrl
>now has a finite number of buffers (i.e. tokens), and it allocates a
>buffer slot when a request is received (senders spend a token when the
>DRAMCtrl accepts a request). The only real difference is that the
>DRAMCtrl now implements a SlavePort with flow control consistent with
>the rest of gem5, so if there are no buffer slots available, the
>request is nacked and a retry must be sent (i.e. a token is returned).
>
>
>Please don't get rid of the Queued*Ports, as I think there is a simple
>way
>> to improve them to do efficient flow control.
>>
>
>Heh... not sure I have the time/motivation to remove the Queued*Ports
>myself. I've just been swapping out the Queued*Ports that break when
>trying to implement finite buffering in a Ruby memory hierarchy. I'll
>leave Queued*Ports for later fixing or removal, as appropriate.
>
>
> Joel
>
>
>________________________________________
>> From: gem5-dev
>><gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>> on
>>behalf of Joel Hestness <
>>***@gmail.com<mailto:***@gmail.com>>
>> Sent: Friday, February 5, 2016 12:03 PM
>> To: Andreas Hansson
>> Cc: gem5 Developer List
>> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
>>DRAMCtrl
>>
>> Hi guys,
>> Quick updates on this:
>> 1) I have a finite response buffer implementation working. I
>>removed the QueuedSlavePort and added a response queue with
>>reservation (Andreas'
>> underlying suggestion). I have a question with this solution: The
>>QueuedSlavePort prioritized responses based their scheduled response
>>time.
>> However, since writes have a shorter pipeline from request to
>>response, this architecture prioritized write requests ahead of read
>>requests received earlier, and it performs ~1-8% worse than a strict
>>queue (what I've implemented at this point). I can make the response
>>queue a priority queue if we want the same structure as previously,
>>but I'm wondering if we might prefer to just have the
>>better-performing strict queue.
>>
>> 2) To reflect on Andreas' specific suggestion of using unblock
>> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
>> callbacks is ugly when trying to call the callback: The call needs to
>> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
>> doesn't have a pointer to the owner component; The SlavePort has the
>> pointer, so the PacketQueue would need to first callback to the port,
>> which would call the owner component callback.
>> The exercise getting this to work has solidified my opinion that
>> the Queued*Ports should probably be removed from the codebase: Queues
>> and ports are separate subcomponents of simulated components, and
>> only the component knows how they should interact. Including a
>> Queued*Port inside a component requires the component to manage the
>> flow-control into the Queued*Port just as it would need to manage a
>> standard port anyway, and hiding the queue in the port obfuscates how it is managed.
>>
>>
>> Thanks!
>> Joel
>>
>>
>> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> wrote:
>>
>> > Hi Andreas,
>> > Thanks for the input. I had tried adding front- and back-end
>> > queues within the DRAMCtrl, but it became very difficult to
>> > propagate the flow control back through the component due to the
>> > complicated implementation
>> of
>> > timing across different accessAndRespond() calls. I had to put this
>> > solution on hold.
>> >
>> > I think your proposed solution should simplify the flow control
>> > issue, and should have the derivative effect of making the
>> > Queued*Ports capable
>> of
>> > flow control. I'm a little concerned that your solution would make
>> > the buffering very fluid, and I'm not sufficiently familiar with
>> > memory controller microarchitecture to know if that would be
>> > realistic. I wonder if you might have a way to do performance
>> > validation after I work through either of these implementations.
>> >
>> > Thanks!
>> > Joel
>> >
>> >
>> >
>> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
>> ***@arm.com<mailto:***@arm.com>>
>> > wrote:
>> >
>> >> Hi Joel,
>> >>
>> >> I would suggest o keep the queued ports, but add methods to
>> >> reserve resources, query if it has free space, and a way to
>> >> register callbacks
>> so
>> >> that the MemObject is made aware when packets are sent. That way
>> >> we can
>> use
>> >> the queue in the cache, memory controller etc, without having all
>> >> the issues of the “naked” port interface, but still enforcing a
>> >> bounded
>> queue.
>> >>
>> >> When a packet arrives to the module we call reserve on the output
>>port.
>> >> Then when we actually add the packet we know that there is space.
>> >> When request packets arrive we check if the queue is full, and if
>> >> so we block any new requests. Then through the callback we can
>> >> unblock the DRAM controller in this case.
>> >>
>> >> What do you think?
>> >>
>> >> Andreas
>> >>
>> >> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >> Date: Tuesday, 2 February 2016 at 00:24
>> >> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >> Cc: gem5 Developer List
>> >> <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
>> >>
>> >> Hi Andreas,
>> >> I'd like to circle back on the thread about removing the
>> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
>> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
>> >> all of my simulations now crash on the DRAMCtrl's response queue.
>> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
>> >> However, based on my inspection of the code, it looks pretty
>> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
>> >> least help me work through the changes.
>> >>
>> >> To reproduce the issue, I've put together a slim gem5 patch
>> >> (attached) to use the memtest.py script to generate accesses.
>> >> Here's the command
>> line
>> >> I used:
>> >>
>> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
>> >> configs/example/memtest.py -u 100
>> >>
>> >> If you're still willing to take a stab at it, let me know if/how
>> >> I can help. Otherwise, I'll start working on it. It seems the
>> >> trickiest thing
>> is
>> >> going to be modeling the arbitrary frontendLatency and
>> >> backendLatency
>> while
>> >> still counting all of the accesses that are in the controller when
>> >> it
>> needs
>> >> to block back to the input queue. These latencies are currently
>> >> assessed with scheduling in the port response queue. Any
>> >> suggestions you could
>> give
>> >> would be appreciated.
>> >>
>> >> Thanks!
>> >> Joel
>> >>
>> >>
>> >> Below here is our conversation from the email thread "[gem5-dev]
>> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
>> >>
>> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
>> ***@arm.com<mailto:***@arm.com>
>> >> > wrote:
>> >>
>> >>> Great. Thanks Joel.
>> >>>
>> >>> If anything pops up on our side I’ll let you know.
>> >>>
>> >>> Andreas
>> >>>
>> >>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>> Date: Wednesday, 23 September 2015 20:29
>> >>>
>> >>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>> RubyMemoryControl delete requests
>> >>>
>> >>>
>> >>>
>> >>>> I don’t think there is any big difference in our expectations,
>> >>>> quite the contrary :-). GPUs are very important to us (and so is
>> >>>> throughput computing in general), and we run plenty simulations
>> >>>> with lots of memory-level parallelism from non-CPU components.
>> >>>> Still, we haven’t
>> run
>> >>>> into the issue.
>> >>>>
>> >>>
>> >>> Ok, cool. Thanks for the context.
>> >>>
>> >>>
>> >>> If you have practical examples that run into problems let me
>> >>> know, and
>> >>>> we’ll get it fixed.
>> >>>>
>> >>>
>> >>> I'm having trouble assembling a practical example (with or
>> >>> without
>> using
>> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
>> >>>
>> >>> Thanks!
>> >>> Joel
>> >>>
>> >>>
>> >>>
>> >>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>> Date: Tuesday, 22 September 2015 19:58
>> >>>>
>> >>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>> RubyMemoryControl delete requests
>> >>>>
>> >>>> Hi Andreas,
>> >>>>
>> >>>>
>> >>>>> If it is a real problem affecting end users I am indeed
>> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
>> >>>>> classic memory
>> system
>> >>>>> there are enough points of regulation (LSQs, MSHR limits,
>> >>>>> crossbar
>> layers
>> >>>>> etc) that having a single memory channel with >100 queued up
>> responses
>> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
>> >>>>> added complexity has not been needed. If there is regulation on
>> >>>>> the number
>> of
>> >>>>> requests in Ruby, then I would argue that it is equally
>> >>>>> unlikely
>> there…I
>> >>>>> could be wrong.
>> >>>>>
>> >>>>
>> >>>> Ok. I think a big part of the difference between our
>> >>>> expectations is just the cores that we're modeling. AMD and
>> >>>> gem5-gpu can model
>> aggressive
>> >>>> GPU cores with potential to expose, perhaps, 4-32x more
>> >>>> memory-level parallel requests than a comparable number of
>> >>>> multithreaded CPU
>> cores. I
>> >>>> feel that this difference warrants different handling of
>> >>>> accesses in
>> the
>> >>>> memory controller.
>> >>>>
>> >>>> Joel
>> >>>>
>> >>>>
>> >>>>
>> >>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>>> Date: Tuesday, 22 September 2015 17:48
>> >>>>>
>> >>>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>>> RubyMemoryControl delete requests
>> >>>>>
>> >>>>> Hi Andreas,
>> >>>>>
>> >>>>> Thanks for the "ship it!"
>> >>>>>
>> >>>>>
>> >>>>>> Do we really need to remove the use of QueuedSlavePort in
>>DRAMCtrl?
>> >>>>>> It will make the controller more complex, and I don’t want to
>> >>>>>> do it
>> “just
>> >>>>>> in case”.
>> >>>>>>
>> >>>>>
>> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
>> >>>>> I'm not sure who should make that change, but I think it should
>> >>>>> get
>> done. The
>> >>>>> memory access response path starts at the DRAMCtrl and ends at
>> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
>> >>>>> will
>> probably
>> >>>>> back-up more quickly on the response path back to where there
>> >>>>> are
>> open
>> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
>> prevalent as
>> >>>>> Ruby adds flow control, unless we add a limitation on
>> >>>>> outstanding
>> requests
>> >>>>> to memory from directory controllers.
>> >>>>>
>> >>>>> How does the classic memory model deal with this?
>> >>>>>
>> >>>>> Joel
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>> From: Joel Hestness
>><***@gmail.com<mailto:***@gmail.com>>
>> >>>>>> Date: Tuesday, 22 September 2015 17:30
>> >>>>>> To: Andreas Hansson
>><***@arm.com<mailto:***@arm.com>>
>> >>>>>> Cc: gem5 Developer List
>><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
>> >>>>>>
>> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> >>>>>> RubyMemoryControl delete requests
>> >>>>>>
>> >>>>>> Hi guys,
>> >>>>>> Thanks for the discussion here. I had quickly tested other
>> >>>>>> memory controllers, but hadn't connected the dots that this
>> >>>>>> might be the
>> same
>> >>>>>> problem Brad/AMD are running into.
>> >>>>>>
>> >>>>>> My preference would be that we remove the QueuedSlavePort
>> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
>> >>>>>> a potential
>> source of
>> >>>>>> the QueueSlavePort packet overflows, and would allow us to
>> >>>>>> more
>> closely
>> >>>>>> focus on the RubyPort problem when we get to it.
>> >>>>>>
>> >>>>>> Can we reach resolution on this patch though? Are we okay
>> >>>>>> with actually fixing the memory leak in mainline?
>> >>>>>>
>> >>>>>> Joel
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
>> >>>>>> ***@arm.com<mailto:***@arm.com>> wrote:
>> >>>>>>
>> >>>>>>> Hi Brad,
>> >>>>>>>
>> >>>>>>> We can remove the use of QueuedSlavePort in the memory
>> >>>>>>> controller
>> and
>> >>>>>>> simply not accept requests if the response queue is full. Is
>> >>>>>>> this needed?
>> >>>>>>> If so we’ll make sure someone gets this in place. The only
>> >>>>>>> reason
>> we
>> >>>>>>> haven’t done it is because it hasn’t been needed.
>> >>>>>>>
>> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
>> >>>>>>> different story. I think most of these can be removed and
>> >>>>>>> actually use flow control.
>> I’m
>> >>>>>>> happy to code it up, but there is such a flux at the moment
>> >>>>>>> that I didn’t want to post yet another patch changing the
>> >>>>>>> Ruby port. I really do think we should avoid having implicit
>> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
>> >>>>>>> possible. If we really need a constructor parameter to make
>> >>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do that...
>> >>>>>>>
>> >>>>>>> Andreas
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
>> >>>>>>> <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>
>> >>>>>>> on
>>behalf of ***@amd.com<mailto:***@amd.com>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> >From AMD's perspective, we have deprecated our usage of
>> >>>>>>> RubyMemoryControl
>> >>>>>>> >and we are using the new Memory Controllers with the port
>> interface.
>> >>>>>>> >
>> >>>>>>> >That being said, I completely agree with Joel that the
>> >>>>>>> >packet
>> queue
>> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
>> >>>>>>> >know, we
>> >>>>>>> tried
>> >>>>>>> >very hard several months ago to essentially make this a
>> >>>>>>> >infinite
>> >>>>>>> buffer,
>> >>>>>>> >but Andreas would not allow us to check it in. We are going
>> >>>>>>> >to
>> >>>>>>> post that
>> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
>> >>>>>>> >GPU
>> >>>>>>> model
>> >>>>>>> >will not work unless we increase that limit.
>> >>>>>>> >
>> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
>> >>>>>>> something is
>> >>>>>>> >fundamentally broken. Please keep in mind that there are
>> >>>>>>> >many
>> uses
>> >>>>>>> of
>> >>>>>>> >gem5 beyond what you use it for. Also this is a research
>> simulator
>> >>>>>>> and
>> >>>>>>> >we should not restrict ourselves to what we think is
>> >>>>>>> >practical in
>> >>>>>>> real
>> >>>>>>> >hardware. Finally, the fact that the finite limit is
>> >>>>>>> >invisible to
>> >>>>>>> the
>> >>>>>>> >producer is just bad software engineering.
>> >>>>>>> >
>> >>>>>>> >I beg you to please allow us to remove this finite invisible
>> limit!
>> >>>>>>> >
>> >>>>>>> >Brad
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >-----Original Message-----
>> >>>>>>> >From: gem5-dev
>>[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>]
>>On Behalf
>> >>>>>>> >Of
>> >>>>>>> Andreas
>> >>>>>>> >Hansson
>> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
>> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
>> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
>> RubyMemoryControl
>> >>>>>>> >delete requests
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
>> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather?
>> >>>>>>> >> > Has it
>> >>>>>>> not been
>> >>>>>>> >>deprecated long enough?
>> >>>>>>> >>
>> >>>>>>> >> Joel Hestness wrote:
>> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
>> >>>>>>> >> don't
>> see
>> >>>>>>> other
>> >>>>>>> >>memory controllers that are guaranteed to work. Besides
>> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
>> >>>>>>> >>their
>> input
>> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
>> >>>>>>> PacketQueue
>> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
>> >>>>>>> >>This
>> >>>>>>> occurs
>> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
>> >>>>>>> >>and
>> >>>>>>> appears
>> >>>>>>> >>difficult to fix in a systematic way.
>> >>>>>>> >>
>> >>>>>>> >> Regardless of the fact we've deprecated
>> >>>>>>> >> RubyMemoryControl,
>> >>>>>>> this is
>> >>>>>>> >>a necessary fix.
>> >>>>>>> >
>> >>>>>>> >No memory controller is using QueuedSlaavePort for any
>> >>>>>>> >_input_
>> >>>>>>> queues.
>> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
>> >>>>>>> >that's
>> >>>>>>> all.
>> >>>>>>> >If that is really an issue we can move away from it and
>> >>>>>>> >enfore an
>> >>>>>>> upper
>> >>>>>>> >bound on responses by not accepting new requests. That said,
>> >>>>>>> >if we
>> >>>>>>> hit
>> >>>>>>> >the limit I would argue something else is fundamentally
>> >>>>>>> >broken in
>> >>>>>>> the
>> >>>>>>> >system and should be addressed.
>> >>>>>>> >
>> >>>>>>> >In any case, the discussion whether to remove
>> >>>>>>> >RubyMemoryControl or
>> >>>>>>> not
>> >>>>>>> >should be completely decoupled.
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >- Andreas
>> >>>>>>>
>> >>>>>>
>> >>
>> >> --
>> >> Joel Hestness
>> >> PhD Candidate, Computer Architecture
>> >> Dept. of Computer Science, University of Wisconsin - Madison
>> >> http://pages.cs.wisc.edu/~hestness/
>> >> IMPORTANT NOTICE: The contents of this email and any attachments
>> >> are confidential and may also be privileged. If you are not the
>> >> intended recipient, please notify the sender immediately and do
>> >> not disclose the contents to any other person, use it for any
>> >> purpose, or store or copy
>> the
>> >> information in any medium. Thank you.
>> >>
>> >
>> >
>> >
>> > --
>> > Joel Hestness
>> > PhD Candidate, Computer Architecture
>> > Dept. of Computer Science, University of Wisconsin - Madison
>> > http://pages.cs.wisc.edu/~hestness/
>> >
>>
>>
>>
>> --
>> Joel Hestness
>> PhD Candidate, Computer Architecture
>> Dept. of Computer Science, University of Wisconsin - Madison
>> http://pages.cs.wisc.edu/~hestness/
>> _______________________________________________
>> gem5-dev mailing list
>> gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>http://m5sim.org/mailman/listinfo/gem5-dev
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org<mailto:gem5-***@gem5.org>
>http://m5sim.org/mailman/listinfo/gem5-dev
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>
>
>
>--
> Joel Hestness
> PhD Candidate, Computer Architecture
> Dept. of Computer Science, University of Wisconsin - Madison
> http://pages.cs.wisc.edu/~hestness/
>_______________________________________________
>gem5-dev mailing list
>gem5-***@gem5.org
>http://m5sim.org/mailman/listinfo/gem5-dev

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
Joel Hestness
2016-03-01 17:33:11 UTC
Permalink
Hi guys,
I haven't had a chance to revise my code per the reviews I've received so
far. It may be after the MICRO deadline before I can get to it. A big part
of posting the patches was so that AMD (Matt) could take a look.

I think I agree with Andreas that it would be good to get on the same
page about the broad aims of flow control to the memory controllers. It
seems like we might be conflating two related changes that are pretty
different in terms of complexity. Specifically, we want (1) flow control to
make DRAMCtrl buffers finite, and it seems AMD wants (2) capability to
enforce some QoS in case a memory controller has multiple requesters (and
they desire token-based flow control).

Here are some notes that haven't been explicitly stated, and a couple
questions that would be helpful for clarification:

1) Currently, in all Ruby protocols, a single directory can only
communicate with a single memory controller. This is important, because
this means Ruby currently doesn't have a need for a QoS mechanism for
multiple requesters to a single memory controller. The intent of my changes
was just to firm up the flow control and remove the infinite buffering
(i.e. not to solve the QoS challenge).

2) My patch: I wasn't sure specifically if the classic memory system
allows multiple requesters to a single memory controller, but I had assumed
I should just try to keep the ack+nack+retry flow control under the
assumption that it would be sufficient to solve the finite buffering aim.
Per Andreas' review, it seems my patch is close to achieving the finite
buffering aim. Correct?

3) QoS: I've looked a bit more at the classic memory system components,
and I'm not sure I understand the need for multiple different mechanisms to
enforce QoS. I might be reading this wrong, but do the xbar components
offer request retry queuing from input ports to blocked output ports? If
so, it seems like there might already be a QoS mechanism for the
ack+nack+retry flow control with multiple requesters issuing to a single
memory controller. Is that true?

4) AMD is interested in token-based flow control for memory controller
QoS and modeling fidelity (hardware often uses token flow control to memory
controllers?). If classic memory system components can offer the
ack+nack+retry flow control QoS without any more DRAMCtrl modification than
my patch, is it necessary or important to model token flow control? If so,
why?
@Matt: Per your reply, I'm not sure I understand why nacks+retries are
insufficient to allow requesters to do internal reordering/prioritization
of further requests. This question would help us understand your views on
the important aspects of the modeling, and where we might fudge to keep the
code simplicity and simulation performance.

@Matt: Also, I'm not sure I understand or agree with your claim that Ruby
currently uses credit-based flow control... It might be useful to clarify
your view.


Thanks!
Joel



On Mon, Feb 29, 2016 at 7:20 PM, Poremba, Matthew <***@amd.com>
wrote:

> Hi Andreas,
>
>
> Sure. I did not have any plans to implement any types of flow control
> other than what I have provided in the posted patch, but I am simply
> stating it should be possible if someone were to use gem5 for that kind of
> research.
>
> What I have implemented is closest to hop-to-hop credit based in your
> terms. One thing I would like to accomplish with this is the ability for
> the master mem object to decide how to operate given information about
> slave buffer availability. This can be used for packet prioritization,
> throttling, etc. based on information available at the directory,
> especially in Ruby between directory and DRAM, but requires something like
> a credit/token-based scheme.
>
> I am currently only making changes to very specific mem objects to test
> this. I don't plan on touching the much classic memory system besides what
> is needed for Ruby. Ruby's flow control is very much credit-based already,
> so an "end-to-end" credit system is just a matter of changing interfaces
> between Ruby and classic. I've also tried to setup the configuration so
> that it would not impact external interfaces such as SystemC TLM 2 so much.
>
> For speed/complexity, I don't see much of a change in simulation time
> since the event is simply moved from sending a retry to returning a token.
> Complexity wise, we can actually reduce common code by moving it into the
> flow controller and out of the mem object. We may have to ensure having
> something called "flow control" with a few lines in some mem objects isn't
> confusing to users, though.
>
>
> -Matt
>
>
> -----Original Message-----
> From: gem5-dev [mailto:gem5-dev-***@gem5.org] On Behalf Of Andreas
> Hansson
> Sent: Monday, February 29, 2016 1:56 PM
> To: gem5 Developer List; Joel Hestness
> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from DRAMCtrl
>
> Hi Matt,
>
> Can we hit the pause button please?
>
> Your proposed changes all look sensible in terms of allowing different
> types of flow control, but I would really like to understand _what_ we are
> trying to accomplish here, and what trade-offs in terms of speed and
> complexity we are willing to make.
>
> I fundamentally do not think it is wise to try and support all options
> (hop-to-hop valid/accept, hop-to-hop credit based, end-to-end credit based
> etc). Is your proposal to switch both classic and Ruby to end-to-end credit
> based? Fixed credits or opportunistic as well? Etc etc.
>
> At the moment the classic memory system (and the port infrastructure) is
> resembling AMBA ACE, in that it uses hop-to-hop valid/accept, with separate
> flow control for reads and writes, snoops and snoop responses. It is also
> worth noting that the current port flow control is fairly well aligned with
> SystemC TLM 2. Any changes need to be considered in this context.
>
> The patch that Joel posted for adding flow-control within the memory
> controller is a clear win and we should get that in once it has converged.
> Any news on that front Joel?
>
> Thanks,
>
> Andreas
>
> On 29/02/2016, 18:50, "gem5-dev on behalf of Poremba, Matthew"
> <gem5-dev-***@gem5.org on behalf of ***@amd.com> wrote:
>
> >Hi Joel/All,
> >
> >
> >I have my proposed changes up on the reviewboard finally. I’ve tried to
> >make the flow control API generic enough that any of the common flow
> >control types could be implemented (i.e., ack/nack [retries],
> >token-based, credit-based, Xon/Xoff, etc.), but there are still a few
> >conflicts between our implementations. Specifically, the
> >serviceMemoryQueue method in AbstractController is handled differently
> >by different flow controls, but SLICC changes are the same as yours.
> >You can see the patches here (they’ll need to be applied in the order
> >below for anyone interested):
> >
> >http://reviews.gem5.org/r/3354/
> >http://reviews.gem5.org/r/3355/
> >http://reviews.gem5.org/r/3356/
> >
> >I am still working on making the flow control more easily configurable
> >within python scripts. Currently this is possible but seems to require
> >passing the command line options all over the place. I’m also working
> >on something to pass flow control calls through connector type
> >MemObjects (Xbar, Bridge, CommMonitor), which will make the potential
> >for QoS capability much more interesting (for example, flow control can
> >monitor MasterIDs and do prioritization/throttling/balancing).
> >
> >
> >-Matt
> >
> >From: Joel Hestness [mailto:***@gmail.com]
> >Sent: Saturday, February 13, 2016 9:07 AM
> >To: Poremba, Matthew
> >Cc: gem5 Developer List; Gross, Joe
> >Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> >DRAMCtrl
> >
> >Hi Matt,
> >
> >‘That said, I realize that by "token" structure, Joe and you might be
> >describing something more than what I've implemented. Namely, since
> >tokens are the credits that allow senders to push into a receiver's
> >queues, they might allow multiple directories/caches sending to a
> >single DRAMCtrl, which I don't believe is possible with my current
> >implementation. I think we'd need to allow the DRAMCtrl to receive
> >requests and queue retries while other requesters are blocked, and
> >sending those retries would need fair arbitration, which a token scheme
> >might automatically handle. Can you clarify if that's what you're
> >referring to as a token scheme?’
> >
> >A token scheme would not use a retry/unblock mechanisms at all. The
> >number of tokens available is sent to each producer from a consumer
> >when the ports are connected/start of simulation. In this regard, the
> >producers know how many requests can be sent and stop sending once the
> >tokens are exhausted. The consumer will return tokens once a request is
> >handled. This removes the need for retries and unblock calls, reduces
> >overall complexity, and is closer to hardware implementations imo. The
> >token scheme would indeed automatically handle the situation where
> >multiple producers are blocked and can also be hidden away in the port
> >without needing to add a retry queue to consumers, which I don’t
> >believe is a great idea.
> >
> >
> >Ok. Yes, that makes sense. I look forward to seeing your changes.
> >
> >
> > Thanks!
> > Joel
> >
> >
> >From: Joel Hestness
> >[mailto:***@gmail.com<mailto:***@gmail.com>]
> >Sent: Thursday, February 11, 2016 2:52 PM
> >To: gem5 Developer List
> >Cc: Gross, Joe; Poremba, Matthew
> >
> >Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> >DRAMCtrl
> >
> >Hi Matt,
> >
> >In regards to the buffersFull() implementation, I can think of a
> >pathological case where the back-end queue is full because the sender
> >is not accepting responses (for whatever reason) but is still issuing
> >requests. buffersFull() will return false in this case and allow the
> >request to be enqueued and eventually scheduled, causing the back-end
> >queue to grow larger than the response_buffer_size parameter.
> >
> >Perhaps one way to better emulate exchanging tokens (credit) as Joe
> >mentioned is to have buffersFull() "reserve" slots in the queues by
> >making sure there is a slot in both the read queue (or write queue) and
> >a corresponding slot available in the back-end queue. The reservation
> >can be lifted once the response is sent on the port.
> >
> >I'm not sure I understand the difference between this description and
> >what I've implemented, except that what I've implemented adds some
> >extra back-end queuing. The capacity of the back-end queue in my
> >implementation is equal to the sum of the read and write queue
> >capacities (plus a little
> >extra: response_buffer_size). The reservation of a slot in this large
> >back-end queue is released when a response is sent through the port, as
> >you describe. To me, this seems exactly the way a token-like structure
> >would reserve back-end queue slots.
> >
> >That said, I realize that by "token" structure, Joe and you might be
> >describing something more than what I've implemented. Namely, since
> >tokens are the credits that allow senders to push into a receiver's
> >queues, they might allow multiple directories/caches sending to a
> >single DRAMCtrl, which I don't believe is possible with my current
> >implementation. I think we'd need to allow the DRAMCtrl to receive
> >requests and queue retries while other requesters are blocked, and
> >sending those retries would need fair arbitration, which a token scheme
> >might automatically handle. Can you clarify if that's what you're
> >referring to as a token scheme?
> >
> >
> >Another more aggressive implementation would be to not use
> >buffersFull() and prevent scheduling memory requests from the
> >read/write queue if the back-end queue is full. This would allow a
> >sender to enqueue memory requests even if the back-end queue is full up
> >until the read/write queue fills up, but would require a number of
> changes to the code.
> >
> >Yes, I tried implementing this first, and it ends up being very
> >difficult due to the DRAMCtrl's calls to and implementation of
> accessAndRespond().
> >Basically, reads and writes require different processing latencies, so
> >we would need not only a back-end queue, but also separate read and
> >write delay queues to model the different DRAM access latencies. We'd
> >also need a well-performing way to arbitrate for slots in the back-end
> >queue that doesn't conflict with the batching efforts of the front-end.
> >To me, all this complexity seems maligned with Andreas et al.'s
> >original aim with the DRAMCtrl: fast and reasonably accurate simulation
> >of a memory controller<
> http://web.eecs.umich.edu/~twenisch/papers/ispass14.pdf>.
> >
> >
> >In regards to Ruby, I am a bit curious- Are you placing MessageBuffers
> >in the SLICC files and doing away with the
> >queueMemoryRead/queueMemoryWrite calls or are you placing a
> >MessageBuffer in AbstractController? I am currently trying out an
> >implementation using the former for a few additional reasons other than
> flow control.
> >
> >If I understand what you're asking, I think I've also done the former,
> >though I've modified SLICC and the AbstractController to deal with
> >parts of the buffer management. I've merged my code with a recent gem5
> >revision
> >(11315:10647f5d0f7f) so I could post a draft review request. Here are
> >the patches (including links) to test all of this:
> >
> > - http://reviews.gem5.org/r/3331/
> > - http://reviews.gem5.org/r/3332/
> > -
> >http://pages.cs.wisc.edu/~hestness/links/MOESI_hammer_test_finite_queue
> >s
> > - http://pages.cs.wisc.edu/~hestness/links/cpu_memory_demand
> >
> >More holistically, I feel that the best solution would be to hide the
> >memory request and response queues in an AbstractDirectoryController
> >class that inherits from AbstractController in C++, and from which all
> >SLICC directory controller machines descend. This structure would move
> >all the directory-specific code out of AbstractController and not model
> >it in other SLICC generated machines. This would also eliminate the
> >need for assertions that only directory controllers are calling the
> >directory-specific functions.
> >
> >
> > Joel
> >
> >
> >-----Original Message-----
> >From: gem5-dev
> >[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>] On
> >Behalf Of Joel Hestness
> >Sent: Monday, February 08, 2016 12:16 PM
> >To: Gross, Joe
> >Cc: gem5 Developer List
> >Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> >DRAMCtrl
> >
> >Hi guys,
> > I just posted a draft of my DRAMCtrl flow-control patch so you can
> >take a look here: http://reviews.gem5.org/r/3315/
> >
> > NOTE: I have a separate patch that changes Ruby's QueuedMasterPort
> >from directories to memory controllers into a MasterPort, and it places
> >a MessageBuffer in front of the MasterPort, so that the user can make
> >all buffering finite within a Ruby memory hierarchy. I still need to
> >merge this patch with gem5, before I can share it. Let me know if you'd
> >like to see the draft there also.
> >
> >@Joe:
> >
> >
> >> I'd be curious to see a patch of what you're proposing as I'm not
> >>sure I really follow what you're doing. The reason I ask is because I
> >>have been discussing an implementation with with Brad and would like
> >>to see how similar it is to what you have. Namely it's an idea
> >>similar to what is commonly used in hardware, where senders have
> >>tokens that correspond to slots in the receiver queue so the
> >>reservation happens at startup. The only communication that goes from
> >>a receiving port back to a sender is token return. The port and queue
> >>would still be coupled and the device which owns the Queued*Port
> >>would manage removal from the PacketQueue. In my experience, this is
> >>a very effective mechanism for flow control and addresses your point
> >>about transparency of the queue and its state.
> >> The tokens removes the need for unblock callbacks, but it's the
> >>responsibility of the receiver not to send when the queue is full or
> >>when it has a conflicting request. There's no implementation yet, but
> >>the simplicity and similarity to hardware techniques may prove useful.
> >> Anyway, could you post something so I can better understand what
> >>you've described?
> >
> >
> >My implementation effectively does what you're describing: The DRAMCtrl
> >now has a finite number of buffers (i.e. tokens), and it allocates a
> >buffer slot when a request is received (senders spend a token when the
> >DRAMCtrl accepts a request). The only real difference is that the
> >DRAMCtrl now implements a SlavePort with flow control consistent with
> >the rest of gem5, so if there are no buffer slots available, the
> >request is nacked and a retry must be sent (i.e. a token is returned).
> >
> >
> >Please don't get rid of the Queued*Ports, as I think there is a simple
> >way
> >> to improve them to do efficient flow control.
> >>
> >
> >Heh... not sure I have the time/motivation to remove the Queued*Ports
> >myself. I've just been swapping out the Queued*Ports that break when
> >trying to implement finite buffering in a Ruby memory hierarchy. I'll
> >leave Queued*Ports for later fixing or removal, as appropriate.
> >
> >
> > Joel
> >
> >
> >________________________________________
> >> From: gem5-dev
> >><gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>> on
> >>behalf of Joel Hestness <
> >>***@gmail.com<mailto:***@gmail.com>>
> >> Sent: Friday, February 5, 2016 12:03 PM
> >> To: Andreas Hansson
> >> Cc: gem5 Developer List
> >> Subject: Re: [gem5-dev] Follow-up: Removing QueuedSlavePort from
> >>DRAMCtrl
> >>
> >> Hi guys,
> >> Quick updates on this:
> >> 1) I have a finite response buffer implementation working. I
> >>removed the QueuedSlavePort and added a response queue with
> >>reservation (Andreas'
> >> underlying suggestion). I have a question with this solution: The
> >>QueuedSlavePort prioritized responses based their scheduled response
> >>time.
> >> However, since writes have a shorter pipeline from request to
> >>response, this architecture prioritized write requests ahead of read
> >>requests received earlier, and it performs ~1-8% worse than a strict
> >>queue (what I've implemented at this point). I can make the response
> >>queue a priority queue if we want the same structure as previously,
> >>but I'm wondering if we might prefer to just have the
> >>better-performing strict queue.
> >>
> >> 2) To reflect on Andreas' specific suggestion of using unblock
> >> callbacks from the PacketQueue: Modifying the QueuedSlavePort with
> >> callbacks is ugly when trying to call the callback: The call needs to
> >> originate from PacketQueue::sendDeferredPacket(), but PacketQueue
> >> doesn't have a pointer to the owner component; The SlavePort has the
> >> pointer, so the PacketQueue would need to first callback to the port,
> >> which would call the owner component callback.
> >> The exercise getting this to work has solidified my opinion that
> >> the Queued*Ports should probably be removed from the codebase: Queues
> >> and ports are separate subcomponents of simulated components, and
> >> only the component knows how they should interact. Including a
> >> Queued*Port inside a component requires the component to manage the
> >> flow-control into the Queued*Port just as it would need to manage a
> >> standard port anyway, and hiding the queue in the port obfuscates how
> it is managed.
> >>
> >>
> >> Thanks!
> >> Joel
> >>
> >>
> >> On Thu, Feb 4, 2016 at 10:06 AM, Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> wrote:
> >>
> >> > Hi Andreas,
> >> > Thanks for the input. I had tried adding front- and back-end
> >> > queues within the DRAMCtrl, but it became very difficult to
> >> > propagate the flow control back through the component due to the
> >> > complicated implementation
> >> of
> >> > timing across different accessAndRespond() calls. I had to put this
> >> > solution on hold.
> >> >
> >> > I think your proposed solution should simplify the flow control
> >> > issue, and should have the derivative effect of making the
> >> > Queued*Ports capable
> >> of
> >> > flow control. I'm a little concerned that your solution would make
> >> > the buffering very fluid, and I'm not sufficiently familiar with
> >> > memory controller microarchitecture to know if that would be
> >> > realistic. I wonder if you might have a way to do performance
> >> > validation after I work through either of these implementations.
> >> >
> >> > Thanks!
> >> > Joel
> >> >
> >> >
> >> >
> >> > On Wed, Feb 3, 2016 at 11:29 AM, Andreas Hansson <
> >> ***@arm.com<mailto:***@arm.com>>
> >> > wrote:
> >> >
> >> >> Hi Joel,
> >> >>
> >> >> I would suggest o keep the queued ports, but add methods to
> >> >> reserve resources, query if it has free space, and a way to
> >> >> register callbacks
> >> so
> >> >> that the MemObject is made aware when packets are sent. That way
> >> >> we can
> >> use
> >> >> the queue in the cache, memory controller etc, without having all
> >> >> the issues of the “naked” port interface, but still enforcing a
> >> >> bounded
> >> queue.
> >> >>
> >> >> When a packet arrives to the module we call reserve on the output
> >>port.
> >> >> Then when we actually add the packet we know that there is space.
> >> >> When request packets arrive we check if the queue is full, and if
> >> >> so we block any new requests. Then through the callback we can
> >> >> unblock the DRAM controller in this case.
> >> >>
> >> >> What do you think?
> >> >>
> >> >> Andreas
> >> >>
> >> >> From: Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> >> Date: Tuesday, 2 February 2016 at 00:24
> >> >> To: Andreas Hansson
> >><***@arm.com<mailto:***@arm.com>>
> >> >> Cc: gem5 Developer List
> >> >> <gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> >> Subject: Follow-up: Removing QueuedSlavePort from DRAMCtrl
> >> >>
> >> >> Hi Andreas,
> >> >> I'd like to circle back on the thread about removing the
> >> >> QueuedSlavePort response queue from DRAMCtrl. I've been working to
> >> >> shift over to DRAMCtrl from the RubyMemoryController, but nearly
> >> >> all of my simulations now crash on the DRAMCtrl's response queue.
> >> >> Since I need the DRAMCtrl to work, I'll be looking into this now.
> >> >> However, based on my inspection of the code, it looks pretty
> >> >> non-trivial to remove the QueueSlavePort, so I'm hoping you can at
> >> >> least help me work through the changes.
> >> >>
> >> >> To reproduce the issue, I've put together a slim gem5 patch
> >> >> (attached) to use the memtest.py script to generate accesses.
> >> >> Here's the command
> >> line
> >> >> I used:
> >> >>
> >> >> % build/X86/gem5.opt --debug-flag=DRAM --outdir=$outdir
> >> >> configs/example/memtest.py -u 100
> >> >>
> >> >> If you're still willing to take a stab at it, let me know if/how
> >> >> I can help. Otherwise, I'll start working on it. It seems the
> >> >> trickiest thing
> >> is
> >> >> going to be modeling the arbitrary frontendLatency and
> >> >> backendLatency
> >> while
> >> >> still counting all of the accesses that are in the controller when
> >> >> it
> >> needs
> >> >> to block back to the input queue. These latencies are currently
> >> >> assessed with scheduling in the port response queue. Any
> >> >> suggestions you could
> >> give
> >> >> would be appreciated.
> >> >>
> >> >> Thanks!
> >> >> Joel
> >> >>
> >> >>
> >> >> Below here is our conversation from the email thread "[gem5-dev]
> >> >> Review Request 3116: ruby: RubyMemoryControl delete requests"
> >> >>
> >> >> On Wed, Sep 23, 2015 at 3:51 PM, Andreas Hansson <
> >> ***@arm.com<mailto:***@arm.com>
> >> >> > wrote:
> >> >>
> >> >>> Great. Thanks Joel.
> >> >>>
> >> >>> If anything pops up on our side I’ll let you know.
> >> >>>
> >> >>> Andreas
> >> >>>
> >> >>> From: Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> >>> Date: Wednesday, 23 September 2015 20:29
> >> >>>
> >> >>> To: Andreas Hansson
> >><***@arm.com<mailto:***@arm.com>>
> >> >>> Cc: gem5 Developer List
> >><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> >>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >> >>> RubyMemoryControl delete requests
> >> >>>
> >> >>>
> >> >>>
> >> >>>> I don’t think there is any big difference in our expectations,
> >> >>>> quite the contrary :-). GPUs are very important to us (and so is
> >> >>>> throughput computing in general), and we run plenty simulations
> >> >>>> with lots of memory-level parallelism from non-CPU components.
> >> >>>> Still, we haven’t
> >> run
> >> >>>> into the issue.
> >> >>>>
> >> >>>
> >> >>> Ok, cool. Thanks for the context.
> >> >>>
> >> >>>
> >> >>> If you have practical examples that run into problems let me
> >> >>> know, and
> >> >>>> we’ll get it fixed.
> >> >>>>
> >> >>>
> >> >>> I'm having trouble assembling a practical example (with or
> >> >>> without
> >> using
> >> >>> gem5-gpu). I'll keep you posted if I find something reasonable.
> >> >>>
> >> >>> Thanks!
> >> >>> Joel
> >> >>>
> >> >>>
> >> >>>
> >> >>>> From: Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> >>>> Date: Tuesday, 22 September 2015 19:58
> >> >>>>
> >> >>>> To: Andreas Hansson
> >><***@arm.com<mailto:***@arm.com>>
> >> >>>> Cc: gem5 Developer List
> >><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> >>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >> >>>> RubyMemoryControl delete requests
> >> >>>>
> >> >>>> Hi Andreas,
> >> >>>>
> >> >>>>
> >> >>>>> If it is a real problem affecting end users I am indeed
> >> >>>>> volunteering to fix the DRAMCtrl use of QueuedSlavePort. In the
> >> >>>>> classic memory
> >> system
> >> >>>>> there are enough points of regulation (LSQs, MSHR limits,
> >> >>>>> crossbar
> >> layers
> >> >>>>> etc) that having a single memory channel with >100 queued up
> >> responses
> >> >>>>> waiting to be sent is extremely unlikely. Hence, until now the
> >> >>>>> added complexity has not been needed. If there is regulation on
> >> >>>>> the number
> >> of
> >> >>>>> requests in Ruby, then I would argue that it is equally
> >> >>>>> unlikely
> >> there…I
> >> >>>>> could be wrong.
> >> >>>>>
> >> >>>>
> >> >>>> Ok. I think a big part of the difference between our
> >> >>>> expectations is just the cores that we're modeling. AMD and
> >> >>>> gem5-gpu can model
> >> aggressive
> >> >>>> GPU cores with potential to expose, perhaps, 4-32x more
> >> >>>> memory-level parallel requests than a comparable number of
> >> >>>> multithreaded CPU
> >> cores. I
> >> >>>> feel that this difference warrants different handling of
> >> >>>> accesses in
> >> the
> >> >>>> memory controller.
> >> >>>>
> >> >>>> Joel
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> From: Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> >>>>> Date: Tuesday, 22 September 2015 17:48
> >> >>>>>
> >> >>>>> To: Andreas Hansson
> >><***@arm.com<mailto:***@arm.com>>
> >> >>>>> Cc: gem5 Developer List
> >><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> >>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >> >>>>> RubyMemoryControl delete requests
> >> >>>>>
> >> >>>>> Hi Andreas,
> >> >>>>>
> >> >>>>> Thanks for the "ship it!"
> >> >>>>>
> >> >>>>>
> >> >>>>>> Do we really need to remove the use of QueuedSlavePort in
> >>DRAMCtrl?
> >> >>>>>> It will make the controller more complex, and I don’t want to
> >> >>>>>> do it
> >> “just
> >> >>>>>> in case”.
> >> >>>>>>
> >> >>>>>
> >> >>>>> Sorry, I misread your email as offering to change the DRAMCtrl.
> >> >>>>> I'm not sure who should make that change, but I think it should
> >> >>>>> get
> >> done. The
> >> >>>>> memory access response path starts at the DRAMCtrl and ends at
> >> >>>>> the RubyPort. If we add control flow to the RubyPort, packets
> >> >>>>> will
> >> probably
> >> >>>>> back-up more quickly on the response path back to where there
> >> >>>>> are
> >> open
> >> >>>>> buffers. I expect the DRAMCtrl QueuedPort problem becomes more
> >> prevalent as
> >> >>>>> Ruby adds flow control, unless we add a limitation on
> >> >>>>> outstanding
> >> requests
> >> >>>>> to memory from directory controllers.
> >> >>>>>
> >> >>>>> How does the classic memory model deal with this?
> >> >>>>>
> >> >>>>> Joel
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>> From: Joel Hestness
> >><***@gmail.com<mailto:***@gmail.com>>
> >> >>>>>> Date: Tuesday, 22 September 2015 17:30
> >> >>>>>> To: Andreas Hansson
> >><***@arm.com<mailto:***@arm.com>>
> >> >>>>>> Cc: gem5 Developer List
> >><gem5-***@gem5.org<mailto:gem5-***@gem5.org>>
> >> >>>>>>
> >> >>>>>> Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >> >>>>>> RubyMemoryControl delete requests
> >> >>>>>>
> >> >>>>>> Hi guys,
> >> >>>>>> Thanks for the discussion here. I had quickly tested other
> >> >>>>>> memory controllers, but hadn't connected the dots that this
> >> >>>>>> might be the
> >> same
> >> >>>>>> problem Brad/AMD are running into.
> >> >>>>>>
> >> >>>>>> My preference would be that we remove the QueuedSlavePort
> >> >>>>>> from the DRAMCtrls. That would at least eliminate DRAMCtrls as
> >> >>>>>> a potential
> >> source of
> >> >>>>>> the QueueSlavePort packet overflows, and would allow us to
> >> >>>>>> more
> >> closely
> >> >>>>>> focus on the RubyPort problem when we get to it.
> >> >>>>>>
> >> >>>>>> Can we reach resolution on this patch though? Are we okay
> >> >>>>>> with actually fixing the memory leak in mainline?
> >> >>>>>>
> >> >>>>>> Joel
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Tue, Sep 22, 2015 at 11:19 AM, Andreas Hansson <
> >> >>>>>> ***@arm.com<mailto:***@arm.com>> wrote:
> >> >>>>>>
> >> >>>>>>> Hi Brad,
> >> >>>>>>>
> >> >>>>>>> We can remove the use of QueuedSlavePort in the memory
> >> >>>>>>> controller
> >> and
> >> >>>>>>> simply not accept requests if the response queue is full. Is
> >> >>>>>>> this needed?
> >> >>>>>>> If so we’ll make sure someone gets this in place. The only
> >> >>>>>>> reason
> >> we
> >> >>>>>>> haven’t done it is because it hasn’t been needed.
> >> >>>>>>>
> >> >>>>>>> The use of QueuedPorts in the Ruby adapters is a whole
> >> >>>>>>> different story. I think most of these can be removed and
> >> >>>>>>> actually use flow control.
> >> I’m
> >> >>>>>>> happy to code it up, but there is such a flux at the moment
> >> >>>>>>> that I didn’t want to post yet another patch changing the
> >> >>>>>>> Ruby port. I really do think we should avoid having implicit
> >> >>>>>>> buffers for 1000’s of kilobytes to the largest extend
> >> >>>>>>> possible. If we really need a constructor parameter to make
> >> >>>>>>> it “infinite” for some quirky Ruby use-case, then let’s do
> that...
> >> >>>>>>>
> >> >>>>>>> Andreas
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On 22/09/2015 17:14, "gem5-dev on behalf of Beckmann, Brad"
> >> >>>>>>> <gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>
> >> >>>>>>> on
> >>behalf of ***@amd.com<mailto:***@amd.com>>
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> >From AMD's perspective, we have deprecated our usage of
> >> >>>>>>> RubyMemoryControl
> >> >>>>>>> >and we are using the new Memory Controllers with the port
> >> interface.
> >> >>>>>>> >
> >> >>>>>>> >That being said, I completely agree with Joel that the
> >> >>>>>>> >packet
> >> queue
> >> >>>>>>> >finite invisible buffer limit of 100 needs to go! As you
> >> >>>>>>> >know, we
> >> >>>>>>> tried
> >> >>>>>>> >very hard several months ago to essentially make this a
> >> >>>>>>> >infinite
> >> >>>>>>> buffer,
> >> >>>>>>> >but Andreas would not allow us to check it in. We are going
> >> >>>>>>> >to
> >> >>>>>>> post that
> >> >>>>>>> >patch again in a few weeks when we post our GPU model. Our
> >> >>>>>>> >GPU
> >> >>>>>>> model
> >> >>>>>>> >will not work unless we increase that limit.
> >> >>>>>>> >
> >> >>>>>>> >Andreas you keep arguing that if you exceed that limit, that
> >> >>>>>>> something is
> >> >>>>>>> >fundamentally broken. Please keep in mind that there are
> >> >>>>>>> >many
> >> uses
> >> >>>>>>> of
> >> >>>>>>> >gem5 beyond what you use it for. Also this is a research
> >> simulator
> >> >>>>>>> and
> >> >>>>>>> >we should not restrict ourselves to what we think is
> >> >>>>>>> >practical in
> >> >>>>>>> real
> >> >>>>>>> >hardware. Finally, the fact that the finite limit is
> >> >>>>>>> >invisible to
> >> >>>>>>> the
> >> >>>>>>> >producer is just bad software engineering.
> >> >>>>>>> >
> >> >>>>>>> >I beg you to please allow us to remove this finite invisible
> >> limit!
> >> >>>>>>> >
> >> >>>>>>> >Brad
> >> >>>>>>> >
> >> >>>>>>> >
> >> >>>>>>> >
> >> >>>>>>> >-----Original Message-----
> >> >>>>>>> >From: gem5-dev
> >>[mailto:gem5-dev-***@gem5.org<mailto:gem5-dev-***@gem5.org>]
> >>On Behalf
> >> >>>>>>> >Of
> >> >>>>>>> Andreas
> >> >>>>>>> >Hansson
> >> >>>>>>> >Sent: Tuesday, September 22, 2015 6:35 AM
> >> >>>>>>> >To: Andreas Hansson; Default; Joel Hestness
> >> >>>>>>> >Subject: Re: [gem5-dev] Review Request 3116: ruby:
> >> RubyMemoryControl
> >> >>>>>>> >delete requests
> >> >>>>>>> >
> >> >>>>>>> >
> >> >>>>>>> >
> >> >>>>>>> >> On Sept. 21, 2015, 8:42 a.m., Andreas Hansson wrote:
> >> >>>>>>> >> > Can we just prune the whole RubyMemoryControl rather?
> >> >>>>>>> >> > Has it
> >> >>>>>>> not been
> >> >>>>>>> >>deprecated long enough?
> >> >>>>>>> >>
> >> >>>>>>> >> Joel Hestness wrote:
> >> >>>>>>> >> Unless I'm overlooking something, for Ruby users, I
> >> >>>>>>> >> don't
> >> see
> >> >>>>>>> other
> >> >>>>>>> >>memory controllers that are guaranteed to work. Besides
> >> >>>>>>> >>RubyMemoryControl, all others use a QueuedSlavePort for
> >> >>>>>>> >>their
> >> input
> >> >>>>>>> >>queues. Given that Ruby hasn't added complete flow control,
> >> >>>>>>> PacketQueue
> >> >>>>>>> >>size restrictions can be exceeded (triggering the panic).
> >> >>>>>>> >>This
> >> >>>>>>> occurs
> >> >>>>>>> >>infrequently/irregularly with aggressive GPUs in gem5-gpu,
> >> >>>>>>> >>and
> >> >>>>>>> appears
> >> >>>>>>> >>difficult to fix in a systematic way.
> >> >>>>>>> >>
> >> >>>>>>> >> Regardless of the fact we've deprecated
> >> >>>>>>> >> RubyMemoryControl,
> >> >>>>>>> this is
> >> >>>>>>> >>a necessary fix.
> >> >>>>>>> >
> >> >>>>>>> >No memory controller is using QueuedSlaavePort for any
> >> >>>>>>> >_input_
> >> >>>>>>> queues.
> >> >>>>>>> >The DRAMCtrl class uses it for the response _output_ queue,
> >> >>>>>>> >that's
> >> >>>>>>> all.
> >> >>>>>>> >If that is really an issue we can move away from it and
> >> >>>>>>> >enfore an
> >> >>>>>>> upper
> >> >>>>>>> >bound on responses by not accepting new requests. That said,
> >> >>>>>>> >if we
> >> >>>>>>> hit
> >> >>>>>>> >the limit I would argue something else is fundamentally
> >> >>>>>>> >broken in
> >> >>>>>>> the
> >> >>>>>>> >system and should be addressed.
> >> >>>>>>> >
> >> >>>>>>> >In any case, the discussion whether to remove
> >> >>>>>>> >RubyMemoryControl or
> >> >>>>>>> not
> >> >>>>>>> >should be completely decoupled.
> >> >>>>>>> >
> >> >>>>>>> >
> >> >>>>>>> >- Andreas
> >> >>>>>>>
> >> >>>>>>
> >> >>
> >> >> --
> >> >> Joel Hestness
> >> >> PhD Candidate, Computer Architecture
> >> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> >> http://pages.cs.wisc.edu/~hestness/
> >> >> IMPORTANT NOTICE: The contents of this email and any attachments
> >> >> are confidential and may also be privileged. If you are not the
> >> >> intended recipient, please notify the sender immediately and do
> >> >> not disclose the contents to any other person, use it for any
> >> >> purpose, or store or copy
> >> the
> >> >> information in any medium. Thank you.
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Joel Hestness
> >> > PhD Candidate, Computer Architecture
> >> > Dept. of Computer Science, University of Wisconsin - Madison
> >> > http://pages.cs.wisc.edu/~hestness/
> >> >
> >>
> >>
> >>
> >> --
> >> Joel Hestness
> >> PhD Candidate, Computer Architecture
> >> Dept. of Computer Science, University of Wisconsin - Madison
> >> http://pages.cs.wisc.edu/~hestness/
> >> _______________________________________________
> >> gem5-dev mailing list
> >> gem5-***@gem5.org<mailto:gem5-***@gem5.org>
> >> http://m5sim.org/mailman/listinfo/gem5-dev
> >>
> >
> >
> >--
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >_______________________________________________
> >gem5-dev mailing list
> >gem5-***@gem5.org<mailto:gem5-***@gem5.org>
> >http://m5sim.org/mailman/listinfo/gem5-dev
> >_______________________________________________
> >gem5-dev mailing list
> >gem5-***@gem5.org<mailto:gem5-***@gem5.org>
> >http://m5sim.org/mailman/listinfo/gem5-dev
> >
> >
> >--
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >
> >
> >
> >--
> > Joel Hestness
> > PhD Candidate, Computer Architecture
> > Dept. of Computer Science, University of Wisconsin - Madison
> > http://pages.cs.wisc.edu/~hestness/
> >_______________________________________________
> >gem5-dev mailing list
> >gem5-***@gem5.org
> >http://m5sim.org/mailman/listinfo/gem5-dev
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
> _______________________________________________
> gem5-dev mailing list
> gem5-***@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>



--
Joel Hestness
PhD Candidate, Computer Architecture
Dept. of Computer Science, University of Wisconsin - Madison
http://pages.cs.wisc.edu/~hestness/
Loading...