Discussion:
[gem5-dev] assertion failures after O3 draining patch (Changeset 6c9e3d624922)
(too old to reply)
Amin Farmahini
2013-01-25 21:00:16 UTC
Permalink
Hi,

I have developed a model that frequently switches between cpus. To be more
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/1568/), I have encountered two
assertion failures.

1. assert(predHist[i].empty()); in BPredUnit<Impl>::drainSanityCheck() (src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to o3
and then from o3 to atomic.

2. assert(!cpu->switchedOut()); in DefaultFetch<Impl>::
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from cache
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch only
if no drain is pending.

I have found a solution to work around these assertion failures, and I am
not sure if this only happens to me because of the specific way I use the
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.

Thanks,
Amin
Ali Saidi
2013-01-25 21:15:36 UTC
Permalink
Hi Amin,

That's curious that you're running into this issue
because we recently added a regression test (realview-switcheroo-full
and tsunami-switcheroo-full) that switch frequently between atomic,
timing, o3, atomic, timing, o3, ... and we don't encounter the error.
Any idea what is different between how the switching is done in those
cases and how you do the switching?

Thanks,

Ali

On 25.01.2013
Post by Amin Farmahini
Hi,
I have developed a model that
frequently switches between cpus. To be more
Post by Amin Farmahini
specific, I switch
between O3 and a cpu model of mine. After new changes to
Post by Amin Farmahini
O3 draining
(http://reviews.gem5.org/r/1568/ [1]), I have encountered two
assertion failures.
Post by Amin Farmahini
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::drainSanityCheck() (src
Post by Amin Farmahini
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but
as
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch.
This
Post by Amin Farmahini
assertion failure happens, for example, when you switch from
atomic to o3
Post by Amin Farmahini
and then from o3 to atomic.
2.
assert(!cpu->switchedOut()); in DefaultFetch<Impl>::
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Post by Amin Farmahini
Obviously this
happens when fetch stage in O3 receives a packet from cache
Post by Amin Farmahini
(possibly
after an Icache miss) while the o3 is switched out. Again,
Post by Amin Farmahini
previously,
we used to detect such a situation and activate the fetch only
Post by Amin Farmahini
if no
drain is pending.
Post by Amin Farmahini
I have found a solution to work around these
assertion failures, and I am
Post by Amin Farmahini
not sure if this only happens to me
because of the specific way I use the
Post by Amin Farmahini
O3 draining or not. I just
wanted to mention these assertion failures could
Post by Amin Farmahini
be possible bugs.
Thanks,
Amin
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev [2]



Links:
Anthony Gutierrez
2013-01-25 21:24:56 UTC
Permalink
Was the testing done on Android at all?

-Tony
Post by Ali Saidi
Hi Amin,
That's curious that you're running into this issue
because we recently added a regression test (realview-switcheroo-full
and tsunami-switcheroo-full) that switch frequently between atomic,
timing, o3, atomic, timing, o3, ... and we don't encounter the error.
Any idea what is different between how the switching is done in those
cases and how you do the switching?
Thanks,
Ali
On 25.01.2013
Post by Amin Farmahini
Hi,
I have developed a model that
frequently switches between cpus. To be more
Post by Amin Farmahini
specific, I switch
between O3 and a cpu model of mine. After new changes to
Post by Amin Farmahini
O3 draining
(http://reviews.gem5.org/r/1568/ [1]), I have encountered two
assertion failures.
Post by Amin Farmahini
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::drainSanityCheck() (src
Post by Amin Farmahini
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch.
This
Post by Amin Farmahini
assertion failure happens, for example, when you switch from
atomic to o3
Post by Amin Farmahini
and then from o3 to atomic.
2.
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Post by Amin Farmahini
Obviously this
happens when fetch stage in O3 receives a packet from cache
Post by Amin Farmahini
(possibly
after an Icache miss) while the o3 is switched out. Again,
Post by Amin Farmahini
previously,
we used to detect such a situation and activate the fetch only
Post by Amin Farmahini
if no
drain is pending.
Post by Amin Farmahini
I have found a solution to work around these
assertion failures, and I am
Post by Amin Farmahini
not sure if this only happens to me
because of the specific way I use the
Post by Amin Farmahini
O3 draining or not. I just
wanted to mention these assertion failures could
Post by Amin Farmahini
be possible bugs.
Thanks,
Amin
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev [2]
------
[1]
http://reviews.gem5.org/r/1568/
[2]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Amin Farmahini
2013-01-25 21:29:16 UTC
Permalink
Ali,

I'll take a look at regression tests and let you know.

Forgot to metion: I use SE, classic, ARM ISA.

Amin
Post by Anthony Gutierrez
Was the testing done on Android at all?
-Tony
Post by Ali Saidi
Hi Amin,
That's curious that you're running into this issue
because we recently added a regression test (realview-switcheroo-full
and tsunami-switcheroo-full) that switch frequently between atomic,
timing, o3, atomic, timing, o3, ... and we don't encounter the error.
Any idea what is different between how the switching is done in those
cases and how you do the switching?
Thanks,
Ali
On 25.01.2013
Post by Amin Farmahini
Hi,
I have developed a model that
frequently switches between cpus. To be more
Post by Amin Farmahini
specific, I switch
between O3 and a cpu model of mine. After new changes to
Post by Amin Farmahini
O3 draining
(http://reviews.gem5.org/r/1568/ [1]), I have encountered two
assertion failures.
Post by Amin Farmahini
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::drainSanityCheck() (src
Post by Amin Farmahini
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch.
This
Post by Amin Farmahini
assertion failure happens, for example, when you switch from
atomic to o3
Post by Amin Farmahini
and then from o3 to atomic.
2.
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Post by Amin Farmahini
Obviously this
happens when fetch stage in O3 receives a packet from cache
Post by Amin Farmahini
(possibly
after an Icache miss) while the o3 is switched out. Again,
Post by Amin Farmahini
previously,
we used to detect such a situation and activate the fetch only
Post by Amin Farmahini
if no
drain is pending.
Post by Amin Farmahini
I have found a solution to work around these
assertion failures, and I am
Post by Amin Farmahini
not sure if this only happens to me
because of the specific way I use the
Post by Amin Farmahini
O3 draining or not. I just
wanted to mention these assertion failures could
Post by Amin Farmahini
be possible bugs.
Thanks,
Amin
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev [2]
------
[1]
http://reviews.gem5.org/r/1568/
[2]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Andreas Sandberg
2013-01-28 09:31:41 UTC
Permalink
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be more
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/1568/), I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in BPredUnit<Impl>::drainSanityCheck() (src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to o3
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from cache
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch only
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects both the icache and dcache when switching between CPUs, so
the CPU should never be switched out and connected to a cache at the
same time. Besides, the new O3 draining should wait for /all/
outstanding requests to complete or be squashed. As far as I'm
concerned, the the draining code is buggy if there are still pending
ifetches in a drained system.
Post by Amin Farmahini
I have found a solution to work around these assertion failures, and I am
not sure if this only happens to me because of the specific way I use the
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second one
could be due to a bug in your configuration scripts or in your CPU
model. Are you using any of the example scripts? Or have you rolled your
own? If so, could you send us/me a copy so I can have a look?

//Andreas
Anthony Gutierrez
2013-01-28 16:08:06 UTC
Permalink
Hey Andreas,

Do you have any idea about this problem:

http://www.mail-archive.com/gem5-users-1Gs4CP2/***@public.gmane.org/msg06550.html

Thanks,
Tony
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be more
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/**1568/<http://reviews.gem5.org/r/1568/>),
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in BPredUnit<Impl>::**drainSanityCheck()
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to o3
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from cache
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch only
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom disconnects
both the icache and dcache when switching between CPUs, so the CPU should
never be switched out and connected to a cache at the same time. Besides,
the new O3 draining should wait for /all/ outstanding requests to complete
or be squashed. As far as I'm concerned, the the draining code is buggy if
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and I am
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use the
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second one
could be due to a bug in your configuration scripts or in your CPU model.
Are you using any of the example scripts? Or have you rolled your own? If
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
Amin Farmahini
2013-01-28 16:19:32 UTC
Permalink
Hi Andreas,

Thanks for the response. I don't use a script for switching cpus. I have
added an m5 magic instructions to drain the pipeline and switch cpus. So,
as you know mentioned, it is highly likely that the second bug could be due
to my magic instruction.

Thanks,
Amin
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be
more
Post by Andreas Sandberg
Post by Amin Farmahini
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/**1568/<
http://reviews.gem5.org/r/1568/>),
Post by Andreas Sandberg
Post by Amin Farmahini
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::**drainSanityCheck()
Post by Andreas Sandberg
Post by Amin Farmahini
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but
as
Post by Andreas Sandberg
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to
o3
Post by Andreas Sandberg
Post by Amin Farmahini
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
Post by Andreas Sandberg
Post by Amin Farmahini
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch
only
Post by Andreas Sandberg
Post by Amin Farmahini
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
Post by Andreas Sandberg
both the icache and dcache when switching between CPUs, so the CPU should
never be switched out and connected to a cache at the same time. Besides,
the new O3 draining should wait for /all/ outstanding requests to
complete
Post by Andreas Sandberg
or be squashed. As far as I'm concerned, the the draining code is buggy
if
Post by Andreas Sandberg
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and I
am
Post by Andreas Sandberg
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use
the
Post by Andreas Sandberg
Post by Amin Farmahini
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second one
could be due to a bug in your configuration scripts or in your CPU model.
Are you using any of the example scripts? Or have you rolled your own? If
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<
http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Andreas Sandberg
2013-02-04 20:06:47 UTC
Permalink
Amin,

I've attached a patch that solves the first of your problems. It is also
available in my fixes [1] branch. I haven't been able to figure out
what's causing the second one.

Could you send me the command line you use when you trigger the problem?
Do you have a minimal (small) test code reproduces the bug?

I'll post the patches for review once we have sorted out both of the
issues.

//Andreas

[1] https://github.com/andysan/gem5/tree/fixes
Post by Amin Farmahini
Hi Andreas,
Thanks for the response. I don't use a script for switching cpus. I have
added an m5 magic instructions to drain the pipeline and switch cpus. So,
as you know mentioned, it is highly likely that the second bug could be due
to my magic instruction.
Thanks,
Amin
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be
more
Post by Andreas Sandberg
Post by Amin Farmahini
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/**1568/<
http://reviews.gem5.org/r/1568/>),
Post by Andreas Sandberg
Post by Amin Farmahini
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::**drainSanityCheck()
Post by Andreas Sandberg
Post by Amin Farmahini
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but
as
Post by Andreas Sandberg
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to
o3
Post by Andreas Sandberg
Post by Amin Farmahini
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
Post by Andreas Sandberg
Post by Amin Farmahini
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch
only
Post by Andreas Sandberg
Post by Amin Farmahini
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
Post by Andreas Sandberg
both the icache and dcache when switching between CPUs, so the CPU should
never be switched out and connected to a cache at the same time. Besides,
the new O3 draining should wait for /all/ outstanding requests to
complete
Post by Andreas Sandberg
or be squashed. As far as I'm concerned, the the draining code is buggy
if
Post by Andreas Sandberg
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and I
am
Post by Andreas Sandberg
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use
the
Post by Andreas Sandberg
Post by Amin Farmahini
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second one
could be due to a bug in your configuration scripts or in your CPU model.
Are you using any of the example scripts? Or have you rolled your own? If
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<
http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Amin Farmahini
2013-02-04 22:03:13 UTC
Permalink
Hi Andreas,

Thanks for taking the time for fixes. I have already applied the same patch
to fix the first problem. So this should fix the problem.

To regenerate the second problem, I have to send you a lot of my patches
because I have highly modified gem5. Since others have not reported such a
problem, I'd guess this is caused by my patches. So, please disregard the
second problem.

Thanks,
Amin
Post by Andreas Sandberg
Amin,
I've attached a patch that solves the first of your problems. It is also
available in my fixes [1] branch. I haven't been able to figure out
what's causing the second one.
Could you send me the command line you use when you trigger the problem?
Do you have a minimal (small) test code reproduces the bug?
I'll post the patches for review once we have sorted out both of the
issues.
//Andreas
[1] https://github.com/andysan/gem5/tree/fixes
Post by Amin Farmahini
Hi Andreas,
Thanks for the response. I don't use a script for switching cpus. I have
added an m5 magic instructions to drain the pipeline and switch cpus. So,
as you know mentioned, it is highly likely that the second bug could be
due
Post by Amin Farmahini
to my magic instruction.
Thanks,
Amin
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
On Mon, Jan 28, 2013 at 4:31 AM, Andreas Sandberg <
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To
be
Post by Amin Farmahini
Post by Anthony Gutierrez
more
Post by Andreas Sandberg
Post by Amin Farmahini
specific, I switch between O3 and a cpu model of mine. After new
changes
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
to
O3 draining (http://reviews.gem5.org/r/**1568/<
http://reviews.gem5.org/r/1568/>),
Post by Andreas Sandberg
Post by Amin Farmahini
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::**drainSanityCheck()
Post by Andreas Sandberg
Post by Amin Farmahini
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching,
but
Post by Amin Farmahini
Post by Anthony Gutierrez
as
Post by Andreas Sandberg
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic
to
Post by Amin Farmahini
Post by Anthony Gutierrez
o3
Post by Andreas Sandberg
Post by Amin Farmahini
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this
assertion
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
Post by Andreas Sandberg
Post by Amin Farmahini
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the
fetch
Post by Amin Farmahini
Post by Anthony Gutierrez
only
Post by Andreas Sandberg
Post by Amin Farmahini
if no drain is pending.
I don't think this should by possible any more, it's most likely a
bug
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
Post by Andreas Sandberg
both the icache and dcache when switching between CPUs, so the CPU
should
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
never be switched out and connected to a cache at the same time.
Besides,
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
the new O3 draining should wait for /all/ outstanding requests to
complete
Post by Andreas Sandberg
or be squashed. As far as I'm concerned, the the draining code is
buggy
Post by Amin Farmahini
Post by Anthony Gutierrez
if
Post by Andreas Sandberg
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures,
and I
Post by Amin Farmahini
Post by Anthony Gutierrez
am
Post by Andreas Sandberg
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I
use
Post by Amin Farmahini
Post by Anthony Gutierrez
the
Post by Andreas Sandberg
Post by Amin Farmahini
O3 draining or not. I just wanted to mention these assertion
failures
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second
one
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
could be due to a bug in your configuration scripts or in your CPU
model.
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
Are you using any of the example scripts? Or have you rolled your
own? If
Post by Amin Farmahini
Post by Anthony Gutierrez
Post by Andreas Sandberg
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<
http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Andreas Sandberg
2013-02-04 20:02:55 UTC
Permalink
Hi Tony,

I had a quick look and was unable to reproduce it myself. Could you
check if it is still a problem and send me your kernel binary in that
case?

I suspect that the problem is that there are cases when we don't reset
the memReq[tid] pointer when a request is has been squashed. Could you
test to see if the patch I've attached solves the issue? The fix is also
included in my gem5 fixes branch [1].

//Andreas

[1] https://github.com/andysan/gem5
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be more
specific, I switch between O3 and a cpu model of mine. After new changes to
O3 draining (http://reviews.gem5.org/r/**1568/<http://reviews.gem5.org/r/1568/>),
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in BPredUnit<Impl>::**drainSanityCheck()
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching, but as
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic to o3
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from cache
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch only
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom disconnects
both the icache and dcache when switching between CPUs, so the CPU should
never be switched out and connected to a cache at the same time. Besides,
the new O3 draining should wait for /all/ outstanding requests to complete
or be squashed. As far as I'm concerned, the the draining code is buggy if
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and I am
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use the
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second one
could be due to a bug in your configuration scripts or in your CPU model.
Are you using any of the example scripts? Or have you rolled your own? If
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Anthony Gutierrez
2013-02-05 14:50:56 UTC
Permalink
Hi Andreas,

The changeset I was using when I ran into this problem was this one:
http://repo.gem5.org/gem5/rev/f9e76b1eb79a.

I tried with the patch; it no longer asserts, but now the simulation seems
to hang. The kernel and disk image I am using are from:
http://gem5.org/bbench-gem5. The gingerbread image with bbench and the
kernel are there.

With the latest repo (unmodified) repeat switching also causes the
simulation to hang and never hits that assert.

Thanks,
Tony
Post by Andreas Sandberg
Hi Tony,
I had a quick look and was unable to reproduce it myself. Could you
check if it is still a problem and send me your kernel binary in that
case?
I suspect that the problem is that there are cases when we don't reset
the memReq[tid] pointer when a request is has been squashed. Could you
test to see if the patch I've attached solves the issue? The fix is also
included in my gem5 fixes branch [1].
//Andreas
[1] https://github.com/andysan/gem5
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
On Mon, Jan 28, 2013 at 4:31 AM, Andreas Sandberg <
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be
more
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
specific, I switch between O3 and a cpu model of mine. After new
changes
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
to
O3 draining (http://reviews.gem5.org/r/**1568/<
http://reviews.gem5.org/r/1568/>),
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::**drainSanityCheck()
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching,
but as
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic
to o3
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this
assertion
Post by Anthony Gutierrez
Post by Andreas Sandberg
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch
only
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
Post by Anthony Gutierrez
Post by Andreas Sandberg
both the icache and dcache when switching between CPUs, so the CPU
should
Post by Anthony Gutierrez
Post by Andreas Sandberg
never be switched out and connected to a cache at the same time.
Besides,
Post by Anthony Gutierrez
Post by Andreas Sandberg
the new O3 draining should wait for /all/ outstanding requests to
complete
Post by Anthony Gutierrez
Post by Andreas Sandberg
or be squashed. As far as I'm concerned, the the draining code is
buggy if
Post by Anthony Gutierrez
Post by Andreas Sandberg
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and
I am
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use
the
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second
one
Post by Anthony Gutierrez
Post by Andreas Sandberg
could be due to a bug in your configuration scripts or in your CPU
model.
Post by Anthony Gutierrez
Post by Andreas Sandberg
Are you using any of the example scripts? Or have you rolled your own?
If
Post by Anthony Gutierrez
Post by Andreas Sandberg
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<
http://m5sim.org/mailman/listinfo/gem5-dev>
Post by Anthony Gutierrez
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Andreas Sandberg
2013-02-07 16:24:52 UTC
Permalink
Hi Tony,

There was a small mistake, well actually, a pretty large one, in the
patch I sent you. The patch actually breaks draining completely... :(

I've attached the new version of the patch. Sorry for the confusion.

I tried to reproduce the bug using the current tip/master with the patch
applied and the simulation gets stuck around tick 6901819000 instead (I
used the same command line as you did). It seems like it's something to
do with L2 draining, but I haven't figured out the details yet.

//Andreas
Post by Amin Farmahini
Hi Andreas,
http://repo.gem5.org/gem5/rev/f9e76b1eb79a.
I tried with the patch; it no longer asserts, but now the simulation seems
http://gem5.org/bbench-gem5. The gingerbread image with bbench and the
kernel are there.
With the latest repo (unmodified) repeat switching also causes the
simulation to hang and never hits that assert.
Thanks,
Tony
Post by Andreas Sandberg
Hi Tony,
I had a quick look and was unable to reproduce it myself. Could you
check if it is still a problem and send me your kernel binary in that
case?
I suspect that the problem is that there are cases when we don't reset
the memReq[tid] pointer when a request is has been squashed. Could you
test to see if the patch I've attached solves the issue? The fix is also
included in my gem5 fixes branch [1].
//Andreas
[1] https://github.com/andysan/gem5
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
On Mon, Jan 28, 2013 at 4:31 AM, Andreas Sandberg <
Post by Andreas Sandberg
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be
more
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
specific, I switch between O3 and a cpu model of mine. After new
changes
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
to
O3 draining (http://reviews.gem5.org/r/**1568/<
http://reviews.gem5.org/r/1568/>),
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
I have encountered two
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::**drainSanityCheck()
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
(src
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching,
but as
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
far as I understand, we don't do so any more in the new patch. This
assertion failure happens, for example, when you switch from atomic
to o3
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
and then from o3 to atomic.
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this
assertion
Post by Anthony Gutierrez
Post by Andreas Sandberg
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
(possibly after an Icache miss) while the o3 is switched out. Again,
previously, we used to detect such a situation and activate the fetch
only
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
if no drain is pending.
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
Post by Anthony Gutierrez
Post by Andreas Sandberg
both the icache and dcache when switching between CPUs, so the CPU
should
Post by Anthony Gutierrez
Post by Andreas Sandberg
never be switched out and connected to a cache at the same time.
Besides,
Post by Anthony Gutierrez
Post by Andreas Sandberg
the new O3 draining should wait for /all/ outstanding requests to
complete
Post by Anthony Gutierrez
Post by Andreas Sandberg
or be squashed. As far as I'm concerned, the the draining code is
buggy if
Post by Anthony Gutierrez
Post by Andreas Sandberg
there are still pending ifetches in a drained system.
I have found a solution to work around these assertion failures, and
I am
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
not sure if this only happens to me because of the specific way I use
the
Post by Anthony Gutierrez
Post by Andreas Sandberg
Post by Amin Farmahini
O3 draining or not. I just wanted to mention these assertion failures could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second
one
Post by Anthony Gutierrez
Post by Andreas Sandberg
could be due to a bug in your configuration scripts or in your CPU
model.
Post by Anthony Gutierrez
Post by Andreas Sandberg
Are you using any of the example scripts? Or have you rolled your own?
If
Post by Anthony Gutierrez
Post by Andreas Sandberg
so, could you send us/me a copy so I can have a look?
//Andreas
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<
http://m5sim.org/mailman/listinfo/gem5-dev>
Post by Anthony Gutierrez
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Anthony Gutierrez
2013-02-07 20:46:48 UTC
Permalink
Hi Andreas,

Where is the patch? It doesn't seem to be attached.

Thanks,
Tony
Post by Andreas Sandberg
Hi Tony,
There was a small mistake, well actually, a pretty large one, in the patch
I sent you. The patch actually breaks draining completely... :(
I've attached the new version of the patch. Sorry for the confusion.
I tried to reproduce the bug using the current tip/master with the patch
applied and the simulation gets stuck around tick 6901819000 instead (I
used the same command line as you did). It seems like it's something to do
with L2 draining, but I haven't figured out the details yet.
//Andreas
Post by Amin Farmahini
Hi Andreas,
http://repo.gem5.org/gem5/rev/**f9e76b1eb79a<http://repo.gem5.org/gem5/rev/f9e76b1eb79a>
.
I tried with the patch; it no longer asserts, but now the simulation seems
http://gem5.org/bbench-gem5. The gingerbread image with bbench and the
kernel are there.
With the latest repo (unmodified) repeat switching also causes the
simulation to hang and never hits that assert.
Thanks,
Tony
Hi Tony,
Post by Andreas Sandberg
I had a quick look and was unable to reproduce it myself. Could you
check if it is still a problem and send me your kernel binary in that
case?
I suspect that the problem is that there are cases when we don't reset
the memReq[tid] pointer when a request is has been squashed. Could you
test to see if the patch I've attached solves the issue? The fix is also
included in my gem5 fixes branch [1].
//Andreas
[1] https://github.com/andysan/**gem5 <https://github.com/andysan/gem5>
Post by Anthony Gutierrez
Hey Andreas,
Thanks,
Tony
On Mon, Jan 28, 2013 at 4:31 AM, Andreas Sandberg <
Post by Amin Farmahini
I have developed a model that frequently switches between cpus. To be
more
specific, I switch between O3 and a cpu model of mine. After new
Post by Amin Farmahini
changes
to
Post by Amin Farmahini
O3 draining (http://reviews.gem5.org/r/****1568/<http://reviews.gem5.org/r/**1568/>
<
http://reviews.gem5.org/r/**1568/ <http://reviews.gem5.org/r/1568/>>),
I have encountered two
Post by Amin Farmahini
assertion failures.
1. assert(predHist[i].empty()); in
BPredUnit<Impl>::****drainSanityCheck()
(src
Post by Amin Farmahini
/cpu/o3/bpred_unit_ipml.hh)
Prior to new patch, we squashed the history table before switching,
but as
far as I understand, we don't do so any more in the new patch. This
Post by Amin Farmahini
assertion failure happens, for example, when you switch from atomic
to o3
and then from o3 to atomic.
Post by Amin Farmahini
This is a bug in the draining code. Just comment out the code in
drainSanityCheck and you should be fine. I'm a bit surprised that we
haven't seen this in the regressions, it seems to be that this
assertion
would trigger on every single O3 CPU drain/resume.
Post by Amin Farmahini
processCacheCompletion (src/cpu/o3/fetch_impl.hh)
Obviously this happens when fetch stage in O3 receives a packet from
cache
(possibly after an Icache miss) while the o3 is switched out. Again,
Post by Amin Farmahini
previously, we used to detect such a situation and activate the fetch
only
if no drain is pending.
Post by Amin Farmahini
I don't think this should by possible any more, it's most likely a bug
somewhere else if the assertion triggers. BaseCPU::takeOverFrom
disconnects
both the icache and dcache when switching between CPUs, so the CPU
should
never be switched out and connected to a cache at the same time.
Besides,
the new O3 draining should wait for /all/ outstanding requests to
complete
or be squashed. As far as I'm concerned, the the draining code is
buggy if
there are still pending ifetches in a drained system.
Post by Amin Farmahini
I have found a solution to work around these assertion failures, and
I am
not sure if this only happens to me because of the specific way I use
Post by Amin Farmahini
the
O3 draining or not. I just wanted to mention these assertion failures
Post by Amin Farmahini
could
be possible bugs.
The first assertion is almost definitely a bug. I suspect the second
one
could be due to a bug in your configuration scripts or in your CPU
model.
Are you using any of the example scripts? Or have you rolled your own?
If
so, could you send us/me a copy so I can have a look?
Post by Amin Farmahini
//Andreas
______________________________****_________________
gem5-dev mailing list
http://m5sim.org/mailman/****listinfo/gem5-dev<http://m5sim.org/mailman/**listinfo/gem5-dev>
<
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
Post by Amin Farmahini
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
______________________________**_________________
gem5-dev mailing list
http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev>
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Loading...