Discussion:
[gem5-dev] Review Request: arm: mark IT instructions as nops
(too old to reply)
Mitch Hayenga
2013-03-30 02:47:28 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------

Review request for Default.


Description
-------

Mark ARM IT (if-then) instructions as nops.

ARM's IT instructions predicate up to the next 4 instructions on various condition codes. IT instructions really just send control signals to the decoder, after decode they do not read or write any registers. Marking them as nops (along with the other patch that drops nops at decode) saves execution resources and bandwidth.


Diffs
-----

src/arch/arm/isa/insts/misc.isa 47591444a7c5

Diff: http://reviews.gem5.org/r/1805/diff/


Testing
-------

A fast libquantum run.


Thanks,

Mitch Hayenga
Ali Saidi
2013-03-30 14:31:47 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------


While this seems harmless enough, I wonder if there is some interaction between faults/interrupts and the instruction that we should worry about. I haven't given it enough thought to say either way, but it seems like it could be a concern.

- Ali Saidi
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on various condition codes. IT instructions really just send control signals to the decoder, after decode they do not read or write any registers. Marking them as nops (along with the other patch that drops nops at decode) saves execution resources and bandwidth.
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
Mitch Hayenga
2013-03-30 15:54:54 UTC
Permalink
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some interaction between faults/interrupts and the instruction that we should worry about. I haven't given it enough thought to say either way, but it seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able to handle faults and return to execution properly within an IT block. It seems the gem5 solution is probably similar to what a real processor implementation would use, appending the IT state to the PC. So an exception/interrupt within an IT block would just return and the decoder would pick off the extra IT bits from the PC (that detail how to predicate up to the next 3 ops). If the exception/interrupt was just prior to the IT instruction, it would just get sent to the decoder like normal.

I was thinking more on the "discarding nops at decode" part. The only case I think that could give that trouble is self-modifying code, since you'd want to track instruction addresses to know if a snooped write changed a currently executing instruction. But gem5 doesn't really provide that now anyway and you could use cheaper structures to perform that operation (since false positives would be ok).


- Mitch


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on various condition codes. IT instructions really just send control signals to the decoder, after decode they do not read or write any registers. Marking them as nops (along with the other patch that drops nops at decode) saves execution resources and bandwidth.
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
Korey Sewell
2013-04-01 03:50:11 UTC
Permalink
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.

If I recall right, O3 throws aways nops. So when we talk about IPC with
this patch in, we aren't giving the CPU "credit" for doing what's necessary
for the ARM IT instruction right?

I'm thinking there may need to be another patch supplemented to this that
counts the # of times this optimization happens. That way, we have all the
bases covered for instruction/IPC counting.

Thoughts?

-Korey
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we should
worry about. I haven't given it enough thought to say either way, but it
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able to
handle faults and return to execution properly within an IT block. It
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the decoder
would pick off the extra IT bits from the PC (that detail how to predicate
up to the next 3 ops). If the exception/interrupt was just prior to the IT
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The only
case I think that could give that trouble is self-modifying code, since
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really provide
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on various
condition codes. IT instructions really just send control signals to the
decoder, after decode they do not read or write any registers. Marking them
as nops (along with the other patch that drops nops at decode) saves
execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Mitch Hayenga
2013-04-01 06:36:42 UTC
Permalink
Re-sending this so it gets sent to the list.

Yes, right now this would not properly credit IPC for IT instructions,
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.

Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing ops and
infer a dropped IT instruction. This would be a bit hackish and ARM
specific though.
2) Maintaining the current method of sending nops through the pipeline
could be made to work. By going through and modifying the code to be sure
nops did not count against bandwidth or size restrictions. You'd also have
to worry about not impacting stats like rob reads/writes that the McPAT
users would feed to their power models. And at commit you'd still have to
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards IPC, why
not IT instructions as well. They do feed some information to the decoder,
but overall their relative work isn't much more than a nop (being fetched +
decoded). They also potentially do far less work than a prefetch
instruction (which is also not counted).

I personally like 3, since the current subset of instructions counted
towards IPC already seems to have a bit of arbitrariness and would require
no changes.

PS: I coded this up because I noticed a few times where up to 1/5 of my
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.
If I recall right, O3 throws aways nops. So when we talk about IPC with
this patch in, we aren't giving the CPU "credit" for doing what's necessary
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to this that
counts the # of times this optimization happens. That way, we have all the
bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we should
worry about. I haven't given it enough thought to say either way, but it
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able to
handle faults and return to execution properly within an IT block. It
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the decoder
would pick off the extra IT bits from the PC (that detail how to predicate
up to the next 3 ops). If the exception/interrupt was just prior to the IT
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The only
case I think that could give that trouble is self-modifying code, since
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really provide
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control signals
to the decoder, after decode they do not read or write any registers.
Marking them as nops (along with the other patch that drops nops at decode)
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Korey Sewell
2013-04-01 16:16:37 UTC
Permalink
Hi Mitch,
Thanks for the quick response. I pretty much agree with the sentiment that
this is a valid optimization but probably disagree a bit on going forward
with (3).

I think you pose a valid question of "If it's already acceptable to not
count ISA-level nops towards IPC, why not IT instructions as well?". My
answer to that would be that whereas nops/prefetches can safely be ignored
and not affect instruction order, you can't literally ignore an IT
instruction without affecting instruction order.

If I err in that reasoning, then I think I'd be OK with #3, but if it's the
case where the output of the IT instruction is actually needed to alter
control flow then I don't think it's OK to treat it as a nop and ignore it
in stats.

I'd be for #1 actually. Although it may sound "hackish", each ISA does have
it's own quirks and at commit I wouldn't be against checking the
ISA-specific state to figure out if this were a optimized instruction (mark
a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), check
to see if this is flag is asserted but the committed flag isn't. If not,
count it as a committed op.

Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?

-Korey
On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
Post by Mitch Hayenga
Re-sending this so it gets sent to the list.
Yes, right now this would not properly credit IPC for IT instructions,
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.
Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing ops and
infer a dropped IT instruction. This would be a bit hackish and ARM
specific though.
2) Maintaining the current method of sending nops through the pipeline
could be made to work. By going through and modifying the code to be sure
nops did not count against bandwidth or size restrictions. You'd also have
to worry about not impacting stats like rob reads/writes that the McPAT
users would feed to their power models. And at commit you'd still have to
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards IPC,
why not IT instructions as well. They do feed some information to the
decoder, but overall their relative work isn't much more than a nop (being
fetched + decoded). They also potentially do far less work than a prefetch
instruction (which is also not counted).
I personally like 3, since the current subset of instructions counted
towards IPC already seems to have a bit of arbitrariness and would require
no changes.
PS: I coded this up because I noticed a few times where up to 1/5 of my
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.
If I recall right, O3 throws aways nops. So when we talk about IPC with
this patch in, we aren't giving the CPU "credit" for doing what's necessary
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to this that
counts the # of times this optimization happens. That way, we have all the
bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we should
worry about. I haven't given it enough thought to say either way, but it
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able to
handle faults and return to execution properly within an IT block. It
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the decoder
would pick off the extra IT bits from the PC (that detail how to predicate
up to the next 3 ops). If the exception/interrupt was just prior to the IT
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The only
case I think that could give that trouble is self-modifying code, since
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really provide
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control signals
to the decoder, after decode they do not read or write any registers.
Marking them as nops (along with the other patch that drops nops at decode)
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
--
- Korey
Mitch Hayenga
2013-04-01 19:14:56 UTC
Permalink
"Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?"
That's a good one that I'm definitely going to implement.

I think whoever wrote the current IPC counting mechanism was trying to
measure backend IPC and not total IPC. This makes sense by counting data
prefetches but not instruction prefetches towards IPC.

I'm still with ignoring IT instructions though, since it was originally
created when ARM shrank their opcodes for the THUMB instruction set and
didn't have enough bits to do their normal predication encoding. IT
instructions just allow the decoder to save and append these bits to
recreate the full ARM opcode. They've also made IT blocks be as atomic as
possible (only the last instruction is allowed to be a branch and jumps,
other than exception returns, into IT blocks are not permitted). So, in my
mind IT instructions are effectively part of the "instruction" that the
entire block comprises.
Post by Korey Sewell
Hi Mitch,
Thanks for the quick response. I pretty much agree with the sentiment that
this is a valid optimization but probably disagree a bit on going forward
with (3).
I think you pose a valid question of "If it's already acceptable to not
count ISA-level nops towards IPC, why not IT instructions as well?". My
answer to that would be that whereas nops/prefetches can safely be ignored
and not affect instruction order, you can't literally ignore an IT
instruction without affecting instruction order.
If I err in that reasoning, then I think I'd be OK with #3, but if it's
the case where the output of the IT instruction is actually needed to alter
control flow then I don't think it's OK to treat it as a nop and ignore it
in stats.
I'd be for #1 actually. Although it may sound "hackish", each ISA does
have it's own quirks and at commit I wouldn't be against checking the
ISA-specific state to figure out if this were a optimized instruction (mark
a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), check
to see if this is flag is asserted but the committed flag isn't. If not,
count it as a committed op.
Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?
-Korey
On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
Post by Mitch Hayenga
Re-sending this so it gets sent to the list.
Yes, right now this would not properly credit IPC for IT instructions,
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.
Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing ops
and infer a dropped IT instruction. This would be a bit hackish and ARM
specific though.
2) Maintaining the current method of sending nops through the pipeline
could be made to work. By going through and modifying the code to be sure
nops did not count against bandwidth or size restrictions. You'd also have
to worry about not impacting stats like rob reads/writes that the McPAT
users would feed to their power models. And at commit you'd still have to
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards IPC,
why not IT instructions as well. They do feed some information to the
decoder, but overall their relative work isn't much more than a nop (being
fetched + decoded). They also potentially do far less work than a prefetch
instruction (which is also not counted).
I personally like 3, since the current subset of instructions counted
towards IPC already seems to have a bit of arbitrariness and would require
no changes.
PS: I coded this up because I noticed a few times where up to 1/5 of my
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.
If I recall right, O3 throws aways nops. So when we talk about IPC with
this patch in, we aren't giving the CPU "credit" for doing what's necessary
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to this
that counts the # of times this optimization happens. That way, we have all
the bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we should
worry about. I haven't given it enough thought to say either way, but it
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able to
handle faults and return to execution properly within an IT block. It
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the decoder
would pick off the extra IT bits from the PC (that detail how to predicate
up to the next 3 ops). If the exception/interrupt was just prior to the IT
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The only
case I think that could give that trouble is self-modifying code, since
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really provide
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control signals
to the decoder, after decode they do not read or write any registers.
Marking them as nops (along with the other patch that drops nops at decode)
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
--
- Korey
Korey Sewell
2013-04-02 16:50:39 UTC
Permalink
Hi Mitch,
I see what you are saying about the atomicity aspect of the IT block. Those
are fair points. Likewise, it's fair to optimize them about past decode
like you what your patch does.

I'm looking for something extra such that another CPU model (or code) will
not look at that instruction and think it's just a "nop". For instance, the
prefetch instruction is marked with a "Prefetch" flag which allows a CPU
model to check for prefetch and handle them differently if it wishes to.

To me, it looks like the converged solution is:
1) add a flag called "isPurePredicate" (or a better name!) in DynInst.
2) Then, in your patch you can give the instruction two flags: "isNop" and
"isPurePredicate".
3) Finally, when the instruction is removed from the CPU, you check to see
if the "isPurePredicate" is asserted and if the instruction is not
squashed. If that condition is true, increment a stat counting how many
times we performed this optimization.

I'm hoping this both eliminates the IT instruction from the back-end (isNop
flag)and then allows for a fair accounting of that optimization in the end
of simulation stats (isPurePredicate flag).

Would you agree with that?
Post by Mitch Hayenga
"Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?"
That's a good one that I'm definitely going to implement.
I think whoever wrote the current IPC counting mechanism was trying to
measure backend IPC and not total IPC. This makes sense by counting data
prefetches but not instruction prefetches towards IPC.
I'm still with ignoring IT instructions though, since it was originally
created when ARM shrank their opcodes for the THUMB instruction set and
didn't have enough bits to do their normal predication encoding. IT
instructions just allow the decoder to save and append these bits to
recreate the full ARM opcode. They've also made IT blocks be as atomic as
possible (only the last instruction is allowed to be a branch and jumps,
other than exception returns, into IT blocks are not permitted). So, in my
mind IT instructions are effectively part of the "instruction" that the
entire block comprises.
Post by Korey Sewell
Hi Mitch,
Thanks for the quick response. I pretty much agree with the sentiment
that
Post by Korey Sewell
this is a valid optimization but probably disagree a bit on going forward
with (3).
I think you pose a valid question of "If it's already acceptable to not
count ISA-level nops towards IPC, why not IT instructions as well?". My
answer to that would be that whereas nops/prefetches can safely be
ignored
Post by Korey Sewell
and not affect instruction order, you can't literally ignore an IT
instruction without affecting instruction order.
If I err in that reasoning, then I think I'd be OK with #3, but if it's
the case where the output of the IT instruction is actually needed to
alter
Post by Korey Sewell
control flow then I don't think it's OK to treat it as a nop and ignore
it
Post by Korey Sewell
in stats.
I'd be for #1 actually. Although it may sound "hackish", each ISA does
have it's own quirks and at commit I wouldn't be against checking the
ISA-specific state to figure out if this were a optimized instruction
(mark
Post by Korey Sewell
a flag in the DynInst) and when it leaves the O3 cpu (instDone()?), check
to see if this is flag is asserted but the committed flag isn't. If not,
count it as a committed op.
Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?
-Korey
On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
Post by Mitch Hayenga
Re-sending this so it gets sent to the list.
Yes, right now this would not properly credit IPC for IT instructions,
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.
Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing ops
and infer a dropped IT instruction. This would be a bit hackish and ARM
specific though.
2) Maintaining the current method of sending nops through the pipeline
could be made to work. By going through and modifying the code to be
sure
Post by Korey Sewell
Post by Mitch Hayenga
nops did not count against bandwidth or size restrictions. You'd also
have
Post by Korey Sewell
Post by Mitch Hayenga
to worry about not impacting stats like rob reads/writes that the McPAT
users would feed to their power models. And at commit you'd still have
to
Post by Korey Sewell
Post by Mitch Hayenga
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards IPC,
why not IT instructions as well. They do feed some information to the
decoder, but overall their relative work isn't much more than a nop
(being
Post by Korey Sewell
Post by Mitch Hayenga
fetched + decoded). They also potentially do far less work than a
prefetch
Post by Korey Sewell
Post by Mitch Hayenga
instruction (which is also not counted).
I personally like 3, since the current subset of instructions counted
towards IPC already seems to have a bit of arbitrariness and would
require
Post by Korey Sewell
Post by Mitch Hayenga
no changes.
PS: I coded this up because I noticed a few times where up to 1/5 of my
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.
If I recall right, O3 throws aways nops. So when we talk about IPC with
this patch in, we aren't giving the CPU "credit" for doing what's
necessary
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to this
that counts the # of times this optimization happens. That way, we
have all
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
the bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we
should
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
worry about. I haven't given it enough thought to say either way, but
it
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able
to
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
handle faults and return to execution properly within an IT block. It
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the
decoder
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
would pick off the extra IT bits from the PC (that detail how to
predicate
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
up to the next 3 ops). If the exception/interrupt was just prior to
the IT
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The only
case I think that could give that trouble is self-modifying code,
since
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really
provide
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control
signals
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
to the decoder, after decode they do not read or write any registers.
Marking them as nops (along with the other patch that drops nops at
decode)
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Mitch Hayenga
2013-04-02 20:59:44 UTC
Permalink
Yeah, I'll see if I get time to do a more full solution later this week. I
also realized this current patch would break FS mode, since the frontend
signals instruction fetch page faults by creating a nop with a fault
attached (this patch would just discard that nop). So, checking for a
fault would be required before discarding.

Discarding unconditional jumps also works. Did a quick mod where I
discarded them if the following was true "if (isUncondCtrl() &&
isDirectCtrl() && !inst->writesRegs())". Where writesRegs was returned
true if the instruction wrote something other than the pc or zero reg (on
ARM). But using a flag in the isa files would be a better way than
checking the destination registers explicitly.
Post by Korey Sewell
Hi Mitch,
I see what you are saying about the atomicity aspect of the IT block. Those
are fair points. Likewise, it's fair to optimize them about past decode
like you what your patch does.
I'm looking for something extra such that another CPU model (or code) will
not look at that instruction and think it's just a "nop". For instance, the
prefetch instruction is marked with a "Prefetch" flag which allows a CPU
model to check for prefetch and handle them differently if it wishes to.
1) add a flag called "isPurePredicate" (or a better name!) in DynInst.
2) Then, in your patch you can give the instruction two flags: "isNop" and
"isPurePredicate".
3) Finally, when the instruction is removed from the CPU, you check to see
if the "isPurePredicate" is asserted and if the instruction is not
squashed. If that condition is true, increment a stat counting how many
times we performed this optimization.
I'm hoping this both eliminates the IT instruction from the back-end (isNop
flag)and then allows for a fair accounting of that optimization in the end
of simulation stats (isPurePredicate flag).
Would you agree with that?
On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga <
Post by Mitch Hayenga
"Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?"
That's a good one that I'm definitely going to implement.
I think whoever wrote the current IPC counting mechanism was trying to
measure backend IPC and not total IPC. This makes sense by counting data
prefetches but not instruction prefetches towards IPC.
I'm still with ignoring IT instructions though, since it was originally
created when ARM shrank their opcodes for the THUMB instruction set and
didn't have enough bits to do their normal predication encoding. IT
instructions just allow the decoder to save and append these bits to
recreate the full ARM opcode. They've also made IT blocks be as atomic
as
Post by Mitch Hayenga
possible (only the last instruction is allowed to be a branch and jumps,
other than exception returns, into IT blocks are not permitted). So, in
my
Post by Mitch Hayenga
mind IT instructions are effectively part of the "instruction" that the
entire block comprises.
Post by Korey Sewell
Hi Mitch,
Thanks for the quick response. I pretty much agree with the sentiment
that
Post by Korey Sewell
this is a valid optimization but probably disagree a bit on going
forward
Post by Mitch Hayenga
Post by Korey Sewell
with (3).
I think you pose a valid question of "If it's already acceptable to not
count ISA-level nops towards IPC, why not IT instructions as well?". My
answer to that would be that whereas nops/prefetches can safely be
ignored
Post by Korey Sewell
and not affect instruction order, you can't literally ignore an IT
instruction without affecting instruction order.
If I err in that reasoning, then I think I'd be OK with #3, but if it's
the case where the output of the IT instruction is actually needed to
alter
Post by Korey Sewell
control flow then I don't think it's OK to treat it as a nop and ignore
it
Post by Korey Sewell
in stats.
I'd be for #1 actually. Although it may sound "hackish", each ISA does
have it's own quirks and at commit I wouldn't be against checking the
ISA-specific state to figure out if this were a optimized instruction
(mark
Post by Korey Sewell
a flag in the DynInst) and when it leaves the O3 cpu (instDone()?),
check
Post by Mitch Hayenga
Post by Korey Sewell
to see if this is flag is asserted but the committed flag isn't. If
not,
Post by Mitch Hayenga
Post by Korey Sewell
count it as a committed op.
Lastly, this optimization could also applied to any branch instructions
that get resolved at decode, right?
-Korey
On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
Post by Mitch Hayenga
Re-sending this so it gets sent to the list.
Yes, right now this would not properly credit IPC for IT instructions,
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.
Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing ops
and infer a dropped IT instruction. This would be a bit hackish and
ARM
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
specific though.
2) Maintaining the current method of sending nops through the
pipeline
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
could be made to work. By going through and modifying the code to be
sure
Post by Korey Sewell
Post by Mitch Hayenga
nops did not count against bandwidth or size restrictions. You'd also
have
Post by Korey Sewell
Post by Mitch Hayenga
to worry about not impacting stats like rob reads/writes that the
McPAT
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
users would feed to their power models. And at commit you'd still
have
Post by Mitch Hayenga
to
Post by Korey Sewell
Post by Mitch Hayenga
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards
IPC,
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
why not IT instructions as well. They do feed some information to the
decoder, but overall their relative work isn't much more than a nop
(being
Post by Korey Sewell
Post by Mitch Hayenga
fetched + decoded). They also potentially do far less work than a
prefetch
Post by Korey Sewell
Post by Mitch Hayenga
instruction (which is also not counted).
I personally like 3, since the current subset of instructions counted
towards IPC already seems to have a bit of arbitrariness and would
require
Post by Korey Sewell
Post by Mitch Hayenga
no changes.
PS: I coded this up because I noticed a few times where up to 1/5 of
my
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on stats.
If I recall right, O3 throws aways nops. So when we talk about IPC
with
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
this patch in, we aren't giving the CPU "credit" for doing what's
necessary
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to this
that counts the # of times this optimization happens. That way, we
have all
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
the bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we
should
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
worry about. I haven't given it enough thought to say either way,
but
Post by Mitch Hayenga
it
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be able
to
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
handle faults and return to execution properly within an IT block.
It
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
seems the gem5 solution is probably similar to what a real processor
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the
decoder
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
would pick off the extra IT bits from the PC (that detail how to
predicate
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
up to the next 3 ops). If the exception/interrupt was just prior to
the IT
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The
only
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
case I think that could give that trouble is self-modifying code,
since
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
you'd want to track instruction addresses to know if a snooped write
changed a currently executing instruction. But gem5 doesn't really
provide
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
that now anyway and you could use cheaper structures to perform that
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control
signals
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
to the decoder, after decode they do not read or write any
registers.
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
Marking them as nops (along with the other patch that drops nops at
decode)
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell
2013-04-02 21:06:04 UTC
Permalink
Good catch on the FS side of things and thanks for looking into this. Once
we get the patch settled, this will be a useful optimization for all the O3
users.

-Korey


On Tue, Apr 2, 2013 at 1:59 PM, Mitch Hayenga
Post by Mitch Hayenga
Yeah, I'll see if I get time to do a more full solution later this week. I
also realized this current patch would break FS mode, since the frontend
signals instruction fetch page faults by creating a nop with a fault
attached (this patch would just discard that nop). So, checking for a
fault would be required before discarding.
Discarding unconditional jumps also works. Did a quick mod where I
discarded them if the following was true "if (isUncondCtrl() &&
isDirectCtrl() && !inst->writesRegs())". Where writesRegs was returned
true if the instruction wrote something other than the pc or zero reg (on
ARM). But using a flag in the isa files would be a better way than
checking the destination registers explicitly.
Post by Korey Sewell
Hi Mitch,
I see what you are saying about the atomicity aspect of the IT block.
Those
Post by Korey Sewell
are fair points. Likewise, it's fair to optimize them about past decode
like you what your patch does.
I'm looking for something extra such that another CPU model (or code)
will
Post by Korey Sewell
not look at that instruction and think it's just a "nop". For instance,
the
Post by Korey Sewell
prefetch instruction is marked with a "Prefetch" flag which allows a CPU
model to check for prefetch and handle them differently if it wishes to.
1) add a flag called "isPurePredicate" (or a better name!) in DynInst.
2) Then, in your patch you can give the instruction two flags: "isNop"
and
Post by Korey Sewell
"isPurePredicate".
3) Finally, when the instruction is removed from the CPU, you check to
see
Post by Korey Sewell
if the "isPurePredicate" is asserted and if the instruction is not
squashed. If that condition is true, increment a stat counting how many
times we performed this optimization.
I'm hoping this both eliminates the IT instruction from the back-end
(isNop
Post by Korey Sewell
flag)and then allows for a fair accounting of that optimization in the
end
Post by Korey Sewell
of simulation stats (isPurePredicate flag).
Would you agree with that?
On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga <
Post by Mitch Hayenga
"Lastly, this optimization could also applied to any branch
instructions
Post by Korey Sewell
Post by Mitch Hayenga
that get resolved at decode, right?"
That's a good one that I'm definitely going to implement.
I think whoever wrote the current IPC counting mechanism was trying to
measure backend IPC and not total IPC. This makes sense by counting
data
Post by Korey Sewell
Post by Mitch Hayenga
prefetches but not instruction prefetches towards IPC.
I'm still with ignoring IT instructions though, since it was originally
created when ARM shrank their opcodes for the THUMB instruction set and
didn't have enough bits to do their normal predication encoding. IT
instructions just allow the decoder to save and append these bits to
recreate the full ARM opcode. They've also made IT blocks be as atomic
as
Post by Mitch Hayenga
possible (only the last instruction is allowed to be a branch and
jumps,
Post by Korey Sewell
Post by Mitch Hayenga
other than exception returns, into IT blocks are not permitted). So,
in
Post by Korey Sewell
my
Post by Mitch Hayenga
mind IT instructions are effectively part of the "instruction" that the
entire block comprises.
Post by Korey Sewell
Hi Mitch,
Thanks for the quick response. I pretty much agree with the sentiment
that
Post by Korey Sewell
this is a valid optimization but probably disagree a bit on going
forward
Post by Mitch Hayenga
Post by Korey Sewell
with (3).
I think you pose a valid question of "If it's already acceptable to
not
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
count ISA-level nops towards IPC, why not IT instructions as well?".
My
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
answer to that would be that whereas nops/prefetches can safely be
ignored
Post by Korey Sewell
and not affect instruction order, you can't literally ignore an IT
instruction without affecting instruction order.
If I err in that reasoning, then I think I'd be OK with #3, but if
it's
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
the case where the output of the IT instruction is actually needed to
alter
Post by Korey Sewell
control flow then I don't think it's OK to treat it as a nop and
ignore
Post by Korey Sewell
Post by Mitch Hayenga
it
Post by Korey Sewell
in stats.
I'd be for #1 actually. Although it may sound "hackish", each ISA
does
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
have it's own quirks and at commit I wouldn't be against checking the
ISA-specific state to figure out if this were a optimized instruction
(mark
Post by Korey Sewell
a flag in the DynInst) and when it leaves the O3 cpu (instDone()?),
check
Post by Mitch Hayenga
Post by Korey Sewell
to see if this is flag is asserted but the committed flag isn't. If
not,
Post by Mitch Hayenga
Post by Korey Sewell
count it as a committed op.
Lastly, this optimization could also applied to any branch
instructions
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
that get resolved at decode, right?
-Korey
On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
Post by Mitch Hayenga
Re-sending this so it gets sent to the list.
Yes, right now this would not properly credit IPC for IT
instructions,
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
since nops don't count towards IPC. I overlooked that since I use
execution time as my evaluation metric.
Three quick thoughts on this...
1) A quick solution would be to look at the ITstate of committing
ops
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
and infer a dropped IT instruction. This would be a bit hackish and
ARM
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
specific though.
2) Maintaining the current method of sending nops through the
pipeline
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
could be made to work. By going through and modifying the code to
be
Post by Korey Sewell
Post by Mitch Hayenga
sure
Post by Korey Sewell
Post by Mitch Hayenga
nops did not count against bandwidth or size restrictions. You'd
also
Post by Korey Sewell
Post by Mitch Hayenga
have
Post by Korey Sewell
Post by Mitch Hayenga
to worry about not impacting stats like rob reads/writes that the
McPAT
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
users would feed to their power models. And at commit you'd still
have
Post by Mitch Hayenga
to
Post by Korey Sewell
Post by Mitch Hayenga
special case the IT instruction to make sure it got counted.
3) If it's already acceptable to not count ISA-level nops towards
IPC,
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
why not IT instructions as well. They do feed some information to
the
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
decoder, but overall their relative work isn't much more than a nop
(being
Post by Korey Sewell
Post by Mitch Hayenga
fetched + decoded). They also potentially do far less work than a
prefetch
Post by Korey Sewell
Post by Mitch Hayenga
instruction (which is also not counted).
I personally like 3, since the current subset of instructions
counted
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
towards IPC already seems to have a bit of arbitrariness and would
require
Post by Korey Sewell
Post by Mitch Hayenga
no changes.
PS: I coded this up because I noticed a few times where up to 1/5 of
my
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
instruction window could be occupied by "useless" IT instructions
Post by Korey Sewell
Hi Mitch,
Another thing I wonder about with this patch is the impact on
stats.
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
If I recall right, O3 throws aways nops. So when we talk about IPC
with
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
this patch in, we aren't giving the CPU "credit" for doing what's
necessary
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
for the ARM IT instruction right?
I'm thinking there may need to be another patch supplemented to
this
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
that counts the # of times this optimization happens. That way, we
have all
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
the bases covered for instruction/IPC counting.
Thoughts?
-Korey
On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
Post by Ali Saidi
Post by Mitch Hayenga
Post by Ali Saidi
While this seems harmless enough, I wonder if there is some
interaction between faults/interrupts and the instruction that we
should
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
worry about. I haven't given it enough thought to say either way,
but
Post by Mitch Hayenga
it
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
seems like it could be a concern.
I thought about it somewhat, since IT blocks are required to be
able
Post by Korey Sewell
Post by Mitch Hayenga
to
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
handle faults and return to execution properly within an IT block.
It
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
seems the gem5 solution is probably similar to what a real
processor
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
implementation would use, appending the IT state to the PC. So an
exception/interrupt within an IT block would just return and the
decoder
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
would pick off the extra IT bits from the PC (that detail how to
predicate
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
up to the next 3 ops). If the exception/interrupt was just prior
to
Post by Korey Sewell
Post by Mitch Hayenga
the IT
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
instruction, it would just get sent to the decoder like normal.
I was thinking more on the "discarding nops at decode" part. The
only
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
case I think that could give that trouble is self-modifying code,
since
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
you'd want to track instruction addresses to know if a snooped
write
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
changed a currently executing instruction. But gem5 doesn't
really
Post by Korey Sewell
Post by Mitch Hayenga
provide
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
that now anyway and you could use cheaper structures to perform
that
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
operation (since false positives would be ok).
- Mitch
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/#review4177
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on
various condition codes. IT instructions really just send control
signals
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
to the decoder, after decode they do not read or write any
registers.
Post by Mitch Hayenga
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
Marking them as nops (along with the other patch that drops nops
at
Post by Korey Sewell
Post by Mitch Hayenga
decode)
Post by Korey Sewell
Post by Mitch Hayenga
Post by Korey Sewell
Post by Ali Saidi
saves execution resources and bandwidth.
Post by Mitch Hayenga
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Ali Saidi
2013-07-15 15:10:58 UTC
Permalink
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1805/#review4511
-----------------------------------------------------------


Hi Mitch,

Have you run FS code with this change?

Thanks,
Ali


- Ali Saidi
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on various condition codes. IT instructions really just send control signals to the decoder, after decode they do not read or write any registers. Marking them as nops (along with the other patch that drops nops at decode) saves execution resources and bandwidth.
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
Mitch Hayenga
2013-07-15 17:28:40 UTC
Permalink
Post by Mitch Hayenga
Post by Korey Sewell
Hi Mitch,
Have you run FS code with this change?
Thanks,
Ali
Nope, I haven't run FS in months. All of my current benchmarking infrastructure (simpoints, etc) is built on ARM SE. I'm probably one of the few people who do that.


- Mitch


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1805/#review4511
-----------------------------------------------------------
Post by Mitch Hayenga
-----------------------------------------------------------
http://reviews.gem5.org/r/1805/
-----------------------------------------------------------
(Updated March 29, 2013, 7:47 p.m.)
Review request for Default.
Repository: gem5
Description
-------
Mark ARM IT (if-then) instructions as nops.
ARM's IT instructions predicate up to the next 4 instructions on various condition codes. IT instructions really just send control signals to the decoder, after decode they do not read or write any registers. Marking them as nops (along with the other patch that drops nops at decode) saves execution resources and bandwidth.
Diffs
-----
src/arch/arm/isa/insts/misc.isa 47591444a7c5
Diff: http://reviews.gem5.org/r/1805/diff/
Testing
-------
A fast libquantum run.
Thanks,
Mitch Hayenga
Loading...