Discussion:
[gem5-dev] Bug in x86 stack instructions
Jason Lowe-Power
2017-01-24 00:04:03 UTC
Permalink
To those of you with more x86 ISA implementation knowledge than I have:

I've been working through a bug one of our users found (thanks Sanchayan!).
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop. See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.

This causes the call instruction to be decoded with with the "address size
override prefix", which is correct, in a sense. From what I can tell, this
is passed to the call instruction via "-env.dataSize" (see call instruction
implementation below).

def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call

limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};

Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack references
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.

Thoughts on how to fix this?

Thanks,
Jason
Steve Reinhardt
2017-01-24 15:04:21 UTC
Permalink
My recollection of how all this works is that the arguments to the 'st'
micro-op get turned into arguments to a call to the StoreOp constructor:

597 <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l597>
class StoreOp(LdStOp):
598 <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l598>
def __init__(self, data, segment, addr, disp = 0,
599 <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l599>
dataSize="env.dataSize",
600 <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l600>
addressSize="env.addressSize",
601 <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l601>
atCPL0=False, nonSpec=False):

so the "-env.dataSize" you see is actually the displacement for the store,
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the "normal"
way, i.e., including the effects of the override prefix.

Poking around some more, there's an 'ssz' symbol that maps to env.stackSize
[1] which gets used a lot in stack operations; for example, right after the
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer. The
calculation of env.stackSize seems to follow the rule you mentioned about
being fixed at 64 bits in 64-bit mode [2, 3].

So it might be sufficient to add ', addressSize=ssz' to the end of the 'st'
micro-op. Oddly though I don't see addressSize being set on any other stack
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and force
the address size to be stackSize (rather than adding this extra parameter
in a dozen different places). I wouldn't know where best to do that though,
so the manual override seems best for now.

Steve

[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks Sanchayan!).
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop. See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the "address size
override prefix", which is correct, in a sense. From what I can tell, this
is passed to the call instruction via "-env.dataSize" (see call instruction
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack references
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Jason Lowe-Power
2017-01-24 16:37:31 UTC
Permalink
Hi Steve,

That was super helpful. I'm now a step closer to solving this!

Your suggestion of =ssz, lead me to search for the uses of that in x86, and
it turns out that almost all of other stack instructions have dataSize=ssz.
So, I added both dataSize=ssz and addressSize=ssz to the call instruction,
though I think only the addressSize is actually needed, but I'm not certain.

Now, the address is passed to the Request object correctly, but the program
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is true.
Thus, the TLB only uses the lower 32 bits of the 64-bit address.

Any ideas on how to change the instruction's memFlags from the macro-op
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l334
.

It would be nice if we could fix this in the decoder. I think the logic
should be "if the address prefix is set and this is an implicit stack
operation, ignore the address prefix". However, I'm not sure how to tell if
the instruction is "an implicit stack operation" from the decoder.

Thanks,
Jason

On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt <***@gmail.com> wrote:

My recollection of how all this works is that the arguments to the 'st'
micro-op get turned into arguments to a call to the StoreOp constructor:

597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l597
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l597>
class StoreOp(LdStOp):
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l601
atCPL0=False, nonSpec=False):

so the "-env.dataSize" you see is actually the displacement for the store,
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the "normal"
way, i.e., including the effects of the override prefix.

Poking around some more, there's an 'ssz' symbol that maps to env.stackSize
[1] which gets used a lot in stack operations; for example, right after the
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer. The
calculation of env.stackSize seems to follow the rule you mentioned about
being fixed at 64 bits in 64-bit mode [2, 3].

So it might be sufficient to add ', addressSize=ssz' to the end of the 'st'
micro-op. Oddly though I don't see addressSize being set on any other stack
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and force
the address size to be stackSize (rather than adding this extra parameter
in a dozen different places). I wouldn't know where best to do that though,
so the manual override seems best for now.

Steve

[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop. See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the "address size
override prefix", which is correct, in a sense. From what I can tell, this
is passed to the call instruction via "-env.dataSize" (see call instruction
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack references
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
gem5-***@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
Steve Reinhardt
2017-01-24 18:18:56 UTC
Permalink
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder when it
sees the address size override prefix [1]) directly, while the legacy.addr
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.

Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once in the
decoder [2] and again in the TLB [3] and it's not clear how to suppress the
latter one.

I suppose one gnarly way of doing it would be to go into the micro-op
definition somewhere and clear the AddrSizeFlagBit out of memFlags if
addressSize != env.addressSize (i.e., the address size was overridden).
There's probably a cleaner way, but that's the easiest path I can think of
(if it even works).

[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in x86, and
it turns out that almost all of other stack instructions have dataSize=ssz.
So, I added both dataSize=ssz and addressSize=ssz to the call instruction,
though I think only the addressSize is actually needed, but I'm not certain.
Now, the address is passed to the Request object correctly, but the program
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is true.
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the macro-op
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the logic
should be "if the address prefix is set and this is an implicit stack
operation, ignore the address prefix". However, I'm not sure how to tell if
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the 'st'
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l597
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for the store,
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the "normal"
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to env.stackSize
[1] which gets used a lot in stack operations; for example, right after the
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer. The
calculation of env.stackSize seems to follow the rule you mentioned about
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of the 'st'
micro-op. Oddly though I don't see addressSize being set on any other stack
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and force
the address size to be stackSize (rather than adding this extra parameter
in a dozen different places). I wouldn't know where best to do that though,
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop. See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the "address
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can tell,
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Jason Lowe-Power
2017-01-24 23:28:15 UTC
Permalink
Thanks for helping me work this out.

I got the binary working by completely removing the AddrSizeFlagBit. The
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is set it
truncates the address to 32-bits removing the upper-order bits.

By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.

My next question: How can I have any confidence that this doesn't break
other things? Especially when we're talking about 32-bit compatibility
mode. Other than the hello32 binary, are there other things I should run?

My intuition says this is an OK change. The addr field in the Request
should never have the upper-order 32 bits set if the instruction is a
32-bit-mode instruction. The instruction implementation already takes care
of this, as I found with this bug.

Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?

If not, I'll post a patch for this tomorrow.

Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder when it
sees the address size override prefix [1]) directly, while the legacy.addr
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once in the
decoder [2] and again in the TLB [3] and it's not clear how to suppress the
latter one.
I suppose one gnarly way of doing it would be to go into the micro-op
definition somewhere and clear the AddrSizeFlagBit out of memFlags if
addressSize != env.addressSize (i.e., the address size was overridden).
There's probably a cleaner way, but that's the easiest path I can think of
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in x86,
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm not certain.
Now, the address is passed to the Request object correctly, but the
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is true.
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the macro-op
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the logic
should be "if the address prefix is set and this is an implicit stack
operation, ignore the address prefix". However, I'm not sure how to tell
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the 'st'
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ldstop.isa#l597
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for the
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the "normal"
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example, right after
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer. The
calculation of env.stackSize seems to follow the rule you mentioned about
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of the
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any other
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and force
the address size to be stackSize (rather than adding this extra parameter
in a dozen different places). I wouldn't know where best to do that
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop. See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the "address
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can tell,
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Gabe Black
2017-01-25 08:03:43 UTC
Permalink
Hi. I think you guys are generally interpreting this code correctly, but
I'll throw in my two cents. If you look at the pseudo code here:

http://www.felixcloutier.com/x86/CALL.html

then it appears that the size of the return address you push onto the stack
should be based on the operand size, ie. the data size. That would mean
that the displacement for the store should be -env.dataSize like it is. By
the same logic, the subtraction below it should also be env.dataSize
(symbolically dsz) and not ssz.

You are also correct, I think, that since the instruction references the
stack segment, it's address size should be set to the stack size when doing
memory operations. Adding addressSize=ssz to the st microop should do that.

According to that pseudo code, the size of the target IP is also the
operand size, but that's the size the wrip microop will use by default and
so should be fine as written.

I think if you make those two changes (change ssz to dsz in the sub, add
addressSize=ssz to the st), then this macroop definition will be correct.
It would be worthwhile to check the other definitions of CALL and make sure
they aren't similarly broken.

Gabe
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the AddrSizeFlagBit. The
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is set it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't break
other things? Especially when we're talking about 32-bit compatibility
mode. Other than the hello32 binary, are there other things I should run?
My intuition says this is an OK change. The addr field in the Request
should never have the upper-order 32 bits set if the instruction is a
32-bit-mode instruction. The instruction implementation already takes care
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder when it
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once in
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to suppress
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the micro-op
definition somewhere and clear the AddrSizeFlagBit out of memFlags if
addressSize != env.addressSize (i.e., the address size was overridden).
There's probably a cleaner way, but that's the easiest path I can think
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in x86,
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm not certain.
Now, the address is passed to the Request object correctly, but the
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is true.
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the macro-op
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the logic
should be "if the address prefix is set and this is an implicit stack
operation, ignore the address prefix". However, I'm not sure how to
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the 'st'
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/
microops/ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for the
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the "normal"
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example, right after
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer. The
calculation of env.stackSize seems to follow the rule you mentioned
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of the
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any other
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this extra
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do that
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge than I
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop.
See
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the "address
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can tell,
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Gabe Black
2017-01-25 08:41:34 UTC
Permalink
Forgive me if I'm repeating a bit of what you said, Steve, but I looked a
little more and there's more to fix.

Also, if you look at x86/isa/microops/ldstop.isa, you'll find the python
classes which receive the parameters of the load and store microops and
translate them into C++ instantiations of the underlying microop classes.
In their __init__ methods, you'll see at the end where the AddrSizeFlagBit
is set based on whether or not the machInst.legacy.addr bit is set,
basically whether the address size prefix had been found when the
instruction was decoded. For the two classes involved (LdStOp and
BigLdStOp), I think you'll want to add a check which doesn't set
AddrSizeFlagBit if the segment is the stack segment.

Additionally, you'll want to make sure the TLB knows to use the stack
segment size (as opposed to either the default or alternate address sizes)
when manipulating the segment offset. You could add a new ISA specific flag
for x86 memory requests (there seems to be room for exactly one more in
arch/x86/ldstflags.hh) and set it if the segment is the stack in those ld
and st ops, similar to how AddrSizeFlagBit is being set. Then in the TLB,
specifically where it's calculating logSize, you'll want to make it
recognize your new flag and use the stack size from the m5Reg (called just
m5Reg.stack, I think) instead of either of the address size values.

This is a little gross because it means there's some calculation going on
when address translation is being done (frequently) when it could have been
done earlier during decode (less frequently). One reason I did things this
way (all speculation, I've long since forgot) is that it makes the decoded
instructions hold less state. The same bytes may not always refer to the
same sizes depending on the segmentation state in the machine, so if those
particular bytes are being decoded, the gem5 decode cache needs to either
consider that segmentation state when deciding what instruction to spit
out. If it didn't, it might return an instruction which statically says to
use 32 bit addresses (for instance), when now segmentation state says it
the same bytes should mean it should use 16 bit addresses. Instead, it canh
be factored in dynamically using external state (the m5Reg) to an
instruction that says which version of the sizes to use but doesn't care
what values those actually map to at that moment.

Gabe
Post by Gabe Black
Hi. I think you guys are generally interpreting this code correctly, but
http://www.felixcloutier.com/x86/CALL.html
then it appears that the size of the return address you push onto the
stack should be based on the operand size, ie. the data size. That would
mean that the displacement for the store should be -env.dataSize like it
is. By the same logic, the subtraction below it should also be env.dataSize
(symbolically dsz) and not ssz.
You are also correct, I think, that since the instruction references the
stack segment, it's address size should be set to the stack size when doing
memory operations. Adding addressSize=ssz to the st microop should do that.
According to that pseudo code, the size of the target IP is also the
operand size, but that's the size the wrip microop will use by default and
so should be fine as written.
I think if you make those two changes (change ssz to dsz in the sub, add
addressSize=ssz to the st), then this macroop definition will be correct.
It would be worthwhile to check the other definitions of CALL and make sure
they aren't similarly broken.
Gabe
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the AddrSizeFlagBit. The
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is set it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't break
other things? Especially when we're talking about 32-bit compatibility
mode. Other than the hello32 binary, are there other things I should run?
My intuition says this is an OK change. The addr field in the Request
should never have the upper-order 32 bits set if the instruction is a
32-bit-mode instruction. The instruction implementation already takes care
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder when
it
Post by Steve Reinhardt
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once in
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to suppress
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the micro-op
definition somewhere and clear the AddrSizeFlagBit out of memFlags if
addressSize != env.addressSize (i.e., the address size was overridden).
There's probably a cleaner way, but that's the easiest path I can think
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in
x86,
Post by Steve Reinhardt
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm not certain.
Now, the address is passed to the Request object correctly, but the
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is true.
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the
macro-op
Post by Steve Reinhardt
Post by Jason Lowe-Power
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the
logic
Post by Steve Reinhardt
Post by Jason Lowe-Power
should be "if the address prefix is set and this is an implicit stack
operation, ignore the address prefix". However, I'm not sure how to
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the
'st'
Post by Steve Reinhardt
Post by Jason Lowe-Power
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for the
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the
"normal"
Post by Steve Reinhardt
Post by Jason Lowe-Power
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example, right
after
Post by Steve Reinhardt
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer.
The
Post by Steve Reinhardt
Post by Jason Lowe-Power
calculation of env.stackSize seems to follow the rule you mentioned
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of the
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any other
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other stack
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this extra
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do that
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microasm.isa#107
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge than I
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction prefix
(address size override) as an optimization instead of using a nop.
See
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the
"address
Post by Steve Reinhardt
Post by Jason Lowe-Power
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can
tell,
Post by Steve Reinhardt
Post by Jason Lowe-Power
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions that
implicitly access the stack segment (SS), the address size for stack
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Steve Reinhardt
2017-01-25 16:37:26 UTC
Permalink
Thanks, Gabe! That's all really helpful. I appreciate you taking the time
to respond.

One concern I have is that the portion of the manual Jason quoted says the
rule holds "For instructions that
*implicitly* access the stack segment (SS)" (emphasis mine)... would you
say then that instructions that *explicitly* choose SS as the segment
(e.g., with a segment override prefix) would not follow this rule? If so,
then it may not be correct to do things inside the instruction based solely
on the fact that segment==SS. Or maybe that's a case that just can't occur
so we don't have to worry about it?

Also, would there be any harm in setting the AddrSizeFlagBit based on
whether addressSize==32 rather than looking at the legacy.addr bit? If not,
this seems like a simpler solution to me.

Thanks again!

Steve
Post by Gabe Black
Forgive me if I'm repeating a bit of what you said, Steve, but I looked a
little more and there's more to fix.
Also, if you look at x86/isa/microops/ldstop.isa, you'll find the python
classes which receive the parameters of the load and store microops and
translate them into C++ instantiations of the underlying microop classes.
In their __init__ methods, you'll see at the end where the AddrSizeFlagBit
is set based on whether or not the machInst.legacy.addr bit is set,
basically whether the address size prefix had been found when the
instruction was decoded. For the two classes involved (LdStOp and
BigLdStOp), I think you'll want to add a check which doesn't set
AddrSizeFlagBit if the segment is the stack segment.
Additionally, you'll want to make sure the TLB knows to use the stack
segment size (as opposed to either the default or alternate address sizes)
when manipulating the segment offset. You could add a new ISA specific flag
for x86 memory requests (there seems to be room for exactly one more in
arch/x86/ldstflags.hh) and set it if the segment is the stack in those ld
and st ops, similar to how AddrSizeFlagBit is being set. Then in the TLB,
specifically where it's calculating logSize, you'll want to make it
recognize your new flag and use the stack size from the m5Reg (called just
m5Reg.stack, I think) instead of either of the address size values.
This is a little gross because it means there's some calculation going on
when address translation is being done (frequently) when it could have been
done earlier during decode (less frequently). One reason I did things this
way (all speculation, I've long since forgot) is that it makes the decoded
instructions hold less state. The same bytes may not always refer to the
same sizes depending on the segmentation state in the machine, so if those
particular bytes are being decoded, the gem5 decode cache needs to either
consider that segmentation state when deciding what instruction to spit
out. If it didn't, it might return an instruction which statically says to
use 32 bit addresses (for instance), when now segmentation state says it
the same bytes should mean it should use 16 bit addresses. Instead, it canh
be factored in dynamically using external state (the m5Reg) to an
instruction that says which version of the sizes to use but doesn't care
what values those actually map to at that moment.
Gabe
Post by Gabe Black
Hi. I think you guys are generally interpreting this code correctly, but
http://www.felixcloutier.com/x86/CALL.html
then it appears that the size of the return address you push onto the
stack should be based on the operand size, ie. the data size. That would
mean that the displacement for the store should be -env.dataSize like it
is. By the same logic, the subtraction below it should also be
env.dataSize
Post by Gabe Black
(symbolically dsz) and not ssz.
You are also correct, I think, that since the instruction references the
stack segment, it's address size should be set to the stack size when
doing
Post by Gabe Black
memory operations. Adding addressSize=ssz to the st microop should do
that.
Post by Gabe Black
According to that pseudo code, the size of the target IP is also the
operand size, but that's the size the wrip microop will use by default
and
Post by Gabe Black
so should be fine as written.
I think if you make those two changes (change ssz to dsz in the sub, add
addressSize=ssz to the st), then this macroop definition will be correct.
It would be worthwhile to check the other definitions of CALL and make
sure
Post by Gabe Black
they aren't similarly broken.
Gabe
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the AddrSizeFlagBit. The
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is set it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't break
other things? Especially when we're talking about 32-bit compatibility
mode. Other than the hello32 binary, are there other things I should
run?
Post by Gabe Black
Post by Jason Lowe-Power
My intuition says this is an OK change. The addr field in the Request
should never have the upper-order 32 bits set if the instruction is a
32-bit-mode instruction. The instruction implementation already takes
care
Post by Gabe Black
Post by Jason Lowe-Power
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder when
it
Post by Steve Reinhardt
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once
in
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to
suppress
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the micro-op
definition somewhere and clear the AddrSizeFlagBit out of memFlags if
addressSize != env.addressSize (i.e., the address size was
overridden).
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
There's probably a cleaner way, but that's the easiest path I can
think
Post by Gabe Black
Post by Jason Lowe-Power
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in
x86,
Post by Steve Reinhardt
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm not certain.
Now, the address is passed to the Request object correctly, but the
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is
true.
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the
macro-op
Post by Steve Reinhardt
Post by Jason Lowe-Power
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the
logic
Post by Steve Reinhardt
Post by Jason Lowe-Power
should be "if the address prefix is set and this is an implicit
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
operation, ignore the address prefix". However, I'm not sure how to
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the
'st'
Post by Steve Reinhardt
Post by Jason Lowe-Power
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for the
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the
"normal"
Post by Steve Reinhardt
Post by Jason Lowe-Power
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example, right
after
Post by Steve Reinhardt
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack pointer.
The
Post by Steve Reinhardt
Post by Jason Lowe-Power
calculation of env.stackSize seems to follow the rule you mentioned
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of
the
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any
other
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this extra
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do that
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
microasm.isa#107
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge than I
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction
prefix
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
(address size override) as an optimization instead of using a nop.
See
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the
"address
Post by Steve Reinhardt
Post by Jason Lowe-Power
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can
tell,
Post by Steve Reinhardt
Post by Jason Lowe-Power
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions
that
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
implicitly access the stack segment (SS), the address size for
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
accesses is determined by the D (default) bit in the stack-segment
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Jason Lowe-Power
2017-01-25 21:28:29 UTC
Permalink
I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some
light on the purpose of this flag? It's only ever used for masking
addresses in the TLB:
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330

From what I can understand, it should be unnecessary to mask the address
here. In the instruction implementation, the address is already masked if
it is a 32-bit mode instruction:
http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433

It seems straightforward to add addressSize=ssz to the st instructions
where necessary. I'm confused as to why the TLB ever needs to know what the
address mode of the instruction is. Why/how would the TLB ever receive a
request with an address > 4GB for a 32-bit mode instruction?

Thanks again for the help!

Cheers,
Jason
Post by Steve Reinhardt
Thanks, Gabe! That's all really helpful. I appreciate you taking the time
to respond.
One concern I have is that the portion of the manual Jason quoted says the
rule holds "For instructions that
*implicitly* access the stack segment (SS)" (emphasis mine)... would you
say then that instructions that *explicitly* choose SS as the segment
(e.g., with a segment override prefix) would not follow this rule? If so,
then it may not be correct to do things inside the instruction based solely
on the fact that segment==SS. Or maybe that's a case that just can't occur
so we don't have to worry about it?
Also, would there be any harm in setting the AddrSizeFlagBit based on
whether addressSize==32 rather than looking at the legacy.addr bit? If not,
this seems like a simpler solution to me.
Thanks again!
Steve
Post by Gabe Black
Forgive me if I'm repeating a bit of what you said, Steve, but I looked a
little more and there's more to fix.
Also, if you look at x86/isa/microops/ldstop.isa, you'll find the python
classes which receive the parameters of the load and store microops and
translate them into C++ instantiations of the underlying microop classes.
In their __init__ methods, you'll see at the end where the
AddrSizeFlagBit
Post by Gabe Black
is set based on whether or not the machInst.legacy.addr bit is set,
basically whether the address size prefix had been found when the
instruction was decoded. For the two classes involved (LdStOp and
BigLdStOp), I think you'll want to add a check which doesn't set
AddrSizeFlagBit if the segment is the stack segment.
Additionally, you'll want to make sure the TLB knows to use the stack
segment size (as opposed to either the default or alternate address
sizes)
Post by Gabe Black
when manipulating the segment offset. You could add a new ISA specific
flag
Post by Gabe Black
for x86 memory requests (there seems to be room for exactly one more in
arch/x86/ldstflags.hh) and set it if the segment is the stack in those ld
and st ops, similar to how AddrSizeFlagBit is being set. Then in the TLB,
specifically where it's calculating logSize, you'll want to make it
recognize your new flag and use the stack size from the m5Reg (called
just
Post by Gabe Black
m5Reg.stack, I think) instead of either of the address size values.
This is a little gross because it means there's some calculation going on
when address translation is being done (frequently) when it could have
been
Post by Gabe Black
done earlier during decode (less frequently). One reason I did things
this
Post by Gabe Black
way (all speculation, I've long since forgot) is that it makes the
decoded
Post by Gabe Black
instructions hold less state. The same bytes may not always refer to the
same sizes depending on the segmentation state in the machine, so if
those
Post by Gabe Black
particular bytes are being decoded, the gem5 decode cache needs to either
consider that segmentation state when deciding what instruction to spit
out. If it didn't, it might return an instruction which statically says
to
Post by Gabe Black
use 32 bit addresses (for instance), when now segmentation state says it
the same bytes should mean it should use 16 bit addresses. Instead, it
canh
Post by Gabe Black
be factored in dynamically using external state (the m5Reg) to an
instruction that says which version of the sizes to use but doesn't care
what values those actually map to at that moment.
Gabe
Post by Gabe Black
Hi. I think you guys are generally interpreting this code correctly,
but
Post by Gabe Black
Post by Gabe Black
http://www.felixcloutier.com/x86/CALL.html
then it appears that the size of the return address you push onto the
stack should be based on the operand size, ie. the data size. That
would
Post by Gabe Black
Post by Gabe Black
mean that the displacement for the store should be -env.dataSize like
it
Post by Gabe Black
Post by Gabe Black
is. By the same logic, the subtraction below it should also be
env.dataSize
Post by Gabe Black
(symbolically dsz) and not ssz.
You are also correct, I think, that since the instruction references
the
Post by Gabe Black
Post by Gabe Black
stack segment, it's address size should be set to the stack size when
doing
Post by Gabe Black
memory operations. Adding addressSize=ssz to the st microop should do
that.
Post by Gabe Black
According to that pseudo code, the size of the target IP is also the
operand size, but that's the size the wrip microop will use by default
and
Post by Gabe Black
so should be fine as written.
I think if you make those two changes (change ssz to dsz in the sub,
add
Post by Gabe Black
Post by Gabe Black
addressSize=ssz to the st), then this macroop definition will be
correct.
Post by Gabe Black
Post by Gabe Black
It would be worthwhile to check the other definitions of CALL and make
sure
Post by Gabe Black
they aren't similarly broken.
Gabe
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the AddrSizeFlagBit.
The
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is
set
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't
break
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
other things? Especially when we're talking about 32-bit compatibility
mode. Other than the hello32 binary, are there other things I should
run?
Post by Gabe Black
Post by Jason Lowe-Power
My intuition says this is an OK change. The addr field in the Request
should never have the upper-order 32 bits set if the instruction is a
32-bit-mode instruction. The instruction implementation already takes
care
Post by Gabe Black
Post by Jason Lowe-Power
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that the
request is using the legacy.addr bit (which is set by the decoder
when
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
Post by Steve Reinhardt
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of checking
legacy.addr to override the address size is done in two places, once
in
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to
suppress
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the
micro-op
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
definition somewhere and clear the AddrSizeFlagBit out of memFlags
if
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
addressSize != env.addressSize (i.e., the address size was
overridden).
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
There's probably a cleaner way, but that's the easiest path I can
think
Post by Gabe Black
Post by Jason Lowe-Power
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that in
x86,
Post by Steve Reinhardt
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm
not
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
certain.
Now, the address is passed to the Request object correctly, but
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is getting
the AddrSizeFlagBit set to true, because machInst.legacy.addr is
true.
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the
macro-op
Post by Steve Reinhardt
Post by Jason Lowe-Power
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think the
logic
Post by Steve Reinhardt
Post by Jason Lowe-Power
should be "if the address prefix is set and this is an implicit
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
operation, ignore the address prefix". However, I'm not sure how
to
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the decoder.
Thanks,
Jason
My recollection of how all this works is that the arguments to the
'st'
Post by Steve Reinhardt
Post by Jason Lowe-Power
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that the
addressSize is using the env,addressSize that's calculated the
"normal"
Post by Steve Reinhardt
Post by Jason Lowe-Power
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example, right
after
Post by Steve Reinhardt
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack
pointer.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
The
Post by Steve Reinhardt
Post by Jason Lowe-Power
calculation of env.stackSize seems to follow the rule you
mentioned
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end of
the
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any
other
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instructions or not. If so, it might be useful to have an override
hardwired in at some lower level to check if the segment is ss and
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this extra
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do
that
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
microasm.isa#107
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge than
I
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction
prefix
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
(address size override) as an optimization instead of using a
nop.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
See
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the
"address
Post by Steve Reinhardt
Post by Jason Lowe-Power
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I can
tell,
Post by Steve Reinhardt
Post by Jason Lowe-Power
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see call
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit mode
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For instructions
that
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
implicitly access the stack segment (SS), the address size for
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
accesses is determined by the D (default) bit in the
stack-segment
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
descriptor. In 64-bit mode, the D bit is ignored, and all stack
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Gabe Black
2017-01-26 01:55:08 UTC
Permalink
As far as explicitly vs. implicitly using the stack segment, I wasn't able
to find a very conclusive answer while looking through the manuals. This
would be something to check by running a little test program on an actual
computer, although I suspect implicit vs. explicit does matter.

One interesting thing I found while looking for that which might be worth
looking into is that instructions which reference rsp or rbp also
implicitly use the stack segment. I remember knowing that, but I don't
remember whether or not gem5 handles that properly.

Unfortunately, we can't really keep track of the effective address size in
the instruction, just whether we should be using the default size or the
alternative size. This is because whether or not we're using the default is
fixed based on the instruction encoding, while how that maps to the actual
address size is based on that and also segmentation state. The decode cache
doesn't (as far as I remember) keep track of that state, so it would
potentially return an instruction which was set up based on now faulty
assumptions.

As far as whether or not AddrSizeFlagBit can be removed entirely,
unfortunately it can't. The problem ultimately stems from the fact that
gem5 assumes that virtual to physical mappings don't change the page
alignment of a memory address. That means that the address that's presented
to the CPU as a virtual address needs to have its potentially non-page
aligned segment base value added into it already, and it has to be a full
width address. For instance, if the data segment for a byte load has a base
of 0x3 and an offset of 0xff8, the linear address (post segmentation
address) you're accessing is going to be 0x1001. As far as the CPU is
concerned the virtual address and the linear address look like they're from
different cache lines and different pages.

When presented to the TLB, the address than has to have the segment base
removed from it again so that the original offset can be checked against
the segment limit values. This gets tricky when the segment is expand down,
since in that case the limit value is actually an effective lower bound,
and the upper bound is either 0xFFFF or 0xFFFFFFFF depending on the size of
the "d" bit in the appropriate segment descriptor. Once you do the
subtraction, you need to know how to truncate the result you get, or sign
extension, carry bits, etc., cause a problem and, if I remember correctly,
cause spurious GP faults.

This could all be simplified if the TLB applied the segment bases, but then
I think that would likely complicate the CPU models significantly.

Gabe
Post by Jason Lowe-Power
I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some
light on the purpose of this flag? It's only ever used for masking
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330
From what I can understand, it should be unnecessary to mask the address
here. In the instruction implementation, the address is already masked if
http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433
It seems straightforward to add addressSize=ssz to the st instructions
where necessary. I'm confused as to why the TLB ever needs to know what the
address mode of the instruction is. Why/how would the TLB ever receive a
request with an address > 4GB for a 32-bit mode instruction?
Thanks again for the help!
Cheers,
Jason
Post by Steve Reinhardt
Thanks, Gabe! That's all really helpful. I appreciate you taking the time
to respond.
One concern I have is that the portion of the manual Jason quoted says
the
Post by Steve Reinhardt
rule holds "For instructions that
*implicitly* access the stack segment (SS)" (emphasis mine)... would you
say then that instructions that *explicitly* choose SS as the segment
(e.g., with a segment override prefix) would not follow this rule? If so,
then it may not be correct to do things inside the instruction based
solely
Post by Steve Reinhardt
on the fact that segment==SS. Or maybe that's a case that just can't
occur
Post by Steve Reinhardt
so we don't have to worry about it?
Also, would there be any harm in setting the AddrSizeFlagBit based on
whether addressSize==32 rather than looking at the legacy.addr bit? If
not,
Post by Steve Reinhardt
this seems like a simpler solution to me.
Thanks again!
Steve
Post by Gabe Black
Forgive me if I'm repeating a bit of what you said, Steve, but I
looked a
Post by Steve Reinhardt
Post by Gabe Black
little more and there's more to fix.
Also, if you look at x86/isa/microops/ldstop.isa, you'll find the
python
Post by Steve Reinhardt
Post by Gabe Black
classes which receive the parameters of the load and store microops and
translate them into C++ instantiations of the underlying microop
classes.
Post by Steve Reinhardt
Post by Gabe Black
In their __init__ methods, you'll see at the end where the
AddrSizeFlagBit
Post by Gabe Black
is set based on whether or not the machInst.legacy.addr bit is set,
basically whether the address size prefix had been found when the
instruction was decoded. For the two classes involved (LdStOp and
BigLdStOp), I think you'll want to add a check which doesn't set
AddrSizeFlagBit if the segment is the stack segment.
Additionally, you'll want to make sure the TLB knows to use the stack
segment size (as opposed to either the default or alternate address
sizes)
Post by Gabe Black
when manipulating the segment offset. You could add a new ISA specific
flag
Post by Gabe Black
for x86 memory requests (there seems to be room for exactly one more in
arch/x86/ldstflags.hh) and set it if the segment is the stack in those
ld
Post by Steve Reinhardt
Post by Gabe Black
and st ops, similar to how AddrSizeFlagBit is being set. Then in the
TLB,
Post by Steve Reinhardt
Post by Gabe Black
specifically where it's calculating logSize, you'll want to make it
recognize your new flag and use the stack size from the m5Reg (called
just
Post by Gabe Black
m5Reg.stack, I think) instead of either of the address size values.
This is a little gross because it means there's some calculation going
on
Post by Steve Reinhardt
Post by Gabe Black
when address translation is being done (frequently) when it could have
been
Post by Gabe Black
done earlier during decode (less frequently). One reason I did things
this
Post by Gabe Black
way (all speculation, I've long since forgot) is that it makes the
decoded
Post by Gabe Black
instructions hold less state. The same bytes may not always refer to
the
Post by Steve Reinhardt
Post by Gabe Black
same sizes depending on the segmentation state in the machine, so if
those
Post by Gabe Black
particular bytes are being decoded, the gem5 decode cache needs to
either
Post by Steve Reinhardt
Post by Gabe Black
consider that segmentation state when deciding what instruction to spit
out. If it didn't, it might return an instruction which statically says
to
Post by Gabe Black
use 32 bit addresses (for instance), when now segmentation state says
it
Post by Steve Reinhardt
Post by Gabe Black
the same bytes should mean it should use 16 bit addresses. Instead, it
canh
Post by Gabe Black
be factored in dynamically using external state (the m5Reg) to an
instruction that says which version of the sizes to use but doesn't
care
Post by Steve Reinhardt
Post by Gabe Black
what values those actually map to at that moment.
Gabe
Post by Gabe Black
Hi. I think you guys are generally interpreting this code correctly,
but
Post by Gabe Black
Post by Gabe Black
http://www.felixcloutier.com/x86/CALL.html
then it appears that the size of the return address you push onto the
stack should be based on the operand size, ie. the data size. That
would
Post by Gabe Black
Post by Gabe Black
mean that the displacement for the store should be -env.dataSize like
it
Post by Gabe Black
Post by Gabe Black
is. By the same logic, the subtraction below it should also be
env.dataSize
Post by Gabe Black
(symbolically dsz) and not ssz.
You are also correct, I think, that since the instruction references
the
Post by Gabe Black
Post by Gabe Black
stack segment, it's address size should be set to the stack size when
doing
Post by Gabe Black
memory operations. Adding addressSize=ssz to the st microop should do
that.
Post by Gabe Black
According to that pseudo code, the size of the target IP is also the
operand size, but that's the size the wrip microop will use by
default
Post by Steve Reinhardt
Post by Gabe Black
and
Post by Gabe Black
so should be fine as written.
I think if you make those two changes (change ssz to dsz in the sub,
add
Post by Gabe Black
Post by Gabe Black
addressSize=ssz to the st), then this macroop definition will be
correct.
Post by Gabe Black
Post by Gabe Black
It would be worthwhile to check the other definitions of CALL and
make
Post by Steve Reinhardt
Post by Gabe Black
sure
Post by Gabe Black
they aren't similarly broken.
Gabe
On Tue, Jan 24, 2017 at 3:28 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the AddrSizeFlagBit.
The
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is
set
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't
break
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
other things? Especially when we're talking about 32-bit
compatibility
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
mode. Other than the hello32 binary, are there other things I should
run?
Post by Gabe Black
Post by Jason Lowe-Power
My intuition says this is an OK change. The addr field in the
Request
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
should never have the upper-order 32 bits set if the instruction is
a
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
32-bit-mode instruction. The instruction implementation already
takes
Post by Steve Reinhardt
Post by Gabe Black
care
Post by Gabe Black
Post by Jason Lowe-Power
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to use
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
request is using the legacy.addr bit (which is set by the decoder
when
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
Post by Steve Reinhardt
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can be
overridden (as we are doing). So if addressSize is overridden then
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of
checking
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
legacy.addr to override the address size is done in two places,
once
Post by Steve Reinhardt
Post by Gabe Black
in
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to
suppress
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the
micro-op
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
definition somewhere and clear the AddrSizeFlagBit out of memFlags
if
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
addressSize != env.addressSize (i.e., the address size was
overridden).
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
There's probably a cleaner way, but that's the easiest path I can
think
Post by Gabe Black
Post by Jason Lowe-Power
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of that
in
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
x86,
Post by Steve Reinhardt
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but I'm
not
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
certain.
Now, the address is passed to the Request object correctly, but
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is
getting
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
the AddrSizeFlagBit set to true, because machInst.legacy.addr is
true.
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thus, the TLB only uses the lower 32 bits of the 64-bit address.
Any ideas on how to change the instruction's memFlags from the
macro-op
Post by Steve Reinhardt
Post by Jason Lowe-Power
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
logic
Post by Steve Reinhardt
Post by Jason Lowe-Power
should be "if the address prefix is set and this is an implicit
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
operation, ignore the address prefix". However, I'm not sure how
to
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the
decoder.
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thanks,
Jason
On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt <
My recollection of how all this works is that the arguments to
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
'st'
Post by Steve Reinhardt
Post by Jason Lowe-Power
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement for
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
addressSize is using the env,addressSize that's calculated the
"normal"
Post by Steve Reinhardt
Post by Jason Lowe-Power
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example,
right
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
after
Post by Steve Reinhardt
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack
pointer.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
The
Post by Steve Reinhardt
Post by Jason Lowe-Power
calculation of env.stackSize seems to follow the rule you
mentioned
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the end
of
Post by Steve Reinhardt
Post by Gabe Black
the
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on any
other
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with other
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instructions or not. If so, it might be useful to have an
override
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
hardwired in at some lower level to check if the segment is ss
and
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this extra
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do
that
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
microasm.isa#107
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge
than
Post by Steve Reinhardt
I
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
I've been working through a bug one of our users found (thanks
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67 instruction
prefix
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
(address size override) as an optimization instead of using a
nop.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
See
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
This causes the call instruction to be decoded with with the
"address
Post by Steve Reinhardt
Post by Jason Lowe-Power
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I
can
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
tell,
Post by Steve Reinhardt
Post by Jason Lowe-Power
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see
call
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit
mode
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For
instructions
Post by Steve Reinhardt
Post by Gabe Black
that
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
implicitly access the stack segment (SS), the address size for
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
accesses is determined by the D (default) bit in the
stack-segment
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
descriptor. In 64-bit mode, the D bit is ignored, and all
stack
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Jason Lowe-Power
2017-01-26 17:07:14 UTC
Permalink
I think I get it now. x86 is complicated!

Here's a first stab at the patch: http://reviews.gem5.org/r/3793/

Cheers,
Jason
Post by Gabe Black
As far as explicitly vs. implicitly using the stack segment, I wasn't able
to find a very conclusive answer while looking through the manuals. This
would be something to check by running a little test program on an actual
computer, although I suspect implicit vs. explicit does matter.
One interesting thing I found while looking for that which might be worth
looking into is that instructions which reference rsp or rbp also
implicitly use the stack segment. I remember knowing that, but I don't
remember whether or not gem5 handles that properly.
Unfortunately, we can't really keep track of the effective address size in
the instruction, just whether we should be using the default size or the
alternative size. This is because whether or not we're using the default is
fixed based on the instruction encoding, while how that maps to the actual
address size is based on that and also segmentation state. The decode cache
doesn't (as far as I remember) keep track of that state, so it would
potentially return an instruction which was set up based on now faulty
assumptions.
As far as whether or not AddrSizeFlagBit can be removed entirely,
unfortunately it can't. The problem ultimately stems from the fact that
gem5 assumes that virtual to physical mappings don't change the page
alignment of a memory address. That means that the address that's presented
to the CPU as a virtual address needs to have its potentially non-page
aligned segment base value added into it already, and it has to be a full
width address. For instance, if the data segment for a byte load has a base
of 0x3 and an offset of 0xff8, the linear address (post segmentation
address) you're accessing is going to be 0x1001. As far as the CPU is
concerned the virtual address and the linear address look like they're from
different cache lines and different pages.
When presented to the TLB, the address than has to have the segment base
removed from it again so that the original offset can be checked against
the segment limit values. This gets tricky when the segment is expand down,
since in that case the limit value is actually an effective lower bound,
and the upper bound is either 0xFFFF or 0xFFFFFFFF depending on the size of
the "d" bit in the appropriate segment descriptor. Once you do the
subtraction, you need to know how to truncate the result you get, or sign
extension, carry bits, etc., cause a problem and, if I remember correctly,
cause spurious GP faults.
This could all be simplified if the TLB applied the segment bases, but then
I think that would likely complicate the CPU models significantly.
Gabe
Post by Jason Lowe-Power
I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some
light on the purpose of this flag? It's only ever used for masking
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318
http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330
From what I can understand, it should be unnecessary to mask the address
here. In the instruction implementation, the address is already masked if
http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433
It seems straightforward to add addressSize=ssz to the st instructions
where necessary. I'm confused as to why the TLB ever needs to know what
the
Post by Jason Lowe-Power
address mode of the instruction is. Why/how would the TLB ever receive a
request with an address > 4GB for a 32-bit mode instruction?
Thanks again for the help!
Cheers,
Jason
Post by Steve Reinhardt
Thanks, Gabe! That's all really helpful. I appreciate you taking the
time
Post by Jason Lowe-Power
Post by Steve Reinhardt
to respond.
One concern I have is that the portion of the manual Jason quoted says
the
Post by Steve Reinhardt
rule holds "For instructions that
*implicitly* access the stack segment (SS)" (emphasis mine)... would
you
Post by Jason Lowe-Power
Post by Steve Reinhardt
say then that instructions that *explicitly* choose SS as the segment
(e.g., with a segment override prefix) would not follow this rule? If
so,
Post by Jason Lowe-Power
Post by Steve Reinhardt
then it may not be correct to do things inside the instruction based
solely
Post by Steve Reinhardt
on the fact that segment==SS. Or maybe that's a case that just can't
occur
Post by Steve Reinhardt
so we don't have to worry about it?
Also, would there be any harm in setting the AddrSizeFlagBit based on
whether addressSize==32 rather than looking at the legacy.addr bit? If
not,
Post by Steve Reinhardt
this seems like a simpler solution to me.
Thanks again!
Steve
Post by Gabe Black
Forgive me if I'm repeating a bit of what you said, Steve, but I
looked a
Post by Steve Reinhardt
Post by Gabe Black
little more and there's more to fix.
Also, if you look at x86/isa/microops/ldstop.isa, you'll find the
python
Post by Steve Reinhardt
Post by Gabe Black
classes which receive the parameters of the load and store microops
and
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
translate them into C++ instantiations of the underlying microop
classes.
Post by Steve Reinhardt
Post by Gabe Black
In their __init__ methods, you'll see at the end where the
AddrSizeFlagBit
Post by Gabe Black
is set based on whether or not the machInst.legacy.addr bit is set,
basically whether the address size prefix had been found when the
instruction was decoded. For the two classes involved (LdStOp and
BigLdStOp), I think you'll want to add a check which doesn't set
AddrSizeFlagBit if the segment is the stack segment.
Additionally, you'll want to make sure the TLB knows to use the stack
segment size (as opposed to either the default or alternate address
sizes)
Post by Gabe Black
when manipulating the segment offset. You could add a new ISA
specific
Post by Jason Lowe-Power
Post by Steve Reinhardt
flag
Post by Gabe Black
for x86 memory requests (there seems to be room for exactly one more
in
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
arch/x86/ldstflags.hh) and set it if the segment is the stack in
those
Post by Jason Lowe-Power
ld
Post by Steve Reinhardt
Post by Gabe Black
and st ops, similar to how AddrSizeFlagBit is being set. Then in the
TLB,
Post by Steve Reinhardt
Post by Gabe Black
specifically where it's calculating logSize, you'll want to make it
recognize your new flag and use the stack size from the m5Reg (called
just
Post by Gabe Black
m5Reg.stack, I think) instead of either of the address size values.
This is a little gross because it means there's some calculation
going
Post by Jason Lowe-Power
on
Post by Steve Reinhardt
Post by Gabe Black
when address translation is being done (frequently) when it could
have
Post by Jason Lowe-Power
Post by Steve Reinhardt
been
Post by Gabe Black
done earlier during decode (less frequently). One reason I did things
this
Post by Gabe Black
way (all speculation, I've long since forgot) is that it makes the
decoded
Post by Gabe Black
instructions hold less state. The same bytes may not always refer to
the
Post by Steve Reinhardt
Post by Gabe Black
same sizes depending on the segmentation state in the machine, so if
those
Post by Gabe Black
particular bytes are being decoded, the gem5 decode cache needs to
either
Post by Steve Reinhardt
Post by Gabe Black
consider that segmentation state when deciding what instruction to
spit
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
out. If it didn't, it might return an instruction which statically
says
Post by Jason Lowe-Power
Post by Steve Reinhardt
to
Post by Gabe Black
use 32 bit addresses (for instance), when now segmentation state says
it
Post by Steve Reinhardt
Post by Gabe Black
the same bytes should mean it should use 16 bit addresses. Instead,
it
Post by Jason Lowe-Power
Post by Steve Reinhardt
canh
Post by Gabe Black
be factored in dynamically using external state (the m5Reg) to an
instruction that says which version of the sizes to use but doesn't
care
Post by Steve Reinhardt
Post by Gabe Black
what values those actually map to at that moment.
Gabe
Post by Gabe Black
Hi. I think you guys are generally interpreting this code
correctly,
Post by Jason Lowe-Power
Post by Steve Reinhardt
but
Post by Gabe Black
Post by Gabe Black
http://www.felixcloutier.com/x86/CALL.html
then it appears that the size of the return address you push onto
the
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
stack should be based on the operand size, ie. the data size. That
would
Post by Gabe Black
Post by Gabe Black
mean that the displacement for the store should be -env.dataSize
like
Post by Jason Lowe-Power
Post by Steve Reinhardt
it
Post by Gabe Black
Post by Gabe Black
is. By the same logic, the subtraction below it should also be
env.dataSize
Post by Gabe Black
(symbolically dsz) and not ssz.
You are also correct, I think, that since the instruction
references
Post by Jason Lowe-Power
Post by Steve Reinhardt
the
Post by Gabe Black
Post by Gabe Black
stack segment, it's address size should be set to the stack size
when
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
doing
Post by Gabe Black
memory operations. Adding addressSize=ssz to the st microop should
do
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
that.
Post by Gabe Black
According to that pseudo code, the size of the target IP is also
the
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
operand size, but that's the size the wrip microop will use by
default
Post by Steve Reinhardt
Post by Gabe Black
and
Post by Gabe Black
so should be fine as written.
I think if you make those two changes (change ssz to dsz in the
sub,
Post by Jason Lowe-Power
Post by Steve Reinhardt
add
Post by Gabe Black
Post by Gabe Black
addressSize=ssz to the st), then this macroop definition will be
correct.
Post by Gabe Black
Post by Gabe Black
It would be worthwhile to check the other definitions of CALL and
make
Post by Steve Reinhardt
Post by Gabe Black
sure
Post by Gabe Black
they aren't similarly broken.
Gabe
On Tue, Jan 24, 2017 at 3:28 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
Thanks for helping me work this out.
I got the binary working by completely removing the
AddrSizeFlagBit.
Post by Jason Lowe-Power
Post by Steve Reinhardt
The
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
only place this bit is used is in the LdStOp (it's set to true if
legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit
is
Post by Jason Lowe-Power
Post by Steve Reinhardt
set
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
truncates the address to 32-bits removing the upper-order bits.
By removing this flag bit, and adding "addressSize=ssz" to the st
micro-op,
the binary is now working.
My next question: How can I have any confidence that this doesn't
break
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
other things? Especially when we're talking about 32-bit
compatibility
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
mode. Other than the hello32 binary, are there other things I
should
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
run?
Post by Gabe Black
Post by Jason Lowe-Power
My intuition says this is an OK change. The addr field in the
Request
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
should never have the upper-order 32 bits set if the instruction
is
Post by Jason Lowe-Power
a
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
32-bit-mode instruction. The instruction implementation already
takes
Post by Steve Reinhardt
Post by Gabe Black
care
Post by Gabe Black
Post by Jason Lowe-Power
of this, as I found with this bug.
Other thoughts? Does anyone know if it is definitely required to
use
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
AddrSizeFlagBit to truncate the address in the TLB?
If not, I'll post a patch for this tomorrow.
Thanks,
Jason
On Tue, Jan 24, 2017 at 12:19 PM Steve Reinhardt <
Post by Steve Reinhardt
Hmm, seems like there's a little bit of an inconsistency in that
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
request is using the legacy.addr bit (which is set by the
decoder
Post by Jason Lowe-Power
Post by Steve Reinhardt
when
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
it
Post by Steve Reinhardt
sees the address size override prefix [1]) directly, while the
legacy.addr
Post by Steve Reinhardt
bit is also used to calculate the addressSize value [2] but can
be
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
overridden (as we are doing). So if addressSize is overridden
then
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
AddrSizeFlagBit should no longer be set based on legacy.addr.
Or another way of looking at it is that the same process of
checking
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
legacy.addr to override the address size is done in two places,
once
Post by Steve Reinhardt
Post by Gabe Black
in
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
decoder [2] and again in the TLB [3] and it's not clear how to
suppress
Post by Gabe Black
Post by Jason Lowe-Power
the
Post by Steve Reinhardt
latter one.
I suppose one gnarly way of doing it would be to go into the
micro-op
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
definition somewhere and clear the AddrSizeFlagBit out of
memFlags
Post by Jason Lowe-Power
Post by Steve Reinhardt
if
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
addressSize != env.addressSize (i.e., the address size was
overridden).
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
There's probably a cleaner way, but that's the easiest path I
can
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
think
Post by Gabe Black
Post by Jason Lowe-Power
of
Post by Steve Reinhardt
(if it even works).
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
[3]
http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
Post by Jason Lowe-Power
Hi Steve,
That was super helpful. I'm now a step closer to solving this!
Your suggestion of =ssz, lead me to search for the uses of
that
Post by Jason Lowe-Power
in
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
x86,
Post by Steve Reinhardt
and
Post by Jason Lowe-Power
it turns out that almost all of other stack instructions have
dataSize=ssz.
Post by Jason Lowe-Power
So, I added both dataSize=ssz and addressSize=ssz to the call
instruction,
Post by Jason Lowe-Power
though I think only the addressSize is actually needed, but
I'm
Post by Jason Lowe-Power
Post by Steve Reinhardt
not
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
certain.
Now, the address is passed to the Request object correctly,
but
Post by Jason Lowe-Power
Post by Steve Reinhardt
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
program
Post by Jason Lowe-Power
still fails. The problem now is that the request object is
getting
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
the AddrSizeFlagBit set to true, because machInst.legacy.addr
is
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
true.
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thus, the TLB only uses the lower 32 bits of the 64-bit
address.
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Any ideas on how to change the instruction's memFlags from the
macro-op
Post by Steve Reinhardt
Post by Jason Lowe-Power
definition? They are set on
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l334
.
It would be nice if we could fix this in the decoder. I think
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
logic
Post by Steve Reinhardt
Post by Jason Lowe-Power
should be "if the address prefix is set and this is an
implicit
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
operation, ignore the address prefix". However, I'm not sure
how
Post by Jason Lowe-Power
Post by Steve Reinhardt
to
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
tell
Post by Steve Reinhardt
if
Post by Jason Lowe-Power
the instruction is "an implicit stack operation" from the
decoder.
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Thanks,
Jason
On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt <
My recollection of how all this works is that the arguments to
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
'st'
Post by Steve Reinhardt
Post by Jason Lowe-Power
micro-op get turned into arguments to a call to the StoreOp
597 <http://repo.gem5.o, and that address doesn't
existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
ldstop.isa#l597
Post by Steve Reinhardt
Post by Jason Lowe-Power
<http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l597>
598 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l598
def __init__(self, data, segment, addr, disp = 0,
599 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l599
dataSize="env.dataSize",
600 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l600
addressSize="env.addressSize",
601 <
http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
x86/isa/microops/ldstop.isa#l601
so the "-env.dataSize" you see is actually the displacement
for
Post by Jason Lowe-Power
Post by Steve Reinhardt
the
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
store,
Post by Jason Lowe-Power
not the dataSize or addressSize. I think the problem is that
the
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
addressSize is using the env,addressSize that's calculated the
"normal"
Post by Steve Reinhardt
Post by Jason Lowe-Power
way, i.e., including the effects of the override prefix.
Poking around some more, there's an 'ssz' symbol that maps to
env.stackSize
Post by Jason Lowe-Power
[1] which gets used a lot in stack operations; for example,
right
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
after
Post by Steve Reinhardt
the
Post by Jason Lowe-Power
store in CALL_NEAR_I, 'ssz' is subtracted off of the stack
pointer.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
The
Post by Steve Reinhardt
Post by Jason Lowe-Power
calculation of env.stackSize seems to follow the rule you
mentioned
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
about
Post by Steve Reinhardt
Post by Jason Lowe-Power
being fixed at 64 bits in 64-bit mode [2, 3].
So it might be sufficient to add ', addressSize=ssz' to the
end
Post by Jason Lowe-Power
of
Post by Steve Reinhardt
Post by Gabe Black
the
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
'st'
Post by Jason Lowe-Power
micro-op. Oddly though I don't see addressSize being set on
any
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
other
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
stack
Post by Jason Lowe-Power
ops (just dataSize), so I wonder if this is a problem with
other
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instructions or not. If so, it might be useful to have an
override
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
hardwired in at some lower level to check if the segment is ss
and
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
force
Post by Steve Reinhardt
Post by Jason Lowe-Power
the address size to be stackSize (rather than adding this
extra
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
parameter
Post by Steve Reinhardt
Post by Jason Lowe-Power
in a dozen different places). I wouldn't know where best to do
that
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
though,
Post by Jason Lowe-Power
so the manual override seems best for now.
Steve
[1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
microasm.isa#107
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
[2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
[3]
http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
Post by Jason Lowe-Power
To those of you with more x86 ISA implementation knowledge
than
Post by Steve Reinhardt
I
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
I've been working through a bug one of our users found
(thanks
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Sanchayan!).
Post by Jason Lowe-Power
It looks like current versions of ld use the 0x67
instruction
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
prefix
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
(address size override) as an optimization instead of using
a
Post by Jason Lowe-Power
Post by Steve Reinhardt
nop.
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
See
https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
This causes the call instruction to be decoded with with the
"address
Post by Steve Reinhardt
Post by Jason Lowe-Power
size
Post by Jason Lowe-Power
override prefix", which is correct, in a sense. From what I
can
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
tell,
Post by Steve Reinhardt
Post by Jason Lowe-Power
this
Post by Jason Lowe-Power
is passed to the call instruction via "-env.dataSize" (see
call
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
instruction
Post by Jason Lowe-Power
implementation below).
def macroop CALL_NEAR_I
{
# Make the default data size of calls 64 bits in 64 bit
mode
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
.adjust_env oszIn64Override
.function_call
limm t1, imm
rdip t7
# Check target of call
st t7, ss, [0, t0, rsp], "-env.dataSize"
subi rsp, rsp, ssz
wrip t7, t1
};
Now, the bug is, according to the x86 manual, "For
instructions
Post by Steve Reinhardt
Post by Gabe Black
that
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
implicitly access the stack segment (SS), the address size
for
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Gabe Black
stack
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
accesses is determined by the D (default) bit in the
stack-segment
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
Post by Jason Lowe-Power
descriptor. In 64-bit mode, the D bit is ignored, and all
stack
Post by Steve Reinhardt
Post by Gabe Black
Post by Gabe Black
Post by Jason Lowe-Power
Post by Steve Reinhardt
Post by Jason Lowe-Power
references
Post by Jason Lowe-Power
have a 64-bit address size." See
https://support.amd.com/TechDocs/24594.pdf page
9.
Thoughts on how to fix this?
Thanks,
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
Jason
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Continue reading on narkive:
Loading...