Discussion:
[gem5-dev] Proposal to untemplate the o3 CPU
(too old to reply)
Mitch Hayenga via gem5-dev
2014-05-13 16:23:00 UTC
Permalink
Hi All,

Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance. The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.

*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because the
"impl" header files were changed to source files.

Here are a few benefits of the patch

- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently on o3.


Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.

*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%


*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s

real 21m29.963s
user 20m17.016s
sys 1m7.108s

*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s

real 21m23.177s
user 20m11.911s
sys 1m5.843s


*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s

*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s


*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06

*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53


*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s



Any thoughts on eventually merging this?
Andreas Sandberg via gem5-dev
2014-05-15 13:27:54 UTC
Permalink
Hi Mitch,

In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB would
just fall over. Would it be possible to split this change into a series
of smaller patches?

For example, you could split it into one patch per functional unit and a
final patch that does some cleaning up. You could probably just 'fake'
new un-templated class names as typedefs in the relevant header files.

//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance. The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because the
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently on o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell via gem5-dev
2014-05-15 15:29:07 UTC
Permalink
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you have my
approval there too :)

With regards to implementation, I agree that splitting up the patches
makes for a better review although it will take a small amount of work to
get a reasonable splitting granularity.

Have you been able to test that un-templated o3 passes all the regressions
as well?

Lastly, once this is reviewed you'll need to coordinate with people who
have outstanding o3 patches as it could be a real pain for them to pull in
a patch that all of a sudden untemplates the o3 code.

-Korey



On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Post by Andreas Sandberg via gem5-dev
Hi Mitch,
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB would
just fall over. Would it be possible to split this change into a series of
smaller patches?
For example, you could split it into one patch per functional unit and a
final patch that does some cleaning up. You could probably just 'fake' new
un-templated class names as typedefs in the relevant header files.
//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance.
The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because the
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently on o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Ali Saidi via gem5-dev
2014-05-15 15:38:45 UTC
Permalink
I don't think this is really something you can half do, or at least the
time to break it into multiple pieces and individually test each one far
exceeds any possible gain. While it might be somewhat more annoying to
thumb though 15 pages of diffs in one sitting vs 3 pages of diffs 5
times, I'd just say if we're going to do it we just need to do it. While
we should pick a time when people are happy for it to be done, I think
this also needs to fall into the usual approach of, if you have
uncommitted patches to the cpu and the underlying cpu model changes it's
the patch owners responsibility to merge.

Thanks,

Ali
Post by Korey Sewell via gem5-dev
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you have my
approval there too :)
With regards to implementation, I agree that splitting up the patches
makes for a better review although it will take a small amount of work to
get a reasonable splitting granularity.
Have you been able to test that un-templated o3 passes all the regressions
as well?
Lastly, once this is reviewed you'll need to coordinate with people who
have outstanding o3 patches as it could be a real pain for them to pull in
a patch that all of a sudden untemplates the o3 code.
-Korey
On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Hi Mitch, In general, I like the idea of removing some of the pointless/awkward templates we have in gem5. I would definitely support moving in this direction. However, I really dislike the idea of reviewing a 32k line patch. Reviewing such a patch would be a headache and I suspect RB would just fall over. Would it be possible to split this change into a series of smaller patches? For example, you could split it into one patch per functional unit and a final patch that does some cleaning up. You could probably just 'fake' new un-templated class names as typedefs in the relevant header files. //Andreas On 2014-05-13 18:23, Mitch Hayenga via gem5-dev wrote: Hi All, Recently I have written a patch that removes templating from the o3 cpu. In general templating in o3 makes the code significan
tly more verbose, adds compile time overheads, and doesn't actually benefit performance. The templating is largely pointless as 1) there aren't multiple versions of fetch, rename, etc to mak
e the
compile time Impl pattern worth doing 2) Modern CPUs have indirect branch predictors that hide the penalties that the templating was trying to mask. *I was wondering what peoples feelings were on a patch of this sort? * It is a quite large modification (~35k line patch file, changes almost all localized to the o3 directory). Many of the lines are simply because the "impl" header files were changed to source files. Here are a few benefits of the patch - Cleaner, less verbose code. - Due to the current templating/DynInst interaction, gem5 often requires rebuilding the function execution signatures (o3_cpu_exec.o) when a modification is made to the o3 cpu. This patch eliminates having to rebuild the execution signatures on o3 changes. - Marginally better compile/run times. - Moved "base_dyn_i
nst_impl.hh" into o3, it's too dependent on o3 as is. No other cpu does/should inherit from it anyway. - Made the checker directly templated on the execution context (DynInst) instead of an
"Impl"
like o3. Seems like it was coded dependently on o3. Here are some performance results for gem5.fast on GCC 4.9 and CLANG on twolf from spec2k. *Binary Size* CLANG: 1.1% smaller without templating GCC: Difference is negligible <0.0001% *CLANG Compile Time (single threaded, no turboboost, two runs)* *Templated* real 21m32.240s user 20m20.019s sys 1m6.721s real 21m29.963s user 20m17.016s sys 1m7.108s *Untempated:* real 21m24.396s user 20m13.158s sys 1m5.798s real 21m23.177s user 20m11.911s sys 1m5.843s *GCC Compile Time (-j8, did not disable turboboost)* *Templated* real 11m35.848s user 67m20.828s sys 2m2.292s *Untemplated:* real 11m42.167s user 67m7.572s sys 2m2.056s *CLANG Run Time (Spec2k twolf)* *Templated* Run 1) 1187.63 Run 2) 1167.50 Run 3) 1172.06 *Untemplated* Run 1) 1142.29 Run 2) 1
154.49 Run 3) 1165.53 *GCC Run Time (Spec2k twolf, did not disable turboboost)* *Templated* Run 1) 12m20.528s *Untemplated* Run 1) 12m19.700s Any thoughts on eventually merging this?
_______________________________________________ gem5-dev mailing list gem5-dev-1Gs4CP2/***@public.gmane.org http://m5sim.org/mailman/listinfo/gem5-dev [1] _______________________________________________ gem5-dev mailing list gem5-dev-1Gs4CP2/***@public.gmane.org http://m5sim.org/mailman/listinfo/gem5-dev [1]



Links:
------
[1] http://m5sim.org/mailman/listinfo/gem5-dev
Anthony Gutierrez via gem5-dev
2014-05-15 15:39:44 UTC
Permalink
I like the idea of this patch as well. In fact, the templating doesn't
really help with extending the CPU model in my experience.

As far as splitting it up into multiple smaller patches, I don't think that
is necessary or really a good idea unless the changes are truly
independent. Instead of having a single 35k line patch, we'll have many
(tens or hundreds?) of patches that add up to 35k lines.


Anthony Gutierrez
http://web.eecs.umich.edu/~atgutier


On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you have my
approval there too :)
With regards to implementation, I agree that splitting up the patches
makes for a better review although it will take a small amount of work to
get a reasonable splitting granularity.
Have you been able to test that un-templated o3 passes all the regressions
as well?
Lastly, once this is reviewed you'll need to coordinate with people who
have outstanding o3 patches as it could be a real pain for them to pull in
a patch that all of a sudden untemplates the o3 code.
-Korey
On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Post by Andreas Sandberg via gem5-dev
Hi Mitch,
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB would
just fall over. Would it be possible to split this change into a series
of
Post by Andreas Sandberg via gem5-dev
smaller patches?
For example, you could split it into one patch per functional unit and a
final patch that does some cleaning up. You could probably just 'fake'
new
Post by Andreas Sandberg via gem5-dev
un-templated class names as typedefs in the relevant header files.
//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance.
The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? *
It
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because
the
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently
on
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Korey Sewell via gem5-dev
2014-05-15 15:56:46 UTC
Permalink
Fair points Ali and Tony.

I think at the end of the day a 35k line patch (if it is that) will be a
pain to review no matter how you slice/dice it although I'd still maintain
at least "dicing" it into pieces would allow the reviewers to handle it
better.

If people all agree that this is the way to go, then maybe Mitch should
just go ahead and provide the full patch to the RB (no matter how
gi-normous!). This at least keeps us in the process of
local_patch->RB->commit. I'm doubtful that any one person is going to get
to 35k lines whether it is one patch or multiple patches anyway.


-Korey



On Thu, May 15, 2014 at 8:39 AM, Anthony Gutierrez via gem5-dev <
Post by Anthony Gutierrez via gem5-dev
I like the idea of this patch as well. In fact, the templating doesn't
really help with extending the CPU model in my experience.
As far as splitting it up into multiple smaller patches, I don't think that
is necessary or really a good idea unless the changes are truly
independent. Instead of having a single 35k line patch, we'll have many
(tens or hundreds?) of patches that add up to 35k lines.
Anthony Gutierrez
http://web.eecs.umich.edu/~atgutier
On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you have
my
Post by Korey Sewell via gem5-dev
approval there too :)
With regards to implementation, I agree that splitting up the patches
makes for a better review although it will take a small amount of work to
get a reasonable splitting granularity.
Have you been able to test that un-templated o3 passes all the
regressions
Post by Korey Sewell via gem5-dev
as well?
Lastly, once this is reviewed you'll need to coordinate with people who
have outstanding o3 patches as it could be a real pain for them to pull
in
Post by Korey Sewell via gem5-dev
a patch that all of a sudden untemplates the o3 code.
-Korey
On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Post by Andreas Sandberg via gem5-dev
Hi Mitch,
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB
would
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
just fall over. Would it be possible to split this change into a series
of
Post by Andreas Sandberg via gem5-dev
smaller patches?
For example, you could split it into one patch per functional unit and
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
final patch that does some cleaning up. You could probably just 'fake'
new
Post by Andreas Sandberg via gem5-dev
un-templated class names as typedefs in the relevant header files.
//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the o3
cpu.
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
In general templating in o3 makes the code significantly more
verbose,
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
adds compile time overheads, and doesn't actually benefit performance.
The
templating is largely pointless as 1) there aren't multiple versions
of
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
fetch, rename, etc to make the compile time Impl pattern worth doing
2)
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
Modern CPUs have indirect branch predictors that hide the penalties
that
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? *
It
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is a quite large modification (~35k line patch file, changes almost
all
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
localized to the o3 directory). Many of the lines are simply because
the
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
modification is made to the o3 cpu. This patch eliminates having
to
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3
as
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently
on
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG
on
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
Mitch Hayenga via gem5-dev
2014-05-15 20:13:16 UTC
Permalink
*Would it be possible to split this change into a series of smaller
patches?*

Thinking about what could be easily split off.

1) The moving of cpu/base_dyn_inst_impl.hh into o3's DynInst
2) The changing of the checker templating

Both of those could go in before the full untemplating patch. But I'd
guess those are only ~1k lines of the 35k. I'm unsure how easy the rest
would be to part out given how many cross dependencies exist. I'll try to
re-do it for a single stage, to see if it is possible, but expect to
discover pain.

This patch was originally written in late December, so some re-basing work
is needed to bring it up to date. Luckily, other than Tony's fetch patches
and the recent review requests by Steve, not many people have made
significant o3 changes. Any feedback from here can go into the rebasing
effort.



*and I suspect RB would just fall over.*

Luckily it doesn't. It ended up being split across 3 very large pages on
the internal ARM review board.



*Have you been able to test that un-templated o3 passes all the
regressions as well?*

Yes, the patch passed all of the regression tests. It makes no difference
in the stats.




On Thu, May 15, 2014 at 10:56 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Fair points Ali and Tony.
I think at the end of the day a 35k line patch (if it is that) will be a
pain to review no matter how you slice/dice it although I'd still maintain
at least "dicing" it into pieces would allow the reviewers to handle it
better.
If people all agree that this is the way to go, then maybe Mitch should
just go ahead and provide the full patch to the RB (no matter how
gi-normous!). This at least keeps us in the process of
local_patch->RB->commit. I'm doubtful that any one person is going to get
to 35k lines whether it is one patch or multiple patches anyway.
-Korey
On Thu, May 15, 2014 at 8:39 AM, Anthony Gutierrez via gem5-dev <
Post by Anthony Gutierrez via gem5-dev
I like the idea of this patch as well. In fact, the templating doesn't
really help with extending the CPU model in my experience.
As far as splitting it up into multiple smaller patches, I don't think
that
Post by Anthony Gutierrez via gem5-dev
is necessary or really a good idea unless the changes are truly
independent. Instead of having a single 35k line patch, we'll have many
(tens or hundreds?) of patches that add up to 35k lines.
Anthony Gutierrez
http://web.eecs.umich.edu/~atgutier
On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you
have
Post by Anthony Gutierrez via gem5-dev
my
Post by Korey Sewell via gem5-dev
approval there too :)
With regards to implementation, I agree that splitting up the patches
makes for a better review although it will take a small amount of work
to
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
get a reasonable splitting granularity.
Have you been able to test that un-templated o3 passes all the
regressions
Post by Korey Sewell via gem5-dev
as well?
Lastly, once this is reviewed you'll need to coordinate with people who
have outstanding o3 patches as it could be a real pain for them to pull
in
Post by Korey Sewell via gem5-dev
a patch that all of a sudden untemplates the o3 code.
-Korey
On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Post by Andreas Sandberg via gem5-dev
Hi Mitch,
In general, I like the idea of removing some of the pointless/awkward
templates we have in gem5. I would definitely support moving in this
direction. However, I really dislike the idea of reviewing a 32k line
patch. Reviewing such a patch would be a headache and I suspect RB
would
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
just fall over. Would it be possible to split this change into a
series
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
of
Post by Andreas Sandberg via gem5-dev
smaller patches?
For example, you could split it into one patch per functional unit
and
Post by Anthony Gutierrez via gem5-dev
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
final patch that does some cleaning up. You could probably just
'fake'
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
new
Post by Andreas Sandberg via gem5-dev
un-templated class names as typedefs in the relevant header files.
//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the o3
cpu.
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
In general templating in o3 makes the code significantly more
verbose,
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
adds compile time overheads, and doesn't actually benefit
performance.
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
The
templating is largely pointless as 1) there aren't multiple versions
of
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
fetch, rename, etc to make the compile time Impl pattern worth
doing
Post by Anthony Gutierrez via gem5-dev
2)
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
Modern CPUs have indirect branch predictors that hide the penalties
that
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this
sort? *
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
It
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is a quite large modification (~35k line patch file, changes almost
all
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
localized to the o3 directory). Many of the lines are simply
because
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
the
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o)
when
Post by Anthony Gutierrez via gem5-dev
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
modification is made to the o3 cpu. This patch eliminates
having
Post by Anthony Gutierrez via gem5-dev
to
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on
o3
Post by Anthony Gutierrez via gem5-dev
as
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded
dependently
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
on
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG
on
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Steve Reinhardt via gem5-dev
2014-05-17 03:59:13 UTC
Permalink
I support getting rid of the templating in O3 for sure. Templates were new
and shiny when we started writing some parts of M5 and we got a little
carried away ;-).

As far as patch reviewing, I'm guessing many of those 35k lines of diffs
are mind-numbingly repetitive, right? If Mitch can split out the
"interesting" parts, or at least point to the parts that are worth
reviewing specifically, then having a giant patch that's mostly boilerplate
is not a problem for me.

Steve



On Thu, May 15, 2014 at 1:13 PM, Mitch Hayenga via gem5-dev <
Post by Mitch Hayenga via gem5-dev
*Would it be possible to split this change into a series of smaller
patches?*
Thinking about what could be easily split off.
1) The moving of cpu/base_dyn_inst_impl.hh into o3's DynInst
2) The changing of the checker templating
Both of those could go in before the full untemplating patch. But I'd
guess those are only ~1k lines of the 35k. I'm unsure how easy the rest
would be to part out given how many cross dependencies exist. I'll try to
re-do it for a single stage, to see if it is possible, but expect to
discover pain.
This patch was originally written in late December, so some re-basing work
is needed to bring it up to date. Luckily, other than Tony's fetch patches
and the recent review requests by Steve, not many people have made
significant o3 changes. Any feedback from here can go into the rebasing
effort.
*and I suspect RB would just fall over.*
Luckily it doesn't. It ended up being split across 3 very large pages on
the internal ARM review board.
*Have you been able to test that un-templated o3 passes all the
regressions as well?*
Yes, the patch passed all of the regression tests. It makes no difference
in the stats.
On Thu, May 15, 2014 at 10:56 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Fair points Ali and Tony.
I think at the end of the day a 35k line patch (if it is that) will be a
pain to review no matter how you slice/dice it although I'd still
maintain
Post by Korey Sewell via gem5-dev
at least "dicing" it into pieces would allow the reviewers to handle it
better.
If people all agree that this is the way to go, then maybe Mitch should
just go ahead and provide the full patch to the RB (no matter how
gi-normous!). This at least keeps us in the process of
local_patch->RB->commit. I'm doubtful that any one person is going to get
to 35k lines whether it is one patch or multiple patches anyway.
-Korey
On Thu, May 15, 2014 at 8:39 AM, Anthony Gutierrez via gem5-dev <
Post by Anthony Gutierrez via gem5-dev
I like the idea of this patch as well. In fact, the templating doesn't
really help with extending the CPU model in my experience.
As far as splitting it up into multiple smaller patches, I don't think
that
Post by Anthony Gutierrez via gem5-dev
is necessary or really a good idea unless the changes are truly
independent. Instead of having a single 35k line patch, we'll have many
(tens or hundreds?) of patches that add up to 35k lines.
Anthony Gutierrez
http://web.eecs.umich.edu/~atgutier
On Thu, May 15, 2014 at 11:29 AM, Korey Sewell via gem5-dev <
Post by Korey Sewell via gem5-dev
Hi Mitch/gem5-ers,
I think I would support the "untemplating" movement as well, so you
have
Post by Anthony Gutierrez via gem5-dev
my
Post by Korey Sewell via gem5-dev
approval there too :)
With regards to implementation, I agree that splitting up the
patches
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
makes for a better review although it will take a small amount of
work
Post by Korey Sewell via gem5-dev
to
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
get a reasonable splitting granularity.
Have you been able to test that un-templated o3 passes all the
regressions
Post by Korey Sewell via gem5-dev
as well?
Lastly, once this is reviewed you'll need to coordinate with people
who
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
have outstanding o3 patches as it could be a real pain for them to
pull
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
in
Post by Korey Sewell via gem5-dev
a patch that all of a sudden untemplates the o3 code.
-Korey
On Thu, May 15, 2014 at 6:27 AM, Andreas Sandberg via gem5-dev <
Post by Andreas Sandberg via gem5-dev
Hi Mitch,
In general, I like the idea of removing some of the
pointless/awkward
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
templates we have in gem5. I would definitely support moving in
this
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
direction. However, I really dislike the idea of reviewing a 32k
line
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
patch. Reviewing such a patch would be a headache and I suspect RB
would
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
just fall over. Would it be possible to split this change into a
series
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
of
Post by Andreas Sandberg via gem5-dev
smaller patches?
For example, you could split it into one patch per functional unit
and
Post by Anthony Gutierrez via gem5-dev
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
final patch that does some cleaning up. You could probably just
'fake'
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
new
Post by Andreas Sandberg via gem5-dev
un-templated class names as typedefs in the relevant header files.
//Andreas
Post by Mitch Hayenga via gem5-dev
Hi All,
Recently I have written a patch that removes templating from the
o3
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
cpu.
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
In general templating in o3 makes the code significantly more
verbose,
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
adds compile time overheads, and doesn't actually benefit
performance.
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
The
templating is largely pointless as 1) there aren't multiple
versions
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
of
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
fetch, rename, etc to make the compile time Impl pattern worth
doing
Post by Anthony Gutierrez via gem5-dev
2)
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
Modern CPUs have indirect branch predictors that hide the
penalties
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
that
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this
sort? *
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
It
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is a quite large modification (~35k line patch file, changes
almost
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
all
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
localized to the o3 directory). Many of the lines are simply
because
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
the
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5
often
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
requires
rebuilding the function execution signatures (o3_cpu_exec.o)
when
Post by Anthony Gutierrez via gem5-dev
a
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
modification is made to the o3 cpu. This patch eliminates
having
Post by Anthony Gutierrez via gem5-dev
to
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on
o3
Post by Anthony Gutierrez via gem5-dev
as
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context
(DynInst)
instead of an "Impl" like o3. Seems like it was coded
dependently
Post by Anthony Gutierrez via gem5-dev
Post by Korey Sewell via gem5-dev
on
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
o3.
Here are some performance results for gem5.fast on GCC 4.9 and
CLANG
Post by Korey Sewell via gem5-dev
Post by Anthony Gutierrez via gem5-dev
on
Post by Korey Sewell via gem5-dev
Post by Andreas Sandberg via gem5-dev
Post by Mitch Hayenga via gem5-dev
twolf from spec2k.
*Binary Size*
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*Templated*
real 21m32.240s
user 20m20.019s
sys 1m6.721s
real 21m29.963s
user 20m17.016s
sys 1m7.108s
*Untempated:*
real 21m24.396s
user 20m13.158s
sys 1m5.798s
real 21m23.177s
user 20m11.911s
sys 1m5.843s
*GCC Compile Time (-j8, did not disable turboboost)*
*Templated*
real 11m35.848s
user 67m20.828s
sys 2m2.292s
*Untemplated:*
real 11m42.167s
user 67m7.572s
sys 2m2.056s
*CLANG Run Time (Spec2k twolf)*
*Templated*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
*Untemplated*
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
*Templated*
Run 1) 12m20.528s
*Untemplated*
Run 1) 12m19.700s
Any thoughts on eventually merging this?
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
Loading...