Mitch Hayenga via gem5-dev
2014-05-13 16:23:00 UTC
Recently I have written a patch that removes templating from the o3 cpu.
In general templating in o3 makes the code significantly more verbose,
adds compile time overheads, and doesn't actually benefit performance. The
templating is largely pointless as 1) there aren't multiple versions of
fetch, rename, etc to make the compile time Impl pattern worth doing 2)
Modern CPUs have indirect branch predictors that hide the penalties that
the templating was trying to mask.
*I was wondering what peoples feelings were on a patch of this sort? * It
is a quite large modification (~35k line patch file, changes almost all
localized to the o3 directory). Many of the lines are simply because the
"impl" header files were changed to source files.
Here are a few benefits of the patch
- Cleaner, less verbose code.
- Due to the current templating/DynInst interaction, gem5 often requires
rebuilding the function execution signatures (o3_cpu_exec.o) when a
modification is made to the o3 cpu. This patch eliminates having to
rebuild the execution signatures on o3 changes.
- Marginally better compile/run times.
- Moved "base_dyn_inst_impl.hh" into o3, it's too dependent on o3 as is.
No other cpu does/should inherit from it anyway.
- Made the checker directly templated on the execution context (DynInst)
instead of an "Impl" like o3. Seems like it was coded dependently on o3.
Here are some performance results for gem5.fast on GCC 4.9 and CLANG on
twolf from spec2k.
CLANG: 1.1% smaller without templating
GCC: Difference is negligible <0.0001%
*CLANG Compile Time (single threaded, no turboboost, two runs)*
*GCC Compile Time (-j8, did not disable turboboost)*
*CLANG Run Time (Spec2k twolf)*
Run 1) 1187.63
Run 2) 1167.50
Run 3) 1172.06
Run 1) 1142.29
Run 2) 1154.49
Run 3) 1165.53
*GCC Run Time (Spec2k twolf, did not disable turboboost)*
Run 1) 12m20.528s
Run 1) 12m19.700s
Any thoughts on eventually merging this?