[gem5-dev] custom allocator for Request/Packet objects

Discussion:

Gabe Black

2018-11-29 03:21:56 UTC

I was just wondering whether it would make sense to have custom allocators
defined for Request and Packet types which would keep around a pool of them
rather than defaulting to the normal allocator. I suspect since both types
of objects are allocated very frequently this could save a lot of heap
overhead.

I think there are two complications to doing this. First, we'd want to make
sure Request and Packet objects are truly those objects and are of the
appropriate size. Subclasses could add new members and make the types
bigger. I think to ensure that, we'd need to add the "final" keyword (added
in C++11) to ensure those types can't be subclassed and have unpredictable
sizes.

Second, we use make_shared a lot to build Requests, and that actually
allocates a larger amount of memory to hold the object and reference count
information. It allocates that larger object with new, and it looks like
you're supposed to use allocate_shared if you want to use a custom
allocator. Writing a custom allocator looks relatively complicated, but it
might make a big difference if it works correctly.

Also we'd probably want to put the Request allocator incantation in a small
function rather than calling make_shared or allocate_shared directly to
avoid a lot of boilerplate and churn if things need to change.

Thoughts? Like I said I suspect this may make a significant difference, but
it might not be easy to implement and may not actually make much of a
difference at all.

Gabe

Jason Lowe-Power

2018-11-29 16:39:44 UTC

Permalink

Hey Gabe,

Are you thinking that a custom allocator would make a difference in terms
of memory footprint or in terms of performance (or both)?

A couple of thoughts:
- I'm hesitant to put the final keyword on Packet. I think we could see
some code cleanup by making Packet a polymorphic object (e.g., Daniel has
been talking about implementing packets with compressed data). Obviously,
using polymorphism isn't required, and I don't have any plans to make this
change in the near future. But it's something that I've been thinking about.
- I like the idea of requiring the use of a small function to allocate
requests and packets. In many cases, for Packets, we use
Packet::createRead()/createWrite(), which is a step in that direction. I
think this would be a relatively non-controversial change moving in the
direction of custom allocators. It may also allow us to do some testing on
this more easily.

Another argument against custom allocators is that we'll be adding a
significant amount of error-prone code. I like the idea of using standard
libraries as much as possible to aid in understanding the code and reducing
the bug surface.

That said, if we get a 20% speedup or a reduction in memory footprint then
this change would likely be worth it!

Cheers,
Jason

Post by Gabe Black
I was just wondering whether it would make sense to have custom allocators
defined for Request and Packet types which would keep around a pool of them
rather than defaulting to the normal allocator. I suspect since both types
of objects are allocated very frequently this could save a lot of heap
overhead.
I think there are two complications to doing this. First, we'd want to make
sure Request and Packet objects are truly those objects and are of the
appropriate size. Subclasses could add new members and make the types
bigger. I think to ensure that, we'd need to add the "final" keyword (added
in C++11) to ensure those types can't be subclassed and have unpredictable
sizes.
Second, we use make_shared a lot to build Requests, and that actually
allocates a larger amount of memory to hold the object and reference count
information. It allocates that larger object with new, and it looks like
you're supposed to use allocate_shared if you want to use a custom
allocator. Writing a custom allocator looks relatively complicated, but it
might make a big difference if it works correctly.
Also we'd probably want to put the Request allocator incantation in a small
function rather than calling make_shared or allocate_shared directly to
avoid a lot of boilerplate and churn if things need to change.
Thoughts? Like I said I suspect this may make a significant difference, but
it might not be easy to implement and may not actually make much of a
difference at all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev

Nikos Nikoleris

2018-11-29 17:21:13 UTC

Permalink

This post might be inappropriate. Click to display it.

Daniel Carvalho

2018-11-29 17:43:19 UTC

Permalink

This post might be inappropriate. Click to display it.

Gabe Black

2018-11-29 23:07:45 UTC

Permalink

I looked at the code as is, and Packets themselves aren't ever inherited by
anything, and they maintain a stack (I think) of sender state objects which
doesn't affect the size of the packet. The sender state is inherited in a
bunch of places since that is often customized. As far as compressed
packets, what would the benefit be? It seems like there would be a lot of
overhead to compress and decompress them, and I don't think they ever get
all that big. I'm also hesitant to add "final" in there since I think it
would be potentially limiting, but sometimes sensible limits are useful :-).

The reason exactly how big Request/Packets are is a concern is that I think
you'd get a meaningful performance benefit if you could use extra knowledge
to do things faster than just using new. For instance, if you knew that
what you were allocating was always X bytes, then you could just have a
bunch of blocks of X bytes laying around in a list and return one when
asked. You wouldn't have to try to keep things compacted like new would, or
keep track of (or even check) how big things were. That marginal bit of
savings could make a big difference if allocations happened a lot,
particularly if the type of allocations were ones new (and/or malloc if
it's built on it) was bad at. My brief Googling says malloc is bad at lots
of small allocations, for what that's worth.

Also, a custom allocator could make some types of debugging marginally
harder. For instance, if you used valgrind, it would replace malloc/new,
and that wouldn't be used in the typical way for those types. They would
likely be mostly batch allocated at the start of simulation, and then just
returned by the various mechanisms later. That would make information about
where a particular Request object came from less useful, although you would
still likely be able to see that it was accessed in an illegal way, and
where that access happened.

I agree that taking an incremental approach and doing some experiments are
worth while, and that wrapping request allocation in a small function would
be a good first step. I can't promise a timeline, but hopefully I'll have
something to share at some point in the future.

Gabe

Post by Nikos Nikoleris
Hey all,
The bigger opportunity is probably around packets as sometimes we are
allocating a number of them per request.
A custom allocator would be a great contribution if it would make gem5
substantially faster. But if it proved make things only marginally
better, I agree with Jason, we would need to factor in the fact that we
will be making the code more fragile. Would debugging/profiling be
harder if we used a custom deallocator?
If the size of the Packet is a concern, we will probably have to deal
with members that are not fixed in size such as the senderState. It
might also be worth looking into something similar for the pointers from
packets such as data.
Packet and Request memory management is something that we have been
concerned about as well. Giacomo, recently, changed all naked Request
pointers for shared_ptr and now we don't have to manually deallocate
them. But when we tried to do this for Packet pointers (figuring out
when to de-allocate a packet is not trivial) there was a substantial
overhead.
Nikos

Post by Jason Lowe-Power
Hey Gabe,
Are you thinking that a custom allocator would make a difference in terms
of memory footprint or in terms of performance (or both)?
- I'm hesitant to put the final keyword on Packet. I think we could see
some code cleanup by making Packet a polymorphic object (e.g., Daniel has
been talking about implementing packets with compressed data). Obviously,
using polymorphism isn't required, and I don't have any plans to make

this

Post by Jason Lowe-Power
change in the near future. But it's something that I've been thinking

about.

Post by Jason Lowe-Power
- I like the idea of requiring the use of a small function to allocate
requests and packets. In many cases, for Packets, we use
Packet::createRead()/createWrite(), which is a step in that direction. I
think this would be a relatively non-controversial change moving in the
direction of custom allocators. It may also allow us to do some testing

Post by Jason Lowe-Power
this more easily.
Another argument against custom allocators is that we'll be adding a
significant amount of error-prone code. I like the idea of using standard
libraries as much as possible to aid in understanding the code and

reducing

Post by Jason Lowe-Power
the bug surface.
That said, if we get a 20% speedup or a reduction in memory footprint

then

Post by Jason Lowe-Power
this change would likely be worth it!
Cheers,
Jason

Post by Gabe Black
I was just wondering whether it would make sense to have custom

allocators