I looked at the code as is, and Packets themselves aren't ever inherited by
anything, and they maintain a stack (I think) of sender state objects which
doesn't affect the size of the packet. The sender state is inherited in a
bunch of places since that is often customized. As far as compressed
packets, what would the benefit be? It seems like there would be a lot of
overhead to compress and decompress them, and I don't think they ever get
all that big. I'm also hesitant to add "final" in there since I think it
would be potentially limiting, but sometimes sensible limits are useful :-).
The reason exactly how big Request/Packets are is a concern is that I think
you'd get a meaningful performance benefit if you could use extra knowledge
to do things faster than just using new. For instance, if you knew that
what you were allocating was always X bytes, then you could just have a
bunch of blocks of X bytes laying around in a list and return one when
asked. You wouldn't have to try to keep things compacted like new would, or
keep track of (or even check) how big things were. That marginal bit of
savings could make a big difference if allocations happened a lot,
particularly if the type of allocations were ones new (and/or malloc if
it's built on it) was bad at. My brief Googling says malloc is bad at lots
of small allocations, for what that's worth.
Also, a custom allocator could make some types of debugging marginally
harder. For instance, if you used valgrind, it would replace malloc/new,
and that wouldn't be used in the typical way for those types. They would
likely be mostly batch allocated at the start of simulation, and then just
returned by the various mechanisms later. That would make information about
where a particular Request object came from less useful, although you would
still likely be able to see that it was accessed in an illegal way, and
where that access happened.
I agree that taking an incremental approach and doing some experiments are
worth while, and that wrapping request allocation in a small function would
be a good first step. I can't promise a timeline, but hopefully I'll have
something to share at some point in the future.
Gabe
Post by Nikos NikolerisHey all,
The bigger opportunity is probably around packets as sometimes we are
allocating a number of them per request.
A custom allocator would be a great contribution if it would make gem5
substantially faster. But if it proved make things only marginally
better, I agree with Jason, we would need to factor in the fact that we
will be making the code more fragile. Would debugging/profiling be
harder if we used a custom deallocator?
If the size of the Packet is a concern, we will probably have to deal
with members that are not fixed in size such as the senderState. It
might also be worth looking into something similar for the pointers from
packets such as data.
Packet and Request memory management is something that we have been
concerned about as well. Giacomo, recently, changed all naked Request
pointers for shared_ptr and now we don't have to manually deallocate
them. But when we tried to do this for Packet pointers (figuring out
when to de-allocate a packet is not trivial) there was a substantial
overhead.
Nikos
Post by Jason Lowe-PowerHey Gabe,
Are you thinking that a custom allocator would make a difference in terms
of memory footprint or in terms of performance (or both)?
- I'm hesitant to put the final keyword on Packet. I think we could see
some code cleanup by making Packet a polymorphic object (e.g., Daniel has
been talking about implementing packets with compressed data). Obviously,
using polymorphism isn't required, and I don't have any plans to make
this
Post by Jason Lowe-Powerchange in the near future. But it's something that I've been thinking
about.
Post by Jason Lowe-Power- I like the idea of requiring the use of a small function to allocate
requests and packets. In many cases, for Packets, we use
Packet::createRead()/createWrite(), which is a step in that direction. I
think this would be a relatively non-controversial change moving in the
direction of custom allocators. It may also allow us to do some testing
on
Post by Jason Lowe-Powerthis more easily.
Another argument against custom allocators is that we'll be adding a
significant amount of error-prone code. I like the idea of using standard
libraries as much as possible to aid in understanding the code and
reducing
Post by Jason Lowe-Powerthe bug surface.
That said, if we get a 20% speedup or a reduction in memory footprint
then
Post by Jason Lowe-Powerthis change would likely be worth it!
Cheers,
Jason
Post by Gabe BlackI was just wondering whether it would make sense to have custom
allocators
Post by Jason Lowe-PowerPost by Gabe Blackdefined for Request and Packet types which would keep around a pool of
them
Post by Jason Lowe-PowerPost by Gabe Blackrather than defaulting to the normal allocator. I suspect since both
types
Post by Jason Lowe-PowerPost by Gabe Blackof objects are allocated very frequently this could save a lot of heap
overhead.
I think there are two complications to doing this. First, we'd want to
make
Post by Jason Lowe-PowerPost by Gabe Blacksure Request and Packet objects are truly those objects and are of the
appropriate size. Subclasses could add new members and make the types
bigger. I think to ensure that, we'd need to add the "final" keyword
(added
Post by Jason Lowe-PowerPost by Gabe Blackin C++11) to ensure those types can't be subclassed and have
unpredictable
Post by Jason Lowe-PowerPost by Gabe Blacksizes.
Second, we use make_shared a lot to build Requests, and that actually
allocates a larger amount of memory to hold the object and reference
count
Post by Jason Lowe-PowerPost by Gabe Blackinformation. It allocates that larger object with new, and it looks like
you're supposed to use allocate_shared if you want to use a custom
allocator. Writing a custom allocator looks relatively complicated, but
it
Post by Jason Lowe-PowerPost by Gabe Blackmight make a big difference if it works correctly.
Also we'd probably want to put the Request allocator incantation in a
small
Post by Jason Lowe-PowerPost by Gabe Blackfunction rather than calling make_shared or allocate_shared directly to
avoid a lot of boilerplate and churn if things need to change.
Thoughts? Like I said I suspect this may make a significant difference,
but
Post by Jason Lowe-PowerPost by Gabe Blackit might not be easy to implement and may not actually make much of a
difference at all.
Gabe
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
_______________________________________________
gem5-dev mailing list
http://m5sim.org/mailman/listinfo/gem5-dev