Closed Bug 1502383 Opened 6 years ago Closed 6 years ago

Long pauses due to profiler buffer reallocation

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: jesup, Unassigned)

References

(Blocks 1 open bug)

Details

Growing the profiler buffer (which means copying it to a new allocation) can be expensive when recording a profile, and might lead to gaps in sampling. We start at 4K buffer entries, and grow by powers of 2. This can lead to large pauses as the sample grows, unless you leave the profiler running long enough to max out the buffer size, before starting the test you want to capture (and never stop it). This may require checking the configuerd sizes used for mobile captures.
How big are the pauses and how do they happen? They should only happen at the very start, and I would have expected buffer sizes to be small enough to not take too long during allocation. The dynamic growth is a recent changed that we made in bug 1476757.
Summary: Allocate profiler buffer at configured size, instead of starting at 4k → Long pauses due to profiler buffer reallocation
This shows the details of that apparent growth better: http://bit.ly/2OPfmtw
I have to say I agree with njn in bug 1476757 comment 8. If we want to use fixed-time buffers, there are other ways (and we can eat (some) chance of allocation/copy in that case (only). Sure, being able to effectively set an unlimited buffersize without allocating it would be useful -- but that's not the case I care the most about, and I'm not willing to sacrifice profiler accuracy/stability for it. If we want long-duration sampling, we want the equivalent to lttng-ust to send the profiling samples to -- something very-low-overhead, based on recycled shmem buffers, etc. (might not be bad idea in general; move every bit of overhead out of the realtime capture we can - dump samples down a shmem hole basically to be processed/stored elsewhere/later)
(In reply to Markus Stange [:mstange] from comment #1) > How big are the pauses and how do they happen? Er, I meant, how *often* do they happen? (In reply to Randell Jesup [:jesup] from comment #3) > If we want to use > fixed-time buffers, there are other ways (and we can eat (some) chance of > allocation/copy in that case (only). Could you elaborate on that? I'd really like fixed-time on-at-all-times profiling to be the default workflow, so if you have suggestions to improve this workflow I'd love to hear them. > Sure, being able to effectively set an > unlimited buffersize without allocating it would be useful -- but that's not > the case I care the most about, and I'm not willing to sacrifice profiler > accuracy/stability for it. I agree, we shouldn't sacrifice profiler stability for this case.
So yes, the approach chosen in bug 1476757 is based on the assumption that allocation would be fast enough to not interfere with profiling stability. But your profile shows a pause of 300ms! Do you know how big the buffer was at that point?
(In reply to Markus Stange [:mstange] from comment #5) > So yes, the approach chosen in bug 1476757 is based on the assumption that > allocation would be fast enough to not interfere with profiling stability. > But your profile shows a pause of 300ms! Do you know how big the buffer was > at that point? No. And I think that particular pause was closer to 210ms, but still way beyond acceptable. I think that profile is set to 180MB max, on a dual XEON
Allocations are (generally) fast, though not always (paging operation, etc). it's the (large) memset's/memcpy's that are hitting us... If we had a segmented bufferqueue or arena of buffers we recycle, insertion and recycling would be fast without allocation/free/copy overhead. This patch took a design optimized for ultra-low-overhead insertion (at cost of other things) and lost that aspect, while retaining all the restrictions that gave us (power-of-twoness, etc). If we want to do what the patch wanted to, it should use a queue of bufferblocks, and recycle blocks as needed if the current one fills up, then you can later layer time limits on top of that (but realize that until you're at steady-state, there still will be occasional allocations if you want a time-based limit. If you want to be really smart, with a segmented queue you can background the "more buffers" operation, so that the overhead for that at least doesn't pause sampling; it merely slightly delays the buffer expanding slightly (and adds some CPU/memory/allocator overhead in that short time)). In the case of a fixed-size buffer, I'd recommend allocating enough segments to cover than at start time, instead of dynamically growing the number of segments-- though it would be much less problematic than the current impl if it was grown
I implemented a segmented queue with recycling in dom/media/doctor/MultiWriterQueue.h, so I'd be happy to try working on this bug here, which sounds like a similar thing if I correctly understand Randell's proposal. (Unless someone else can do it quicker/better, as I'm still learning about the profiler guts.)
Let's do the simplest thing possible and back out bug 1476757, and then rebase bug 1476775 on top of the backout. That'll mean that in the fixed-time window profiling mode, we'll sometimes run out of buffer. But that is vastly preferable over frequent gaps in sampling. A segmented predicting allocator sounds nice, but saving memory during profiling isn't important enough to warrant much effort at this point, I'd say.
The backout of bug 1476757 is currently on inbound.
Backout has merged to central.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.