1502383 - Long pauses due to profiler buffer reallocation

Reporter

Description

•

6 years ago

Growing the profiler buffer (which means copying it to a new allocation) can be expensive when recording a profile, and might lead to gaps in sampling. We start at 4K buffer entries, and grow by powers of 2. This can lead to large pauses as the sample grows, unless you leave the profiler running long enough to max out the buffer size, before starting the test you want to capture (and never stop it). This may require checking the configuerd sizes used for mobile captures.

Markus Stange [:mstange]

Comment 1

•

6 years ago

How big are the pauses and how do they happen? They should only happen at the very start, and I would have expected buffer sizes to be small enough to not take too long during allocation. The dynamic growth is a recent changed that we made in bug 1476757.

Markus Stange [:mstange]

Updated

•

6 years ago

Summary: Allocate profiler buffer at configured size, instead of starting at 4k → Long pauses due to profiler buffer reallocation

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 2

•

6 years ago

This shows the details of that apparent growth better: http://bit.ly/2OPfmtw

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 3

•

6 years ago

I have to say I agree with njn in bug 1476757 comment 8. If we want to use fixed-time buffers, there are other ways (and we can eat (some) chance of allocation/copy in that case (only). Sure, being able to effectively set an unlimited buffersize without allocating it would be useful -- but that's not the case I care the most about, and I'm not willing to sacrifice profiler accuracy/stability for it. If we want long-duration sampling, we want the equivalent to lttng-ust to send the profiling samples to -- something very-low-overhead, based on recycled shmem buffers, etc. (might not be bad idea in general; move every bit of overhead out of the realtime capture we can - dump samples down a shmem hole basically to be processed/stored elsewhere/later)

Markus Stange [:mstange]

Comment 4

•

6 years ago

(In reply to Markus Stange [:mstange] from comment #1) > How big are the pauses and how do they happen? Er, I meant, how *often* do they happen? (In reply to Randell Jesup [:jesup] from comment #3) > If we want to use > fixed-time buffers, there are other ways (and we can eat (some) chance of > allocation/copy in that case (only). Could you elaborate on that? I'd really like fixed-time on-at-all-times profiling to be the default workflow, so if you have suggestions to improve this workflow I'd love to hear them. > Sure, being able to effectively set an > unlimited buffersize without allocating it would be useful -- but that's not > the case I care the most about, and I'm not willing to sacrifice profiler > accuracy/stability for it. I agree, we shouldn't sacrifice profiler stability for this case.

Markus Stange [:mstange]

Comment 5

•

6 years ago

So yes, the approach chosen in bug 1476757 is based on the assumption that allocation would be fast enough to not interfere with profiling stability. But your profile shows a pause of 300ms! Do you know how big the buffer was at that point?

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 6

•

6 years ago

(In reply to Markus Stange [:mstange] from comment #5) > So yes, the approach chosen in bug 1476757 is based on the assumption that > allocation would be fast enough to not interfere with profiling stability. > But your profile shows a pause of 300ms! Do you know how big the buffer was > at that point? No. And I think that particular pause was closer to 210ms, but still way beyond acceptable. I think that profile is set to 180MB max, on a dual XEON

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 7

•

6 years ago

Allocations are (generally) fast, though not always (paging operation, etc). it's the (large) memset's/memcpy's that are hitting us... If we had a segmented bufferqueue or arena of buffers we recycle, insertion and recycling would be fast without allocation/free/copy overhead. This patch took a design optimized for ultra-low-overhead insertion (at cost of other things) and lost that aspect, while retaining all the restrictions that gave us (power-of-twoness, etc). If we want to do what the patch wanted to, it should use a queue of bufferblocks, and recycle blocks as needed if the current one fills up, then you can later layer time limits on top of that (but realize that until you're at steady-state, there still will be occasional allocations if you want a time-based limit. If you want to be really smart, with a segmented queue you can background the "more buffers" operation, so that the overhead for that at least doesn't pause sampling; it merely slightly delays the buffer expanding slightly (and adds some CPU/memory/allocator overhead in that short time)). In the case of a fixed-size buffer, I'd recommend allocating enough segments to cover than at start time, instead of dynamically growing the number of segments-- though it would be much less problematic than the current impl if it was grown

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 8

•

6 years ago

I implemented a segmented queue with recycling in dom/media/doctor/MultiWriterQueue.h, so I'd be happy to try working on this bug here, which sounds like a similar thing if I correctly understand Randell's proposal. (Unless someone else can do it quicker/better, as I'm still learning about the profiler guts.)

Markus Stange [:mstange]

Comment 9

•

6 years ago

Let's do the simplest thing possible and back out bug 1476757, and then rebase bug 1476775 on top of the backout. That'll mean that in the fixed-time window profiling mode, we'll sometimes run out of buffer. But that is vastly preferable over frequent gaps in sampling. A segmented predicting allocator sounds nice, but saving memory during profiling isn't important enough to warrant much effort at this point, I'd say.

Markus Stange [:mstange]

Comment 10

•

6 years ago

The backout of bug 1476757 is currently on inbound.

Markus Stange [:mstange]

Comment 11

•

6 years ago

Backout has merged to central.

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox65: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla65

Bugzilla

Quick Search

Long pauses due to profiler buffer reallocation

Categories

(Core :: Gecko Profiler, enhancement)

Tracking

()

People

(Reporter: jesup, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11