Closed Bug 1678651 Opened 4 years ago Closed 4 years ago

Sampling suddenly stops on the content process

Tracking

()

Status:

RESOLVED FIXED

Milestone:

86 Branch

Tracking Flags:

Tracking

Status

firefox86

---

fixed

People

(Reporter: denispal, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(9 files)

Bug 1678651 - Better debugging assertions in ProfileBufferChunk - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Added ProfileChunkedBuffer state tests - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Rewrote TestChunkedBufferSingle() to better test buffer-filling scenarios - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - ProfileChunkedBuffer now keeps track of failed puts - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Fixed sampler's detection of too-big stacks - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Better ProfileChunkedBuffer::Clear(), recycles released chunks - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - ProfilerChunkedBuffer: Always mark current chunk full when entry doesn't fit - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Increase expected max stack record size to 128KB - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1678651 - Centralize scExpectedMaximumStackSize in ProfileBufferChunkManager - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 48 bytes, text/x-phabricator-request		Details \| Review

Denis Palmeiro [:denispal]

Reporter

Description

•

4 years ago

Here are a couple profiles recently reported in bug 1664067. The sampling data on the content process seems to suddenly stop but there is clearly activity going on based on the screenshots and markers.

https://share.firefox.dev/3nghqaG
https://share.firefox.dev/32zSqTJ

Markus Stange [:mstange]

Comment 1

•

4 years ago

Looking at the stack heights, it seems to have been broken after reaching a really deep stack containing 746 frames. (I got this number by going to the deepest node in the call tree, choosing "Copy stack" and pasting it into a text editor.)

I'm not sure what could break based on the stack depth, but maybe it's a hint.

Greg Tatum [:gregtatum]

Comment 2

•

4 years ago

I suspect something is going on with the sampling thread here as the markers are still being captured. This may be that the existing hangs issue from that bug are affecting profiling in some way.

ni: Gerald, to bring this to your attention.

Flags: needinfo?(gsquelart)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 3

•

4 years ago

Looking at the first profile, there's a "DOM Worker" threads in the same process, which continues to be sampled. So the sampler loop still runs in the process.
Looking at the code, either the thread was removed from the list of "live" threads (very unlikely), or samples were in fact taken but somehow lost.
This last hypothesis, with Markus' clue that stack size have been growing, would point at the too-big stack sample code.

I managed to get too-big stacks on foxnews.com, they were more than the current maximum of almost 68 KB.

While investigating, I saw that this code (and supporting code around it) is in fact incorrect!

It checks for "cleared" blocks, which won't happen because we don't actually destroy old blocks. So the first test never catches the too-big stack issue, and then an empty stack is silently recorded.
More worryingly I suspect that the ProfileBufferChunkManagerSingle's single chunk is not actually re-used, which could prevent later samples from getting stored. (Though this is contradicted by the DOM Worker samples -- but I'd still like to review this code and make sure it does its job.)

As a quick solution here, we could bump up the buffer size during profiling. And I should at least make the size checks work.

A better solution would be to dynamically increase the size when needed. But this cannot be done while a thread is suspended, which makes things more complicated.
I'll do what I can here, and file follow-ups as necessary...

Assignee: nobody → gsquelart

Severity: -- → S3

Flags: needinfo?(gsquelart)

Priority: -- → P2

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 4

•

4 years ago

Attached file Bug 1678651 - Better debugging assertions in ProfileBufferChunk - r?canaltinova — Details

In DEBUG builds, instead of only testing for the expected chunk state(s), there are now separate assertions for each unexpected state, which makes it much easier to track the source of failures.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 5

•

4 years ago

Attached file Bug 1678651 - Added ProfileChunkedBuffer state tests - r?canaltinova — Details

Depends on D99974

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 6

•

4 years ago

Attached file Bug 1678651 - Rewrote TestChunkedBufferSingle() to better test buffer-filling scenarios - r?canaltinova — Details

This bug has revealed some issues when the single chunk gets filled, and there are different paths depending on whether the chunk is filled right to the end, or past it.
Later patches will fix these issues and update these tests accordingly.

Depends on D99975

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 7

•

4 years ago

Attached file Bug 1678651 - ProfileChunkedBuffer now keeps track of failed puts - r?canaltinova — Details

When a "Put" operation fails (most probably because no chunk was available to store the data), we remember the number of bytes that couldn't be stored.
This can be useful to give an indication of how much more memory would have been needed for successful puts.

Depends on D99976

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 8

•

4 years ago

Attached file Bug 1678651 - Fixed sampler's detection of too-big stacks - r?canaltinova — Details

The profiler sampler was using the "cleared block count" as indication that some data could not be stored.
But this was incorrect, because cleared blocks only account for blocks in chunks that have been destroyed or recycled, but this missed data that could not be stored due to lack of space.
In particular, ProfileBufferChunkManagerSingle only provides one chunk, so when it gets full, a new chunk cannot be allocated, and the data is lost; but the "cleared block count" is left unchanged (the one chunk is only released, but not destroyed nor recycled).

Now the sampler correctly looks at the "failed puts bytes", and can report precisely how many bytes could not be stored.

Depends on D99977

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 9

•

4 years ago

Attached file Bug 1678651 - Better ProfileChunkedBuffer::Clear(), recycles released chunks - r?canaltinova — Details

Instead of discarding released chunks, we keep them as "next" chunks, and we make sure there's a valid "current" chunk when possible.

Recycling released chunks means that when using the ProfileBufferChunkManagerSingle, the one chunk, in whichever state it may be, will be kept alive and reused.

Depends on D99978

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 10

•

4 years ago

Attached file Bug 1678651 - ProfilerChunkedBuffer: Always mark current chunk full when entry doesn't fit - r?canaltinova — Details

Depends on D99979

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 11

•

4 years ago

Attached file Bug 1678651 - Increase expected max stack record size to 128KB - r?canaltinova — Details

Some sites do have stacks that require more than 64KB to store in the profiler buffer.
Note that this only affects one semi-permanent buffer per process during profiling, and short-lived buffers when capturing stacks in markers.

Depends on D99980

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

4 years ago

Blocks: 1683040

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 12

•

4 years ago

Attached file Bug 1678651 - Centralize scExpectedMaximumStackSize in ProfileBufferChunkManager - r?canaltinova — Details

The "expected maximum stack size" (currently 64KiB) value was present in multiple places.
Now it's accessible from everywhere as ProfileBufferChunkManager::scExpectedMaximumStackSize, so it's easier to modify as needed.

Depends on D99980

Pulsebot

Comment 13

•

4 years ago

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e6890e92e588 Better debugging assertions in ProfileBufferChunk - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/f52a13d8eba8 Added ProfileChunkedBuffer state tests - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/b597a84c7777 Rewrote TestChunkedBufferSingle() to better test buffer-filling scenarios - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/9bf07ef7bfb5 ProfileChunkedBuffer now keeps track of failed puts - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/ba92771a968a Fixed sampler's detection of too-big stacks - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/07b13d6485bc Better ProfileChunkedBuffer::Clear(), recycles released chunks - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/9116bff9e29b ProfilerChunkedBuffer: Always mark current chunk full when entry doesn't fit - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/dce478a960db Centralize scExpectedMaximumStackSize in ProfileBufferChunkManager - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/ece3bd1f7094 Increase expected max stack record size to 128KB - r=canaltinova

Cosmin Sabou [:CosminS]

Comment 14

•

4 years ago

bugherder

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox86: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 86 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Sampling suddenly stops on the content process

Categories

(Core :: Gecko Profiler, defect, P2)

Tracking

()

People

(Reporter: denispal, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(9 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Comment 13

Comment 14

Attachment

General

Description

File Name

Content Type