Sampling suddenly stops on the content process
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox86 | --- | fixed |
People
(Reporter: denispal, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(9 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Here are a couple profiles recently reported in bug 1664067. The sampling data on the content process seems to suddenly stop but there is clearly activity going on based on the screenshots and markers.
https://share.firefox.dev/3nghqaG
https://share.firefox.dev/32zSqTJ
Comment 1•4 years ago
|
||
Looking at the stack heights, it seems to have been broken after reaching a really deep stack containing 746 frames. (I got this number by going to the deepest node in the call tree, choosing "Copy stack" and pasting it into a text editor.)
I'm not sure what could break based on the stack depth, but maybe it's a hint.
Comment 2•4 years ago
|
||
I suspect something is going on with the sampling thread here as the markers are still being captured. This may be that the existing hangs issue from that bug are affecting profiling in some way.
ni: Gerald, to bring this to your attention.
Assignee | ||
Comment 3•4 years ago
|
||
Looking at the first profile, there's a "DOM Worker" threads in the same process, which continues to be sampled. So the sampler loop still runs in the process.
Looking at the code, either the thread was removed from the list of "live" threads (very unlikely), or samples were in fact taken but somehow lost.
This last hypothesis, with Markus' clue that stack size have been growing, would point at the too-big stack sample code.
I managed to get too-big stacks on foxnews.com, they were more than the current maximum of almost 68 KB.
While investigating, I saw that this code (and supporting code around it) is in fact incorrect!
- It checks for "cleared" blocks, which won't happen because we don't actually destroy old blocks. So the first test never catches the too-big stack issue, and then an empty stack is silently recorded.
- More worryingly I suspect that the
ProfileBufferChunkManagerSingle
's single chunk is not actually re-used, which could prevent later samples from getting stored. (Though this is contradicted by the DOM Worker samples -- but I'd still like to review this code and make sure it does its job.)
As a quick solution here, we could bump up the buffer size during profiling. And I should at least make the size checks work.
A better solution would be to dynamically increase the size when needed. But this cannot be done while a thread is suspended, which makes things more complicated.
I'll do what I can here, and file follow-ups as necessary...
Assignee | ||
Comment 4•4 years ago
|
||
In DEBUG builds, instead of only testing for the expected chunk state(s), there are now separate assertions for each unexpected state, which makes it much easier to track the source of failures.
Assignee | ||
Comment 5•4 years ago
|
||
Depends on D99974
Assignee | ||
Comment 6•4 years ago
|
||
This bug has revealed some issues when the single chunk gets filled, and there are different paths depending on whether the chunk is filled right to the end, or past it.
Later patches will fix these issues and update these tests accordingly.
Depends on D99975
Assignee | ||
Comment 7•4 years ago
|
||
When a "Put" operation fails (most probably because no chunk was available to store the data), we remember the number of bytes that couldn't be stored.
This can be useful to give an indication of how much more memory would have been needed for successful puts.
Depends on D99976
Assignee | ||
Comment 8•4 years ago
|
||
The profiler sampler was using the "cleared block count" as indication that some data could not be stored.
But this was incorrect, because cleared blocks only account for blocks in chunks that have been destroyed or recycled, but this missed data that could not be stored due to lack of space.
In particular, ProfileBufferChunkManagerSingle
only provides one chunk, so when it gets full, a new chunk cannot be allocated, and the data is lost; but the "cleared block count" is left unchanged (the one chunk is only released, but not destroyed nor recycled).
Now the sampler correctly looks at the "failed puts bytes", and can report precisely how many bytes could not be stored.
Depends on D99977
Assignee | ||
Comment 9•4 years ago
|
||
Instead of discarding released chunks, we keep them as "next" chunks, and we make sure there's a valid "current" chunk when possible.
Recycling released chunks means that when using the ProfileBufferChunkManagerSingle
, the one chunk, in whichever state it may be, will be kept alive and reused.
Depends on D99978
Assignee | ||
Comment 10•4 years ago
|
||
Depends on D99979
Assignee | ||
Comment 11•4 years ago
|
||
Some sites do have stacks that require more than 64KB to store in the profiler buffer.
Note that this only affects one semi-permanent buffer per process during profiling, and short-lived buffers when capturing stacks in markers.
Depends on D99980
Assignee | ||
Comment 12•4 years ago
|
||
The "expected maximum stack size" (currently 64KiB) value was present in multiple places.
Now it's accessible from everywhere as ProfileBufferChunkManager::scExpectedMaximumStackSize, so it's easier to modify as needed.
Depends on D99980
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e6890e92e588
https://hg.mozilla.org/mozilla-central/rev/f52a13d8eba8
https://hg.mozilla.org/mozilla-central/rev/b597a84c7777
https://hg.mozilla.org/mozilla-central/rev/9bf07ef7bfb5
https://hg.mozilla.org/mozilla-central/rev/ba92771a968a
https://hg.mozilla.org/mozilla-central/rev/07b13d6485bc
https://hg.mozilla.org/mozilla-central/rev/9116bff9e29b
https://hg.mozilla.org/mozilla-central/rev/dce478a960db
https://hg.mozilla.org/mozilla-central/rev/ece3bd1f7094
Description
•