Crash in [@ stackoverflow | mozilla::profiler::PlatformData::ProfiledThread] during BackgroundHangThread::Notify()
Categories
(Core :: Gecko Profiler, defect, P3)
Tracking
()
People
(Reporter: release-mgmt-account-bot, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, Whiteboard: [fxp])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/919a857f-5a5c-48ea-a4fa-106710231026
Reason: EXCEPTION_STACK_OVERFLOW
Top 10 frames of crashing thread:
0 xul.dll mozilla::profiler::PlatformData::ProfiledThread const tools/profiler/public/ProfilerThreadPlatformData.h:31
0 xul.dll DoMozStackWalkBacktrace tools/profiler/core/platform.cpp:2194
1 xul.dll profiler_suspend_and_sample_thread::<lambda_120>::operator const tools/profiler/core/platform.cpp:7105
1 xul.dll Sampler::SuspendAndSampleAndResumeThread tools/profiler/core/platform-win32.cpp:286
1 xul.dll profiler_suspend_and_sample_thread tools/profiler/core/platform.cpp:7136
2 xul.dll profiler_suspend_and_sample_thread::<lambda_19>::operator const tools/profiler/core/platform.cpp:7190
2 xul.dll mozilla::profiler::ThreadRegistry::OffThreadRef::WithLockedRWFromAnyThread tools/profiler/public/ProfilerThreadRegistry.h:188
2 xul.dll profiler_suspend_and_sample_thread::<lambda_19>::operator const tools/profiler/core/platform.cpp:7186
2 xul.dll mozilla::profiler::ThreadRegistry::WithOffThreadRef tools/profiler/public/ProfilerThreadRegistry.h:259
2 xul.dll profiler_suspend_and_sample_thread tools/profiler/core/platform.cpp:7184
By querying Nightly crashes reported within the last 2 months, here are some insights about the signature:
- First crash report: 2023-08-22
- Process type: Multiple distinct types
- Is startup crash: No
- Has user comments: No
- Is null crash: No
Comment 2•7 months ago
|
||
I looked at about 10 crashes between the two signatures, and I noticed they are all happening on the BHMgr Monitor thread, and there's always BackgroundHangThread::Notify() on the stack. It looks like the main thread has been hanging long enough that it is collecting a stack. I think this also means it isn't people who have opted in to turning on the profiler, which would make this less of an issue.
Florian, do you know if there's anything odd about this thread, like a smaller stack space, that would cause these kinds of stack overflow when trying to report a hang from background hang monitoring? Thanks.
Reporter | ||
Comment 3•7 months ago
|
||
Copying crash signatures from duplicate bugs.
Comment 4•7 months ago
|
||
Some of these crashes are OOMs, there's not enough commit space to enlarge the stacks. Others aren't and it's unclear what might be causing them.
Comment 5•3 months ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #2)
I think this also means it isn't people who have opted in to turning on the profiler, which would make this less of an issue.
It's not people who have turned on the profiler indeed, but BHR is only enabled on the Nightly channel.
Florian, do you know if there's anything odd about this thread, like a smaller stack space, that would cause these kinds of stack overflow when trying to report a hang from background hang monitoring?
I don't know. If it's OOM crashes, maybe this thread should reserve enough memory to capture a profiler stack when the thread is created.
Updated•3 months ago
|
Updated•3 months ago
|
Comment 6•2 months ago
|
||
There was a spike on Feb 27 (24 reports), but it looks like they are all from the same machine, as they're happening all within 7 seconds, with the same hardware.
These reports say that there are 13GB available physical memory though. Gabriele, what makes you think that this is really an OOM?
Comment 7•2 months ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #6)
There was a spike on Feb 27 (24 reports), but it looks like they are all from the same machine, as they're happening all within 7 seconds, with the same hardware.
These reports say that there are 13GB available physical memory though. Gabriele, what makes you think that this is really an OOM?
The ones from the 27th don't look like OOMs. What you need to check to be sure if it's an OOM or not is the available page file, because that's the hard limit that Windows uses for memory (see this crash for example which still has some available physical memory but no page file left, also see my old article for a more in-depth explanation of how commit space works on Windows in case you're curious).
Something odd about the 27th crashes is that you can see the available page file increase in the crashes, as if the child processes dying were freeing memory. It could have been a user with a bazillion open tabs, who turned on the profiler and caused a cascade of crashes due to the increase in memory consumption, but this is just a theory.
Description
•