Crash in [@ stackoverflow | profiler_suspend_and_sample_thread] in background hang monitor
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox-esr115 | --- | disabled |
firefox117 | --- | disabled |
firefox118 | --- | disabled |
firefox119 | --- | fixed |
People
(Reporter: mccr8, Assigned: yannis)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/0cb8feec-53af-4565-bbf0-202e70230622
Reason: EXCEPTION_STACK_OVERFLOW
Top 10 frames of crashing thread:
0 xul.dll __chkstk /builds/worker/workspace/obj-build/toolkit/library/build/D:/a/_work/1/s/src/vctools/crt/vcstartup/src/misc/amd64/chkstk.asm:109
1 xul.dll profiler_suspend_and_sample_thread tools/profiler/core/platform.cpp:6936
2 xul.dll profiler_suspend_and_sample_thread::<lambda_19>::operator const tools/profiler/core/platform.cpp:7058
2 xul.dll mozilla::profiler::ThreadRegistry::OffThreadRef::WithLockedRWFromAnyThread tools/profiler/public/ProfilerThreadRegistry.h:188
2 xul.dll profiler_suspend_and_sample_thread::<lambda_19>::operator const tools/profiler/core/platform.cpp:7054
2 xul.dll mozilla::profiler::ThreadRegistry::WithOffThreadRef tools/profiler/public/ProfilerThreadRegistry.h:259
2 xul.dll profiler_suspend_and_sample_thread tools/profiler/core/platform.cpp:7052
3 xul.dll mozilla::ThreadStackHelper::GetStack toolkit/components/backgroundhangmonitor/ThreadStackHelper.cpp:146
3 xul.dll mozilla::BackgroundHangThread::Notify toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp:505
4 xul.dll nsTimerImpl::Fire::<lambda_12>::operator const xpcom/threads/nsTimerImpl.cpp:675
It looks like the BackgroundHang monitor has detected a hang, then it tries to get a stack and overflow the stack? Not sure what might be going wrong there. In some, but not all, of these crashes the main thread is in BackgroundHangThread::NotifyActivity().
Assignee | ||
Comment 1•2 years ago
•
|
||
To me these are crashes that could appear when a first background hang is detected but we are in a low memory condition.
The background hang monitor thread has 256KB of reserved stack space, but only a few are committed before the first hang is detected. Then, when a first hang is detected, __chkstk
tries to commit 18600 bytes of stack, which are needed mostly to store the 2 * 1024 pointers in the NativeStack nativeStack
stack variable.
These crashes may be favored by the weird trick (bug 1716727), which, by introducing artificial delays when we are in a low memory condition instead of crashes, could favor triggering background hang detection while we are in a low memory condition.
We could try to avoid these crashes by pre-committing the required 18600 bytes of stack when the thread is created, unless we have a better idea? (pre-allocating the nativeStack
variable in heap could also be a cleaner way to achieve the same thing)
Assignee | ||
Comment 2•2 years ago
|
||
When a first hang is detected, the BHMgr Monitor thread needs to commit
5 more pages of stack to run profiler_suspend_and_sample_thread, which
contains big stack variables. If that occurs while we are low on memory,
failure to commit stack pages can crash the process.
In bug 1716727, we have added delays on failed allocations to try
to avoid crashing the main process under low memory condition. These
delays could trigger the background hang monitor, which could in
turn crash the process, as they occur in a low memory condition where we
will likely fail to commit.
We can pre-commit the 5 pages of stack at thread initialization to
ensure that they will already be commited when we later need them. Or at
least, we can try to see if that works.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
The severity field is not set for this bug.
:canova, could you have a look please?
For more information, please visit BugBot documentation.
Comment 4•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:yannis, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 5•2 years ago
|
||
Crashes are coming from the same install, no need to increase severity here. But good reminder as this got out of my mind.
Comment 6•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 8•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Description
•