Closed Bug 1840164 Opened 2 years ago Closed 2 years ago

Crash in [@ stackoverflow | profiler_suspend_and_sample_thread] in background hang monitor

Categories

(Core :: Gecko Profiler, defect, P1)

Unspecified
Windows 11
defect

Tracking

()

RESOLVED FIXED
119 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- disabled
firefox117 --- disabled
firefox118 --- disabled
firefox119 --- fixed

People

(Reporter: mccr8, Assigned: yannis)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/0cb8feec-53af-4565-bbf0-202e70230622

Reason: EXCEPTION_STACK_OVERFLOW

Top 10 frames of crashing thread:

0  xul.dll  __chkstk  /builds/worker/workspace/obj-build/toolkit/library/build/D:/a/_work/1/s/src/vctools/crt/vcstartup/src/misc/amd64/chkstk.asm:109
1  xul.dll  profiler_suspend_and_sample_thread  tools/profiler/core/platform.cpp:6936
2  xul.dll  profiler_suspend_and_sample_thread::<lambda_19>::operator const  tools/profiler/core/platform.cpp:7058
2  xul.dll  mozilla::profiler::ThreadRegistry::OffThreadRef::WithLockedRWFromAnyThread  tools/profiler/public/ProfilerThreadRegistry.h:188
2  xul.dll  profiler_suspend_and_sample_thread::<lambda_19>::operator const  tools/profiler/core/platform.cpp:7054
2  xul.dll  mozilla::profiler::ThreadRegistry::WithOffThreadRef  tools/profiler/public/ProfilerThreadRegistry.h:259
2  xul.dll  profiler_suspend_and_sample_thread  tools/profiler/core/platform.cpp:7052
3  xul.dll  mozilla::ThreadStackHelper::GetStack  toolkit/components/backgroundhangmonitor/ThreadStackHelper.cpp:146
3  xul.dll  mozilla::BackgroundHangThread::Notify  toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp:505
4  xul.dll  nsTimerImpl::Fire::<lambda_12>::operator const  xpcom/threads/nsTimerImpl.cpp:675

It looks like the BackgroundHang monitor has detected a hang, then it tries to get a stack and overflow the stack? Not sure what might be going wrong there. In some, but not all, of these crashes the main thread is in BackgroundHangThread::NotifyActivity().

To me these are crashes that could appear when a first background hang is detected but we are in a low memory condition.

The background hang monitor thread has 256KB of reserved stack space, but only a few are committed before the first hang is detected. Then, when a first hang is detected, __chkstk tries to commit 18600 bytes of stack, which are needed mostly to store the 2 * 1024 pointers in the NativeStack nativeStack stack variable.

These crashes may be favored by the weird trick (bug 1716727), which, by introducing artificial delays when we are in a low memory condition instead of crashes, could favor triggering background hang detection while we are in a low memory condition.

We could try to avoid these crashes by pre-committing the required 18600 bytes of stack when the thread is created, unless we have a better idea? (pre-allocating the nativeStack variable in heap could also be a cleaner way to achieve the same thing)

When a first hang is detected, the BHMgr Monitor thread needs to commit
5 more pages of stack to run profiler_suspend_and_sample_thread, which
contains big stack variables. If that occurs while we are low on memory,
failure to commit stack pages can crash the process.

In bug 1716727, we have added delays on failed allocations to try
to avoid crashing the main process under low memory condition. These
delays could trigger the background hang monitor, which could in
turn crash the process, as they occur in a low memory condition where we
will likely fail to commit.

We can pre-commit the 5 pages of stack at thread initialization to
ensure that they will already be commited when we later need them. Or at
least, we can try to see if that works.

Assignee: nobody → yjuglaret
Status: NEW → ASSIGNED

The severity field is not set for this bug.
:canova, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(canaltinova)
Severity: -- → S3
Flags: needinfo?(canaltinova)
Priority: -- → P1

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:yannis, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(yjuglaret)
Keywords: topcrash

Crashes are coming from the same install, no need to increase severity here. But good reminder as this got out of my mind.

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a4df96b719fb Pre-commit stack pages on background hang monitor thread. r=mhowell
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 119 Branch
Flags: needinfo?(yjuglaret) → in-testsuite+
See Also: → 1958276
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: