Closed Bug 1930437 Opened 6 days ago Closed 5 days ago

Intermittent Shutdown hanging at step XPCOMShutdownThreads. Something is blocking the main-thread | mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)

Categories

(Core :: mozglue, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1358898

People

(Reporter: whimboo, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Crash Data

Various test jobs like this one from mozilla-release are failing due to a crash as triggered by a shutdown hang in:

mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)

[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Linux Ubuntu 18.04 - bionic (Ubuntu 18.04.6 LTS)
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - 
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Crash reason:  SIGSEGV / SEGV_MAPERR
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Crash address: 0x0000000000000000
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Crashing instruction: `mov dword [0x0], 0xf4`
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Memory accessed by instruction:
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -   0. Address: 0x0000000000000000
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      Size: 4
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Process uptime: not available
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - 
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO - Thread 25 Shutdow~minator (crashed)
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -  0  libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 317]
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -     Found by: inlining
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -  1  libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 244 + 0xa]
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      rax = 0x00007f180ab63220    rdx = 0x2e6461657268742d
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      rcx = 0x00005555d3467b90    rbx = 0x00007f18293fdd50
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      rsi = 0x00007f180ab42fd0    rdi = 0x00007f180ab63268
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      rbp = 0x0000000000000276    rsp = 0x00007f18293fdd50
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -       r8 = 0x000000000000000d     r9 = 0x0000000000002003
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      r10 = 0x00007f18460178f0    r11 = 0x00007f1846041720
[task 2024-11-07T19:28:21.700Z] 19:28:21     INFO -      r12 = 0x0000000000000009    r13 = 0x0000000000000000
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      r14 = 0x00007f18338c7010    r15 = 0x0002000100000000
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      rip = 0x00007f183962fc25
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -     Found by: given as instruction pointer in context
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -  2  libnspr4.so!_pt_root [ptthread.c:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 201 + 0x6]
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      rbx = 0x00007f180a17c940    rbp = 0x0000000000000007
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      rsp = 0x00007f18293fdda0    r12 = 0x00007f18471e6378
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      r13 = 0x0000000000000000    r14 = 0x00007f18293fe700
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      r15 = 0x00007f1845c33c40    rip = 0x00007f18471c3bd2
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -     Found by: call frame info
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -  3  firefox-bin!set_alt_signal_stack_and_start(PthreadCreateParams*) [pthread_create_interposer.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 81 + 0x5]
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      rbx = 0x00007f180a17c940    rbp = 0x0000000000000000
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      rsp = 0x00007f18293fddf0    r12 = 0x00007f18471c3a40
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      r13 = 0x0000000000000000    r14 = 0x00007f1845e15000
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -      r15 = 0x00007f18293fde20    rip = 0x00005555d3412620
[task 2024-11-07T19:28:21.701Z] 19:28:21     INFO -     Found by: call frame info

Checking the BgThreadPool thread shows the following stack:

[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -  0  libpthread.so.0 + 0xded9
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rax = 0xfffffffffffffdfc    rdx = 0x0000000000000000
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rcx = 0x00007f1846290ed9    rbx = 0x00007f18160ea8e0
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rsi = 0x0000000000000080    rdi = 0x00007f18160ea908
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rbp = 0x0000000000000000    rsp = 0x00007f18160ea6c0
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -       r8 = 0x0000000000000000     r9 = 0x0000000000000003
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      r10 = 0x00007f18160ea720    r11 = 0x0000000000000246
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      r12 = 0x00007f1831ccc6c0    r13 = 0x00007f18160ea908
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      r14 = 0x00007f18160ea7d0    r15 = 0x00007f18160ea904
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rip = 0x00007f1846290ed9
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -     Found by: given as instruction pointer in context
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -  1  libpthread.so.0 + 0xd68f
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rsp = 0x00007f18160ea738    rip = 0x00007f1846290690
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -     Found by: stack scanning
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -  2  firefox-bin!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_posix.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 140 + 0xa]
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rsp = 0x00007f18160ea7c0    rip = 0x00005555d3415c59
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -     Found by: stack scanning
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -  3  libxul.so!mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [CondVar.h:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 65]
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -     Found by: inlining
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -  4  libxul.so!nsThreadPool::Run() [nsThreadPool.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 424 + 0x64]
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      rbx = 0x00007f1831ccc6a0    rsp = 0x00007f18160ea820
[task 2024-11-07T19:28:21.747Z] 19:28:21     INFO -      r12 = 0x00007f1831ccc6c0    r14 = 0x00007f1831c963c0
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -      r15 = 0x00000045d964b800    rip = 0x00007f1834f893db
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -     Found by: call frame info
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -  5  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 1149]
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -     Found by: inlining
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -  6  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 480 + 0x474]
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -      rbx = 0x00007f180ad27580    rbp = 0x00007f1834f88900
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -      rsp = 0x00007f18160ea9b0    r12 = 0x00007f180ab3efc0
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -      r13 = 0x0000000000000000    r14 = 0x00000000ffffffff
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -      r15 = 0x00007f1831ccc6a0    rip = 0x00007f1834f8b9b0
[task 2024-11-07T19:28:21.748Z] 19:28:21     INFO -     Found by: call frame info

It looks like that this situation happens quite a lot with a focus on release builds on mozilla-beta and mozilla-release. I've asked sheriffs to re-classify existing failures to see how wide-spread this issue is. So numbers should show up soon.

Mike do you have an idea what this might have caused it?

Flags: needinfo?(mh+mozilla)

This signature is just saying that the BgThreadPool is idling and waiting for more events. And indeed the main thread SpinEventLoopUntil waits for IPDL Background thread, which is also indicated by the nested event loop warning.

In general I do not think we gain much by trying to file new bugs with signatures, as we probably will end up finding the same instances as bug RunWatchdogShutdownhang. Which does not mean we should not file bugs at all, but we should refrain from adding (easily misleading) signatures. Once I spot a case, I usually file an investigation bug for that case. But first there should be a short explanation of findings as comment on the main bug.

Sorry, when I did this wrongly. I could certainly need more details in how to analyze those hangs. Do we have some documentation for that?

Am I getting it right that you are saying we should just duplicate this bug to bug 1358898 and classifying all the failures regarding those hangs against that meta bug?

Flags: needinfo?(mh+mozilla) → needinfo?(jstutte)

This specific bug seems wrong, at least, as being in mozilla::detail::ConditionVariableImpl::wait_for can happen on almost any thread (or at least on any thread pool thread), and the thread we are waiting for is a different one, that unfortunately seems to be idling, too. SO the log does not contain any direct hint you could filter on. My impression in general is that for this category of bugs it is hard to get the signatures right without having already understood the root cause.

What's more, they are rare in the wild these days, fortunately.

I believe in this case there is something ongoing with the session file late in shutdown that breaks a JS promise chain which might make something wait forever. Which is unfortunately not visible from the stack traces.

Flags: needinfo?(jstutte)

Thanks. Then lets dupe to bug 1358898.

Status: NEW → RESOLVED
Closed: 5 days ago
Duplicate of bug: RunWatchdogShutdownhang
Resolution: --- → DUPLICATE

Removing unwanted signature to stop the bot from updating bug 1358898.

Crash Signature: [@ mozilla::(anonymous namespace)::RunWatchdog] [@ mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)] → [@ mozilla::(anonymous namespace)::RunWatchdog]
You need to log in before you can comment on or make changes to this bug.