Intermittent Shutdown hanging at step XPCOMShutdownThreads. Something is blocking the main-thread | mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)
Categories
(Core :: mozglue, defect)
Tracking
()
People
(Reporter: whimboo, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Crash Data
Various test jobs like this one from mozilla-release are failing due to a crash as triggered by a shutdown hang in:
mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Linux Ubuntu 18.04 - bionic (Ubuntu 18.04.6 LTS)
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO -
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Crash reason: SIGSEGV / SEGV_MAPERR
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Crash address: 0x0000000000000000
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Crashing instruction: `mov dword [0x0], 0xf4`
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Memory accessed by instruction:
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - 0. Address: 0x0000000000000000
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Size: 4
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Process uptime: not available
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO -
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Thread 25 Shutdow~minator (crashed)
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - 0 libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 317]
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - Found by: inlining
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - 1 libxul.so!mozilla::(anonymous namespace)::RunWatchdog(void*) [nsTerminator.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 244 + 0xa]
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - rax = 0x00007f180ab63220 rdx = 0x2e6461657268742d
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - rcx = 0x00005555d3467b90 rbx = 0x00007f18293fdd50
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - rsi = 0x00007f180ab42fd0 rdi = 0x00007f180ab63268
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - rbp = 0x0000000000000276 rsp = 0x00007f18293fdd50
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - r8 = 0x000000000000000d r9 = 0x0000000000002003
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - r10 = 0x00007f18460178f0 r11 = 0x00007f1846041720
[task 2024-11-07T19:28:21.700Z] 19:28:21 INFO - r12 = 0x0000000000000009 r13 = 0x0000000000000000
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - r14 = 0x00007f18338c7010 r15 = 0x0002000100000000
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - rip = 0x00007f183962fc25
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - Found by: given as instruction pointer in context
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - 2 libnspr4.so!_pt_root [ptthread.c:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 201 + 0x6]
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - rbx = 0x00007f180a17c940 rbp = 0x0000000000000007
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - rsp = 0x00007f18293fdda0 r12 = 0x00007f18471e6378
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - r13 = 0x0000000000000000 r14 = 0x00007f18293fe700
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - r15 = 0x00007f1845c33c40 rip = 0x00007f18471c3bd2
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - Found by: call frame info
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - 3 firefox-bin!set_alt_signal_stack_and_start(PthreadCreateParams*) [pthread_create_interposer.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 81 + 0x5]
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - rbx = 0x00007f180a17c940 rbp = 0x0000000000000000
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - rsp = 0x00007f18293fddf0 r12 = 0x00007f18471c3a40
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - r13 = 0x0000000000000000 r14 = 0x00007f1845e15000
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - r15 = 0x00007f18293fde20 rip = 0x00005555d3412620
[task 2024-11-07T19:28:21.701Z] 19:28:21 INFO - Found by: call frame info
Checking the BgThreadPool thread shows the following stack:
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - 0 libpthread.so.0 + 0xded9
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rax = 0xfffffffffffffdfc rdx = 0x0000000000000000
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rcx = 0x00007f1846290ed9 rbx = 0x00007f18160ea8e0
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rsi = 0x0000000000000080 rdi = 0x00007f18160ea908
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rbp = 0x0000000000000000 rsp = 0x00007f18160ea6c0
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - r8 = 0x0000000000000000 r9 = 0x0000000000000003
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - r10 = 0x00007f18160ea720 r11 = 0x0000000000000246
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - r12 = 0x00007f1831ccc6c0 r13 = 0x00007f18160ea908
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - r14 = 0x00007f18160ea7d0 r15 = 0x00007f18160ea904
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rip = 0x00007f1846290ed9
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - Found by: given as instruction pointer in context
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - 1 libpthread.so.0 + 0xd68f
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rsp = 0x00007f18160ea738 rip = 0x00007f1846290690
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - Found by: stack scanning
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - 2 firefox-bin!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_posix.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 140 + 0xa]
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rsp = 0x00007f18160ea7c0 rip = 0x00005555d3415c59
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - Found by: stack scanning
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - 3 libxul.so!mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [CondVar.h:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 65]
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - Found by: inlining
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - 4 libxul.so!nsThreadPool::Run() [nsThreadPool.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 424 + 0x64]
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - rbx = 0x00007f1831ccc6a0 rsp = 0x00007f18160ea820
[task 2024-11-07T19:28:21.747Z] 19:28:21 INFO - r12 = 0x00007f1831ccc6c0 r14 = 0x00007f1831c963c0
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - r15 = 0x00000045d964b800 rip = 0x00007f1834f893db
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - Found by: call frame info
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - 5 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 1149]
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - Found by: inlining
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - 6 libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:936d749a83c1cee83d6e8e8bb26e9acbd5bd5cd6 : 480 + 0x474]
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - rbx = 0x00007f180ad27580 rbp = 0x00007f1834f88900
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - rsp = 0x00007f18160ea9b0 r12 = 0x00007f180ab3efc0
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - r13 = 0x0000000000000000 r14 = 0x00000000ffffffff
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - r15 = 0x00007f1831ccc6a0 rip = 0x00007f1834f8b9b0
[task 2024-11-07T19:28:21.748Z] 19:28:21 INFO - Found by: call frame info
Reporter | ||
Comment 1•6 days ago
|
||
It looks like that this situation happens quite a lot with a focus on release builds on mozilla-beta and mozilla-release. I've asked sheriffs to re-classify existing failures to see how wide-spread this issue is. So numbers should show up soon.
Mike do you have an idea what this might have caused it?
Comment 2•6 days ago
•
|
||
This signature is just saying that the BgThreadPool is idling and waiting for more events. And indeed the main thread SpinEventLoopUntil
waits for IPDL Background thread, which is also indicated by the nested event loop warning.
In general I do not think we gain much by trying to file new bugs with signatures, as we probably will end up finding the same instances as bug RunWatchdogShutdownhang. Which does not mean we should not file bugs at all, but we should refrain from adding (easily misleading) signatures. Once I spot a case, I usually file an investigation bug for that case. But first there should be a short explanation of findings as comment on the main bug.
Reporter | ||
Comment 3•5 days ago
|
||
Sorry, when I did this wrongly. I could certainly need more details in how to analyze those hangs. Do we have some documentation for that?
Am I getting it right that you are saying we should just duplicate this bug to bug 1358898 and classifying all the failures regarding those hangs against that meta bug?
Comment 4•5 days ago
|
||
This specific bug seems wrong, at least, as being in mozilla::detail::ConditionVariableImpl::wait_for
can happen on almost any thread (or at least on any thread pool thread), and the thread we are waiting for is a different one, that unfortunately seems to be idling, too. SO the log does not contain any direct hint you could filter on. My impression in general is that for this category of bugs it is hard to get the signatures right without having already understood the root cause.
What's more, they are rare in the wild these days, fortunately.
I believe in this case there is something ongoing with the session file late in shutdown that breaks a JS promise chain which might make something wait forever. Which is unfortunately not visible from the stack traces.
Reporter | ||
Comment 5•5 days ago
|
||
Thanks. Then lets dupe to bug 1358898.
Reporter | ||
Comment 6•5 days ago
|
||
Removing unwanted signature to stop the bot from updating bug 1358898.
Description
•