ASSERTION: Dispatch to non-main thread after xpcom-shutdown-threads: '!gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread() == mThread'
Categories
(GeckoView :: General, defect, P5)
Tracking
(Not tracked)
People
(Reporter: valentin, Assigned: m_kato)
References
Details
(Keywords: intermittent-failure)
Crash Data
Attachments
(1 obsolete file)
I've seen a couple of bugs that seem to assert on !gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread()
So far all of the intermittents I've seen only happen on Android. Could be a regression from bug 1797688, but possibly something else.
I'm filing this bug to track related issues.
Stack trace from bug 1811127.
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=402812108&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/PcUVyo-ARVStDEKDvtrx1w/runs/0/artifacts/public/logs/live_backing.log
[task 2023-01-18T22:32:04.640Z] 22:32:04 WARNING - PROCESS-CRASH | netwerk/test/unit_ipc/test_alt-data_closeWithStatus_wrap.js | application crashed [@ MOZ_Crash(char const*, int, char const*)]
[task 2023-01-18T22:32:04.640Z] 22:32:04 INFO - Mozilla crash reason: [Parent 15245, Unnamed thread 764d68023ef0] ###!!! ASSERTION: Dispatch to non-main thread after xpcom-shutdown-threads: '!gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread() == mThread', file /builds/worker/checkouts/gecko/xpcom/threads/ThreadEventTarget.cpp:77
[task 2023-01-18T22:32:04.640Z] 22:32:04 INFO - Crash dump filename: /tmp/tmpk47or_ql/280841ae-447d-48c4-52f5-c748e90c9a64.dmp
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Operating system: Android
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - 0.0.0 Linux 3.10.0+ #260 SMP PREEMPT Fri May 19 12:48:14 PDT 2017 x86_64
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - CPU: amd64
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - family 6 model 6 stepping 3
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - 4 CPUs
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Crash reason: SIGSEGV / SEGV_MAPERR
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Crash address: 0x0
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Crashing instruction: `mov dword [0x0], 0x201`
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Memory accessed by instruction:
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - 0. Address: 0x0000000000000000
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Size: 4
[task 2023-01-18T22:32:04.641Z] 22:32:04 INFO - Process uptime: not available
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - Thread 0 AndroidUI (crashed)
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - 0 libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 261]
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - Found by: inlining
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - 1 libxul.so!Abort(char const*) [nsDebugImpl.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 513 + 0x17]
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - rax = 0x0000764d7a5eed50 rdx = 0x0000000000000004
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - rcx = 0x257b0bf81092e3e0 rbx = 0x00007fff30d1da10
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - rsi = 0x00007fff30d1d250 rdi = 0x000000000000001b
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - rbp = 0x00007fff30d1d910 rsp = 0x00007fff30d1d900
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - r8 = 0x0000000000000000 r9 = 0x0000764d8674e090
[task 2023-01-18T22:32:04.642Z] 22:32:04 INFO - r10 = 0x0000000000000017 r11 = 0x0000000000000246
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - r12 = 0x00007fff30d1d9a0 r13 = 0x00007fff30d1d958
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - r14 = 0x00007fff30d1d988 r15 = 0x00007fff30d1d9e8
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - rip = 0x0000764d5ceeb37d
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - Found by: given as instruction pointer in context
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - 2 libxul.so!NS_DebugBreak [nsTSubstring.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 0 + 0x20]
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - rbx = 0x0000764d680030e2 rbp = 0x00007fff30d1e050
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - rsp = 0x00007fff30d1d920 r12 = 0x00007fff30d1d9a0
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - r13 = 0x00007fff30d1d958 r14 = 0x00007fff30d1d988
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - r15 = 0x00007fff30d1d9e8 rip = 0x0000764d5ceeb035
[task 2023-01-18T22:32:04.643Z] 22:32:04 INFO - Found by: call frame info
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - 3 libxul.so!mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) [ThreadEventTarget.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 75 + 0x25]
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - rbx = 0x0000000080070057 rbp = 0x00007fff30d1e200
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - rsp = 0x00007fff30d1e060 r12 = 0x0000764d59c665b0
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - r13 = 0x00007fff30d1e258 r14 = 0x0000000000000000
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - r15 = 0x0000764d5b80f910 rip = 0x0000764d5cfd4c30
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - Found by: call frame info
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - 4 libxul.so!nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) [nsThread.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 676 + 0xd]
[task 2023-01-18T22:32:04.644Z] 22:32:04 INFO - rbx = 0x00007fff30d1e210 rbp = 0x00007fff30d1e240
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - rsp = 0x00007fff30d1e210 r12 = 0x00007fff30d1e258
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - r13 = 0x00007fff30d1e258 r14 = 0x0000000000000000
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - r15 = 0x0000764d680937b0 rip = 0x0000764d5cfdcc9b
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - Found by: call frame info
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - 5 libxul.so!XPCOMEventTargetWrapper::DispatchNative(mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&) [nsAppShell.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 361 + 0xd]
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - rbx = 0x0000764d68093760 rbp = 0x00007fff30d1e2a0
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - rsp = 0x00007fff30d1e250 r12 = 0x00007fff30d1e260
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - r13 = 0x00007fff30d1e258 r14 = 0x00007fff30d1e250
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - r15 = 0x00007fff30d1e268 rip = 0x0000764d5fd7c0bb
[task 2023-01-18T22:32:04.645Z] 22:32:04 INFO - Found by: call frame info
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - 6 libxul.so!mozilla::jni::NativeStub<mozilla::java::XPCOMEventTarget::DispatchNative_t, XPCOMEventTargetWrapper, mozilla::jni::Args<mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&> >::Wrap<&XPCOMEventTargetWrapper::DispatchNative(mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&)>(_JNIEnv*, _jobject*, _jobject*) [Natives.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 1447 + 0x7]
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - rbx = 0x00007fff30d1e308 rbp = 0x00007fff30d1e2d0
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - rsp = 0x00007fff30d1e2b0 r12 = 0x0000000036c01ce0
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - r13 = 0x0000000036cc94d0 r14 = 0x0000000036cc94f0
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - r15 = 0x0000000000000001 rip = 0x0000764d5fd7bf4a
[task 2023-01-18T22:32:04.646Z] 22:32:04 INFO - Found by: call frame info
Comment 1•3 years ago
|
||
Hmm, that check might be actually not checking what it thinks to check on Android? Not sure about the thread model on Android, but IIUC what normally would be our thread 0 is Thread 11 Gecko here in this trace. And we seem to do some wanted/needed thread hopping here between that and thread 0.
Nika, do you think we can find a working check for Android or should we just disable this check there?
Comment 5•3 years ago
|
||
Copying crash signatures from duplicate bugs.
Comment 6•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #1)
Hmm, that check might be actually not checking what it thinks to check on Android? Not sure about the thread model on Android, but IIUC what normally would be our thread 0 is
Thread 11 Geckohere in this trace. And we seem to do some wanted/needed thread hopping here between that and thread 0.Nika, do you think we can find a working check for Android or should we just disable this check there?
As we discussed in the XPCOM meeting, the Gecko thread is already quite dead, and performing shutdown CCs at this point, so even if the dispatch was to the main thread (which it isn't), the thread which the dispatch is going to is definitely already dead.
I don't think the assertion needs to be changed as much as we need to fix whatever caller in Java is dispatching this runnable. It's hard to tell what that could be though, unfortunately, given the lack of Java stack information.
Perhaps someone on the geckoview team could dig into this more and figure out what's doing the dispatch here, and why it's happening after XPCOM has almost fully shut down?
Comment 7•3 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #6)
Perhaps someone on the geckoview team could dig into this more and figure out what's doing the dispatch here, and why it's happening after XPCOM has almost fully shut down?
Thanks, moving to Geckoview for further analysis.
Comment 9•3 years ago
|
||
Copying crash signatures from duplicate bugs.
| Comment hidden (Intermittent Failures Robot) |
Comment 11•3 years ago
|
||
The severity field is not set for this bug.
:cpeterson, could you have a look please?
For more information, please visit auto_nag documentation.
| Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment 14•3 years ago
|
||
Copying crash signatures from duplicate bugs.
| Comment hidden (Intermittent Failures Robot) |
Comment 18•3 years ago
|
||
Copying crash signatures from duplicate bugs.
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 33•2 years ago
|
||
The remaining instances are all from (old) Android. I assume there is a race between this check on the Android UI thread and rushing through ShutdownPhase::XPCOMShutdownThreads and flipping the flag on the XPCOM main thread.
Not sure if transforming the runnable involves calling Java code or blocks on mutexes or what not that would make it slow? Maybe we should check the shutdown phase after that?
| Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment 35•1 year ago
|
||
This looks like the same assertion as bug 1859551 --> added as see-also.
(Looks like the last report on this bug here was about a month ago, but bug 1859551 is continuing to get reports of intermittent failures, for what it's worth.)
| Assignee | ||
Comment 36•1 year ago
|
||
This still have a race condition that is comment #33. We should have something mutex to block main thread and XPCOMEventTargetWrapper::DispatchNative.(In reply to Jens Stutte [:jstutte] from comment #33)
The remaining instances are all from (old) Android. I assume there is a race between this check on the Android UI thread and rushing through
ShutdownPhase::XPCOMShutdownThreadsand flipping the flag on the XPCOM main thread.
Right. This still has a race condition on main thread and Android UI thread for checking whether shutting down. I think we should use RWLock or Mutex to avoid this assertion.
| Assignee | ||
Comment 37•1 year ago
|
||
When dispatching task from Java, we have a check whether Gecko is shutting
down (bug 1558103). But this wasn't enough according to our CI result.
This still has a race condition on main thread and Android UI thread.
So we should use a lock instead.
Updated•1 year ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 51•1 year ago
|
||
Copying crash signatures from duplicate bugs.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•7 months ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Description
•