Open Bug 1812544 Opened 3 years ago Updated 6 months ago

ASSERTION: Dispatch to non-main thread after xpcom-shutdown-threads: '!gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread() == mThread'

Categories

(GeckoView :: General, defect, P5)

Unspecified
Android
defect

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: valentin, Assigned: m_kato)

References

Details

(Keywords: intermittent-failure)

Crash Data

Attachments

(1 obsolete file)

I've seen a couple of bugs that seem to assert on !gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread()
So far all of the intermittents I've seen only happen on Android. Could be a regression from bug 1797688, but possibly something else.
I'm filing this bug to track related issues.

Stack trace from bug 1811127.
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=402812108&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/PcUVyo-ARVStDEKDvtrx1w/runs/0/artifacts/public/logs/live_backing.log

[task 2023-01-18T22:32:04.640Z] 22:32:04  WARNING -  PROCESS-CRASH | netwerk/test/unit_ipc/test_alt-data_closeWithStatus_wrap.js | application crashed [@ MOZ_Crash(char const*, int, char const*)]
[task 2023-01-18T22:32:04.640Z] 22:32:04     INFO -  Mozilla crash reason: [Parent 15245, Unnamed thread 764d68023ef0] ###!!! ASSERTION: Dispatch to non-main thread after xpcom-shutdown-threads: '!gXPCOMThreadsShutDownNotified || mIsMainThread || PR_GetCurrentThread() == mThread', file /builds/worker/checkouts/gecko/xpcom/threads/ThreadEventTarget.cpp:77
[task 2023-01-18T22:32:04.640Z] 22:32:04     INFO -  Crash dump filename: /tmp/tmpk47or_ql/280841ae-447d-48c4-52f5-c748e90c9a64.dmp
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Operating system: Android
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -                    0.0.0 Linux 3.10.0+ #260 SMP PREEMPT Fri May 19 12:48:14 PDT 2017 x86_64
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  CPU: amd64
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -       family 6 model 6 stepping 3
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -       4 CPUs
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Crash reason:  SIGSEGV / SEGV_MAPERR
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Crash address: 0x0
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Crashing instruction: `mov dword [0x0], 0x201`
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Memory accessed by instruction:
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -    0. Address: 0x0000000000000000
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -       Size: 4
[task 2023-01-18T22:32:04.641Z] 22:32:04     INFO -  Process uptime: not available
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -  Thread 0 AndroidUI (crashed)
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -   0  libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 261]
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -      Found by: inlining
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -   1  libxul.so!Abort(char const*) [nsDebugImpl.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 513 + 0x17]
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -       rax = 0x0000764d7a5eed50    rdx = 0x0000000000000004
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -       rcx = 0x257b0bf81092e3e0    rbx = 0x00007fff30d1da10
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -       rsi = 0x00007fff30d1d250    rdi = 0x000000000000001b
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -       rbp = 0x00007fff30d1d910    rsp = 0x00007fff30d1d900
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -        r8 = 0x0000000000000000     r9 = 0x0000764d8674e090
[task 2023-01-18T22:32:04.642Z] 22:32:04     INFO -       r10 = 0x0000000000000017    r11 = 0x0000000000000246
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       r12 = 0x00007fff30d1d9a0    r13 = 0x00007fff30d1d958
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       r14 = 0x00007fff30d1d988    r15 = 0x00007fff30d1d9e8
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       rip = 0x0000764d5ceeb37d
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -      Found by: given as instruction pointer in context
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -   2  libxul.so!NS_DebugBreak [nsTSubstring.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 0 + 0x20]
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       rbx = 0x0000764d680030e2    rbp = 0x00007fff30d1e050
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       rsp = 0x00007fff30d1d920    r12 = 0x00007fff30d1d9a0
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       r13 = 0x00007fff30d1d958    r14 = 0x00007fff30d1d988
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -       r15 = 0x00007fff30d1d9e8    rip = 0x0000764d5ceeb035
[task 2023-01-18T22:32:04.643Z] 22:32:04     INFO -      Found by: call frame info
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -   3  libxul.so!mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) [ThreadEventTarget.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 75 + 0x25]
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -       rbx = 0x0000000080070057    rbp = 0x00007fff30d1e200
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -       rsp = 0x00007fff30d1e060    r12 = 0x0000764d59c665b0
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -       r13 = 0x00007fff30d1e258    r14 = 0x0000000000000000
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -       r15 = 0x0000764d5b80f910    rip = 0x0000764d5cfd4c30
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -      Found by: call frame info
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -   4  libxul.so!nsThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) [nsThread.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 676 + 0xd]
[task 2023-01-18T22:32:04.644Z] 22:32:04     INFO -       rbx = 0x00007fff30d1e210    rbp = 0x00007fff30d1e240
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       rsp = 0x00007fff30d1e210    r12 = 0x00007fff30d1e258
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       r13 = 0x00007fff30d1e258    r14 = 0x0000000000000000
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       r15 = 0x0000764d680937b0    rip = 0x0000764d5cfdcc9b
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -      Found by: call frame info
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -   5  libxul.so!XPCOMEventTargetWrapper::DispatchNative(mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&) [nsAppShell.cpp:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 361 + 0xd]
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       rbx = 0x0000764d68093760    rbp = 0x00007fff30d1e2a0
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       rsp = 0x00007fff30d1e250    r12 = 0x00007fff30d1e260
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       r13 = 0x00007fff30d1e258    r14 = 0x00007fff30d1e250
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -       r15 = 0x00007fff30d1e268    rip = 0x0000764d5fd7c0bb
[task 2023-01-18T22:32:04.645Z] 22:32:04     INFO -      Found by: call frame info
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -   6  libxul.so!mozilla::jni::NativeStub<mozilla::java::XPCOMEventTarget::DispatchNative_t, XPCOMEventTargetWrapper, mozilla::jni::Args<mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&> >::Wrap<&XPCOMEventTargetWrapper::DispatchNative(mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&)>(_JNIEnv*, _jobject*, _jobject*) [Natives.h:8a0fa777a6a9525721bceed324e9e1ae332d2715 : 1447 + 0x7]
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -       rbx = 0x00007fff30d1e308    rbp = 0x00007fff30d1e2d0
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -       rsp = 0x00007fff30d1e2b0    r12 = 0x0000000036c01ce0
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -       r13 = 0x0000000036cc94d0    r14 = 0x0000000036cc94f0
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -       r15 = 0x0000000000000001    rip = 0x0000764d5fd7bf4a
[task 2023-01-18T22:32:04.646Z] 22:32:04     INFO -      Found by: call frame info

Hmm, that check might be actually not checking what it thinks to check on Android? Not sure about the thread model on Android, but IIUC what normally would be our thread 0 is Thread 11 Gecko here in this trace. And we seem to do some wanted/needed thread hopping here between that and thread 0.

Nika, do you think we can find a working check for Android or should we just disable this check there?

Flags: needinfo?(nika)
Duplicate of this bug: 1811565
See Also: → 1803948
Duplicate of this bug: 1803235
Duplicate of this bug: 1802581

Copying crash signatures from duplicate bugs.

Crash Signature: [@ MOZ_Crash(char const*, int, char const*)]

(In reply to Jens Stutte [:jstutte] from comment #1)

Hmm, that check might be actually not checking what it thinks to check on Android? Not sure about the thread model on Android, but IIUC what normally would be our thread 0 is Thread 11 Gecko here in this trace. And we seem to do some wanted/needed thread hopping here between that and thread 0.

Nika, do you think we can find a working check for Android or should we just disable this check there?

As we discussed in the XPCOM meeting, the Gecko thread is already quite dead, and performing shutdown CCs at this point, so even if the dispatch was to the main thread (which it isn't), the thread which the dispatch is going to is definitely already dead.

I don't think the assertion needs to be changed as much as we need to fix whatever caller in Java is dispatching this runnable. It's hard to tell what that could be though, unfortunately, given the lack of Java stack information.

Perhaps someone on the geckoview team could dig into this more and figure out what's doing the dispatch here, and why it's happening after XPCOM has almost fully shut down?

Flags: needinfo?(nika)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #6)

Perhaps someone on the geckoview team could dig into this more and figure out what's doing the dispatch here, and why it's happening after XPCOM has almost fully shut down?

Thanks, moving to Geckoview for further analysis.

Component: XPCOM → General
OS: Unspecified → Android
Product: Core → GeckoView
Version: unspecified → Trunk
Duplicate of this bug: 1815486

Copying crash signatures from duplicate bugs.

Crash Signature: [@ MOZ_Crash(char const*, int, char const*)] → [@ MOZ_Crash(char const*, int, char const*)] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)]

The severity field is not set for this bug.
:cpeterson, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(cpeterson)
Duplicate of this bug: 1819539
Crash Signature: [@ MOZ_Crash(char const*, int, char const*)] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] → [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch]

Copying crash signatures from duplicate bugs.

Crash Signature: [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] → [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)]
Duplicate of this bug: 1822490
Duplicate of this bug: 1825439

Copying crash signatures from duplicate bugs.

Crash Signature: [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] → [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] [@ libxul.so + 0x000000000024d547]
Flags: needinfo?(cpeterson)
Crash Signature: [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] [@ libxul.so + 0x000000000024d547] → [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] [@ libxul.so + 0x000000000024d547]

The remaining instances are all from (old) Android. I assume there is a race between this check on the Android UI thread and rushing through ShutdownPhase::XPCOMShutdownThreads and flipping the flag on the XPCOM main thread.

Not sure if transforming the runnable involves calling Java code or blocks on mutexes or what not that would make it slow? Maybe we should check the shutdown phase after that?

Flags: needinfo?(bugzeeeeee)
Severity: -- → S4
Flags: needinfo?(bugzeeeeee)
Priority: -- → P5
See Also: → 1859551

This looks like the same assertion as bug 1859551 --> added as see-also.

(Looks like the last report on this bug here was about a month ago, but bug 1859551 is continuing to get reports of intermittent failures, for what it's worth.)

This still have a race condition that is comment #33. We should have something mutex to block main thread and XPCOMEventTargetWrapper::DispatchNative.(In reply to Jens Stutte [:jstutte] from comment #33)

The remaining instances are all from (old) Android. I assume there is a race between this check on the Android UI thread and rushing through ShutdownPhase::XPCOMShutdownThreads and flipping the flag on the XPCOM main thread.

Right. This still has a race condition on main thread and Android UI thread for checking whether shutting down. I think we should use RWLock or Mutex to avoid this assertion.

When dispatching task from Java, we have a check whether Gecko is shutting
down (bug 1558103). But this wasn't enough according to our CI result.
This still has a race condition on main thread and Android UI thread.

So we should use a lock instead.

Assignee: nobody → m_kato
Status: NEW → ASSIGNED
Duplicate of this bug: 1859551
Duplicate of this bug: 1914103
Duplicate of this bug: 1921998
Duplicate of this bug: 1935211

Copying crash signatures from duplicate bugs.

Crash Signature: [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] [@ libxul.so + 0x000000000024d547] → [@ MOZ_Crash] [@ mozilla::ThreadEventTarget::Dispatch] [@ mozilla::ThreadEventTarget::Dispatch(already_AddRefed<nsIRunnable>, unsigned int)] [@ libxul.so + 0x000000000024d547] [@ libxul.so + 0x00000000004ffa79]
Attachment #9408804 - Attachment is obsolete: true
Depends on: 1981209
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: