Closed Bug 1584396 Opened 1 year ago Closed 3 months ago

Intermittent GECKO(1931) | Assertion failure: !mOtherThreadsDispatchingViaEventTarget (XPCOM Dispatch hapenning at the same time our thread is being unset! This should not be possible!), at /builds/worker/workspace/build/src/dom/workers/WorkerThread.cpp

Categories

(Core :: DOM: Workers, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- fixed
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: asuth)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: cbrindusan [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=268670790&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/DD2_ZDWERRyDzuv4IEIFlg/runs/0/artifacts/public/logs/live_backing.log


[task 2019-09-26T22:49:50.774Z] 22:49:50 INFO - TEST-START | dom/canvas/test/test_imagebitmap.html
[task 2019-09-26T22:49:50.774Z] 22:49:50 INFO - GECKO(1931) | --DOMWINDOW == 17 (0x12140e400) [pid = 1933] [serial = 213] [outer = 0x0] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2019-09-26T22:49:50.774Z] 22:49:50 INFO - GECKO(1931) | --DOMWINDOW == 16 (0x1255a0000) [pid = 1933] [serial = 215] [outer = 0x0] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2019-09-26T22:49:50.819Z] 22:49:50 INFO - GECKO(1931) | ++DOMWINDOW == 17 (0x121410400) [pid = 1933] [serial = 222] [outer = 0x12145bc40]
[task 2019-09-26T22:49:51.069Z] 22:49:51 INFO - GECKO(1931) | [Child 1933, Main Thread] WARNING: '!surface', file /builds/worker/workspace/build/src/dom/canvas/ImageBitmap.cpp, line 425
[task 2019-09-26T22:49:51.069Z] 22:49:51 INFO - GECKO(1931) | [Child 1933, Main Thread] WARNING: 'aRv.Failed()', file /builds/worker/workspace/build/src/dom/canvas/ImageBitmap.cpp, line 694
[task 2019-09-26T22:49:51.069Z] 22:49:51 INFO - GECKO(1931) | Assertion failure: !mOtherThreadsDispatchingViaEventTarget (XPCOM Dispatch hapenning at the same time our thread is being unset! This should not be possible!), at /builds/worker/workspace/build/src/dom/workers/WorkerThread.cpp:130

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 11 months ago9 months ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 9 months ago8 months ago
Resolution: --- → INCOMPLETE

New occurrence: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289714296&repo=autoland&lineNumber=5051

[task 2020-02-20T17:45:56.034Z] 17:45:56 INFO - GECKO(1753) | Assertion failure: !mOtherThreadsDispatchingViaEventTarget (XPCOM Dispatch hapenning at the same time our thread is being unset! This should not be possible!), at /builds/worker/workspace/build/src/dom/workers/WorkerThread.cpp:130
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO - GECKO(1753) | #01: mozilla::dom::WorkerPrivate::ResetWorkerPrivateInWorkerThread() [dom/workers/WorkerPrivate.cpp:4861]
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO - GECKO(1753) | #02: mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() [dom/workers/RuntimeService.cpp:2346]
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO - GECKO(1753) | #03: nsThread::ProcessNextEvent(bool, bool*) [mfbt/RefPtr.h:314]
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO - GECKO(1753) | #04: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:481]
[task 2020-02-20T17:46:20.827Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO - GECKO(1753) | #05: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:332]
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO - GECKO(1753) | #06: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO -
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO - GECKO(1753) | #07: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:466]
[task 2020-02-20T17:46:20.828Z] 17:46:20 INFO -
[task 2020-02-20T17:46:21.327Z] 17:46:21 INFO - GECKO(1753) | #08: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO -
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO - GECKO(1753) | #09: libsystem_pthread.dylib + 0x32eb
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO -
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO - GECKO(1753) | #10: libsystem_pthread.dylib + 0x6249
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO -
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO - GECKO(1753) | [Parent 1753, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/ipc/glue/ProtocolUtils.cpp, line 481
[task 2020-02-20T17:46:21.328Z] 17:46:21 INFO - GECKO(1753) | [Parent 1753, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/ipc/glue/ProtocolUtils.cpp, line 481
[task 2020-02-20T17:46:21.329Z] 17:46:21 INFO - GECKO(1753) | [Parent 1753, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/ipc/glue/ProtocolUtils.cpp, line 481
[task 2020-02-20T17:46:21.329Z] 17:46:21 INFO - GECKO(1753) | ###!!! [Parent][MessageChannel] Error: (msgtype=0x370136,name=PContent::Msg_DetachBrowsingContext) Channel error: cannot send/recv
[task 2020-02-20T17:46:21.330Z] 17:46:21 INFO - GECKO(1753) | [Parent 1753, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ipc/ProtocolUtils.h, line 295
[task 2020-02-20T17:46:21.330Z] 17:46:21 ERROR - GECKO(1753) | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down

Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 8 months ago6 months ago
Resolution: --- → INCOMPLETE
Depends on: 1633392

I see the interesting situation here, that we first assert on !mOtherThreadsDispatchingViaEventTarget and then wait for mOtherThreadsDispatchingViaEventTarget to be decremented to 0 anyway.

So most probably in an optimized build nothing evil happens at all (at least at this specific code position). Is this assert then really needed?

Flags: needinfo?(bugmail)

So I was asking myself, if this assert really helps. Actually I see in the last dump another thread dispatching a message to the worker. This seems to happen here in WebCryptoTask which happens to do some interesting thread-hopping here. If I understand right, this dispatch can happen at arbitrary moments after some heavy encryption work finished and without further checking, if it actually should be still dispatched? Or is this just a situation we should be able to deal with and thus the assert is wrong?

Yeah, bad assertion.

I think we're seeing pathological thread scheduling here, with the following series of events happening:
0. There's a WebCryptoTask that holds a StrongWorkerRef that will keep the worker alive.

  1. The WebCryptoTask invokes the nsIEventTarget::Dispatch() on its threadpool thread that calls WorkerThread::Dispatch. Execution reaches the call to nsThread::Dispatch after doing mOtherThreadsDispatchingViaEventTarget++. Dispatch completes enough that the runnable can run on the worker.
  2. WebCryptoTask runs on the worker thread, releasing its StrongWorkerRef and allowing for the worker event loop to immediately shutdown.
  3. The worker shutdown code reaches the assertion.
  4. (Hasn't happened yet) Execution returns to the WorkerThread::Dispatch which does --mOtherThreadsDispatchingViaEventTarget.

So indeed the assertion is bad because it assumes some combination of:

  • The dispatch call is atomic. Which it's not. And probably can't be because in general we hold it to be unsafe to call code outside of a module while holding a lock. (Even if it's safe at the time the code is written, there's nothing stopping the other module from changing, etc.)
  • Multi-threading is our friend and that when scheduling dispatch of a runnable on another thread, our code invoking that dispatch will get to run to completion before that other code runs.

And given that we already have the while loop that handles what's happening, it's appropriate to lose the assertion and gain a comment.

Assignee: nobody → bugmail
Status: REOPENED → ASSIGNED
Flags: needinfo?(bugmail)

Hi Andrew, is there anything I am not aware of that prevents us from landing this patch?

Flags: needinfo?(bugmail)
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/383bbc14ece1
Do not assert about other thread dispatches at worker shutdown. r=dom-workers-and-storage-reviewers,ttung
Flags: needinfo?(bugmail)
Status: ASSIGNED → RESOLVED
Closed: 6 months ago3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.