Closed Bug 1315446 Opened 3 years ago Closed 3 years ago
Crash in mozilla::dom::workers::Worker
This bug was filed from the Socorro interface and is report bp-dc523270-1352-47b4-a7bf-67bdd2161105. ============================================================= Crashing Thread (33) Frame Module Signature Source 0 xul.dll mozilla::dom::workers::WorkerRunnable::Run() dom/workers/WorkerRunnable.cpp:327 1 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1067 2 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:290 3 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:354 4 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:228 5 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:208 6 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:467 7 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 8 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 9 ucrtbase.dll _crt_at_quick_exit 10 kernel32.dll BaseThreadInitThunk 11 ntdll.dll __RtlUserThreadStart 12 ntdll.dll _RtlUserThreadStart this signature has been around for a while but is particularly increasing in volume in firefox 49, so i'm marking it as a regression for that reason. overall it's rather mid- to low-volume though, as it's accounting for 0.05% of all browser crashes on the 49 release. correlations: (99.67% in signature vs 00.10% overall) address = 0x2d0 (99.67% in signature vs 36.74% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ (60.00% in signature vs 10.61% overall) contains_memory_report = 1 (47.54% in signature vs 08.26% overall) useragent_locale = ru (44.59% in signature vs 08.02% overall) is_garbage_collecting = 1
Have you seen something that might have caused this sort-of-regression since FF49? Thanks!
Can it be connected with the new queue?
Flags: needinfo?(amarchesini) → needinfo?(bkelly)
(In reply to Andrea Marchesini [:baku] from comment #2) > Can it be connected with the new queue? No. That only landed in FF51 AFAIK. These crashes are in FF49.
Though it's hard, wondering if we could get STR or regression-window?
baku, can you take a look here to see if anything stands out?
The reason why we crash is that, we dispatch a runnable from a child worker to a parent one. But the parent worker is gone, somehow. I've seen this crash before. At that time the reason was a wrong busyCount value in the parent worker.
Do you have ideas of how to proceed? Or someone who's familiar with these types of crashes?
I think I see a way to at least mitigate these.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
I'm fairly certain these are WorkerRunnable instances queued on the thread when we mark the worker Dead here: https://dxr.mozilla.org/mozilla-release/source/dom/workers/WorkerPrivate.cpp#4507 If that is the last JSContext ref dropped there then the GetCurrentThreadJSContext() may get a nullptr worker private here: https://dxr.mozilla.org/mozilla-release/source/dom/workers/RuntimeService.cpp#1483 We should check for nullptr there and probably in GetCurrentThreadWorkerGlobal() as well.
This makes us gracefully return nullptr instead of crashing. These methods can get nullptr if they are called during various stages of worker shutdown. For example, mScope is cleared during Killing and mContext is cleared when the worker is marked Dead. We can't guarantee, though, that there are not any orkerRunnable objects in the thread event queue. So its possible something will try to use these methods after shutdown. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3bfe118acfefe222c97c023c9c22c7c8d43770a
Attachment #8828847 - Flags: review?(amarchesini)
Attachment #8828847 - Flags: review?(amarchesini) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/df94a2bfc243 Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku
Comment on attachment 8828847 [details] [diff] [review] Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku Approval Request Comment [Feature/Bug causing the regression]: Workers [User impact if declined]: Low frequency crashes. [Is this code covered by automated tests?]: We have many tests, but they do not reliably trigger this failure. [Has the fix been verified in Nightly?]: Its a race condition so its hard to manually verify. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Minimal risk [Why is the change risky/not risky?]: This code is only adding nullptr checks to allow us to gracefully handle the shutdown of the worker thread during some racy code paths. [String changes made/needed]: None.
Comment on attachment 8828847 [details] [diff] [review] Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku fix crash on worker shutdown, aurora53+, beta52+
Setting qe-verify- per Comment 14.
(In reply to Ben Kelly [PTO, back late March][:bkelly] from comment #10) > I'm fairly certain these are WorkerRunnable instances queued on the thread > when we mark the worker Dead here: https://hg.mozilla.org/releases/mozilla-release/annotate/7d9973f1bba7/dom/workers/WorkerPrivate.cpp?revcount=240#l4507 > If that is the last JSContext ref dropped there then the > GetCurrentThreadJSContext() may get a nullptr worker private here: https://hg.mozilla.org/releases/mozilla-release/annotate/7d9973f1bba7/dom/workers/RuntimeService.cpp?revcount=240#l1483
(In reply to [:philipp] from comment #0) > 0 xul.dll mozilla::dom::workers::WorkerRunnable::Run() > dom/workers/WorkerRunnable.cpp:327 If this is 49, then it is https://hg.mozilla.org/releases/mozilla-release/annotate/416dc3163a1f/dom/workers/WorkerRunnable.cpp#l327
You need to log in before you can comment on or make changes to this bug.