Closed Bug 1315446 Opened 3 years ago Closed 3 years ago

Crash in mozilla::dom::workers::WorkerRunnable::Run

Categories

(Core :: DOM: Workers, defect, P3, critical)

49 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: philipp, Assigned: bkelly)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-dc523270-1352-47b4-a7bf-67bdd2161105.
=============================================================
Crashing Thread (33)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::dom::workers::WorkerRunnable::Run() 	dom/workers/WorkerRunnable.cpp:327
1 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1067
2 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:290
3 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:354
4 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:228
5 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:208
6 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:467
7 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
8 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
9 	ucrtbase.dll 	_crt_at_quick_exit 	
10 	kernel32.dll 	BaseThreadInitThunk 	
11 	ntdll.dll 	__RtlUserThreadStart 	
12 	ntdll.dll 	_RtlUserThreadStart

this signature has been around for a while but is particularly increasing in volume in firefox 49, so i'm marking it as a regression for that reason. overall it's rather mid- to low-volume though, as it's accounting for 0.05% of all browser crashes on the 49 release.

correlations:
(99.67% in signature vs 00.10% overall) address = 0x2d0
(99.67% in signature vs 36.74% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(60.00% in signature vs 10.61% overall) contains_memory_report = 1
(47.54% in signature vs 08.26% overall) useragent_locale = ru
(44.59% in signature vs 08.02% overall) is_garbage_collecting = 1
Have you seen something that might have caused this sort-of-regression since FF49? Thanks!
Flags: needinfo?(amarchesini)
Can it be connected with the new queue?
Flags: needinfo?(amarchesini) → needinfo?(bkelly)
(In reply to Andrea Marchesini [:baku] from comment #2)
> Can it be connected with the new queue?

No.  That only landed in FF51 AFAIK.  These crashes are in FF49.
Flags: needinfo?(bkelly)
Though it's hard, wondering if we could get STR or regression-window?
Priority: -- → P3
baku, can you take a look here to see if anything stands out?
Flags: needinfo?(amarchesini)
The reason why we crash is that, we dispatch a runnable from a child worker to a parent one. But the parent worker is gone, somehow. I've seen this crash before. At that time the reason was a wrong busyCount value in the parent worker.
Flags: needinfo?(amarchesini)
Do you have ideas of how to proceed? Or someone who's familiar with these types of crashes?
Flags: needinfo?(amarchesini)
Too late to fix in 50.1.0 release
I think I see a way to at least mitigate these.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(amarchesini)
I'm fairly certain these are WorkerRunnable instances queued on the thread when we mark the worker Dead here:

https://dxr.mozilla.org/mozilla-release/source/dom/workers/WorkerPrivate.cpp#4507

If that is the last JSContext ref dropped there then the GetCurrentThreadJSContext() may get a nullptr worker private here:

https://dxr.mozilla.org/mozilla-release/source/dom/workers/RuntimeService.cpp#1483

We should check for nullptr there and probably in GetCurrentThreadWorkerGlobal() as well.
Too late for FF51.
This makes us gracefully return nullptr instead of crashing.  These methods can get nullptr if they are called during various stages of worker shutdown.  For example, mScope is cleared during Killing and mContext is cleared when the worker is marked Dead.  We can't guarantee, though, that there are not any orkerRunnable objects in the thread event queue.  So its possible something will try to use these methods after shutdown.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3bfe118acfefe222c97c023c9c22c7c8d43770a
Attachment #8828847 - Flags: review?(amarchesini)
Attachment #8828847 - Flags: review?(amarchesini) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/df94a2bfc243
Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku
Comment on attachment 8828847 [details] [diff] [review]
Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku

Approval Request Comment
[Feature/Bug causing the regression]: Workers
[User impact if declined]: Low frequency crashes.
[Is this code covered by automated tests?]: We have many tests, but they do not reliably trigger this failure.
[Has the fix been verified in Nightly?]: Its a race condition so its hard to manually verify.
[Needs manual test from QE? If yes, steps to reproduce]:  No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal risk
[Why is the change risky/not risky?]: This code is only adding nullptr checks to allow us to gracefully handle the shutdown of the worker thread during some racy code paths.
[String changes made/needed]: None.
Attachment #8828847 - Flags: approval-mozilla-beta?
Attachment #8828847 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/df94a2bfc243
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment on attachment 8828847 [details] [diff] [review]
Avoid crashing if GetCurrentThreadJSContext() or GetCurrentThreadWorkerGlobal() are called after worker shutdown. r=baku

fix crash on worker shutdown, aurora53+, beta52+
Attachment #8828847 - Flags: approval-mozilla-beta?
Attachment #8828847 - Flags: approval-mozilla-beta+
Attachment #8828847 - Flags: approval-mozilla-aurora?
Attachment #8828847 - Flags: approval-mozilla-aurora+
Setting qe-verify- per Comment 14.
Flags: qe-verify-
(In reply to Ben Kelly [PTO, back late March][:bkelly] from comment #10)
> I'm fairly certain these are WorkerRunnable instances queued on the thread
> when we mark the worker Dead here:

https://hg.mozilla.org/releases/mozilla-release/annotate/7d9973f1bba7/dom/workers/WorkerPrivate.cpp?revcount=240#l4507

> If that is the last JSContext ref dropped there then the
> GetCurrentThreadJSContext() may get a nullptr worker private here:

https://hg.mozilla.org/releases/mozilla-release/annotate/7d9973f1bba7/dom/workers/RuntimeService.cpp?revcount=240#l1483
(In reply to [:philipp] from comment #0)
> 0 	xul.dll 	mozilla::dom::workers::WorkerRunnable::Run() 
> dom/workers/WorkerRunnable.cpp:327

If this is 49, then it is
https://hg.mozilla.org/releases/mozilla-release/annotate/416dc3163a1f/dom/workers/WorkerRunnable.cpp#l327
You need to log in before you can comment on or make changes to this bug.