Closed Bug 1270783 Opened 4 years ago Closed 4 years ago

Crash in `anonymous namespace''::CompileScriptRunnable::WorkerRun

Categories

(Core :: DOM: Workers, defect, critical)

47 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 + fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: philipp, Assigned: bzbarsky)

References

Details

(Keywords: crash, regression, Whiteboard: btpp-followup-2016-05-13)

Crash Data

Attachments

(1 file)

[Tracking Requested - why for this release]:
request tracking, since it is a recently regressing signature. it's currently at around #30 in beta with 0.23% of all crashes.

This bug was filed from the Socorro interface and is 
report bp-9a910ea0-5756-4fd6-9996-f47962160505.
=============================================================
Crashing Thread (1150)
Frame 	Module 	Signature 	Source
0 	xul.dll 	`anonymous namespace'::CompileScriptRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) 	dom/workers/WorkerPrivate.cpp:532
1 	xul.dll 	mozilla::dom::workers::WorkerRunnable::Run() 	dom/workers/WorkerRunnable.cpp:375
2 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:994
3 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:297
4 	xul.dll 	mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) 	dom/workers/WorkerPrivate.cpp:4520
5 	xul.dll 	`anonymous namespace'::WorkerThreadPrimaryRunnable::Run() 	dom/workers/RuntimeService.cpp:2632
6 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:994
7 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:297
8 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:326
9 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:227
10 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:201
11 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:396
12 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
13 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
14 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
15 	msvcr120.dll 	msvcr120.dll@0x2c000 	
16 	kernel32.dll 	BaseThreadInitThunk 	
17 	ntdll.dll 	__RtlUserThreadStart 	
18 	ntdll.dll 	_RtlUserThreadStart

This signature is showing up in pre-release builds since 47 and looks to be related to the work in bug 1249652.
Hmm.  We're crashing with a read at 0x24 == 36 in a 32-bit build on the second line of:

    JSAutoCompartment ac(aCx,
                         aWorkerPrivate->GlobalScope()->GetGlobalJSObject());

The scope is a WorkerGlobalScope which inherits directly from DOMEventTargetHelper which inherits directly from dom::EventTarget which inherits from nsIDOMEventTarget and then nsWrapperCache.

GetGlobalJSObject just returns GetWrapper, which just returns mWrapper once you inline everything.  mWrapper is at offset 0 in nsWrapperCache, so at offset 1 word (4 bytes in this case) inside WorkerGlobalScope; the one word is the vtable pointer.

GlobalScope() returns the mScope of the WorkerPrivate.  The stuff that comes before mScope is... well, there's a lot of it.  All the members of WorkerPrivateParent, for one.  Way more than 36 bytes.

So maybe the crash is really in the JSAutoCompartment bits.  That does ->compartment() on the given JSObject, which tries to dereference JSObject::group_.  But group_ is the first member of JSObject, so should be at offset 0.

I'm really not sure which thing is null here.  :(  Kyle, any idea?
Flags: needinfo?(khuey)
Whiteboard: btpp-followup-2016-05-13
I downloaded the minidump, the scope is null.

rv here is NS_ERROR_DOM_INVALID_STATE_ERR.  The handling in https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/dom/workers/WorkerPrivate.cpp#l508 seems kind of bogus (and yes, I reviewed it) because there are non-NS_BINDING_ABORTED exceptions thrown by C++ ...
Flags: needinfo?(khuey) → needinfo?(bzbarsky)
> I downloaded the minidump, the scope is null.

OK, thanks.  Still don't know where that 0x24 comes from.  I guess there are two vtable pointers (one for nsIDOMEventTarget and one for nsWrapperCache), but that would still put things at 0x8.

Anyway, I think the only place NS_ERROR_DOM_INVALID_STATE_ERR can come from is ScriptExecutorRunnable::ShutdownScriptLoader when aResult is false and mScriptLoader.mRv.Failed() is also false.  And as the comments there say, this can only happen if we totally fail to allocate the global in GetOrCreateGlobalScope() or if ScriptExecutorRunnable::Cancel gets called.

We could change the Cancel() case to use NS_BINDING_ABORTED.  But maybe the right thing here is that we should, in addition to explicitly ignoring all errors that are NS_BINDING_ABORTED, also explicitly ignore all errors when we have a null scope.  There's really not much we can do with them anyway, afaict.
Flags: needinfo?(bzbarsky)
Oh, and this is a regression from bug 1249652.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Comment on attachment 8751437 [details] [diff] [review]
If we end up erroring out of a worker main-script load before we get to creating a global for the worker, don't try to enter that global's compartment to throw an exception.  Just squelch it instead

Approval Request Comment
[Feature/regressing bug #]: Bug 1249652
[User impact if declined]: Crashes in some situations (e.g. when navigating away
   from a page while a worker is loading, I expect).
[Describe test coverage new/current, TreeHerder]: I haven't found a way to
   reproduce these situations reliably yet.
[Risks and why]: I think this is low-risk.  We basically avoid doing a null-deref
   in a situation where we don't have an expected global object.
[String/UUID change made/needed]: None.
Attachment #8751437 - Flags: approval-mozilla-beta?
Attachment #8751437 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/6ea22dfff1e3
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Comment on attachment 8751437 [details] [diff] [review]
If we end up erroring out of a worker main-script load before we get to creating a global for the worker, don't try to enter that global's compartment to throw an exception.  Just squelch it instead

Fix for a new crash that shows up only in Fx47, Beta47+, Aurora48+
Attachment #8751437 - Flags: approval-mozilla-beta?
Attachment #8751437 - Flags: approval-mozilla-beta+
Attachment #8751437 - Flags: approval-mozilla-aurora?
Attachment #8751437 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.