Closed Bug 1270783 Opened 4 years ago Closed 4 years ago
Crash in `anonymous namespace''::Compile
Script Runnable::Worker Run
[Tracking Requested - why for this release]: request tracking, since it is a recently regressing signature. it's currently at around #30 in beta with 0.23% of all crashes. This bug was filed from the Socorro interface and is report bp-9a910ea0-5756-4fd6-9996-f47962160505. ============================================================= Crashing Thread (1150) Frame Module Signature Source 0 xul.dll `anonymous namespace'::CompileScriptRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) dom/workers/WorkerPrivate.cpp:532 1 xul.dll mozilla::dom::workers::WorkerRunnable::Run() dom/workers/WorkerRunnable.cpp:375 2 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:994 3 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:297 4 xul.dll mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) dom/workers/WorkerPrivate.cpp:4520 5 xul.dll `anonymous namespace'::WorkerThreadPrimaryRunnable::Run() dom/workers/RuntimeService.cpp:2632 6 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:994 7 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:297 8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:326 9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:227 10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:201 11 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:396 12 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 13 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 14 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376 15 msvcr120.dll msvcr120.dll@0x2c000 16 kernel32.dll BaseThreadInitThunk 17 ntdll.dll __RtlUserThreadStart 18 ntdll.dll _RtlUserThreadStart This signature is showing up in pre-release builds since 47 and looks to be related to the work in bug 1249652.
Hmm. We're crashing with a read at 0x24 == 36 in a 32-bit build on the second line of: JSAutoCompartment ac(aCx, aWorkerPrivate->GlobalScope()->GetGlobalJSObject()); The scope is a WorkerGlobalScope which inherits directly from DOMEventTargetHelper which inherits directly from dom::EventTarget which inherits from nsIDOMEventTarget and then nsWrapperCache. GetGlobalJSObject just returns GetWrapper, which just returns mWrapper once you inline everything. mWrapper is at offset 0 in nsWrapperCache, so at offset 1 word (4 bytes in this case) inside WorkerGlobalScope; the one word is the vtable pointer. GlobalScope() returns the mScope of the WorkerPrivate. The stuff that comes before mScope is... well, there's a lot of it. All the members of WorkerPrivateParent, for one. Way more than 36 bytes. So maybe the crash is really in the JSAutoCompartment bits. That does ->compartment() on the given JSObject, which tries to dereference JSObject::group_. But group_ is the first member of JSObject, so should be at offset 0. I'm really not sure which thing is null here. :( Kyle, any idea?
I downloaded the minidump, the scope is null. rv here is NS_ERROR_DOM_INVALID_STATE_ERR. The handling in https://hg.mozilla.org/mozilla-central/annotate/043082cb7bd8/dom/workers/WorkerPrivate.cpp#l508 seems kind of bogus (and yes, I reviewed it) because there are non-NS_BINDING_ABORTED exceptions thrown by C++ ...
Flags: needinfo?(khuey) → needinfo?(bzbarsky)
> I downloaded the minidump, the scope is null. OK, thanks. Still don't know where that 0x24 comes from. I guess there are two vtable pointers (one for nsIDOMEventTarget and one for nsWrapperCache), but that would still put things at 0x8. Anyway, I think the only place NS_ERROR_DOM_INVALID_STATE_ERR can come from is ScriptExecutorRunnable::ShutdownScriptLoader when aResult is false and mScriptLoader.mRv.Failed() is also false. And as the comments there say, this can only happen if we totally fail to allocate the global in GetOrCreateGlobalScope() or if ScriptExecutorRunnable::Cancel gets called. We could change the Cancel() case to use NS_BINDING_ABORTED. But maybe the right thing here is that we should, in addition to explicitly ignoring all errors that are NS_BINDING_ABORTED, also explicitly ignore all errors when we have a null scope. There's really not much we can do with them anyway, afaict.
Oh, and this is a regression from bug 1249652.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #8751437 - Flags: review?(khuey) → review+
Comment on attachment 8751437 [details] [diff] [review] If we end up erroring out of a worker main-script load before we get to creating a global for the worker, don't try to enter that global's compartment to throw an exception. Just squelch it instead Approval Request Comment [Feature/regressing bug #]: Bug 1249652 [User impact if declined]: Crashes in some situations (e.g. when navigating away from a page while a worker is loading, I expect). [Describe test coverage new/current, TreeHerder]: I haven't found a way to reproduce these situations reliably yet. [Risks and why]: I think this is low-risk. We basically avoid doing a null-deref in a situation where we don't have an expected global object. [String/UUID change made/needed]: None.
Comment on attachment 8751437 [details] [diff] [review] If we end up erroring out of a worker main-script load before we get to creating a global for the worker, don't try to enter that global's compartment to throw an exception. Just squelch it instead Fix for a new crash that shows up only in Fx47, Beta47+, Aurora48+
You need to log in before you can comment on or make changes to this bug.