Closed Bug 1291535 Opened 4 years ago Closed 11 days ago

Crash in xpc::NativeGlobal

Categories

(Core :: JavaScript Engine, defect, critical)

Unspecified
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 --- fixed
firefox64 --- wontfix
firefox65 --- wontfix
firefox72 --- wontfix
firefox73 --- fixed
firefox74 --- fixed

People

(Reporter: ting, Assigned: jonco)

References

(Regression, )

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-0e195650-7323-4eb6-8741-654022160802.
=============================================================
#9 with 0731 Nightly on Windows, 18 crashes from 14 installations.
20 crashes for 0801 nightly.

Stack:

xpc::NativeGlobal
`anonymous namespace'::MessageEventRunnable::DispatchDOMEvent
`anonymous namespace'::MessageEventRunnable::WorkerRun
mozilla::dom::workers::WorkerRunnable::Run
nsThread::ProcessNextEvent
NS_ProcessNextEvent
mozilla::dom::workers::WorkerPrivate::DoRunLoop
`anonymous namespace'::WorkerThreadPrimaryRunnable::Run
nsThread::ProcessNextEvent
NS_ProcessNextEvent
mozilla::ipc::MessagePumpForNonMainThreads::Run
MessageLoop::RunHandler
MessageLoop::Run
nsThread::ThreadFunc
...

I looked at the minidump and find native is 0 at https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/js/xpconnect/wrappers/WrapperFactory.cpp#l633 so we are likely getting the new native from https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/js/xpconnect/wrappers/WrapperFactory.cpp#l642

We crashed trying to dereference native (address 0xfffc000030828d40)

Andrea, Boris, any idea?
Flags: needinfo?(bzbarsky)
Flags: needinfo?(amarchesini)
> and find native is 0 at https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/js/xpconnect/wrappers/WrapperFactory.cpp#l633

Uh... That is quite odd.  That means the global is not a Web IDL global.  But I believe the only non-webidl globals we have are messagemanager globals.  And I somewhat doubt those are doing worker stuff.

If you have a minidump and can try debugging this a bit, I'd be interested in the following pieces of information:

1)  In `anonymous namespace'::MessageEventRunnable::DispatchDOMEvent, what is this->mBehavior?

2)  In xpc::NativeGlobal, what are obj->getClass()->name and obj->getClass()->flags?  If you have to just walk data structures and can't call functions, I believe you can replace obj->getClass() with reinterpret_cast<const js::shadow::Object*>(obj)->group->clasp to get this info.

Between those bits of info, we should at least know what sort of thing obj is or might be.
Flags: needinfo?(bzbarsky) → needinfo?(kchen)
(In reply to Boris Zbarsky [:bz] from comment #2)
> > and find native is 0 at https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/js/xpconnect/wrappers/WrapperFactory.cpp#l633
> 
> Uh... That is quite odd.  That means the global is not a Web IDL global. 
> But I believe the only non-webidl globals we have are messagemanager
> globals.  And I somewhat doubt those are doing worker stuff.
> 
> If you have a minidump and can try debugging this a bit, I'd be interested
> in the following pieces of information:
> 
> 1)  In `anonymous namespace'::MessageEventRunnable::DispatchDOMEvent, what
> is this->mBehavior?
> 
> 2)  In xpc::NativeGlobal, what are obj->getClass()->name and
> obj->getClass()->flags?  If you have to just walk data structures and can't
> call functions, I believe you can replace obj->getClass() with
> reinterpret_cast<const js::shadow::Object*>(obj)->group->clasp to get this
> info.
> 
> Between those bits of info, we should at least know what sort of thing obj
> is or might be.

Unfortunately the minidump does not contain any heap data so I can't look into those object members. However we know that mBehavior != ParentThreadUnchangedBusyCount because we run to here: https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/dom/workers/WorkerPrivate.cpp#l832

Also according to https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/dom/workers/WorkerPrivate.cpp#l701 this is a worker without window.

The crash I'm looking at is bp-6192e0d1-5e2c-46f2-ae70-b6a182160803

About 12 crash reports has .pdf URL so might be related.
Flags: needinfo?(kchen) → needinfo?(bzbarsky)
See Also: → 1291190
> because we run to here: https://hg.mozilla.org/mozilla-unified/annotate/465d150bc8be/dom/workers/WorkerPrivate.cpp#l832

OK.  So we're not making the call on line 826?  In that case sounds like we're running on the worker thread, ok.

In that case, having aTarget->GetParentObject() is completely expected.  The "worker without a window" comment is talking about the case when aTarget == aWorkerPrivate, which would come through line 826.

But then, modulo memory corruption or something like that, JS::CurrentGlobalOrNull(aCx) should be the global of the worker (or _maybe_ the worker debugger, though that seems unlikely based on the stack aboce) which should certainly not return null from UnwrapDOMObjectToISupports, since WorkerGlobalScope and WorkerDebuggerGlobalScope quite clearly inherit from nsISupports.
Flags: needinfo?(bzbarsky)
As seen in bug 1291190, there could be some memory corruption-y issue here, as some users are getting a bunch of different crashes.
Memory corruption could cause this pretty easily, fwiw.  UnwrapDOMObjectToISupports will return null if the JSClass doesn't have JSCLASS_IS_DOMJSCLASS in its flags or if mDOMObjectIsISupports is false in the DOMJSclass.  So a single memory bitflip can make the whole thing go off the rails...
A fix for bug 1291190 landed in probably the 8-7 Nightly, and this crash persists, so I guess I was wrong about this being related.
See Also: 1291190
See Also: → 1286904
Crash volume for signature 'xpc::NativeGlobal':
 - nightly (version 51): 97 crashes from 2016-08-01.
 - aurora  (version 50): 195 crashes from 2016-08-01.
 - beta    (version 49): 12 crashes from 2016-08-02.
 - release (version 48): 4 crashes from 2016-07-25.
 - esr     (version 45): 9 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       4      13      78
 - aurora       69      96      23
 - beta          5       6       0
 - release       2       0       1
 - esr           0       0       0

Affected platforms: Windows, Mac OS X

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly           #321
 - aurora            #28
 - beta    #4633
 - release #5598
 - esr
Looks like 1291190 is still on going.
Kan-Ru, what should we do here? It looks like a low but non-zero volume of these crashes persists :/
Flags: needinfo?(amarchesini) → needinfo?(kchen)
Priority: -- → P3
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
There are still some crashes so reopen it.

Bughunter can reproduce on Windows with http://auravault.com/

Firefox 72.0.1 : bp-d3426af3-4b37-466b-bf98-e8ec40200109
Nightly opt crashes but it fails to submit the crash report.

Bob, thank you! That site reproduces for me in a debug m-c build as well, and I have an rr trace. Debugging now.

So the basic issue (and for my future reference this is the "firefox-860" rr trace locally) is that we are doing this in ScriptLoader::FinishDynamicImport:

989       MOZ_ALWAYS_TRUE(jsapi.Init(aRequest->mDynamicPromise));

but aRequest->mDynamicPromise is null. In a debug build this crashes when it tries to assert !js::IsCrossCompartmentWrapper at the top of AutoJSAPI::Init, while in an opt build it would then pass null to xpc::NativeGlobal and crash somewhere in there. Crashing on the obj = JS::GetNonCCWObjectGlobal(obj); line totally makes sense in that case. Note that this is not the same crash location as earlier in the bug, but maybe crashreporter was messing things up back then.

Anyway, looking at the reports in crashreporter, all the recent ones are coming via ScriptLoader::FinishDynamicImport. That code dates back to bug 1342012, which is much newer than this bug report. Chances are, a new bug should have been filed instead of reopening this one, but here we are...

Anyway, In ScriptLoader::FinishDynamicImport we have aResult == NS_BINDING_ABORTED, so the dynamic import got canceled (e.g. by the user hitting stop, or navigating away from the page or whatever).

And the reason the promise is null is that we cleared it out by calling FinishDynamicImport from ScriptLoader::ParsingComplete, because aTerminated was true. So we're ending up calling FinishDynamicImport twice for the same ModuleLoadRequest and that blows up.

Presumably we should either stop canceling things in ScriptLoader::ParsingComplete or handle the case when they were canceled there in later callsites to FinishDynamicImport or in FinishDynamicImport itself.

Component: XPConnect → JavaScript Engine
Flags: needinfo?(kanru) → needinfo?(jcoppeard)
Priority: P3 → --
Regressed by: 1342012

e.g. by the user hitting stop, or navigating away from the page or whatever

And since I know I did none of those things, more likely by the iframe involved being removed from the DOM or the page calling window.stop() on it. Which happened before parsing completed, given the rest of the debugging info.

https://pernos.co/debug/R5l4wBNaRDUpvwG4zaWn4w/index.html#f{m[BjdG,AQsm_,t[iw,Q6E_,f{e[BjdG,AQsZ_,s{abG1yFwAA,bAYk,oD/ixFA,uD+1UwA___ is this crash in pernosco: the two notebook entries correspond to the two calls to ScriptLoader::FinishDynamicImport

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #18)
Thanks for debugging this.

I think we need to clear the list of dynamic import requests in ScriptLoader::ParsingComplete when aTerminated is true, like we do for all the other kinds of requests.

I wasn't able to write a crashtest for this. Using window.stop() works, but makes the test harness hang waiting for the load event to fire. I couldn't get a version to work using iframes.

Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)

I think we need to clear the list of dynamic import requests in ScriptLoader::ParsingComplete when aTerminated is true, like we do for all the other kinds of requests.

It's not entirely clear to me that we really need to clear anything there, but OK.

Using window.stop() works, but makes the test harness hang waiting for the load event to fire

Did you call stop() on a toplevel window, or on a subframe that contained the dynamic module load?

I couldn't get a version to work using iframes.

The crashing site is crashing because it navigates a subframe with a pending dynamic module load, right? Did that not reproduce the crash in our test harness?

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #21)
It turns out I didn't understand how to do iframes properly. window.stop() in an iframe does work. I'll update the patch.

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/260d440770c6
Clear the list of dynamic import requests after cancelling them in ScriptLoader::ParsingComplete as we do for other requests r=bzbarsky
Status: REOPENED → RESOLVED
Closed: 1 year ago11 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Initial Nightly data is looking good so far. What do you think about nominating this for Beta and ESR approval, Jon?

Flags: needinfo?(jcoppeard)
Flags: in-testsuite+

Comment on attachment 9120445 [details]
Bug 1291535 - Clear the list of dynamic import requests after cancelling them in ScriptLoader::ParsingComplete as we do for other requests r?bzbarsky

Beta/Release Uplift Approval Request

  • User impact if declined: Possible crash when using import() in some situations.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is a small and simple change and is covered by tests.
  • String changes made/needed:

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This fixes a crash.
  • User impact if declined: Possible crash when using import() in some situations.
  • Fix Landed on Version: 74
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is a small and simple change and is covered by tests.
  • String or UUID changes made by this patch:
Flags: needinfo?(jcoppeard)
Attachment #9120445 - Flags: approval-mozilla-esr68?
Attachment #9120445 - Flags: approval-mozilla-beta?

Comment on attachment 9120445 [details]
Bug 1291535 - Clear the list of dynamic import requests after cancelling them in ScriptLoader::ParsingComplete as we do for other requests r?bzbarsky

Fixes a lingering crash with dynamic module imports. Approved for 73.0b8 and 68.5esr.

Attachment #9120445 - Flags: approval-mozilla-esr68?
Attachment #9120445 - Flags: approval-mozilla-esr68+
Attachment #9120445 - Flags: approval-mozilla-beta?
Attachment #9120445 - Flags: approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.