Crash in mozilla::dom::WindowBinding::get_onunload

RESOLVED FIXED in Firefox 54

Status

()

defect
--
critical
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: marcia, Assigned: bzbarsky)

Tracking

({crash})

53 Branch
mozilla55
x86
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53+ wontfix, firefox54+ fixed, firefox55+ fixed)

Details

(crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-22f19d31-7768-4f89-b1ba-0a0820170531.
=============================================================

Guessing on the component, please rebucket.

Seen while looking at the explosiveness report: https://crash-analysis.mozilla.com/release-mgmt/2017-05-31/2017-05-31.firefox.53.explosiveness.html

This seems to have spiked in 53.0.3 and does occur in 54 as well.

Comments:

I clicked a facebook notification icon and FF crached. 
watching facebook, screen blanked,sorry error message appeared. 
viewing facebook
Boris, do you know what could be happening here? (Sorry, I'm not sure who to ask about bindings things other than you and Peter and I rolled the dice)
Flags: needinfo?(bzbarsky)
The other person to ask is qdot.

Anyway, the relevant bit of get_onunload on the 53 release is this:

  RefPtr<EventHandlerNonNull> result(self->GetOnunload());
  MOZ_ASSERT(!JS_IsExceptionPending(cx));
  if (result) {
    args.rval().setObject(*GetCallbackFromCallbackObject(result));
    if (!MaybeWrapObjectOrNullValue(cx, args.rval())) {

where the crashing line is that MaybeWrapObjectOrNullValue call.

Now the thing is... GetCallbackFromCallbackObject can return null.  And clearly the MaybeWrapObjectOrNullValue is expecting that the value might be null (though this is more or less accidental in this case).  But the setObject() call and its dereference of its argument are assuming null was not returned.  

Now it used to be the case that GetCallbackFromCallbackObject could not return null, until bug 1273251 was fixed.  So this is in fact a regression in Firefox 53.

The proximate fix is to use setObjectOrNull here.  But it looks to me like in the new setup of bug 1273251 we should disallow non-nullable callback-typed return values, because we _can_ in fact return null from any such thing if the callback has been nuked.
Blocks: 1273251
Flags: needinfo?(bzbarsky)
Er, I meant to cc folks from bug 1273251.
> we should disallow non-nullable callback-typed return values

This is actually somewhat of a PITA, I expect.  In particular, it means that we can't have non-nullable callback-typed arguments to JS-implemented stuff, right?  Because there's absolutely nothing that prevents them from being nuked after the CallbackObject is created...

Similarly, it's not clear to me what the comment on CallbackObject::CallbackOrNull (see <http://searchfox.org/mozilla-central/rev/972b7a5149bbb2eaab48aafa87678c33d5f2f045/dom/bindings/CallbackObject.h#87-89>) means.  The callee it's talking about generally doesn't control whether script got to run after the CallbackObject was created!
Flags: needinfo?(peterv)
Flags: needinfo?(kmaglione+bmo)
Hrm. So if bug 1273251 is responsible, that means that an extension is setting a cross-compartment onunload property and then getting nuked. Which extensions shouldn't be doing (for a variety of reasons), so we could consider disallowing it.

(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #4)
> This is actually somewhat of a PITA, I expect.  In particular, it means that
> we can't have non-nullable callback-typed arguments to JS-implemented stuff,
> right?  Because there's absolutely nothing that prevents them from being
> nuked after the CallbackObject is created...

We could just prevent them being set to values that belong to compartments that can be nuked.

> Similarly, it's not clear to me what the comment on
> CallbackObject::CallbackOrNull (see
> <http://searchfox.org/mozilla-central/rev/
> 972b7a5149bbb2eaab48aafa87678c33d5f2f045/dom/bindings/CallbackObject.h#87-
> 89>) means.  The callee it's talking about generally doesn't control whether
> script got to run after the CallbackObject was created!

It's particularly referring to the few places where we take a CallbackObject as an argument to a binding method and immediately retrieve the callback function from it. In those cases, there's no chance for the object to be nuked between the time the CallbackObject is created and the time the function is retrieved.

I suppose it may be just about possible for the binding code to trigger a getter with synchronously nukes a sandbox, but that would have to be done by a chrome caller, intentionally, and even then I'm not sure it would be possible.
Flags: needinfo?(kmaglione+bmo)
> We could just prevent them being set to values that belong to compartments that can be nuked.

Are we sure this doesn't include APIs that extensions might actually want to use legitimately?

> In those cases, there's no chance for the object to be nuked between the time the CallbackObject is created and the time the function is retrieved.

I don't believe this statement is true.

> I suppose it may be just about possible for the binding code to trigger a getter with synchronously nukes a sandbox

It's possible for the binding code to trigger a getter which spins the event loop (e.g. via alert()) and then while that's happening chrome code happens to nuke a sandbox, etc.

Which APIs/codepaths are currently depending on this "CallbackOrNull won't return null in this situation" bit?  getWrapObjectBody for CGJSImplClass does, but that "callback" is the chrome-side impl of the object, so isn't subject to nuking, so I _think_that code is safe.  Is there anyone else?
Flags: needinfo?(kmaglione+bmo)
I spun off bug 1369533 on the "we should disallow non-nullable callback-typed return values" issue.
Blocks: 1369533
(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #7)
> > We could just prevent them being set to values that belong to compartments that can be nuked.
> 
> Are we sure this doesn't include APIs that extensions might actually want to
> use legitimately?

It might, but it would only be an issue for cross-compartment setters. For event listener properties, they can (and should) use addEventListener instead, especially when adding listeners to objects in content compartments. If there are other cases where this would apply, I can't think of them.

> > In those cases, there's no chance for the object to be nuked between the time the CallbackObject is created and the time the function is retrieved.
> 
> I don't believe this statement is true.
> 
> > I suppose it may be just about possible for the binding code to trigger a getter with synchronously nukes a sandbox
> 
> It's possible for the binding code to trigger a getter which spins the event
> loop (e.g. via alert()) and then while that's happening chrome code happens
> to nuke a sandbox, etc.

I suppose that's true. If some content script gets a chance to spin the event loop while a callback function from an add-on compartment is being passed to a DOM method, and that add-on compartment is nuked before the time it's finished, it may be possible for it to be unexpectedly null by the time it's accessed.

At the time I wrote this, at least, though, all of the affected functions took callbacks as their last arguments, and there wasn't a chance for any getters to run after they were created.

> Which APIs/codepaths are currently depending on this "CallbackOrNull won't
> return null in this situation" bit?  getWrapObjectBody for CGJSImplClass
> does, but that "callback" is the chrome-side impl of the object, so isn't
> subject to nuking, so I _think_that code is safe.  Is there anyone else?

At this point, it looks like just some event listener setters, InstallTrigger, and a few WebRTC methods.
Flags: needinfo?(kmaglione+bmo)
> but it would only be an issue for cross-compartment setters

I'm not sure what you mean.  If an extensions tries to use webrtc on a web page right now, it will be passing its callbacks through a JS-implemented API that exercises the "convert this secretly nullable thing to JS" codepaths I'm worried about.  I haven't done an exhaustive audit of other APIs that might also be affected.

> At the time I wrote this, at least, though, all of the affected functions took callbacks as their last arguments

Um... addEventListener hasn't done that ever, and in particular has had a dictionary argument after the callback ever since Firefox 49.

> At this point, it looks like just some event listener setters, InstallTrigger, and a few WebRTC methods.

Can you point me to where those assumptions are made?

Also happy to move this part of the conversation to a separate bug...
Flags: needinfo?(kmaglione+bmo)
[Tracking Requested - why for this release]: Crash regression
(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #10)
> > but it would only be an issue for cross-compartment setters
> 
> I'm not sure what you mean.  If an extensions tries to use webrtc on a web
> page right now, it will be passing its callbacks through a JS-implemented
> API that exercises the "convert this secretly nullable thing to JS"
> codepaths I'm worried about.  I haven't done an exhaustive audit of other
> APIs that might also be affected.

There are two cases here: 1) We store the CallbackObject and the value in the stored callback eventually becomes null. 2) We immediately retrieve the callback function, expecting it to always be non-null, and store that rather than storing the CallbackObject.

Event listener properties are case 1, WebRTC is case 2 (at least in the instances that I'm aware of).

> > At the time I wrote this, at least, though, all of the affected functions took callbacks as their last arguments
> 
> Um... addEventListener hasn't done that ever, and in particular has had a
> dictionary argument after the callback ever since Firefox 49.

addEventListener isn't affected by this. It stores the CallbackObject directly, null checks where necessary, and doesn't expose stored listeners via any web-accessible API. We do expose them via some internal APIs, but we null check and remove dead listeners in those cases.

> > At this point, it looks like just some event listener setters, InstallTrigger, and a few WebRTC methods.
> 
> Can you point me to where those assumptions are made?
> 
> Also happy to move this part of the conversation to a separate bug...

I can give you a complete list in a separate bug, but for example, for install trigger, the CallbackObject is generated:

http://searchfox.org/mozilla-central/source/__GENERATED__/dom/bindings/InstallTriggerBinding.cpp#679-704

And then accessed:

http://searchfox.org/mozilla-central/source/__GENERATED__/dom/bindings/InstallTriggerBinding.cpp#1363

The WebRTC cases are similar, but don't involve JS-implemented bindings.
Flags: needinfo?(kmaglione+bmo)
> There are two cases here:

Yes, agreed.  I'm talking about case 2 here.

The auto-generated callsites for JS-implemented stuff like InstallTrigger and WebRTC (which does involve JS-implemented bindings, last I checked) will be fixed by a combination of the fix in this bug and fixes in bug 
1369533, hopefully...
Oh, and thinking about this further, it's not actually possible for a CallbackObjects generated for most binding methods to become null before binding method returns, since we only ever clear the callback function references when traced by the cycle collector, and in the non-JS-implemented binding cases, we manually root the CallbackObjects, and only call HoldJSObjects if it's still held after the binding method returns.
Ah, indeed.  That is a very good point!

We could still have issues with a CallbackObject in a sequence/dictionary/union, possibly.
Tracking this crash regression for all branches based on Comment 11.
We're a week away from releasing 54 so I don't think we need to track this for 53. It's still possible to uplift a patch to 54 if we think it's worth the last minute risk.
The spot fix patch is _very_ safe.

Given that Peter is not responding, qdot, would you feel ok reviewing this?
Flags: needinfo?(kyle)
Comment on attachment 8873616 [details] [diff] [review]
Spot fix for the crash, leaving the bigger problem still open

Review of attachment 8873616 [details] [diff] [review]:
-----------------------------------------------------------------

Read through the bug comments thread, and I agree that the spotfix seems safe and should at least do the job for the moment..
Attachment #8873616 - Flags: review+
Flags: needinfo?(kyle)
Comment on attachment 8873616 [details] [diff] [review]
Spot fix for the crash, leaving the bigger problem still open

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1273251
[User impact if declined]: Crashes.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: Presumably need an
   extension that sets onunload (or some other event handler) and then gets
   disabled....
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: It effectively just adds a null-check in
   a place where null can be expected.
[String changes made/needed]: None
Attachment #8873616 - Flags: review?(peterv)
Attachment #8873616 - Flags: approval-mozilla-release?
Attachment #8873616 - Flags: approval-mozilla-beta?

Comment 21

2 years ago
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8aee9eb0b349
DOM callbacks can now store a null object; teach codegen to handle that.  r=qdot
https://hg.mozilla.org/mozilla-central/rev/8aee9eb0b349
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8873616 [details] [diff] [review]
Spot fix for the crash, leaving the bigger problem still open

Fix a crash. Beta54+. Should be in 54 beta RC1.
Attachment #8873616 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee: nobody → bzbarsky
Comment on attachment 8873616 [details] [diff] [review]
Spot fix for the crash, leaving the bigger problem still open

As the 54 was merged to m-r and mark 53 won't fix, remove the approval‑mozilla‑release flag.
Attachment #8873616 - Flags: approval-mozilla-release? → approval-mozilla-release-
I was unable to reproduce this crash on a few affected builds (Nightly 55.0a1 2017-06-01 x86 and 53.0.3
Build ID 20170518000419) on the following OSes:
- Windows 10 x64
- macOS 10.12.5
- Ubuntu 16.04 x64

I've tried to follow the STR provided on comment 0, but FF did not crashed on Facebook when clicking notifications. So, I cannot confirm if this crash is fixed or not, since I wasn't able to reproduce it.
Just to be clear, reproducing this involves using some extensions (which ones?) and then getting unlucky with the timing of extension unloading and page unloading and GC all interacting.

So lack of ability to reproduce in a clean profile means absolutely nothing; this bug is not reproducible in a clean profile to start with.
Flags: needinfo?(peterv)
You need to log in before you can comment on or make changes to this bug.