Closed Bug 1328861 (CVE-2017-5401) Opened 7 years ago Closed 7 years ago

Crash in arena_dalloc | mozilla::binding_danger::TErrorResult<T>::ClearDOMExceptionInfo

Categories

(Core :: DOM: Core & HTML, defect)

50 Branch
x86_64
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla54
Tracking Status
firefox-esr45 52+ verified
firefox51 + wontfix
firefox52 + verified
firefox-esr52 52+ verified
firefox53 + verified
firefox54 + verified

People

(Reporter: devel, Assigned: bzbarsky)

References

Details

(5 keywords, Whiteboard: [post-critsmash-triage][adv-main52+][adv-esr45.8+])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-57766b22-bc36-42be-b8b0-e60982170105.
=============================================================

This is a blank Firefox profile I created to create a clean report of this bug. Steps to reproduce (happens every time):

1. Install Zotero for Firefox 4.0.29.16 from https://www.zotero.org/download/
2. Install YesScript 2.2 from addons.mozilla.org
3. Restart Firefox
4. Go to www.google.com
5. Click the YesScript blacklist button
6. Press F5 to refresh page
7. Firefox crashes.

Platform: Ubuntu 16.04 64-bit. Firefox 50.1.0.

Cross-reported to both YesScript and Zotero:

* https://github.com/JasonBarnabe/yesscript/issues/13
* https://forums.zotero.org/discussion/63733/854545803-firefox-crashes-when-addons-yesscript-and-zotero-are-both-installed

Not sure I would classify this as critical severity, but that is what the reporting tool defaulted to.
I tried Nightly on Win 7, I can't reproduce the crash.
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0

I have tested this issue on latest Firefox release v50.1.0 and latest Firefox Beta (v51.0b12) and the issue is reproducible.
Also, I have tried to reproduce the issue on latest Nightly (Build ID: 20170109030209) and latest Dev Edition and couldn't reproduce it. I did a find fix range with mozregression and reached the following pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=10daeea4d322f4d396538ebac11b4bfdd5f5a798&tochange=578dfd6dba5e20d87b7642acef8d093d8be8e268 

It seems the fix was made in bug 1293305. The fix will probably reach to Firefox release version in v52.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Devel, could you download Fiefox DE and confim it's fixed too?
https://www.mozilla.org/en-US/firefox/developer/all/
Depends on: 1293305
Flags: needinfo?(devel)
I've tested with e10s ON and OFF. On Nightly versions e10s remains ON even if the add-ons are installed. On the release version e10s is disabled when installing add-ons. 

The issue is reproducible on Nightly with e10s OFF. When testing try both scenarios, e10s ON and e10s OFF.
It's all fine in 52.0.a2. No crashes and both addons work after preliminary testing. According to Add-on Compatibility Reporter, "multiprocess is enabled". Thanks for looking into this bug! I'm looking forward to when 52 gets released as stable.
Flags: needinfo?(devel)
I can still reproduce this crash in Beta 52 and Aurora 53. You don't need to install the YesScript extension. You can just flip the "javascript.enabled" pref to false, install the "Zotero for Firefox" extension, then load any web page.

@ Nick, could this crash be a regression from your CC bug 1261106?

I bisected this crash to the following regression range. Your commit looks suspicious because it deals with JS memory management.

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c4f86ad95de8bd9bb32727f7221e5d57e6ec1bfd&tochange=7db4faf3e32b32c9862c92d895ac5a014b29738f

Here is one of my crash reports:

https://crash-stats.mozilla.com/report/index/6f90b7df-5c4e-4e28-975c-db3b02170209
Status: RESOLVED → REOPENED
Crash Signature: [@ arena_dalloc | mozilla::binding_danger::TErrorResult<T>::ClearDOMExceptionInfo] → [@ arena_dalloc | mozilla::binding_danger::TErrorResult<T>::ClearDOMExceptionInfo] [@ ReleaseData ]
Ever confirmed: true
Flags: needinfo?(nfitzgerald)
Keywords: crash, reproducible
Resolution: WORKSFORME → ---
btw, I can't reproduce this crash in Nightly 54. I bisected the possible fix range to this (unfortunately quite long) pushlog on the autoland repo:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=352cd8e264990a86fb486088e0a83b20aa9a1274&tochange=27eef31f9434af59344a313481ada7366c151f59
(In reply to Chris Peterson [:cpeterson] from comment #6)
> @ Nick, could this crash be a regression from your CC bug 1261106?

Every patch that landed in that bug was backed out before getting into a nightly, so I am pretty sure it isn't that bug's patches.

A bit of an aside, but I concluded that shutdown interaction between the CC and GC is so borked and complicated that the investment required to fix it isn't worth the pay off :-/
Flags: needinfo?(nfitzgerald)
Boris, could this crash be a regression from your fix for "Provide better error messages when CallSetup fails" bug 1269400? It is in the regression range and touches DOM bindings code, which is in the neighborhood of this ClearDOMExceptionInfo() crash.
Flags: needinfo?(bzbarsky)
Hmm.  I dug into this a bit in bug 1313340 (of which this seems to be a duplicate).

But yes, this could be related to bug 1269400, especially given the steps to reproduce from comment 6, which would in fact trigger the modified codepath (due to disabling script).

Having steps to reproduce is huge.  Let me spin up a debug beta build and take a look!
Blocks: 1269400
Flags: needinfo?(bzbarsky)
Yes, this is totally a regression from bug 1269400.  A debug build makes it pretty obvious what happens.  The sequence of events is as follows:

1)  document.evaluate is called.  How is it called if script is disabled, you might ask?  Extensions, of course.  I did not debug the details, past noting that this call is coming from a sandbox with some web page principal attached; we call into that sandbox via Xrays.  Presumably whatever codepath evals in sandbox doesn't check for script-enabled?  In any case, this function takes an XPathNSResolver? argument, which is a WebIDL callback.

2)  We call XPathEvaluator::Evaluate and thence XPathEvaluator::CreateExpression.  We're passing along that callback argument.

3)  We stash the callback in an XPathEvaluatorParseContext and call CreateExpression.

4)  While parsing the thing we were passed, we call the callback to do that namespace-resolving thing it's supposed to do.

5)  The callback function comes from that same sandbox, as far as I can tell.  We deny the call to it, because we decide that the script is disabled for this sandbox (via xpc::Scriptability::Get(realCallback).Allowed()).  This seems weird; I filed bug 1338313 on that.

6)  We hit the code added in bug 1269400 and ThrowDOMException() on the ErrorResult.

7)  We unwind to XPathEvaluatorParseContext::resolveNamespacePrefix which has code like this:

   217          ErrorResult rv;
   218          mResolver->LookupNamespaceURI(prefix, ns, rv);
   219          if (rv.Failed()) {
   220              return rv.StealNSResult();

Note that in this case it returns NS_ERROR_DOM_DOMEXCEPTION.  This is all sort of ok so far, but then...

8) We keep unwinding to XPathEvaluator::CreateExpression where we do this:

   150      aRv = txExprParser::createExpr(PromiseFlatString(aExpression), aContext,
   151                                     getter_Transfers(expression));
   152      if (aRv.Failed()) {
   153          if (!aRv.ErrorCodeIs(NS_ERROR_DOM_NAMESPACE_ERR)) {
   154              aRv.SuppressException();
   155              aRv.Throw(NS_ERROR_DOM_INVALID_EXPRESSION_ERR);
   156          }

where aRv is an ErrorResult.  Keep in mind that NS_ERROR_DOMEXCEPTION is what got returned.  This, of course, asserts in TErrorResult::AssignErrorCode from that assignment to aRv:

    MOZ_ASSERT(aRv != NS_ERROR_DOM_DOMEXCEPTION, "Use ThrowDOMException()");

but then goes ahead and sets mResult = aRv.  Then aRv.Failed() tests true, aRv.ErrorCodeIs(NS_ERROR_DOM_NAMESPACE_ERR) tests false, so we call aRv.SuppressException().  This tries to do cleanup based on mResult being NS_ERROR_DOM_DOMEXCEPTION, which involves "delete mDOMExceptionInfo;"... but mDOMExceptionInfo is pointing to random memory (because we don't actually have anything in our union; if we got this far in a debug build without failing the earlier asserts we'd fail the mUnionState assert in TErrorResultClearDOMExceptionInfo).  Typically we crash as a result.

OK, so there are several possible fixes:

1)  Make TErrorResult::AssignErrorCode safer and have it sanitize any of the "disallowed" values (specifically NS_ERROR_TYPE_ERR, NS_ERROR_RANGE_ERR, NS_ERROR_DOM_JS_EXCEPTION, and NS_ERROR_DOM_DOMEXCEPTION) to some innocuous nsresult value.  This is probably the generally safest fix, since it makes sure we can't end up in a situation where we have on of those as mResult.

2)  Make TErrorResult::StealNSResult sanitize those "disallowed" values.

Option 2 is arguably a bit faster.  Option 1 is safer.  Ccing some DOM folks who may have opinions.

Anyway, I just went and looked at where we use those 4 error codes.  NS_ERROR_DOM_JS_EXCEPTION and NS_ERROR_DOM_DOMEXCEPTION are only used inside ErrorResult.  

NS_ERROR_RANGE_ERR is used in nsSVGTransform::SetSkewX/Y.  Luckily, no one looks at the return values.

NS_ERROR_TYPE_ERR is used all sorts of places.  A bunch of them return directly via xpconnect, but a few...

* nsGlobalWindow::CreateImageBitmap has some "we don't handle that" codepaths that throw it.  We clearly didn't add tests for those codepaths.  And indeed, window.createImageBitmap(document.createElement("canvas"), 0, 0, "RGBA32", []) will corrupt memory for you very nicely.  This was added in bug 1141979.  Similar for WorkerGlobalScope::CreateImageBitmap, though it would need something more like createImageBitmap(new Blob(['']), 0, 0, 'RGBA32', []).  I filed bug 1338319 on this.

* Various serviceworker stuff, which I can't trace reliably in a short period of time.  At least the pieces that reject promises with NS_ERROR_TYPE_ERR are safe, because that doesn't go through an ErrorResult.

* AudioEventTimeline.  I can't figure out whether there's a way to sneak non-finite floats into there, but if there is, it's bad.

* DataTransfer::SetDataAtInternal when aFormat is "application/x-moz-custom-clipdata".  Evaluating new DataTransfer("copy", false).setData("application/x-moz-custom-clipdata", "") in the browser console does the memory corruption thing too.  I'm sure a web page could do that from a relevant event handler.  Filed bug 

Alright, so given all that I think I'm going to go with option 1.  And we should consider using less-likely-to-be-reached-for nsresult names here, I guess... :(

I have not filed a bug on the DataTransfer thing yet.
Group: dom-core-security
Component: Untriaged → DOM
Product: Firefox → Core
For what it's worth, 1141979 also dates back to 49.  The DataTransfer code dates back to bug 860857, which is Firefox 48.  So I _think_ esr45 is unaffected by all this.

The createImageBitmap crash is reproducible just fine with a current-ish tip build.

The "zotero + disabled script" thing is not, probably because on tip various parts of the zotero add-on are broken due to "for each" support being removed on nightly only.  That happened in bug 1293305.  I did just try re-enabling "for each" in my tip opt build, and then it reproduces the "zotero + disabled script" crash just fine.
> and then it reproduces the "zotero + disabled script" crash just fine.

Er, maybe not.  I was running the wrong build there.  But anyway, I don't think anything happened to the underlying issue in the meantime.  Maybe we're getting lucky and our random pointer happens to be null...
Upon further thought, I'm going to do both options 1 and 2.  Option 1 is needed for code that sort of reasonably does StealNSResult() and then throws it on another ErrorResult to not fail with fatal asserts when interesting things were thrown on the source ErrorResult.  Option 2 is needed to deal with the various places that throw our internal nsresults.  :(
Assignee: nobody → bzbarsky
Status: REOPENED → ASSIGNED
I think we should check this in and backport it, then maybe do a followup where we use some more-clearly-private nsresult values and remove the bits in AssignErrorCode.  Maybe.
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

[Security approval request comment]
How easily could an exploit be constructed based on the patch? It's hard to say.  Once I figured out the basic problem, it took just a few minutes of code searching to find the bugs in createImageBitmap and DataTransfer.  Unfortunately, I can't think of a way to write this patch that doesn't make it clear that _something_ is up with ErrorResult mResult values.  :(

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  No.

Which older supported branches are affected by this flaw?  In theory, all of them, but in practice I think all the problem callsites were added in 48 or later.  So anything newer than that.

If not all supported branches, which bug introduced the flaw?  There are multiple separate flaws, introduced in bug 1269400, bug 1141979, bug 860857.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  This applies cleanly to 52 and tip and anything in between.

How likely is this patch to cause regressions; how much testing does it need? I think this is a pretty safe patch.
Attachment #8835773 - Flags: sec-approval?
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: Crashes, crashes everywhere.  Also, I think this is sec:crit.
User impact if declined: Crashes.  And security bugs.
Fix Landed on Version: Will be 54.
Risk to taking this patch (and alternatives if risky): Low risk, in my opinion. 
  The alternatives all seem riskier, including doing nothing.
String or UUID changes made by this patch:  None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1269400, bug 1141979, bug 860857
[User impact if declined]: Crashes, security bugs.  Repetitive template
   syndrome.
[Is this code covered by automated tests?]: No, which is the problem; we have
   assertions for all this stuff already.  More precisely, we have general
   coverage of ErrorResult, obviously, but not for the three codepaths that we
   have found so far that can cause issues.
[Has the fix been verified in Nightly?]: Not yet.
[Needs manual test from QE? If yes, steps to reproduce]: Probably a good idea. 
   See comment 6 for steps.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: In my opinion, no.
[Why is the change risky/not risky?]: It just changes which exact exceptions we
   end up throwing in some cases.  And the cases it changes would likely lead
   to a crash otherwise.
[String changes made/needed]: None.
Attachment #8835773 - Flags: approval-mozilla-release?
Attachment #8835773 - Flags: approval-mozilla-esr45?
Attachment #8835773 - Flags: approval-mozilla-beta?
Attachment #8835773 - Flags: approval-mozilla-aurora?
The bug I filed for the datatransfer thing is bug 1338328.
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #12)

> * nsGlobalWindow::CreateImageBitmap has some "we don't handle that"
> codepaths that throw it.  We clearly didn't add tests for those codepaths. 
> And indeed, window.createImageBitmap(document.createElement("canvas"), 0, 0,
> "RGBA32", []) will corrupt memory for you very nicely.  This was added in
> bug 1141979.  Similar for WorkerGlobalScope::CreateImageBitmap, though it
> would need something more like createImageBitmap(new Blob(['']), 0, 0,
> 'RGBA32', []).  I filed bug 1338319 on this.
Urgh, error reporting is really fragile. Throwing NS_ERROR_TYPE_ERR should just work.
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

>   nsresult StealNSResult() {
>     nsresult rv = ErrorCode();
>     SuppressException();
>+    // Don't propagate out our internal error codes that have special meaning.
>+    if (rv == NS_ERROR_TYPE_ERR ||
>+        rv == NS_ERROR_RANGE_ERR ||
>+        rv == NS_ERROR_DOM_JS_EXCEPTION ||
>+        rv == NS_ERROR_DOM_DOMEXCEPTION) {
>+      // What about NS_ERROR_DOM_EXCEPTION_ON_JSCONTEXT?  I guess that can be
>+      // legitimately passed on through....
>+      // What to pick here?
>+      return NS_ERROR_DOM_INVALID_STATE_ERR;
>+    }
Hmm, I guess this is fine, though it is sad to lose information.


>@@ -390,17 +401,25 @@ private:
>     MOZ_ASSERT(!IsErrorWithMessage(), "Don't overwrite errors with message");
>     MOZ_ASSERT(aRv != NS_ERROR_DOM_JS_EXCEPTION, "Use ThrowJSException()");
>     MOZ_ASSERT(!IsJSException(), "Don't overwrite JS exceptions");
>     MOZ_ASSERT(aRv != NS_ERROR_DOM_DOMEXCEPTION, "Use ThrowDOMException()");
>     MOZ_ASSERT(!IsDOMException(), "Don't overwrite DOM exceptions");
>     MOZ_ASSERT(aRv != NS_ERROR_XPC_NOT_ENOUGH_ARGS, "May need to bring back ThrowNotEnoughArgsError");
>     MOZ_ASSERT(aRv != NS_ERROR_DOM_EXCEPTION_ON_JSCONTEXT,
>                "Use NoteJSContextException");
>-    mResult = aRv;
>+    // Don't trust people anyway, though.
>+    if (aRv == NS_ERROR_TYPE_ERR ||
>+        aRv == NS_ERROR_RANGE_ERR ||
>+        aRv == NS_ERROR_DOM_JS_EXCEPTION ||
>+        aRv == NS_ERROR_DOM_DOMEXCEPTION) {
>+      mResult = NS_ERROR_UNEXPECTED;
>+    } else {
>+      mResult = aRv;
>+    }
>   }
We really must rename these magical error codes to something else. Just looking at the ErrorResult API nothing hints that some NS_ERROR_ shouldn't be used
(well, unless one reads all the assertions).
Could we for now make NS_ERROR_TYPE_ERR to call ThrowTypeError() and NS_ERROR_RANGE_ERR ThrowRangeError() ? But perhaps that isn't needed if we rename the error codes and fix callers and keep this patch for branches.

r+ for branches, but this is very broken setup, so we must change error code names (or fix the API in some other way). NS_ERROR_DANGER_BINDING_LAYER_INTERNAL_TYPE_ERR or some such might be scary enough. Same for the other errors.
Attachment #8835773 - Flags: review?(bugs) → review+
> Throwing NS_ERROR_TYPE_ERR should just work.

Work and do what?  NS_ERROR_TYPE_ERR is a standin for "synthesize a TypeError", and that only works with ErrorResult.  Throwing it in general is not very meaningful.

Again, we can just move ErrorResult to a different error code here, but then I would argue we should remove NS_ERROR_TYPE_ERR because it's just an attractive nuisance that doesn't do what people want.

> Hmm, I guess this is fine, though it is sad to lose information.

I agree.  The reason we need this part is because I want to keep the asserts in AssignErrorCode and doing aRv = otherRv.StealNSResult() (indirectly) should really not trigger them.  I'm open to better ideas.  I mean, long-term the answer is to disallow throwing an nsresult on an ErrorResult or something, but that's pretty long-term.

> We really must rename these magical error codes to something else

Yes.  I will do that in a followup.  I'd like to land a single patch here on trunk and branches first to fix the immediate issue.

> Could we for now make NS_ERROR_TYPE_ERR to call ThrowTypeError() and NS_ERROR_RANGE_ERR ThrowRangeError() ?

We could, I guess, with some sort of default message.  But I'd rather not do that, honestly; people should be doing things the right way instead.
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

sec-approval+ for trunk and, hell, I'll give Aurora and Beta too. I'll let release management do ESR45 though.
Attachment #8835773 - Flags: sec-approval?
Attachment #8835773 - Flags: sec-approval+
Attachment #8835773 - Flags: approval-mozilla-beta?
Attachment #8835773 - Flags: approval-mozilla-beta+
Attachment #8835773 - Flags: approval-mozilla-aurora?
Attachment #8835773 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/150d0b07434b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

Fix a crash and sec issue. ESR45+.
Attachment #8835773 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
It's sec-critical, imo.  We're deallocating a random address; if it happens to be the actual address of something, we have a problem.

Going to go out on a limb and just rate this.
Keywords: sec-critical
> I will do that in a followup. 

Bug 1339540.
Updating the tracking flags for all branches except ESR 45, which doesn't seem to have the correct version in the tracking dropdown.
Blocks: 1338319
Group: dom-core-security → core-security-release
Flagging this for manual testing, instructions in Comment 6.
Flags: qe-verify+
Comment on attachment 8835773 [details] [diff] [review]
Be a little less trusting about our error codes in ErrorResult

We don't have then plan to have dot release for 51. Rel51-.
Attachment #8835773 - Flags: approval-mozilla-release? → approval-mozilla-release-
Whiteboard: [post-critsmash-triage]
I reproduced this issue using Fx 51.0.1, on Ubuntu 14.04 LTS.
I can confirm this issue is fixed, I verified Fx 54.0a1, build ID: 20170227030203, Fx53.0a2, build ID: 20170227004004, Fx52.0b9, build ID: 20170223185858, Fx 45.8.0esr and Fx 52.0esr Ubuntu 14.04 LTS.

Cheers!
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main52+][adv-esr45.8+]
Alias: CVE-2017-5401
Blocks: 1141979, 860857
Keywords: regression
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: