Closed Bug 1268845 Opened 4 years ago Closed 4 years ago

WidevineCDMManifest crash in xpc::AddonWindowOrNull

Categories

(Core :: XPConnect, defect, P1, critical)

48 Branch
x86
Windows 8.1
defect

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 --- unaffected
firefox47 --- unaffected
firefox48 + fixed
firefox49 + fixed
firefox-esr45 --- unaffected

People

(Reporter: marco, Assigned: bzbarsky)

References

Details

(Keywords: crash, regression, Whiteboard: btpp-active)

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-e1bc3a75-6b76-4457-be9b-e5d6b2160429.
=============================================================

Windows startup crash on Firefox 48, #8 top crasher (although there are only a few reports for now, so the crash might be less widespread).

It could be related to Widevine, since in the stack there's the mozilla::dom::WidevineCDMManifest::Init(nsAString_internal const&) function.
Rank: 10
Priority: -- → P1
Anthony -- This is likely Widevine related and a top crasher.  Can you find someone to take this?  Thanks.
Flags: needinfo?(ajones)
Summary: crash in xpc::AddonWindowOrNull → WidevineCDMManifest crash in xpc::AddonWindowOrNull
Daniel - seeing that Bobby Holley is busy, do you have any thought on this crash? It has spiked recently (most likely due to Widevine) but it goes back a long way.
Flags: needinfo?(ajones) → needinfo?(dveditz)
CompartmentPrivate::Get() is returning nullptr, and we're crashing trying to access 'scope' within ObjectScope (within IsInAddonScope).

Looks like 'data' on the JSCompartment was nullptr.
The specific null deref doesn't worry me but I don't know what the implications of having an unexpected null is. Is it reasonable to add a null check and just deal with it, or do we have to work up the chain and see what else has gone wrong?

The "spike" due to Widevine seems to be one person with a startup crash, not a real spike.

https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=14&signature=xpc%3A%3AAddonWindowOrNull#tab-reports

(sort by version and note the "dup" column).
Flags: needinfo?(dveditz)
Anthony says Gerald is looking at Widevine crashes.
Assignee: nobody → gsquelart
Boris, your code seems to be involved in this crash.
Would you mind having a look at it, and giving some clues about the issue, e.g.: who else could take over, or even a fix if you know what's wrong. (Feel free to take the bug. Please!)

Micro-analysis, based on: https://crash-stats.mozilla.com/report/index/e1bc3a75-6b76-4457-be9b-e5d6b2160429
Your code is in AutoJSAPI::ReportException(): https://hg.mozilla.org/releases/mozilla-aurora/annotate/d6d1b8b21016/dom/base/ScriptSettings.cpp#l586
It retrieves 'errorGlobal' somehow, then calls xpc::AddonWindowOrNull(errorGlobal), which calls IsInAddonScope(aObj=errorGlobal), which calls ObjectScope(obj)->IsAddonScope() where there's a null-deref, probably because ObjectScope returned a nullptr (see comment 3) -- Is errorGlobal null itself, or doesn't have the expected contents?
Flags: needinfo?(bzbarsky)
[Tracking Requested - why for this release]: Null-deref crashes on malformed JSON input to some subsystems.

errorGlobal can't be null itself.  If it were, you would have crashed back when AutoJSAPI::ReportException called xpc::WindowGlobalOrNull(errorGlobal).

On the other hand, CompartmentPrivate::Get(obj) inside ObjectScope really ought to return null here; the globals created by SimpleGlobalObject::Create do not have any sort of interesting compartment private.  In fact, the only thing that does are compartments attached to an XPCWrappedNativeScope, which I guess is all other globals on the main thread _except_ these ones.

This should be quite simple to reproduce with web crypto API too, I expect.  Just need to send malformed JSON through the string Init() method on a dictionary.  I'll try to write a testcase tomorrow.

Bobby, do we want to hook up SimpleGlobalObjects to an XPCWrappedNativeScope on the main thread?  If not, seems like we should null-check any uses of CompartmentPrivate::Get() and ObjectScope() that might have SimpleGlobalObject bits flowing through them...

I should note that some consumers along those lines (e.g xpc::IsInContentXBLScope) already perform just such null checks.  So adding the checks seems like the right thing to me, personally.
Blocks: 1246153
Flags: needinfo?(bzbarsky) → needinfo?(bobbyholley)
Keywords: regression
Thank you Boris, much appreciated.

De-assigning myself, as someone more knowledgeable in the XPConnect area (Bobby?) really should handle this.
Assignee: gsquelart → nobody
Component: Audio/Video: GMP → XPConnect
Tim, is it possible to enter ImportKeyTask::SetKeyDataMaybeParseJWK with arbitrary user-provided data (and in particular malformed JSON), and if so how?  All our testcases I've found so far seem to be passing in something that was created by wrapKey...
Flags: needinfo?(ttaubert)
Per IRC discussion, every main-thread global needs an XPCWrappedNativeScope. Sorry I missed that during review.
Flags: needinfo?(bobbyholley)
(In reply to Boris Zbarsky [:bz] from comment #9)
> Tim, is it possible to enter ImportKeyTask::SetKeyDataMaybeParseJWK with
> arbitrary user-provided data (and in particular malformed JSON), and if so
> how?  All our testcases I've found so far seem to be passing in something
> that was created by wrapKey...

Indeed, (un)wrapKey should be the only way to pass data to this method:

https://dvcs.w3.org/hg/webcrypto-api/raw-file/tip/spec/Overview.html#SubtleCrypto-method-unwrapKey
Flags: needinfo?(ttaubert)
I wonder thought if I could pass malformed json to encrypt() and then pass the result of that to unwrapKey() somehow. This might work or maybe I'm forgetting something that would disallow that.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attached patch Now with tests (obsolete) — Splinter Review
Tim, could you please review the test?

Eddy, could you please review the worker error reporter changes?
Attachment #8750483 - Flags: review?(ttaubert)
Attachment #8750483 - Flags: review?(ejpbruel)
Attachment #8750483 - Flags: review?(bobbyholley)
Attachment #8750455 - Attachment is obsolete: true
Attachment #8750455 - Flags: review?(bobbyholley)
Oh, and the worker changes were needed because the crypto test is run in workers too, and failed an assertion in the worker code that expects there to be no such globals.... but of course there are!
Comment on attachment 8750483 [details] [diff] [review]
Now with tests

Review of attachment 8750483 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bindings/SimpleGlobalObject.cpp
@@ +126,5 @@
>  
> +  if (isMainThread) {
> +    // Invoking the XPCWrappedNativeScope constructor automatically hooks it
> +    // up to the compartment of global.
> +    (void) new XPCWrappedNativeScope(cx, global);

Instead of doing this, can we just call CreateGlobalObject in the main-thread case and JS_NewGlobalObject in the non-main-thread case? That function doesn't do any extra/unnecessary work modulo a few assertions and branches, and this feels like a breach of abstraction.

@@ +136,5 @@
>        JS_ClearPendingException(cx);
>        return nullptr;
>      }
>  
> +    if (!JS_SplicePrototype(cx, global, protoObj)) {

Why this change?
Attachment #8750483 - Flags: review?(bobbyholley) → review-
> Instead of doing this, can we just call CreateGlobalObject in the main-thread case

You mean xpc::CreateGlobalObject?  I guess we could; that will also fail the TraceXPCGlobal assert, which means we need to fix _that_ too.  Which we probably needed anyway now that we're creating an XPCWrappedNativeScope.

> Why this change?

Drive-by cleanup, basically.
Attachment #8750483 - Attachment is obsolete: true
Attachment #8750483 - Flags: review?(ttaubert)
Attachment #8750483 - Flags: review?(ejpbruel)
Comment on attachment 8750612 [details] [diff] [review]
Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread

Review of attachment 8750612 [details] [diff] [review]:
-----------------------------------------------------------------

r=me SimpleGlobalObject.cpp and nsXPConnect.cpp
Attachment #8750612 - Flags: review?(bobbyholley) → review+
Comment on attachment 8750612 [details] [diff] [review]
Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread

Review of attachment 8750612 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for the test

::: dom/crypto/test/test_WebCrypto_Wrap_Unwrap.html
@@ +321,5 @@
> +      var str = "I am so not JSON";
> +      var abv = new ArrayBuffer(str.length);
> +      for (var i = 0; i < str.length; ++i) {
> +        abv[i] = str.charCodeAt(i);
> +      }

Could also do:

var abv = new TextEncoder("utf-8").encode("I am so not JSON");

@@ +326,5 @@
> +      return crypto.subtle.encrypt(wrapAlg, wrapKey, abv);
> +    }
> +    function doUnwrap(wrappedKey) {
> +      return crypto.subtle.unwrapKey("jwk", wrappedKey, wrapKey, wrapAlg,
> +                                     { name: "HMAC", hash: "SHA-384"},

nit: please remove space after "{"

@@ +333,5 @@
> +
> +    crypto.subtle.importKey("jwk", tv.aes_gcm_enc.key_jwk,
> +                            "AES-GCM", false, ['encrypt','unwrapKey'])
> +      .then(function(x) { wrapKey = x; })
> +      .then(doBogusWrap)

It might be worth passing "error(that)" as a second argument to the two .then() calls above so that we don't yield success when importKey() or encrypt() fail.

@@ +336,5 @@
> +      .then(function(x) { wrapKey = x; })
> +      .then(doBogusWrap)
> +      .then(doUnwrap)
> +      .then(
> +        complete(that, false),

error(that),
Attachment #8750612 - Flags: review?(ttaubert) → review+
Comment on attachment 8750612 [details] [diff] [review]
Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread

Review of attachment 8750612 [details] [diff] [review]:
-----------------------------------------------------------------

r+ for the worker error reporter changes.
Attachment #8750612 - Flags: review?(ejpbruel) → review+
Whiteboard: btpp-active
Comment on attachment 8750612 [details] [diff] [review]
Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread

Approval Request Comment
[Feature/regressing bug #]: Bug 1246153.
[User impact if declined]: Crashes when dealing with malformed Widevine manifest
   JSON data.
[Describe test coverage new/current, TreeHerder]: Test included in the patch.
   Not for Widevine, but an equivalent codepath.
[Risks and why]: Fairly low.  The xpconnect change is debug-only, the worker
   change just adjusts some assertions.  The main change here affects code
   that's used for worker debugger sandboxes (for which it doesn't change
   anything) and for webcrypto (tested in the patch) and the Widevine thing.
   I _think_ this should be pretty safe.
[String/UUID change made/needed]: None.
Attachment #8750612 - Flags: approval-mozilla-aurora?
By the way, note that crashing here means the JSON in the Widevine manifest was invalid.  Is this somewhat expected?  How do we end up actually handling that case (of not being able to parse the manifest JSON)?
Flags: needinfo?(gsquelart)
> var abv = new TextEncoder("utf-8").encode("I am so not JSON");

Good idea.  Done.

> nit: please remove space after "{"

Done.

> It might be worth passing "error(that)" as a second argument to the two .then() calls

Done.

> error(that),

Wasn't sure about throwing that exception and producing an uncaught-rejection promise, but I guess it's fine.
(In reply to Boris Zbarsky [:bz] from comment #24)
> By the way, note that crashing here means the JSON in the Widevine manifest
> was invalid.  Is this somewhat expected?  How do we end up actually handling
> that case (of not being able to parse the manifest JSON)?

If JSON-parsing fails, we reject the promise associated with initializing the GMP:
https://hg.mozilla.org/releases/mozilla-aurora/annotate/d6d1b8b21016/dom/media/gmp/GMPParent.cpp#l941
ParseChromiumManifest is async-called by ReadChromiumManifestFile, called by ReadGMPMetaData, called by GMPParent::Init, called by GeckoMediaPluginServiceParent::AddOnGMPThread, where the rejected promise will be logged and forgotten, so the GMP won't be used.
So at least we should fail gracefully.

Now I'm not sure if invalid Widevine manifests are actually expected. Chris?
Flags: needinfo?(gsquelart) → needinfo?(cpearce)
(In reply to Gerald Squelart [:gerald] from comment #26)
> Now I'm not sure if invalid Widevine manifests are actually expected. Chris?

The manifests are downloaded from Google's servers, and we check the sha512 sum of the download to ensure its what we expect. So the manifest should be valid.
Flags: needinfo?(cpearce)
https://hg.mozilla.org/mozilla-central/rev/58d84f830e72
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Hmm.  Valid in the sense that it is what Google meant to send.  That makes the question whether Google is possibly sending invalid JSON in some cases and expecting us to deal somehow.
bz, is this fix safe enough to uplift to Beta 47? This bug was marked as status-firefox47=unaffected, but that is incorrect. There are crash reports for 47 and we uplifted the Widevine code that can hit this crash to 47.
Flags: needinfo?(bzbarsky)
Chris, there shouldn't be crash reports due to the code changed in this bug in 47, because that code doesn't exist in 47.  It was added in bug 1246153, which was only checked in for 48.

I do see crashes in xpc::AddonWindowOrNull on 47, and even older versions, but they all seem to be failures very early in global setup (which we should still look into, possibly), and none involve the Widevine code.  Is there a particular crash report you're seeing that you think is relevant to Widevine?
Flags: needinfo?(bzbarsky)
Depends on: 1272160
Sorry. I see now that dveditz says (in comment 4) that the Widevine-related crashes appear to be one person. None of the crash reports for 47 point to Widevine, so I guess this crash isn't really a regression from Widevine.
Comment on attachment 8750612 [details] [diff] [review]
Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread

Crash fix, recent regression, includes a test. Let's uplift to aurora.
Attachment #8750612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
has problems uplifting to aurora

grafting 344272:58d84f830e72 "Bug 1268845.  Make sure to set up an XPCWrappedNativeScope for SimpleGlobalObject globals on the main thread.  r=bholley,ttaubert,ejpbruel"
merging dom/bindings/SimpleGlobalObject.cpp
merging dom/workers/WorkerPrivate.cpp
warning: conflicts while merging dom/bindings/SimpleGlobalObject.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(bzbarsky)
Yeah, looks like the patch from bug 1263496 landed before this one but after bug 1246153 and wasn't uplifted to Aurora.  I'll put up a diff against Aurora for checkin.
Flags: needinfo?(bzbarsky)
Flags: needinfo?(cbook)
Flags: needinfo?(cbook)
Version: unspecified → 48 Branch
Crash volume for signature 'xpc::AddonWindowOrNull':
 - nightly (version 50): 0 crashes from 2016-06-06.
 - aurora  (version 49): 0 crashes from 2016-06-07.
 - beta    (version 48): 6 crashes from 2016-06-06.
 - release (version 47): 82 crashes from 2016-05-31.
 - esr     (version 45): 6 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       0       0       0       0       0       0
 - aurora        0       0       0       0       0       0       0
 - beta          1       0       1       0       2       1       1
 - release      20       7      11      11       8      10       8
 - esr           0       0       0       0       1       2       0

Affected platforms: Windows, Mac OS X, Linux
This signature may be found on ESR-45, but it's not the same as this Widevine-related bug.
Depends on: 1306384
No longer depends on: 1306384
You need to log in before you can comment on or make changes to this bug.