Closed Bug 1210099 Opened 9 years ago Closed 9 years ago

crash in mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const

Categories

(Core :: Security: CAPS, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
e10s + ---
firefox45 blocking fixed
firefox46 + fixed
firefox47 --- fixed

People

(Reporter: azakai, Assigned: billm)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-a4b19af6-485d-4ea6-ad75-ca41c2150930.
=============================================================

I get this crash 100% of the time when I visit

https://www.opengl.org/sdk/docs/man2/xhtml/glGetIntegerv.xml

There are some similar reports (mentioned on crash-stats), but at closer inspection they look different - stacks are not identical, and they are not consistent like this one is. Some also involve plugins but this is on a plugin-free install, just plain Nightly.
Also occurs when attempting to view comments on http://www.reddit.com. Started with today's OSX Nightly.

bp-4e346e6c-a5e6-4431-8753-432662151001
Probably related to https://bugzilla.mozilla.org/show_bug.cgi?id=1171432, which was RESOLVED WORKSFORME.
Actually, I suspect it's a dupe of bug 1209587 which was duped to bug 1167100.
Crash Signature: [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] → [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] [@ mozilla::OriginAttributes::CreateSuffix const]
Crash Signature: [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] [@ mozilla::OriginAttributes::CreateSuffix const] → [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] [@ mozilla::OriginAttributes::CreateSuffix const] [@ mozilla::OriginAttributes::CreateSuffix ]
These are happening pretty persistently in crash-stats.  Hitting this MOZ_RELEASE_ASSERT seems bad.  How bad is it, and what should we do?
Flags: needinfo?(bobbyholley)
Component: General → Security: CAPS
Hm. We're hitting the release-mode assert here:

http://hg.mozilla.org/mozilla-central/annotate/9aa2dae27ae5/caps/BasePrincipal.cpp#l124

This is basically saying that we're encountering an addon ID that has a character in it that we can't serialize into a filename.

That list of characters is here: http://mxr.mozilla.org/mozilla-central/source/dom/quota/ActorsParent.cpp#123

However, none of the crash reports _seem_ that have extensions installed that fit that description (at least according to my wetware RegExp engine). So what we probably need to do is to land some telemetry to tell us what the addon id is (assuming the data collection folks will let us do that).

I don't have the cycles to follow up on this. Michael, are you able to?
Flags: needinfo?(bobbyholley) → needinfo?(michael)
I'm so sorry - I completely forgot about this bug during the holiday stuff - classes are starting again for me, and I don't think I'll have the cycles to work on it for at least a week or so - I'm so sorry for that!

Is there any chance you could find someone else to work on that... Sorry again.
Flags: needinfo?(michael) → needinfo?(bobbyholley)
Huseby is the OriginAttributes guy now. :-)
Flags: needinfo?(bobbyholley) → needinfo?(huseby)
I'm on it.
Assignee: nobody → huseby
Status: NEW → ASSIGNED
Flags: needinfo?(huseby)
Awesome. Thanks huseby!
Yoshi, can you take a look at this crasher instead?
Flags: needinfo?(allstars.chh)
Assignee: huseby → nobody
Status: ASSIGNED → NEW
Comment 0 and Comment 1 both refers to http://hg.mozilla.org/mozilla-central/annotate/acdb22976ff8/caps/BasePrincipal.cpp#l56

which has been removed by Bobby in Bug 1209843.

IMO we could duplicate this bug with Bug 1209843.

However I don't know where to find the crash report from Bobby's comment 5.
Bobby could you share your crash report ?
I'll file another bug and check it.
Flags: needinfo?(allstars.chh)
I went to crash-stats.mozilla.com and searched for CreateSuffix.

Here is one from a few days ago: https://crash-stats.mozilla.com/report/index/5eb55a53-3c39-4d54-94a7-8c2082160113
And to be clear, this isn't related to the UNKNOWN_APP_ID thing. This is about forbidden chars in an addon id.

The crash is here: http://hg.mozilla.org/mozilla-central/annotate/e790bba372f1/caps/BasePrincipal.cpp#l124

All of this is explained in comment 5. We need to add Telemetry to figure out what the problematic addon id is.
Flags: needinfo?(allstars.chh)
Yeah, I'd work on it.

But is the Extension Id in the crash report represents 'mAddonId' ?
or how is mAddonId generated?
looks from Extension.generate?

ni? for billm for this.
Assignee: nobody → allstars.chh
Flags: needinfo?(allstars.chh) → needinfo?(wmccloskey)
Typically we generate the UUID here:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/extensions/Extension.jsm#689
That ultimately goes to here:
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsUUIDGenerator.cpp#85
I had assumed this function did something simple, but it actually looks pretty crazy. That may or may not be the problem.

I'm actually surprised that so many people have WebExtensions installed. I think that's the only place we use this OriginAttribute, but maybe I'm forgetting.

Kris, could you look into replacing the UUID generator with something reasonable?
Flags: needinfo?(wmccloskey) → needinfo?(kmaglione+bmo)
If cryptographic security isn't a requirement, perhaps it can just use the new xorshift128+ PRNG in MFBT on all platforms.
I can definitely look into replacing it in our code. Changing the service might cause some friction.

I don't think this is the problem, though. The generated UUIDs are used for the URL identifier, but mAddonId is the public ID of the add-on, from manifest.json.

Those IDs have to meet a pretty strict regular expression, which should rule out any character in that list. None of the extension IDs in the crash report look like they contain them either, so I'm not sure how this could be happening.
Flags: needinfo?(kmaglione+bmo)
Oh yeah, you're right. Maybe the right thing to do is to move the assertion closer to the source where mAddonId is created. Or at least include the add-on ID in the crash report with AnnotateCrashReport.
It might also be helpful if someone can provide a copy of those extensions. Maybe one of them is doing something strange, like instantiating an Extension instance directly (probably another good reason to move the check).

75% of those crash reports have light_plugin_D772DC8D6FAF43A29B25C4EBAA5AD1DE@kaspersky.com installed, which I can't get a hold of without buying a Kaspersky subscription.
I don't think this is e10s-related.
(In reply to Bill McCloskey (:billm) from comment #20)
> I don't think this is e10s-related.

Is there any place where we structured clone principals when e10s is disabled?
I think we take the same message manager paths with and without e10s. The message manager always uses structured clone.
I don't think there's anywhere we send principals using sendRpcMessage except in remote browsers, though.

It looks like the only places we do that is in RemoteAddonsChild for content policy and about: proxies, in browser-child for webprogress proxies, and in content.js for contextmenu data, specifically if we're in a content process.

So I'm pretty sure this can only happen we have a webextension installed (or a regular extension using our code), and e10s enabled.

Also, crash-stats is reporting nearly 100% of crashes in content processes: https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3AOriginAttributes%3A%3ACreateSuffix#aggregations
This could happen through sendSyncMessage as well as sendRpcMessage. And there are 3 crashes without e10s :-). But maybe you're right. I haven't looked at each callsite. I'll + this for now. It needs to get fixed either way.
Yeah, it looks like all of the crashes that aren't in content processes are a different assertion, though. I haven't looked through all of the call stacks, but all of the ones I saw were sendRpcMessage.
Blocks: 1234647
[Tracking Requested - why for this release]:
This is 7.5% of all crashes in 45.0b1, almost all of them are at startup, and we see about 5 crashes per installation. I think this should be a blocker, it probably has already lost us a good number of Beta users in the last days.
The stack all only have this one frame, so we probably need someone to load a dump into WinDbg and get a better stack. Addresses all seem to end in 254a, so that could be a clue of some sort.
A number of those crashes are on http://home.tb.ask.com/index.jhtml but facebook.com, mail.ru, google.fr and google.co.uk, ok.ru, youtube.com are also in the top positions.

These add-on correlations are probably interesting:
94% (1308/1391)	23% (3968/17561)	{82AF8DCA-6DE9-405D-BD5E-43525BDAD38A}
61% (846/1391)	9% (1652/17561)	wrc@avast.com
98% (1369/1391)	50% (8850/17561)	e10s-beta45-withaddons@experiments.mozilla.org

The first of those is Skype Click-to-Call, and the latter is our e10s Experiment, both show strong correlations, sounds to me like the click-to-call add-on doesn't work with e10s.

100% of the reports have DOMIPCEnabled set and are therefore on e10s.
Tracking as it is a major issue. This would block the release
Kairo, could you also look at the crash reports to see if all of them have the following annotation:

  AddonsShouldHaveBlockedE10s = 1

It doesn't show up on Socorro so I don't know how to look for that myself
Flags: needinfo?(kairo)
(In reply to :Felipe Gomes (needinfo me!) from comment #31)
> Kairo, could you also look at the crash reports to see if all of them have
> the following annotation:
> 
>   AddonsShouldHaveBlockedE10s = 1
> 
> It doesn't show up on Socorro so I don't know how to look for that myself

I have no magic to be able to facet or search for stuff that nobody has requested Socorro to know about. If we will need to look for this more often, please file a bug against Socorro for this field to be added to Super Search.

That said, I have looked through a number of crashes, and almost all of them do have this flag set to 1, but 96aa82f0-2da6-4db0-a894-c69232160129 has 0 for that flag.
Flags: needinfo?(kairo)
I know, it's just that this is a very temporary thing that it probably doesn't make sense to add to Socorro. Unless adding it is a one-liner..
https://crash-stats.mozilla.com/signature/?product=Firefox&version=45.0b2&signature=msvcr120.dll%400xf608+|+nsAString_internal%3A%3AAssign+|+nsAString_internal%3A%3AAssign+|+mozilla%3A%3Adom%3A%3AURLParams%3A%3ASet seems to be pretty much the same issue, CreateSuffix is on the stack, all crashes are with e10s enabled, and the correlations are as follows:
     87% (13/15) vs.   7% (43/599) wrc@avast.com
     87% (13/15) vs.  17% (100/599) {82AF8DCA-6DE9-405D-BD5E-43525BDAD38A}
    100% (15/15) vs.  44% (263/599) e10s-beta45-withaddons@experiments.mozilla.org
Crash Signature: [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] [@ mozilla::OriginAttributes::CreateSuffix const] [@ mozilla::OriginAttributes::CreateSuffix ] → [@ mozilla::OriginAttributes::CreateSuffix(nsACString_internal&) const] [@ mozilla::OriginAttributes::CreateSuffix const] [@ mozilla::OriginAttributes::CreateSuffix ] [@ msvcr120.dll@0xf608 | nsAString_internal::Assign | nsAString_internal::Assign | mo…
Bobby, could you please look at this again? The instrumentation I landed hasn't been all that helpful. We're crashing trying to write out the crash annotation. You can see an example here:
https://crash-stats.mozilla.com/report/index/16af67f2-6482-4614-a0e3-1d4382160203

I also noticed some reports where we OOM trying to allocate a string with the add-on ID:
https://crash-stats.mozilla.com/report/index/4f0c927a-c4e1-4bfd-940b-6078c2160202
There are 14 of these OOM crashes for yesterday's nightly, so it's more common than you would expect for a random string allocation.

This leads me to believe that mAddonId just contains garbage. I tried to figure out how that could happen, but I'm not seeing anything. Bobby, can you do an audit and make sure we don't have any uninitialized memory here? This is a pretty serious crash.
Flags: needinfo?(bobbyholley)
Hm. The only thing that jumps out at me is the possibility that ExtensionURIToAddonId might fail in MaybeSetAddonIdFromURI, and leave its out-param in an inconsistent state.

I think the next debugging step would be to duplicate the MOZ_ASSERT and crashreporter stuff at the various places mAddonId gets set - during deserialization and in MaybeSetAddonIdFromURI. That'll either give us a new crash (a bit closer to the source of the problem), or we'll continue crashing where we're crashing now, which would indicate heap corruption (assuming there's nowhere else in the tree that is allowed to twiddle that value).
Flags: needinfo?(bobbyholley)
I just installed Skype Click to Call. By itself there seems to be no problem, but when I also installed Avast, I started getting instant crashes on beta. I'll dig into this tomorrow.
(In reply to Bill McCloskey (:billm) from comment #37)
> I just installed Skype Click to Call. By itself there seems to be no
> problem, but when I also installed Avast, I started getting instant crashes
> on beta. I'll dig into this tomorrow.

We got a similar reply from the Skype developers. They were able to reproduce consistently when both Skype and Ghostery were installed. Their conclusion is that it's probably the presence of an extension that scans outbound network traffic from content scripts that's causing this issue.

They commented out all the code in their content script and replaced it with this:
 
    $.ajax({
        type: "POST",
        dataType: "text",
        url: "https://www.google.com",
        contentType: "application/json",
        jsonp: false,
        beforeSend: function (xhr) {
            xhr.setRequestHeader('Pragma', 'no-cache');
        },
        data: JSON.stringify({ id: "0", languages: "en", url: "https://www.google.com/search?q=ham"}),
        success: function(response) { },
        error: function (xhr, ajaxOptions, thrownError) {  }
    });
 
And it still caused it to crash.
(In reply to Bill McCloskey (:billm) from comment #37)
> I just installed Skype Click to Call. By itself there seems to be no
> problem, but when I also installed Avast, I started getting instant crashes
> on beta. I'll dig into this tomorrow.

\o/
In a debug build, the STR I mentioned above asserts here:
https://dxr.mozilla.org/mozilla-central/source/caps/nsJSPrincipals.cpp#195
If that had been a release assertion, this bug would have been 10x easier to track down.

We can make that happen by converting the generated IPDL union discriminator checks to MOZ_DIAGNOSTIC_ASSERTs. This patch does that. I don't think the performance impact should be too big, since we probably don't do these conversions much. In any case, it won't affect release.

This also affects "actor not managed by this!" assertions. That also seems like a pretty serious error, so I see no reason why that shouldn't be a diagnostic assertion as well.
Assignee: allstars.chh → wmccloskey
Status: NEW → ASSIGNED
Attachment #8716064 - Flags: review?(jld)
This fixes the central problem: we don't implement structured cloning of expanded principals. I piggybacked onto an existing test.
Attachment #8716065 - Flags: review?(bobbyholley)
Attachment #8716065 - Flags: review?(bobbyholley) → review+
Attachment #8716064 - Flags: review?(jld) → review+
https://hg.mozilla.org/mozilla-central/rev/0e54a3a870e9
https://hg.mozilla.org/mozilla-central/rev/91b300b43f10
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Blocks: 1246180
Comment on attachment 8716065 [details] [diff] [review]
implement structured cloning of expanded principals

We should get this uplifted. It only affects e10s, but we have a lot of e10s users on Aurora and beta.

Approval Request Comment
[Feature/regressing bug #]: bug 1133189 (sort of, it was more broken before that)
[User impact if declined]: crashes, security issues
[Describe test coverage new/current, TreeHerder]: on m-c, test added
[Risks and why]: Should be low risk.
[String/UUID change made/needed]: None.
Attachment #8716065 - Flags: approval-mozilla-beta?
Attachment #8716065 - Flags: approval-mozilla-aurora?
> [Risks and why]: Should be low risk.
Could you explain why? This is a pretty big patch and it has been in m-c just for 2 days. Thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #45)
> > [Risks and why]: Should be low risk.
> Could you explain why? This is a pretty big patch and it has been in m-c
> just for 2 days. Thanks

The code is mostly just boilerplate. It doesn't change any of our invariants or APIs. It just does some work that we were failing to correctly before. It only changes code paths where we would have crashed without the patch, so it's hard to see how it can make things worse than they are now.

It's possible that we won't hit this crash on beta now that the add-ons part of the e10s experiment is over. I'm not entirely sure though. We definitely need it on Aurora though.
This looks like a bad startup crash here (on 45), https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3AOriginAttributes%3A%3ACreateSuffix  

Let's uplift this to aurora now. Adding tracking for 46 since this crash affects 46. 

Over the next couple of days, we should be able to see if the end of the e10s experiment affects the crash rate on beta 45.
Comment on attachment 8716065 [details] [diff] [review]
implement structured cloning of expanded principals

Crash fix, hard to verify or test, but let's try it in aurora.
Attachment #8716065 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8716065 [details] [diff] [review]
implement structured cloning of expanded principals

OK, let's take it in 45. Should be in 45 beta 5.
Thanks
Attachment #8716065 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: