Last Comment Bug 772615 - about:memory crash in nsScriptSecurityManager::GetSubjectPrincipal when refreshed
: about:memory crash in nsScriptSecurityManager::GetSubjectPrincipal when refre...
Status: RESOLVED FIXED
[qa-]
: crash, regression
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: 16 Branch
: All Windows 7
: -- critical with 1 vote (vote)
: mozilla17
Assigned To: Nicholas Nethercote [:njn]
:
Mentors:
Depends on:
Blocks: 687724
  Show dependency treegraph
 
Reported: 2012-07-10 13:03 PDT by Grant
Modified: 2012-09-20 08:55 PDT (History)
10 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
+
fixed
fixed


Attachments
(part 1) - Use the safe JSContext in the JS memory reporter. (3.28 KB, patch)
2012-07-15 21:04 PDT, Nicholas Nethercote [:njn]
bobbyholley: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
(part 2) - Enter compartments before calling GetNativeOfWrapper() in the JS memory reporter. (2.23 KB, patch)
2012-07-15 21:16 PDT, Nicholas Nethercote [:njn]
bobbyholley: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Grant 2012-07-10 13:03:17 PDT
Hi all,

Every few days I get a crash with the latest nightlies, When I refresh about:memory it crashes Firefox completely: Here are some of the bug reports:

https://crash-stats.mozilla.com/report/index/bp-b5a081db-8c77-4872-b51b-e96132120710

http://crash-stats.mozilla.com/report/index/bp-ec98a57b-9589-4e13-bd58-8cf792120708

Seems to be crashing on JS_GetCompartmentPrincipals.

Any help?
Comment 1 Scoobidiver (away) 2012-07-10 14:34:48 PDT
Does it happen in Safe Mode (see http://support.mozilla.org/kb/troubleshoot-firefox-issues-using-safe-mode)?
Which extension causes the issue?
Comment 2 Paul Silaghi, QA [:pauly] 2012-07-11 00:50:53 PDT
WFM fine on Nightly 16.0a1 (2012-07-10) on Win 7.
Comment 3 Grant 2012-07-11 01:05:39 PDT
Thanks for the feedback, I've had this happen a fair number of times recently - I'm going to leave a session on overnight without addons and see if it still does it in the morning.

I really wish I could reproduce it more frequently, such is bug finding....
Comment 4 Scoobidiver (away) 2012-07-11 12:04:27 PDT
There are other crashes with the same kind of comment.
It first appeared in 16.0a1/20120706075126. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=675f55c4310c&tochange=a8f682801a6d

More reports at:
https://crash-stats.mozilla.com/report/list?signature=JS_GetCompartmentPrincipals
https://crash-stats.mozilla.com/report/list?signature=JS_GetFlatStringChars
Comment 5 Scoobidiver (away) 2012-07-12 01:43:44 PDT
It might be a regression from bug 687724.
Comment 6 Nicholas Nethercote [:njn] 2012-07-12 02:46:21 PDT
Bug 687724 definitely looks relevant to the stack traces in comment 0.  However...

> More reports at:
> https://crash-stats.mozilla.com/report/
> list?signature=JS_GetCompartmentPrincipals
> https://crash-stats.mozilla.com/report/list?signature=JS_GetFlatStringChars

The first group crashes in JS_GetCompartmentPrincipals().  The second group crashes in JS_GetFlatStringChars().  We've had plenty of crashes in both of those prior to bug 687724 landing.  So I wonder if this is just making an existing problem more noticeable by executing the buggy code more frequently.
Comment 7 Scoobidiver (away) 2012-07-12 03:00:10 PDT
(In reply to Nicholas Nethercote [:njn] from comment #6)
>  We've had plenty of crashes in both of those prior to bug 687724 landing.
But not with the following stack trace which the bug is about:
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	JS_GetCompartmentPrincipals 	js/src/jsfriendapi.cpp:179
1 	xul.dll 	nsScriptSecurityManager::GetSubjectPrincipal 	caps/src/nsScriptSecurityManager.cpp:2291
2 	xul.dll 	nsScriptSecurityManager::IsCapabilityEnabled 	caps/src/nsScriptSecurityManager.cpp:2443
3 	xul.dll 	xpc::ContentScriptHasUniversalXPConnect 	js/xpconnect/wrappers/XrayWrapper.cpp:773
4 	xul.dll 	xpc::XrayUtils::IsTransparent 	js/xpconnect/wrappers/XrayWrapper.cpp:1178
5 	xul.dll 	xpc::AccessCheck::isScriptAccessOnly 	js/xpconnect/wrappers/AccessCheck.cpp:379
6 	xul.dll 	XPCWrappedNative::GetWrappedNativeOfJSObject 	js/xpconnect/src/XPCWrappedNative.cpp:1849
7 	xul.dll 	nsXPConnect::GetNativeOfWrapper 	js/xpconnect/src/nsXPConnect.cpp:1449
8 	xul.dll 	xpc::XPCJSRuntimeStats::initExtraCompartmentStats 	js/xpconnect/src/XPCJSRuntime.cpp:1721
9 	mozjs.dll 	JS::StatsCompartmentCallback 	js/src/MemoryMetrics.cpp:62
10 	mozjs.dll 	js::IterateCompartmentsArenasCells 	js/src/jsgc.cpp:4054
11 	mozjs.dll 	js::IterateChunks 	js/src/jsgc.cpp:4077
12 	xul.dll 	xpc::JSMemoryMultiReporter::CollectReports 	js/xpconnect/src/XPCJSRuntime.cpp:1759
Comment 8 Nicholas Nethercote [:njn] 2012-07-12 03:06:37 PDT
(In reply to Scoobidiver from comment #7)
> (In reply to Nicholas Nethercote [:njn] from comment #6)
> >  We've had plenty of crashes in both of those prior to bug 687724 landing.
> But not with the following stack trace which the bug is about:

I realize that.  It's still possible that bug 687724 made an existing problem more obvious.
Comment 9 Bobby Holley (:bholley) (busy with Stylo) 2012-07-12 05:23:24 PDT
Probably has something to do with bug 754202, but that's just the bug where we pull principals off the compartment. Here we seem to be ending up with a bad context compartment.
Comment 10 Nicholas Nethercote [:njn] 2012-07-12 17:05:00 PDT
> Here we seem to be ending up with a bad context compartment.

The context in question is one that I fake up before doing the memory measurements:

http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#1740

Is that relevant?
Comment 11 Bobby Holley (:bholley) (busy with Stylo) 2012-07-13 00:49:17 PDT
Yes, because it appears that you don't enter a compartment, and thus will hit the assertion here, or crash shortly thereafter on release builds:

http://mxr.mozilla.org/mozilla-central/source/caps/src/nsScriptSecurityManager.cpp#2287

Can you use the SafeJSContext instead?
Comment 12 Nicholas Nethercote [:njn] 2012-07-13 02:24:36 PDT
> Yes, because it appears that you don't enter a compartment, and thus will
> hit the assertion here, or crash shortly thereafter on release builds:

Is that explanation consistent with the fact that this crash is intermittent?  I've never seen this crash myself...

> Can you use the SafeJSContext instead?

Do you mean nsXPConnect::GetSafeJSContext()?  Sure, that's easy.  How does SafeJSContext differ from JSContext?
Comment 13 Bobby Holley (:bholley) (busy with Stylo) 2012-07-13 02:50:49 PDT
(In reply to Nicholas Nethercote [:njn] from comment #12)
> Is that explanation consistent with the fact that this crash is
> intermittent?  I've never seen this crash myself...

If the codepaths were non-deterministic such that we sometimes did and sometimes didn't enter a compartment, yes.


> > Can you use the SafeJSContext instead?
> 
> Do you mean nsXPConnect::GetSafeJSContext()?  Sure, that's easy.  How does
> SafeJSContext differ from JSContext?

It's a special JSContext for when chrome needs to do something that requires a JSContext (so that not everybody needs to spin up a special JSContext). It runs in a default compartment with an nsNullPrincipal.
Comment 14 Nicholas Nethercote [:njn] 2012-07-13 03:05:04 PDT
> > Is that explanation consistent with the fact that this crash is
> > intermittent?  I've never seen this crash myself...
> 
> If the codepaths were non-deterministic such that we sometimes did and
> sometimes didn't enter a compartment, yes.

The memory reporter never enters any compartments.
Comment 15 Bobby Holley (:bholley) (busy with Stylo) 2012-07-13 04:07:05 PDT
(In reply to Nicholas Nethercote [:njn] from comment #14)
> The memory reporter never enters any compartments.

Then it's sure going to crash if it ever triggers a call to nsScriptSecurityManager::IsCapabilityEnabled.
Comment 16 Nicholas Nethercote [:njn] 2012-07-15 19:29:15 PDT
> Then it's sure going to crash if it ever triggers a call to
> nsScriptSecurityManager::IsCapabilityEnabled.

Can you suggest steps to trigger that?

Grant, have you noticed this happening after visiting any particular web sites?  I'd really like to be able to reproduce this myself.
Comment 17 Nicholas Nethercote [:njn] 2012-07-15 21:04:06 PDT
Created attachment 642475 [details] [diff] [review]
(part 1) - Use the safe JSContext in the JS memory reporter.

Use the safe JSContext in the JS memory reporter.  This let me remove
XPCJSRuntimeStats::mCx.
Comment 18 Nicholas Nethercote [:njn] 2012-07-15 21:16:46 PDT
Created attachment 642478 [details] [diff] [review]
(part 2) - Enter compartments before calling GetNativeOfWrapper() in the JS memory reporter.

Let's enter the compartment before calling GetNativeOfWrapper().  I haven't
managed to reproduce this crash, maybe I'll get lucky and this will fix it.
Comment 19 Grant 2012-07-15 21:49:43 PDT
Hi Nicholas,

The only thing that seems to be consistent in this is my use of Chatzilla open for a few days - I don't think I've seen it with Chatzilla not being opened during a session. Its very hard to reproduce.
Comment 20 Nicholas Nethercote [:njn] 2012-07-15 21:53:30 PDT
Thanks, Grant.  I use Chatzilla myself and I haven't seen this, though I don't keep it open for more than about 8 hours at a time.
Comment 21 Bobby Holley (:bholley) (busy with Stylo) 2012-07-16 01:36:49 PDT
Comment on attachment 642478 [details] [diff] [review]
(part 2) - Enter compartments before calling GetNativeOfWrapper() in the JS memory reporter.


> JSAutoEnterCompartment::enter(JSContext *cx, JSObject *target)
> {
>-    AssertHeapIsIdle(cx);
>+    AssertHeapIsIdleOrIterating(cx);

I don't know what this means, but presumably you do.

>diff --git a/js/xpconnect/src/XPCJSRuntime.cpp b/js/xpconnect/src/XPCJSRuntime.cpp
>--- a/js/xpconnect/src/XPCJSRuntime.cpp
>+++ b/js/xpconnect/src/XPCJSRuntime.cpp
>@@ -1747,16 +1747,21 @@ class XPCJSRuntimeStats : public JS::Run
>         nsCAutoString cJSPathPrefix, cDOMPathPrefix;
>         nsCString cName;
>         GetCompartmentName(c, cName);
> 
>         // Get the compartment's global.
>         nsXPConnect *xpc = nsXPConnect::GetXPConnect();
>         JSContext *cx = xpc->GetSafeJSContext();
>         if (JSObject *global = JS_GetGlobalForCompartmentOrNull(cx, c)) {
>+            // Need to enter the compartment, otherwise GetNativeOfWrapper()
>+            // might crash.
>+            JSAutoEnterCompartment aec;
>+            aec.enterAndIgnoreErrors(cx, global);

Should we maybe just return here if we hit an error rather than ignoring? Note that entering a compartment will become infallible with bug 625199, so it probably doesn't matter too much.
Comment 22 Nicholas Nethercote [:njn] 2012-07-16 01:47:08 PDT
> > JSAutoEnterCompartment::enter(JSContext *cx, JSObject *target)
> > {
> >-    AssertHeapIsIdle(cx);
> >+    AssertHeapIsIdleOrIterating(cx);
> 
> I don't know what this means, but presumably you do.

Yes.  The heap has three states:  Idle, Iterating, Collecting.  I added "Iterating" recently.  The important thing in this case is that the heap isn't in the "Collecting" state.


> >+            // Need to enter the compartment, otherwise GetNativeOfWrapper()
> >+            // might crash.
> >+            JSAutoEnterCompartment aec;
> >+            aec.enterAndIgnoreErrors(cx, global);
> 
> Should we maybe just return here if we hit an error rather than ignoring?
> Note that entering a compartment will become infallible with bug 625199, so
> it probably doesn't matter too much.

I started doing that but the fallibility bubbled up through several layers of calls all the way to jsapi.h, and it felt like more trouble than it was worth.  And if bug 625199 is on the cards, I feel even more justified!
Comment 23 Bobby Holley (:bholley) (busy with Stylo) 2012-07-16 01:48:23 PDT
(In reply to Nicholas Nethercote [:njn] from comment #22)

> I started doing that but the fallibility bubbled up through several layers
> of calls all the way to jsapi.h, and it felt like more trouble than it was
> worth.  And if bug 625199 is on the cards, I feel even more justified!

I just meant to return, in the same way we handle the "OrNull" case for the global.
Comment 24 Nicholas Nethercote [:njn] 2012-07-16 17:14:30 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/4785b5869803
https://hg.mozilla.org/integration/mozilla-inbound/rev/3b89b2b1340b

I'm going to be optimistic and let this bug be marked as RESOLVED FIXED once it merges with mozilla-central.  Grant, if you still hit it once these patches have made it to Nightly (probably two days from now), can you open a new bug?  Thanks very much.
Comment 27 Scoobidiver (away) 2012-07-17 10:36:25 PDT
(In reply to SpeciesX from comment #26)
> I get still crashes
That's normal. The first Nightly that will contain the patch is 17.0a1/20120718.
Comment 28 Nicholas Nethercote [:njn] 2012-07-17 18:26:25 PDT
If these patches fix the crashes, we'll need to backport them to Aurora 16, or back out the original patches from bug 687724.
Comment 29 Bobby Holley (:bholley) (busy with Stylo) 2012-07-24 01:30:40 PDT
Comment on attachment 642475 [details] [diff] [review]
(part 1) - Use the safe JSContext in the JS memory reporter.

Flagging these patches for for m-a at mccr8's request. They're low-risk. The first one gets rid of a hand-rolled JSContext and just uses the one from XPConnect (which njn didn't know about when he did the original implementation). The second one enters a compartment, which is basically always a Good Thing.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 687724
User impact if declined: crashes
Testing completed (on m-c, etc.): baked on m-c for >1 week.
Risk to taking this patch (and alternatives if risky): Low risk. 
String or UUID changes made by this patch: None.
Comment 30 Andrew McCreight [:mccr8] 2012-07-24 06:33:56 PDT
SpeciesX, grant, are you still experiencing these crashes?

Looking at crash stats, it seems there are still some of these around.  I guess there could be other ways to trigger it.  I'll try to look into them.
Comment 31 Grant 2012-07-24 12:56:24 PDT
(In reply to Andrew McCreight [:mccr8] from comment #30)
> SpeciesX, grant, are you still experiencing these crashes?
> 
> Looking at crash stats, it seems there are still some of these around.  I
> guess there could be other ways to trigger it.  I'll try to look into them.


Hey Andrew,

Haven't so far, I had 70~ tabs and chatzilla open and it didn't crash, hasn't since the change.

Will let you know if it does though.

Cheers
Comment 32 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-24 14:39:37 PDT
Comment on attachment 642478 [details] [diff] [review]
(part 2) - Enter compartments before calling GetNativeOfWrapper() in the JS memory reporter.

low risk and crash fixing, approving for Aurora.
Comment 34 Grant 2012-07-25 23:39:10 PDT
Damn, im still getting a crash on about:memory - but it looks like a different reason:


https://crash-stats.mozilla.com/report/index/bp-50c761f5-e9ca-4496-a5f7-faef42120726

Firefox 17.0a1 Crash Report [@ gfxFontCache::SizeOfFontEntryExcludingThis(gfxFontCache::HashEntry*, unsigned int (*)(void const*), void*) ]

If I get it again, I will post another bug.
Comment 35 Nicholas Nethercote [:njn] 2012-08-07 17:41:29 PDT
mccr8:  Thanks for landing on Aurora.

Grant:  Yes, that's a different crash;  a new bug would be appropriate if you see it again.
Comment 36 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-18 16:08:41 PDT
Grant, to confirm, you no longer see *this* crash in Firefox 16 and 17?
Comment 37 Grant 2012-09-18 19:44:47 PDT
Hi Nicholas/Anthony,

I have since moved to a new PC and am more then happy with FF - I haven't have a crash yet :) I think its safe to say this has been fixed.

Thanks for your help guys.
Comment 38 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-19 09:43:04 PDT
Thanks for checking back in, Grant. Unfortunately, I don't think we'll be able to call this verified fixed, especially if your new PC cannot reproduce the original crash on a reportedly affected build.

If anyone else on this bug is able to help verify it fixed, please help. Thanks.
Comment 39 Nicholas Nethercote [:njn] 2012-09-19 17:13:34 PDT
> If anyone else on this bug is able to help verify it fixed, please help.

I think your best bet for verifying is to look for the crash signature in FF16 and FF17.  Hopefully it won't appear.
Comment 40 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-20 08:55:26 PDT
No crashes found for the following signatures with Firefox 17 or 18 in the last week:
 * JS_GetCompartmentPrincipals
 * nsScriptSecurityManager::GetSubjectPrincipal(JSContext*, unsigned int*)
 * nsScriptSecurityManager::GetSubjectPrincipal

However, JS_GetFlatStringChars has a single crash on Firefox 17.0a2 in the last week:
https://crash-stats.mozilla.com/report/index/7b599cf7-e977-4b6b-bf99-d7dc72120916

Note You need to log in before you can comment on or make changes to this bug.