Closed Bug 1348666 Opened 7 years ago Closed 7 years ago

Crash in js::ScriptSource::addSizeOfIncludingThis

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: marcia, Assigned: kmag)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-a237ff1d-8f31-4da7-a134-5b4ee2170319.
=============================================================

Seen while looking at Nightly crash stats - crashes started with 20170318030202 and seem to be happening both on Windows and Mac: 

Windows crashes: http://bit.ly/2n3O1US

Mac crashes: http://bit.ly/2naCzXT

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=39607304b774591fa6e32c4b06158d869483c312&tochange=23a4b7430dd7e83a2809bf3dc41471f154301eda

Fairly high crash count so far, but smaller number of installs (406 crashes on Windows with 24 installs)
CollectScriptSourceStats() assumes that |ss| is non-null. My guess is that it is. Fixing this might be as simple as null-checking script->scriptSource() on line 513.

Bug 1333990 is in the regression range in comment 0. kmag, any chance that bug's patches might have affected how ScriptSource works?
Flags: needinfo?(kmaglione+bmo)
Yeah, this is happening to me a ton over the last few days on both my iMac and MacBook Pro on OS X 10.12.4 beta releases. Sometimes reloading the tab works, but usually it just crashes again.

Browser startup usually includes at least half my front-most tabs being crashed. So something's wonky.
(This tab crashed right after I finished submitting comment #2 above.)
Not likely, but the changes do cache and clone a lot of JSScripts. It's possible that something strange is happening with the script source being shared across a bunch of compartments with different principals. Or the fact that we disable lazy parsing. I'd probably have to be able to reproduce it to say more.


Something weird is going on with the correlations. This report is for Windows 10, but it says 100% correlation with platform = Linux, and 14 extensions:

https://crash-stats.mozilla.com/report/index/df6c308f-e7ba-4063-ae72-ecc722170318#tab-details
Flags: needinfo?(kmaglione+bmo)
Shu, do you have any thoughts here?
Flags: needinfo?(shu)
It might also have something to do with the fact that we don't keep the original scripts alive indefinitely after we clone them. I'd expect the clones to keep the script sources alive through their wrappers, but it's possible they're somehow being finalized before the clones are. It doesn't look like there's anywhere else where we clone scripts and then let go of the originals.

If that seems likely, I can change the caching to keep the scripts alive as long as they're in use. It probably makes more sense for the sake of memory management anyway, since we'd be less likely to wind up with extra in-memory script sources.
Ah. Actually, I bet I know what the problem is. We load the cloned scripts into extension sandboxes. When we destroy those, we cut all of their wrappers, both in and out. That would include the script source. We should probably make those wrappers immune to nuking...
I'm not really sure how to test this change. I could try to inspect a script from a nuked sandbox with the debugger, but that would rely a lot on the behavior of the GC, and I suspect any way I came at it, it would turn out pretty flaky.
Flags: needinfo?(shu)
Sheppy, any chance you can try to reproduce on this build?

https://queue.taskcluster.net/v1/task/bwFwgkgzTyinWHXObFpsdg/runs/0/artifacts/public/build/target.dmg
Assignee: nobody → kmaglione+bmo
Flags: needinfo?(eshepherd)
(In reply to Kris Maglione [:kmag] from comment #10)
> Sheppy, any chance you can try to reproduce on this build?
> 
> https://queue.taskcluster.net/v1/task/bwFwgkgzTyinWHXObFpsdg/runs/0/
> artifacts/public/build/target.dmg

That disk image is apparently invalid. At least, when I mount it, I'm told it appears to be damaged. I hesitate to do a test run on a busted image. Got another one?
Flags: needinfo?(eshepherd)
Flags: needinfo?(eshepherd)
I'm still trying to understand the bug. Where is the SSO *wrapper* that should not the nuked? In the extension sandbox or in chrome? If in the sandbox, why are we trying to get the size of a nuked compartment? If in chrome, it seems to me that we probably shouldn't CCW SSOs. ScriptSources aren't compartment things, they're just refcounted, so in that sense ScriptSourceObjects are already wrappers. For this special API, perhaps we should just create new SSOs that increase the refcount instead of creating CCWs and getting in the way of nuking semantics.
(In reply to Shu-yu Guo [:shu] from comment #13)
> I'm still trying to understand the bug. Where is the SSO *wrapper* that
> should not the nuked? In the extension sandbox or in chrome?

It's in the sandbox.

> If in the sandbox, why are we trying to get the size of a nuked compartment?

Because nuking a compartment doesn't guarantee it will be destroyed any time
soon. Cutting wrappers makes that more likely, but there can still be
references that keep it alive. In this case, it's most likely event listeners
attached to the content page, which guarantee it'll be kept alive until at
least the next full CC. But even without those, it can take some time for the
sandbox to be GCed.

> For this special API, perhaps we should just create new SSOs that increase
> the refcount instead of creating CCWs and getting in the way of nuking
> semantics.

I'll leave that decision up to you, but in the mean time this is causing
crashes, so I'd prefer to go with the simpler solution first.
(In reply to Kris Maglione [:kmag] from comment #14)
> (In reply to Shu-yu Guo [:shu] from comment #13)
> > I'm still trying to understand the bug. Where is the SSO *wrapper* that
> > should not the nuked? In the extension sandbox or in chrome?
> 
> It's in the sandbox.
> 
> > If in the sandbox, why are we trying to get the size of a nuked compartment?
> 
> Because nuking a compartment doesn't guarantee it will be destroyed any time
> soon. Cutting wrappers makes that more likely, but there can still be
> references that keep it alive. In this case, it's most likely event listeners
> attached to the content page, which guarantee it'll be kept alive until at
> least the next full CC. But even without those, it can take some time for the
> sandbox to be GCed.

Ah ha, that's the info I was looking for.

> 
> > For this special API, perhaps we should just create new SSOs that increase
> > the refcount instead of creating CCWs and getting in the way of nuking
> > semantics.
> 
> I'll leave that decision up to you, but in the mean time this is causing
> crashes, so I'd prefer to go with the simpler solution first.

I'd like to investigate if we can destroy compartments sooner if we don't keep CCWs of SSOs. I agree let's go with the simpler solution first.
Comment on attachment 8848994 [details]
Bug 1348666: Don't nuke cross-compartment wrappers for ScriptSourceObjects.

https://reviewboard.mozilla.org/r/121850/#review124298

I need a refresher on compartment nuking. The bug here is that we need to keep SSO wrappers from the add-on compartment -> chrome? Or the other way around? I
Attachment #8848994 - Flags: review?(shu) → review+
(In reply to Shu-yu Guo [:shu] from comment #15)
> I'd like to investigate if we can destroy compartments sooner if we don't
> keep CCWs of SSOs. I agree let's go with the simpler solution first.

I'd love it if we could, but I'm not sure how feasible it is. When we're
only dealing with pure JS objects, it would probably be doable. The problem
comes when there are references to JS objects in the compartment that are held
externally by non-JS objects, and they aren't wrapped.

The problem with DOM event listeners is one example. Those are intentionally
stored unwrapped for the sake of certain security requirements. When we nuke
the compartment, the references stay alive until the next time the CC
traverses them and notices they should be dead. We could try to cut them
sooner, the way we do for the CCW table, but there are performance concerns.

The problem is worse for things like XPCWrappedJS objects, though. We never
actually cut those, so if any native code (or even JS code that uses XPConnect
in certain weird ways) is holding on to them, they stay alive indefinitely.
That's probably fixable, but not yet fixed.

And then of course there's the possibility of any other random code storing
references in their own ways that we don't expect.

I suppose we could probably just have the GC replace any heap references to
objects in nuked compartments with dead wrappers, though... I'm not 100% sure
what the side-effects would be.

(In reply to Shu-yu Guo [:shu] from comment #16)
> I need a refresher on compartment nuking. The bug here is that we need to
> keep SSO wrappers from the add-on compartment -> chrome? Or the other way
> around? I

Yes, from add-on sandbox to chrome. This API is only available to chrome
callers, so it's not possible for this to happen the other way around.
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/5bd473699d77
Don't nuke cross-compartment wrappers for ScriptSourceObjects. r=shu
https://hg.mozilla.org/mozilla-central/rev/5bd473699d77
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Installing the new nightly now. will advise on how it goes for me.
Flags: needinfo?(eshepherd)
Is this issue 55-only or was there an older latent bug that warrants backport consideration?
Flags: needinfo?(kmaglione+bmo)
I don't think we ever clone scripts into compartments that might be nuked prior to 55, so it's probably 55-only.
Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: