LAST_CONTEXT GCs are totally insane

RESOLVED FIXED in mozilla27

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

unspecified
mozilla27
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(9 attachments, 2 obsolete attachments)

2.98 KB, patch
jonco
: review+
Details | Diff | Splinter Review
3.53 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.02 KB, patch
billm
: review+
Details | Diff | Splinter Review
9.07 KB, patch
billm
: review+
Details | Diff | Splinter Review
3.22 KB, patch
billm
: review+
Details | Diff | Splinter Review
7.88 KB, patch
jonco
: review+
Details | Diff | Splinter Review
11.43 KB, patch
billm
: review+
jonco
: review+
Details | Diff | Splinter Review
4.86 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.25 KB, patch
billm
: review+
Details | Diff | Splinter Review
(Assignee)

Description

5 years ago
In bug 905364, I worked some magic so that we don't always create the SafeJSContext whenever CAPS fires up. In general, this just delays the creation a bit until it's really needed. But in certain very trivial xpcshell situations, we actually never create the SafeJSContext at all.

This caused crashes, which I spent some time investigating. As it turns out, the JS engine has a panic attack when the last JSContext is destroyed, and starts freeing memory buffers willy-nilly without calling finalize hooks. In particular, during the 'last gc', we just call js_delete() on every compartment, even if it's marked and has a live global with all sorts of stuff hanging off of it: http://mxr.mozilla.org/mozilla-central/source/js/src/jsgc.cpp#2530

So in effect, the engine has an invariant that there must be a JSContext alive for the lifespan of anything useful happening on the JSRuntime. The SafeJSContext used to do this for the browser, but doesn't (necessarily) anymore. So now we run into a chicken-and-egg scenario with xpcshell. If we destroy xpcshell's JSContext before shutting down XPCOM, SpiderMonkey takes a bulldozer to XPConnect's carefully-managed data structures, causing us to crash. If we shut down XPCOM first, we end up simultaneously destroying the runtime, which probably isn't kosher to do with an outstanding context.

A simple workaround is to force the creation of the SafeJSContext in xpcshell. But it seems like we should fix this. Why do we do it this way, and not in JS_DestroyRuntime or somesuch?
(Assignee)

Updated

5 years ago
Summary: LAST_CONTEXT GC is totally insane → LAST_CONTEXT GCs are totally insane
Some time ago we used to need a JSContext in order to perform a GC because we would pass the context to finalizers. Igor did a big refactor a while ago that removed this requirement. I'm pretty sure it should be possible to do all this teardown stuff in JS_DestroyRuntime now.
That sounds like an improvement.

I probably said this elsewhere, but DestroyRuntime should really call all the finalizers. Leaking everything is outrageous.
(Assignee)

Updated

5 years ago
Assignee: general → bobbyholley+bmo
(Assignee)

Comment 5

5 years ago
Created attachment 792959 [details]
patches.tar.bz2

Bill is going to take a look here.
Created attachment 793087 [details] [diff] [review]
additional patch

This patch makes some changes that I think are needed to allow us to run safely without any JSContexts around. It passes jit-tests. I wasn't able to test otherwise since I get a crash at startup with just Bobby's patches applied.

Bobby, it's possible that this patch will fix the xpcshell test problem. I'm not sure what the deal is with the startup crash. It claims it's OOMing, but that doesn't make any sense. Have you tried to build your stuff locally? I based your patches on top of 764ac2c7cb0a, which was green on tinderbox, although I haven't tried it locally.
Never mind, the OOM was a stupid thing caused by updating Ubuntu without rebooting.
I couldn't reproduce the failures from try, so I pushed my patch to try on top of everything else. Maybe we'll get lucky and it will fix the problem.

https://tbpl.mozilla.org/?tree=Try&rev=130fb1068768
(Assignee)

Comment 9

5 years ago
hm, still GC crashes. :-(
There are still some crashes left, but I think some stuff got fixed. Everything that's left seems to be caused by calling cyclecollector::DeferredFinalize during the final GC, which apparently is forbidden because the cycle collector has already shut down. I'm not too familiar with this code.
There's two parts to CC shutdown.  First, there's the CC part, which doesn't matter here.  The second part is nsCycleCollector_forgetJSRuntime, which nulls out the runtime component of the CC per-thread data which is what the DeferredFinalize is routed through.  It looks like that gets called in ~CycleCollectedJSRuntime, right before we call JS_DestroyRuntime().  Is that also used in XPCShell?
Just at a glance, moving the nsCycleCollector_forgetJSRuntime after JS_DestroyRuntime() might work, though you'd have to be very careful, because presumably you'd end up with a dangling pointer in the CC at some point in during JS_DestroyRuntime().
I took Andrew's suggestion and moved down nsCycleCollector_forgetJSRuntime.

https://tbpl.mozilla.org/?tree=Try&rev=22ba816f2e7f
(Assignee)

Updated

4 years ago
Blocks: 911303
(Assignee)

Comment 17

4 years ago
(In reply to Bill McCloskey (:billm) from comment #16)
> This one actually compiles.
> https://tbpl.mozilla.org/?tree=Try&rev=3af588d149f7

The basic problem with the crash here is that XPCWrappedJS::Release needs to acquire the map lock, which is destroyed by the time we invoke JS_DestroyRuntime. We're going to just expose an idempotent mechanism on CycleCollectedJSRuntime so that we can destroy the JSRuntime in the XPCJSRuntime destructor. I've filed bug 911303 on doing this properly.
(Assignee)

Comment 22

4 years ago
This is all green except for one jit-test, Debugger-onNewGlobalObject-15.js, which I actually wrote not so long ago.

The issue is that the list of debuggers is non-empty when the runtime is destroyed, because the Debugger object never got GCed. The problem goes away when I revert my patch to make SHG-marking lastContext()-independent. Weird.

I'll talk to bill about this on monday. Hopefully we can get it figured out and get this stuff up for review.
(Assignee)

Comment 25

4 years ago
_finally_ green on linux64. I'm going to do a full try push to see if there are any other lingering failures.
(Assignee)

Comment 27

4 years ago
This is looking green, though the macosx results haven't come in yet. I'm going to upload patches and flag for review.
(Assignee)

Updated

4 years ago
Attachment #793087 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #792959 - Attachment is obsolete: true
(Assignee)

Comment 28

4 years ago
Created attachment 803483 [details] [diff] [review]
Part 1 - Fix small bugs. v1

This patch fixes two small issues.

(1) It tightens an assertion in finalizeNow. Stuff finalized in finalizeNow
    should never be finalized off the main thread, so we shouldn't have to
    worry about BFS_JUST_FINISHED.

(2) If the only compartment that's left to GC is the atoms compartment, then
    we're not collecting it. I just moved the |any| check down below the
    code that decides whether to schedule the atoms compartment for GC.
Attachment #803483 - Flags: review?(jcoppeard)
(Assignee)

Comment 29

4 years ago
Created attachment 803484 [details] [diff] [review]
Part 2 - Explicitly track whether a context has been created. v1

The current mechanism is pretty sloppy, and falls over if the context count ever
drops to zero and then increases again. Let's do this right.
Attachment #803484 - Flags: review?(wmccloskey)
(Assignee)

Comment 30

4 years ago
Created attachment 803485 [details] [diff] [review]
Part 3 - Stop creating an ephemeral JSContext during XPConnect shutdown. v1

This interacts badly with our attempts to shut things down nicely, and happens
to be entirely unused. the primary impact of it ends up being that the cx
destruction triggers a GC, which we depend on. But we can do that manually.
Attachment #803485 - Flags: review?(wmccloskey)
(Assignee)

Comment 31

4 years ago
Created attachment 803486 [details] [diff] [review]
Part 4 - Invoke JS_DestroyRuntime before we totally tear down the XPCJSRuntime. v1
Attachment #803486 - Flags: review?(wmccloskey)
(Assignee)

Comment 32

4 years ago
Created attachment 803487 [details] [diff] [review]
Part 5 - Remove watchpoints on all compartments during shutdown, not just the cx compartment. v1
Attachment #803487 - Flags: review?(wmccloskey)
(Assignee)

Comment 33

4 years ago
Created attachment 803488 [details] [diff] [review]
Part 6 - Move rambo GC to runtime destruction. v1
Attachment #803488 - Flags: review?(jcoppeard)
(Assignee)

Comment 34

4 years ago
Created attachment 803489 [details] [diff] [review]
Part 7 - Stop using hasContexts(). v2

This patch is a joint effort between billm and myself. I'm marking myself as
the author because I worked on it more recently, which means that Bill should
probably review it again in addition to jonco, and self-review is frowned upon.

There were a few places where we relied on !rt->hasContexts() to mean
"this is the last GC". That was never really valid, and it especially isn't
valid with the previous patch. This patch replaces the check everywhere it's
used in the GC.
Attachment #803489 - Flags: review?(wmccloskey)
Attachment #803489 - Flags: review?(jcoppeard)
(Assignee)

Comment 35

4 years ago
Created attachment 803490 [details] [diff] [review]
Part 8 - Be more explicit about GCing twice in the nsXPConnect destructor. v1

In the current setup, we'll end up GCing once when destroying the SafeJSContext,
and then once again immediately after in the explicit GC we do before destroying
the XPCJSRuntime. It would be nice to avoid the first GC entirely, but we
currently need both to avoid leaking.

All in all, these patches cause us to GC three times during shutdown, rather
than twice as we did before, because the second GC was rolled together with
the runtime destruction GC when we destroyed the last JSContext. There are a
number of ways to eliminate these, at least in opt builds, but mccr8 thinks
it probably doesn't matter, since there shouldn't be much left in the heap after
the second GC.

We can probably get away with eliminating rambo GC entirely at some point. But
this might become irrelevant for the browser if we end up doing bug 662444.

It would also be interesting to see what, if anything, the rambo GC actually
collects. We might even be able to get away with asserting that all the zones
are gone and removing the GC entirely.

We also take the opportunity to kill mOwnSafeJSContext, which no longer holds
any meaning.
Attachment #803490 - Flags: review?(wmccloskey)
(Assignee)

Comment 36

4 years ago
Created attachment 803491 [details] [diff] [review]
Part 9 - Remove workaround in xpcshell. v1
Attachment #803491 - Flags: review?(wmccloskey)
Comment on attachment 803484 [details] [diff] [review]
Part 2 - Explicitly track whether a context has been created. v1

Review of attachment 803484 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jscntxt.cpp
@@ +186,5 @@
>       * keywords, numbers, strings and self-hosted scripts. If one of these
>       * steps should fail, the runtime will be left in a partially initialized
>       * state, with zeroes and nulls stored in the default-initialized remainder
>       * of the struct. We'll clean the runtime up under DestroyContext, because
>       * cx will be "last" as well as "first".

Please remove the last sentence from the comment.
Attachment #803484 - Flags: review?(wmccloskey) → review+
Comment on attachment 803485 [details] [diff] [review]
Part 3 - Stop creating an ephemeral JSContext during XPConnect shutdown. v1

Review of attachment 803485 [details] [diff] [review]:
-----------------------------------------------------------------

I think there's probably more we can do to improve this sequence, but it's definitely outside the scope of the bug.
Attachment #803485 - Flags: review?(wmccloskey) → review+
Comment on attachment 803486 [details] [diff] [review]
Part 4 - Invoke JS_DestroyRuntime before we totally tear down the XPCJSRuntime. v1

Review of attachment 803486 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/CycleCollectedJSRuntime.h
@@ +86,5 @@
>    CycleCollectedJSRuntime(uint32_t aMaxbytes,
>                            JSUseHelperThreads aUseHelperThreads);
>    virtual ~CycleCollectedJSRuntime();
>  
> +  // Idempotent. Subclasses may destroy their runtime earlier in execution if

I think runtimes should be plural, although I'm not sure.
Attachment #803486 - Flags: review?(wmccloskey) → review+
Attachment #803487 - Flags: review?(wmccloskey) → review+
Comment on attachment 803489 [details] [diff] [review]
Part 7 - Stop using hasContexts(). v2

Review of attachment 803489 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Runtime.cpp
@@ +422,5 @@
>  
> +    /*
> +     * Flag us as being destroyed. This allows the GC to free things like
> +     * interned atoms and Ion trampolines.
> +     * */

Last line of comment is messed up.
Attachment #803489 - Flags: review?(wmccloskey) → review+
Comment on attachment 803490 [details] [diff] [review]
Part 8 - Be more explicit about GCing twice in the nsXPConnect destructor. v1

Review of attachment 803490 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/nsXPConnect.cpp
@@ +110,5 @@
>  
> +    // In order to clean up everything properly, we need to GC twice: once now,
> +    // to clean anything that can go away on its own (like the Junk Scope, which
> +    // we unrooted above), and once after forcing a bunch of shutdown in
> +    // XPConnect, to clean the stuff we forcibly disconnected.

This doesn't explain why we couldn't get away with only the latter GC. If you know the answer, can you add something?
Attachment #803490 - Flags: review?(wmccloskey) → review+
Attachment #803491 - Flags: review?(wmccloskey) → review+

Updated

4 years ago
Attachment #803483 - Flags: review?(jcoppeard) → review+

Updated

4 years ago
Attachment #803489 - Flags: review?(jcoppeard) → review+

Updated

4 years ago
Attachment #803488 - Flags: review?(jcoppeard) → review+
(Assignee)

Comment 42

4 years ago
Updated patches to address review comments. One more full push to try just to be super safe. There were no substantive changes to the patches since comment 26, so the only danger would be a mismerge while updating to tip:

https://tbpl.mozilla.org/?tree=Try&rev=b7b1b04fcccd

Updated

4 years ago
Depends on: 917757
(Assignee)

Updated

4 years ago
Blocks: 917909

Updated

3 years ago
Duplicate of this bug: 890243

Updated

3 years ago
Duplicate of this bug: 1120934
You need to log in before you can comment on or make changes to this bug.