Last Comment Bug 716014 - Investigate if we could use CompartmentGC more often
: Investigate if we could use CompartmentGC more often
Status: RESOLVED FIXED
[Snappy:P2]
:
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla15
Assigned To: Olli Pettay [:smaug]
:
Mentors:
: 707863 (view as bug list)
Depends on: 721933 723907 752244
Blocks: 698919 716394 k9o-perf 721543
  Show dependency treegraph
 
Reported: 2012-01-06 12:29 PST by Olli Pettay [:smaug]
Modified: 2013-12-27 14:24 PST (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
-
disabled


Attachments
hack (6.81 KB, patch)
2012-01-06 12:29 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch to use compartment GCs after CCs (5.51 KB, patch)
2012-01-06 17:23 PST, Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
patch (9.51 KB, patch)
2012-01-29 09:47 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (10.39 KB, patch)
2012-01-30 04:31 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
a bit nicer patch (12.32 KB, patch)
2012-01-30 06:27 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (12.34 KB, patch)
2012-01-30 07:02 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (17.76 KB, patch)
2012-01-31 04:03 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (17.25 KB, patch)
2012-01-31 04:05 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (17.35 KB, patch)
2012-02-01 09:06 PST, Olli Pettay [:smaug]
continuation: review+
wmccloskey: review+
Details | Diff | Splinter Review
patch (20.72 KB, patch)
2012-02-02 12:17 PST, Olli Pettay [:smaug]
continuation: review+
Details | Diff | Splinter Review
interdiff (6.23 KB, patch)
2012-02-02 12:18 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
less compartment GC (1.61 KB, patch)
2012-02-07 01:32 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
v8 (20.82 KB, patch)
2012-02-07 10:11 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
v9 (20.91 KB, patch)
2012-02-07 10:22 PST, Olli Pettay [:smaug]
continuation: review+
Details | Diff | Splinter Review
interdiff (1.40 KB, patch)
2012-02-07 10:24 PST, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (20.22 KB, patch)
2012-05-01 09:44 PDT, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
patch (16.95 KB, patch)
2012-05-01 11:58 PDT, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
check if a GC is scheduled (1.55 KB, patch)
2012-05-02 13:56 PDT, Bill McCloskey (:billm)
terrence: review+
Details | Diff | Splinter Review
Olli's patch (17.01 KB, patch)
2012-05-04 14:06 PDT, Bill McCloskey (:billm)
wmccloskey: review+
Details | Diff | Splinter Review
patch (17.02 KB, patch)
2012-05-05 01:55 PDT, Olli Pettay [:smaug]
no flags Details | Diff | Splinter Review
skip compartment for GC (2.14 KB, patch)
2012-05-08 19:00 PDT, Bill McCloskey (:billm)
terrence: review+
Details | Diff | Splinter Review
Olli' (19.45 KB, patch)
2012-05-08 19:00 PDT, Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
v20 (20.57 KB, patch)
2012-05-09 08:51 PDT, Olli Pettay [:smaug]
wmccloskey: review+
Details | Diff | Splinter Review

Description Olli Pettay [:smaug] 2012-01-06 12:29:46 PST
Created attachment 586517 [details] [diff] [review]
hack

The patch is working surprisingly well, at least with purple buffer clean up
patches.
But it is really just a quick hack to try something.
Comment 1 Bill McCloskey (:billm) 2012-01-06 17:23:39 PST
Created attachment 586625 [details] [diff] [review]
patch to use compartment GCs after CCs

I created bug 716142, which allows us to do multi-compartment GCs. I think they could be useful for this bug. For example, we could GC all the compartments where ScriptEvaluated was called in the last n seconds. And we could also GC the chrome compartments along with them.

This attachment uses the multi-compartment GC code to schedule a GC on every compartment where the cycle collector unlinked something. On the next MaybeGC call, all these compartments will be collected. This makes it so we don't need a full GC after every CC.
Comment 2 Olli Pettay [:smaug] 2012-01-07 12:11:48 PST
Comment on attachment 586625 [details] [diff] [review]
patch to use compartment GCs after CCs


> nsXPConnect::Unlink(void *p)
> {
>+    JS_ScheduleCompartmentGC(JS_GetCompartmentOfGCThing(p));
>     return NS_OK;
> }
Unlink gets called a lot. How fast is JS_ScheduleCompartmentGC ?
Comment 3 Bill McCloskey (:billm) 2012-01-07 17:18:35 PST
(In reply to Olli Pettay [:smaug] from comment #2)
> Unlink gets called a lot. How fast is JS_ScheduleCompartmentGC ?

It's pretty fast. I suspect the common case is that we schedule a compartment GC on a compartment where one is already scheduled. In that case I would guess that the whole sequence is ~20 instructions. In the worst case it might be ~100 instructions. If it's a big deal, we could optimize it.
Comment 4 Olli Pettay [:smaug] 2012-01-09 04:36:14 PST
Comment on attachment 586625 [details] [diff] [review]
patch to use compartment GCs after CCs

Would this lead to GCing all the relevant compartments simultaneously or
will they get GCed one by one?
Comment 5 Bill McCloskey (:billm) 2012-01-09 10:46:26 PST
(In reply to Olli Pettay [:smaug] from comment #4)
> Comment on attachment 586625 [details] [diff] [review]
> patch to use compartment GCs after CCs
> 
> Would this lead to GCing all the relevant compartments simultaneously or
> will they get GCed one by one?

With bug 716142, they would all be collected at once. And any cycles will be collected if they are contained entirely within the subset of compartments being collected. So as long as we collect chrome compartments along with a given content compartment, we're likely to collect most cycles related to that content compartment.
Comment 6 Olli Pettay [:smaug] 2012-01-09 10:59:47 PST
I just wonder if we can do something faster. Even a single compartment GC isn't exactly fast
operation.
Comment 7 Bill McCloskey (:billm) 2012-01-09 13:18:20 PST
(In reply to Olli Pettay [:smaug] from comment #6)
> I just wonder if we can do something faster. Even a single compartment GC
> isn't exactly fast
> operation.

Something faster to clean up the unlinked objects? That would be pretty complex, and I'm not sure it would be worth it.
Comment 8 Andrew McCreight [:mccr8] 2012-01-27 17:51:15 PST
Ideally, we'd pass along the reason the compartment GC is being invoked, but that will require adding a new function to jsfriendapi.  I can probably whip something up there.
Comment 9 Andrew McCreight [:mccr8] 2012-01-28 18:57:07 PST
Running with Olli's patches, the GCs mostly look reasonable, but there's the occasional bizarre one.  For instance this one:

GC(T+30523.8) Type:Comp, Total:490.2, Wait:415.2, Mark:14.1, Sweep:58.7, FinObj:3.8, FinStr:0.2, FinScr:0.4, FinShp:1.4, DisCod:2.6, DisAnl:43.8, XPCnct:2.5, Destry:0.0, End:4.6, +Chu:0, -Chu:0, Reason:API

Most of the time is spent in Wait, whatever that is.

This one is less weird but still weird:

GC(T+30787.0) Type:Comp, Total:111.8, Wait:42.6, Mark:20.5, Sweep:39.9, FinObj:3.3, FinStr:1.0, FinScr:0.1, FinShp:1.4, DisCod:0.5, DisAnl:22.8, XPCnct:2.6, Destry:0.0, End:11.0, +Chu:0, -Chu:0, Reason:API

There are also some medium length ones like this that spend a lot of time in DisAnl:

GC(T+30742.9) Type:Glob, Total:271.2, Wait:8.2, Mark:91.0, Sweep:164.8, FinObj:16.2, FinStr:2.8, FinScr:1.0, FinShp:7.0, DisCod:3.4, DisAnl:122.1, XPCnct:3.5, Destry:0.5, End:10.3, +Chu:0, -Chu:0, Reason:SET_NEW_DOCUMENT

The typical one looks like this, which seems reasonable:
GC(T+30949.5) Type:Comp, Total:55.4, Wait:0.1, Mark:23.5, Sweep:18.5, FinObj:5.8, FinStr:2.9, FinScr:0.1, FinShp:1.3, DisCod:0.5, DisAnl:0.2, XPCnct:3.1, Destry:0.0, End:16.0, +Chu:0, -Chu:0, Reason:API
Comment 10 Olli Pettay [:smaug] 2012-01-29 04:10:07 PST
There is a bug open, IIRC, about Wait.
Trying to find it...
Comment 11 Olli Pettay [:smaug] 2012-01-29 04:26:49 PST
Er, I remembered wrong.
There is this https://bugzilla.mozilla.org/show_bug.cgi?id=714770#c1
(which is even stranger)
Comment 12 Olli Pettay [:smaug] 2012-01-29 04:29:02 PST
I see Wait times 0.x and once 1.x
Comment 13 Olli Pettay [:smaug] 2012-01-29 09:47:49 PST
Created attachment 592515 [details] [diff] [review]
patch

Um, I thought I had attached this patch to some bug, but I can't find it.
And this is not in my waiting-for-review queue.
Comment 14 Olli Pettay [:smaug] 2012-01-29 09:48:37 PST
The patch applies on top of Bug 721543
Comment 15 Bill McCloskey (:billm) 2012-01-29 13:35:14 PST
Andrew, do you know whether the long Wait and DisAnl times are caused by this patch, or were they happening before you started using it? Doing fewer global GCs seems like it could cause longer DisAnl times, but I don't see why Wait times should increase.
Comment 16 Andrew McCreight [:mccr8] 2012-01-29 13:46:05 PST
Here's another weird one just now:

GC(T+98261.9) Type:Comp, Total:463.9, Wait:92.6, Mark:26.0, Sweep:283.6, FinObj:17.1, FinStr:1.5, FinScr:0.1, FinShp:1.6, DisCod:1.3, DisAnl:251.8, XPCnct:4.5, Destry:0.0, End:65.6, +Chu:0, -Chu:0, Reason:API

I haven't really noticed anything like that before, but on the other hand, it is really only something that stands out once you most GCs are < 100ms.  Also, I've only been running the MemChaser addon for a few weeks, and that really helps you notice these spikey collections.  And there, it only really stands out once you have smaug's patches, and both GC and CC are usually < 100ms.  When it goes above 100ms, it shows up as red in the addon.  I really haven't noticed any pauses from these apparently longer GCs, I just see the MemChaser reading, then check out the log.

And see ones like this one just now (long End):

GC(T+98707.9) Type:Comp, Total:277.7, Wait:0.1, Mark:28.3, Sweep:66.7, FinObj:6.7, FinStr:46.5, FinScr:0.1, FinShp:1.6, DisCod:1.1, DisAnl:0.3, XPCnct:5.2, Destry:0.0, End:187.0, +Chu:0, -Chu:0, Reason:API

I can try out a regular Nightly and see how that goes.
Comment 17 Andrew McCreight [:mccr8] 2012-01-29 13:53:09 PST
Looking at my telemetry, the average GC time is 180ms.  There's a peak around 48ms, then another smaller one around 268ms, with a fairly long tail.
Comment 18 Andrew McCreight [:mccr8] 2012-01-29 13:53:35 PST
Also, with this patch, the GC_REASON is API 90%+ of the time.
Comment 19 Andrew McCreight [:mccr8] 2012-01-29 14:12:22 PST
My facebook compartment had 90megs of object-elements, that went away when I pressed the GC button in about:memory.  Then it built up another 30megs that went away on its own.  Seems like a lot of junk, but then this is Facebook so who knows.
Comment 20 Bill McCloskey (:billm) 2012-01-29 15:59:42 PST
Comment on attachment 592515 [details] [diff] [review]
patch

Review of attachment 592515 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, Olli. These heuristics make sense, and we really need something like this. However, I still have a lot of reservations about checking this in so close to the merge. I'll talk to dmandelin about it on Monday.

I am a little worried about the situation where the browser goes idle for a long time. It would be nice if we could guarantee that the last GC was a full GC. Could we maybe have a timer that schedules a global GC, set for 30 seconds or so, and we keep resetting it to 30 seconds every time we do a compartment GC? That way, as long as we're running compartment GCs, it never fires. But once we stop doing compartment GCs, we would be guaranteed to get a full GC later on.

::: dom/base/nsJSEnvironment.cpp
@@ +173,5 @@
>  static PRUint32 sPreviousSuspectedCount = 0;
>  
>  static bool sPreviousCollectionWasGC = false;
> +static PRUint32 sCompartmentGCCount = 0;
> +static PRUint32 sGlobalGCGeneration = 0;

Instead of Generation, could you use Epoch here (and in the member variable as well)? Otherwise people might think this has something to do with generational GC.

@@ +3173,5 @@
> +
> +    // If there has been a global GC after previous
> +    // script evaluation, reset mEvaluationCount to 0,
> +    // and mark the current GC generation.
> +    if (mGlobalGCGeneration != sGlobalGCGeneration) {

I think all this generation logic would be cleaner if you consolidated it into three inline functions: one to read an nsJSContext's evaluation count (returning 0 if the generation doesn't match), one to increment its count, and one to reset all the counts to zero. This code could go at the top of nsJSEnvironment.cpp.

@@ +3280,5 @@
>  
> +  if (aGckind != nsGCShrinking && !aGlobal &&
> +      sTopEvaluator &&
> +      sTopEvaluator->mGlobalGCGeneration == sGlobalGCGeneration &&
> +      !sContextDeleted && sCompartmentGCCount < 10) {

Could you put a comment somewhere explaining when we do a compartment GC? The logic is currently split between here and ScriptEvaluated. I think the comment should go here. Also, do I understand right that only every other compartment GC is allowed to collect a chrome compartment? Is this working around the problem that we can't GC multiple compartments at once?

@@ +3287,5 @@
> +    top->mEvaluationCount = 0;
> +    JSContext* cx = top->GetNativeContext();
> +    if (cx) {
> +      JSObject* global = top->GetNativeGlobal();
> +      if (global) {

Is it possible for GetNativeGlobal to return NULL?

@@ +3289,5 @@
> +    if (cx) {
> +      JSObject* global = top->GetNativeGlobal();
> +      if (global) {
> +        JSCompartment* comp = js::GetObjectCompartment(global);
> +        if (comp) {

It's not possible for this to be NULL.

@@ +3302,3 @@
>    if (nsContentUtils::XPConnect()) {
> +    if (sTopEvaluator) {
> +      sTopEvaluator->mEvaluationCount = 0;

Is this line needed? Won't a global GC cause all the evaluation counts to be invalidated anyway?

@@ +3632,5 @@
> +  if (!comp) {
> +    sCompartmentGCCount = 0;
> +    if (!++sGlobalGCGeneration) {
> +      ++sGlobalGCGeneration;
> +    }

What's this for? Some sort of overflow check? It doesn't seem like a big deal if sGlobalGCGeneration overflows to 0: some newly-created contexts will have their generation equal to 0, but they will also have an evaluation count of 0, which is correct. So why is the extra increment needed?
Comment 21 Andrew McCreight [:mccr8] 2012-01-29 16:31:01 PST
Comment on attachment 592515 [details] [diff] [review]
patch

I'm not sure I know what is going on here well enough to evaluate it.  I'll try taking another look tomorrow.
Comment 22 Olli Pettay [:smaug] 2012-01-30 03:41:10 PST
(In reply to Bill McCloskey (:billm) from comment #20)

> I am a little worried about the situation where the browser goes idle for a
> long time. It would be nice if we could guarantee that the last GC was a
> full GC. Could we maybe have a timer that schedules a global GC, set for 30
> seconds or so, and we keep resetting it to 30 seconds every time we do a
> compartment GC? That way, as long as we're running compartment GCs, it never
> fires. But once we stop doing compartment GCs, we would be guaranteed to get
> a full GC later on.
I can add that.



> 
> ::: dom/base/nsJSEnvironment.cpp
> @@ +173,5 @@
> >  static PRUint32 sPreviousSuspectedCount = 0;
> >  
> >  static bool sPreviousCollectionWasGC = false;
> > +static PRUint32 sCompartmentGCCount = 0;
> > +static PRUint32 sGlobalGCGeneration = 0;
> 
> Instead of Generation, could you use Epoch here (and in the member variable
> as well)? Otherwise people might think this has something to do with
> generational GC.
sGlobalGCEpoch? ok.



> 
> @@ +3173,5 @@
> > +
> > +    // If there has been a global GC after previous
> > +    // script evaluation, reset mEvaluationCount to 0,
> > +    // and mark the current GC generation.
> > +    if (mGlobalGCGeneration != sGlobalGCGeneration) {
> 
> I think all this generation logic would be cleaner if you consolidated it
> into three inline functions: one to read an nsJSContext's evaluation count
> (returning 0 if the generation doesn't match), one to increment its count,
> and one to reset all the counts to zero. This code could go at the top of
> nsJSEnvironment.cpp.
Ok


> 
> @@ +3280,5 @@
> >  
> > +  if (aGckind != nsGCShrinking && !aGlobal &&
> > +      sTopEvaluator &&
> > +      sTopEvaluator->mGlobalGCGeneration == sGlobalGCGeneration &&
> > +      !sContextDeleted && sCompartmentGCCount < 10) {
> 
> Could you put a comment somewhere explaining when we do a compartment GC?
> The logic is currently split between here and ScriptEvaluated. I think the
> comment should go here. Also, do I understand right that only every other
> compartment GC is allowed to collect a chrome compartment?
Yes

> Is this working
> around the problem that we can't GC multiple compartments at once?
It is working around the problem that otherwise chrome would be GCed all the time.
We have so much chrome js which runs pretty much whenever user does something in the browser, so
chrome js gets evaluated way more often than content js.

> @@ +3287,5 @@
> > +    top->mEvaluationCount = 0;
> > +    JSContext* cx = top->GetNativeContext();
> > +    if (cx) {
> > +      JSObject* global = top->GetNativeGlobal();
> > +      if (global) {
> 
> Is it possible for GetNativeGlobal to return NULL?
I don't know. Adding a null check is safe.

> 
> @@ +3289,5 @@
> > +    if (cx) {
> > +      JSObject* global = top->GetNativeGlobal();
> > +      if (global) {
> > +        JSCompartment* comp = js::GetObjectCompartment(global);
> > +        if (comp) {
> 
> It's not possible for this to be NULL.
Documentation doesn't tell that but ok.
And sorry, I'm a Gecko developer and in Gecko Get* methods can in general
return null.

> 
> @@ +3302,3 @@
> >    if (nsContentUtils::XPConnect()) {
> > +    if (sTopEvaluator) {
> > +      sTopEvaluator->mEvaluationCount = 0;
> 
> Is this line needed? Won't a global GC cause all the evaluation counts to be
> invalidated anyway?
leftover from a previous version.

> 
> @@ +3632,5 @@
> > +  if (!comp) {
> > +    sCompartmentGCCount = 0;
> > +    if (!++sGlobalGCGeneration) {
> > +      ++sGlobalGCGeneration;
> > +    }
> 
> What's this for? Some sort of overflow check?
Yes.

> It doesn't seem like a big
> deal if sGlobalGCGeneration overflows to 0: some newly-created contexts will
> have their generation equal to 0, but they will also have an evaluation
> count of 0, which is correct. So why is the extra increment needed?
Not strictly needed, but to have similar behavior what sCCGeneration has.
Comment 23 Olli Pettay [:smaug] 2012-01-30 04:31:19 PST
Created attachment 592657 [details] [diff] [review]
patch
Comment 24 Olli Pettay [:smaug] 2012-01-30 06:27:07 PST
Created attachment 592694 [details] [diff] [review]
a bit nicer patch

But I need to actually test this.
Comment 25 Olli Pettay [:smaug] 2012-01-30 07:02:57 PST
Created attachment 592702 [details] [diff] [review]
patch
Comment 26 Bill McCloskey (:billm) 2012-01-30 18:59:46 PST
I thought about this some more, and I talked to Andrew, dmandelin, and jst. I think we should wait until the next release to do this.
Comment 27 Olli Pettay [:smaug] 2012-01-31 02:05:38 PST
Ok, but let's get this landed early in the FF13 cycle.

I realized I didn't add the 30s timer. I'll add that today.
Comment 28 Olli Pettay [:smaug] 2012-01-31 04:03:14 PST
Created attachment 593037 [details] [diff] [review]
patch

60s full gc timer after last compartment GC.
And if after that full gc there is significant amount suspected objects, 
CC will be triggered.
Comment 29 Olli Pettay [:smaug] 2012-01-31 04:05:38 PST
Created attachment 593038 [details] [diff] [review]
patch
Comment 30 Olli Pettay [:smaug] 2012-02-01 09:06:25 PST
Created attachment 593474 [details] [diff] [review]
patch

Just a minor change to keep shutdown handling the way it is now;
don't call GC if we can't access XPConnect.
Comment 31 Bill McCloskey (:billm) 2012-02-01 18:46:58 PST
Comment on attachment 593474 [details] [diff] [review]
patch

Review of attachment 593474 [details] [diff] [review]:
-----------------------------------------------------------------

Please use the API in bug 721933 for specifying the reason for a compartment GC. Otherwise, it looks fine.
Comment 32 Andrew McCreight [:mccr8] 2012-02-02 10:58:07 PST
Comment on attachment 593474 [details] [diff] [review]
patch

Review of attachment 593474 [details] [diff] [review]:
-----------------------------------------------------------------

Is making a compartmental GC, rather than global, the default a good idea?  You missed two places GarbageCollectNow is called in ContentChild that I would guess should stay global, not silently become compartmental.  It may also be surprising to any future callers that the default is compartmental.

Using an enum like nsGCGlobal and nsGCCompartmental might be clearer than an arbitrary boolean, but if you want to leave it as is, that's okay.

It seems a little odd to me that you have to add a new boolean argument for compartmental vs. not, instead of adding another GC type.  Can you have a compartmental shrinking GC or a compartmental incremental GC slice?  The latter makes sense, but maybe not the former.  Perhaps more a question for Bill.

::: dom/base/nsJSEnvironment.cpp
@@ +3314,5 @@
> +                                             NS_FULL_GC_DELAY,
> +                                             nsITimer::TYPE_ONE_SHOT);
> +        }
> +        JSCompartment* comp = js::GetObjectCompartment(global);
> +        JS_CompartmentGC(cx, comp);        

As Bill said, you should put the reason in here now that that works.

::: dom/base/nsJSEnvironment.h
@@ +232,5 @@
> +    aCx->mEvaluationCount = 0;
> +  }
> +  
> +  static void DOMGCFinishedCallback(JSRuntime* aRt, JSCompartment* aComp,
> +                                    const char* aStatus);

Why did you need to make this callback public?
Comment 33 Andrew McCreight [:mccr8] 2012-02-02 10:58:37 PST
You should fix the callers on ContentChild or change the default so they are okay.
Comment 34 Olli Pettay [:smaug] 2012-02-02 11:30:40 PST
(In reply to Andrew McCreight [:mccr8] from comment #32)
> > +  static void DOMGCFinishedCallback(JSRuntime* aRt, JSCompartment* aComp,
> > +                                    const char* aStatus);
> 
> Why did you need to make this callback public?
Because the method needs to be accessed outside nsJSContext
Comment 35 Andrew McCreight [:mccr8] 2012-02-02 11:34:52 PST
(In reply to Olli Pettay [:smaug] from comment #34)
> > Why did you need to make this callback public?
> Because the method needs to be accessed outside nsJSContext
The patch doesn't add any additional uses of it as far as I can see (the JS_SetGCFinishedCallback).  Is it just that it should have been made public from the start?  I'm just curious.
Comment 36 Olli Pettay [:smaug] 2012-02-02 11:39:42 PST
nsJSRuntime::Init() needs to be able to access DOMGCFinishedCallback, and
DOMGCFinishedCallback need to be able to access nsJSContext::sGlobalGCEpoch.
Comment 37 Olli Pettay [:smaug] 2012-02-02 11:41:57 PST
I'll remove the default parameters. They are errorprone
Comment 38 Andrew McCreight [:mccr8] 2012-02-02 11:44:35 PST
(In reply to Olli Pettay [:smaug] from comment #36)
> nsJSRuntime::Init() needs to be able to access DOMGCFinishedCallback, and
> DOMGCFinishedCallback need to be able to access nsJSContext::sGlobalGCEpoch.

Ah, I see now!  Thanks for the explanation.

(In reply to Olli Pettay [:smaug] from comment #37)
> I'll remove the default parameters. They are errorprone

Great.
Comment 39 Olli Pettay [:smaug] 2012-02-02 12:17:27 PST
Created attachment 593936 [details] [diff] [review]
patch

I'll upload interdiff in a second.
Comment 40 Olli Pettay [:smaug] 2012-02-02 12:18:34 PST
Created attachment 593937 [details] [diff] [review]
interdiff
Comment 41 Andrew McCreight [:mccr8] 2012-02-02 12:27:49 PST
Comment on attachment 593936 [details] [diff] [review]
patch

Review of attachment 593936 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
Comment 42 Olli Pettay [:smaug] 2012-02-02 13:32:30 PST
https://hg.mozilla.org/mozilla-central/rev/c18523b51058
Comment 43 Jonathan Kew (:jfkthame) 2012-02-03 06:05:13 PST
Since this merged to inbound, Win Debug reftest has timed-out on about 90% of test runs. I've backed it out there to see if this resolves the orangeness. (If not, apologies in advance!)
Comment 44 Jonathan Kew (:jfkthame) 2012-02-03 06:09:05 PST
Hmm, just found that sgautherie filed bug 723907 on this. If only tbpl and bugzilla were more responsive...
Comment 45 Jonathan Kew (:jfkthame) 2012-02-04 05:18:32 PST
FTR, this was backed out of inbound, and then the backout merged to m-c:
https://hg.mozilla.org/integration/mozilla-inbound/rev/137e8fbde75a
https://hg.mozilla.org/mozilla-central/rev/137e8fbde75a
Comment 46 Andrew McCreight [:mccr8] 2012-02-04 07:54:09 PST
Olli, I can look into the shutdown issue.
Comment 47 Olli Pettay [:smaug] 2012-02-04 07:56:55 PST
I'm pushing some test patches to tryserver.
I have vague feeling that the problem might have something to do with the FullGCTimer, which was
added very lately, since I didn't see the problem earlier with patches which didn't have that.
Comment 48 Andrew McCreight [:mccr8] 2012-02-06 14:02:16 PST
I think it is more than just ref tests that are being slowed. Cww was experiencing really awful GC times on a Nightly with this patch in it.  25% or more of his GCs were > 1 second.  This is on OSX.

Here's one of them:

GC(T+219862.8) Type:Glob, Total:5606.0, Wait:0.2, Mark:290.4, Sweep:844.9, FinObj:112.5, FinStr:36.9, FinScr:2.3, FinShp:16.9, DisCod:56.6, DisAnl:574.2, XPCnct:9.7, Destry:0.1, End:4479.3, +Chu:0, -Chu:0, Reason:FULL_GC_TIMER

Note that most of the time is spent in "End", and that it is a full GC triggered by the timer.  That's just after being open for about a day.

He usually has a lot of tabs open.  His CC times are pretty good, though, so your earlier patches helped a lot with that. :)
Comment 49 Bill McCloskey (:billm) 2012-02-06 14:14:45 PST
The main thing that happens in the end phase is calling the JSGC_END callback, which triggers deferred release in XPConnect. So it sounds like it might be doing a ton of NS_RELEASE calls. Also, we've seen problems before that suggest that calling free() is really slow on MacOS, so that might be involved as well. To confirm this, we'd need to be able to reproduce the problem and time the JSGC_END callback specifically.

It's possible that there's just a lot of stuff that's not getting GCed until the global GC. I'm not sure why that would be, though. It might be helpful if you can get a more complete list of the GCs that happened on Cww's machine before the long one. Then we could see how recent the previous global GC was.
Comment 50 Andrew McCreight [:mccr8] 2012-02-06 14:19:11 PST
Ah, sorry, I should have gotten that info before I asked him to restarted with a newer version.  I saw 2 or 3 of the FULL_GC_TIMER GCs and they were all pretty slow.  Then there were a few compartmental GCs in between.  Probably not more than 4 or 5?  But there were many more CCs in between, so it could have been a long time in between the GCs.
Comment 51 Bill McCloskey (:billm) 2012-02-06 14:34:51 PST
If there are a lot of FULL_GC_TIMER GCs, that means that the GC isn't being triggered very often. That might account for all the deferred release calls.

It used to be that a lot of our GCs were triggered by the CC. Now that the CC is run less, we'll also have fewer GCs.
Comment 52 Gregor Wagner [:gwagner] 2012-02-06 14:59:21 PST
Does this happen during regular browsing? 

I see similar GC pauses if I don't interact with the browser over night for example. In such use-cases they seem fine to me.
We should just make sure that we don't allocate additional memory and allocate out of our fragmented heap.
Comment 53 Olli Pettay [:smaug] 2012-02-07 00:52:46 PST
(In reply to Andrew McCreight [:mccr8] from comment #50)
> Ah, sorry, I should have gotten that info before I asked him to restarted
> with a newer version.  I saw 2 or 3 of the FULL_GC_TIMER GCs and they were
> all pretty slow.  Then there were a few compartmental GCs in between. 
> Probably not more than 4 or 5?  But there were many more CCs in between, so
> it could have been a long time in between the GCs.
Interesting. that is very different to what I get. Usually comp GC runs quite often, and
global GC is also triggered, so I basically never get FULL_GC_TIMER GC.
Comment 54 Olli Pettay [:smaug] 2012-02-07 00:55:22 PST
I wonder if just doing full GC a bit more often would be ok (for now).
Like every 5th GC could be full (not every 10th like with the patch) and then
decrease full gc timer to 30s from 60s.
I'll test and push some patches to try.
Comment 55 Olli Pettay [:smaug] 2012-02-07 01:32:52 PST
Created attachment 594954 [details] [diff] [review]
less compartment GC

I uploaded this to tryserver to see how it behaves.
There can be at most 5 explicit compartment GCs in a row, and
20s after the last explicit compartment gc there is a global gc.
Comment 56 Olli Pettay [:smaug] 2012-02-07 04:12:59 PST
I wonder if bug 716142 would help here.
Comment 57 Olli Pettay [:smaug] 2012-02-07 07:54:06 PST
So I wonder if we could call scheduleGC() on each JSContext a script is evaluated.
Comment 58 Olli Pettay [:smaug] 2012-02-07 09:06:28 PST
I'll try still some changes where global gc would be called more often.
Comment 59 Olli Pettay [:smaug] 2012-02-07 09:10:15 PST
...and once the incremental GC and bug 716142 are ready, we could re-evaluate.
But let's see how the changes work. Pushing to try soon.
Comment 60 Olli Pettay [:smaug] 2012-02-07 10:11:42 PST
Created attachment 595076 [details] [diff] [review]
v8

I'm still waiting for tryserver results, but in cases like reftest this
should give pretty much the same behavior what m-c has now.
But in cases like FotN, compartment GC can be used still quite often.

I'll upload an interdiff in a minute.
Comment 61 Olli Pettay [:smaug] 2012-02-07 10:22:30 PST
Created attachment 595081 [details] [diff] [review]
v9

This is a tiny bit cleaner
Comment 62 Olli Pettay [:smaug] 2012-02-07 10:24:41 PST
Created attachment 595082 [details] [diff] [review]
interdiff

this shows the changes to the patch which was pushed to (and backed out from)
m-c
Comment 63 Andrew McCreight [:mccr8] 2012-02-09 16:26:57 PST
Sorry for the delay here.

So the change here is
1) always do a full GC at startup
2) do a full GC every 30 seconds instead of 60 seconds
3) make sure the context GC is a full gc

Earlier you were talking about making it be a global every 5th compartmental, instead of 10th, but you don't seem to have that in the latest version.  Was that intentional?  I guess if the GC is running every 5-10 seconds the global timer will hit you first.

Looks good to me, but I do worry a bit about the recent memory spike we saw around the original landing of the patch.  Maybe you should try out something like Gregor's MemBuster test and see what memory usage looks like 30 seconds after closing all tabs.  You should also coordinate with Bill, as he's looking into a GC issue, so I'm not sure how this may interfere with that.
Comment 64 Andrew McCreight [:mccr8] 2012-02-09 16:27:28 PST
Comment on attachment 595081 [details] [diff] [review]
v9

Review of attachment 595081 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the interdiff.
Comment 65 Olli Pettay [:smaug] 2012-02-09 16:29:15 PST
(In reply to Andrew McCreight [:mccr8] from comment #63)
> Sorry for the delay here.
> 
> So the change here is
> 1) always do a full GC at startup
> 2) do a full GC every 30 seconds instead of 60 seconds
> 3) make sure the context GC is a full gc
Yes

> 
> Earlier you were talking about making it be a global every 5th
> compartmental, instead of 10th, but you don't seem to have that in the
> latest version.  Was that intentional?
We end up calling full GC anyway very often.


> Looks good to me, but I do worry a bit about the recent memory spike we saw
> around the original landing of the patch.  Maybe you should try out
> something like Gregor's MemBuster test
What is that?
Comment 66 Andrew McCreight [:mccr8] 2012-02-09 16:34:41 PST
(In reply to Olli Pettay [:smaug] from comment #65)
> > Looks good to me, but I do worry a bit about the recent memory spike we saw
> > around the original landing of the patch.  Maybe you should try out
> > something like Gregor's MemBuster test
> What is that?
I guess it isn't actually Gregor's.  This site: http://random.pavlov.net/membuster/index.html

It opens up a ton of sites and closes them.  Try that out with this patch, see what memory looks like in about:memory.  Or some other kind of benchmark you can compare with and without this patch that involved loading and closing a bunch of sites.
Comment 67 Olli Pettay [:smaug] 2012-02-10 11:25:55 PST
(In reply to Andrew McCreight [:mccr8] from comment #63)
> Looks good to me, but I do worry a bit about the recent memory spike we saw
> around the original landing of the patch.  Maybe you should try out
> something like Gregor's MemBuster test and see what memory usage looks like
> 30 seconds after closing all tabs.
Memusage was pretty much the same with or without the patch.
Comment 68 Lawrence Mandel [:lmandel] (use needinfo) 2012-04-30 07:13:38 PDT
noming for k9o as per recommendation by billm.
Comment 69 Olli Pettay [:smaug] 2012-05-01 09:44:30 PDT
Created attachment 619957 [details] [diff] [review]
patch

Bill, does this affect badly to iGC?
The patch is otherwise the same as earlier, but some static variable handling
happens in GarbageCollectNow since DOMGCSliceCallback is quite different from
the earlier callback.

https://tbpl.mozilla.org/?tree=Try&rev=b831e9e65b54
Comment 70 Olli Pettay [:smaug] 2012-05-01 09:47:21 PDT
Though, I think iGC should be working before trying this.
Comment 71 Bill McCloskey (:billm) 2012-05-01 10:56:44 PDT
I was kinda hoping you could use the new multi-compartment GC stuff here. That would avoid the need for tracking the top evaluator. Instead, we would just GC every compartment that had run code.

The APIs for this are in jsfriendapi. First you call PrepareCompartmentForGC on every compartment you want to collect. This won't actually do any collection, so it can be done from  the ScriptEvaluated hook. Then when you actually want to collect, call IncrementalGC with the runtime and the reason.
Comment 72 Olli Pettay [:smaug] 2012-05-01 11:08:48 PDT
(In reply to Bill McCloskey (:billm) from comment #71)
> I was kinda hoping you could use the new multi-compartment GC stuff here.
I can try it.
Comment 74 Olli Pettay [:smaug] 2012-05-01 12:35:21 PDT
Comment on attachment 620010 [details] [diff] [review]
patch

This looks leaky. Waiting for some more tryserver results.
Comment 75 Bill McCloskey (:billm) 2012-05-02 13:56:03 PDT
Created attachment 620459 [details] [diff] [review]
check if a GC is scheduled

It looks like the try push had some assertions caused by GCing without any compartments scheduled for GC. This patch makes it possible to check if any compartments are scheduled.

I updated Olli's patch to call this and pushed the result to try:
https://tbpl.mozilla.org/?tree=Try&rev=01d57bb0e7a4
Comment 76 Bill McCloskey (:billm) 2012-05-04 14:03:13 PDT
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

We need this to determine whether it's worth doing a compartment GC.
Comment 77 Bill McCloskey (:billm) 2012-05-04 14:06:59 PDT
Created attachment 621167 [details] [diff] [review]
Olli's patch

This is a slight modification of Olli's patch to fix an assertion. I'll review it soon.
Comment 78 Bill McCloskey (:billm) 2012-05-04 14:11:10 PDT
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

Giving to Terrence to expedite this.
Comment 79 Terrence Cole [:terrence] 2012-05-04 14:14:50 PDT
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

Review of attachment 620459 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me.
Comment 80 Bill McCloskey (:billm) 2012-05-04 15:21:32 PDT
Comment on attachment 621167 [details] [diff] [review]
Olli's patch

Review of attachment 621167 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +2963,5 @@
> +                                         reinterpret_cast<void *>(reason),
> +                                         NS_FULL_GC_DELAY,
> +                                         nsITimer::TYPE_ONE_SHOT);
> +    }
> +    if (js::IsGCScheduled(nsJSRuntime::sRuntime))

I just realized I forgot to fix the bracing here.

@@ +3584,5 @@
>    sPendingLoadCount = 0;
>    sLoadingInProgress = false;
>    sCCollectedWaitingForGC = 0;
>    sPostGCEventsToConsole = false;
> +  sDisableExplicitCompartmentGC = false;

I guess as long as you're initializing this one, it would make sense to initialize the other static vars.

::: dom/base/nsJSEnvironment.h
@@ +185,5 @@
>    static void LoadEnd();
>  
> +  static void GarbageCollectNow(js::gcreason::Reason reason,
> +                                PRUint32 aGckind,
> +                                bool aGlobal);

Could you make an enum for aGlobal?
Comment 81 Olli Pettay [:smaug] 2012-05-05 01:52:14 PDT
(In reply to Bill McCloskey (:billm) from comment #80)
> @@ +3584,5 @@
> >    sPendingLoadCount = 0;
> >    sLoadingInProgress = false;
> >    sCCollectedWaitingForGC = 0;
> >    sPostGCEventsToConsole = false;
> > +  sDisableExplicitCompartmentGC = false;
> 
> I guess as long as you're initializing this one, it would make sense to
> initialize the other static vars.
I do initialize the others, just a few lines below.


> Could you make an enum for aGlobal?
Hmm, so many enums and similar: readon, kind, different-type-of-kind...
For now I think bool should work just fine.
Comment 82 Olli Pettay [:smaug] 2012-05-05 01:55:04 PDT
Created attachment 621265 [details] [diff] [review]
patch
Comment 84 Olli Pettay [:smaug] 2012-05-05 08:37:44 PDT
This may have caused dromaeo crashes on Win-opt.
Unfortunately no stack traces.

Backed out
https://hg.mozilla.org/mozilla-central/rev/79f78105c451
Comment 85 Bill McCloskey (:billm) 2012-05-08 11:19:12 PDT
*** Bug 707863 has been marked as a duplicate of this bug. ***
Comment 86 Bill McCloskey (:billm) 2012-05-08 19:00:03 PDT
Created attachment 622256 [details] [diff] [review]
skip compartment for GC

This is a little patch to unschedule a compartment for GC.
Comment 87 Bill McCloskey (:billm) 2012-05-08 19:00:19 PDT
Created attachment 622257 [details] [diff] [review]
Olli'
Comment 88 Bill McCloskey (:billm) 2012-05-08 19:02:29 PDT
Oops. Anyway, the previous patch works the way we talked about. First it schedules all compartments for GC. Then it de-schedules any compartments that have been inactive. I didn't have much time, and I added a really gross linked list of nsJSContexts. We can fix this up.

I also just realized that we should probably check if any compartments are left to GC before calling js::IncrementalGC. But that should be easy.

This is totally untested. I just wanted to get it posted before the end of the day.
Comment 89 Olli Pettay [:smaug] 2012-05-09 08:51:44 PDT
Created attachment 622383 [details] [diff] [review]
v20

This seems to pass all the tests.
Because of the PrepareForFullGC/SkipCompartmentForGC thing we end up collecting 
quite a few more compartments than in the previous versions, so I increased
NS_FULL_GC_DELAY and NS_MAX_COMPARTMENT_GC_COUNT.
Comment 90 Olli Pettay [:smaug] 2012-05-09 14:07:27 PDT
https://hg.mozilla.org/mozilla-central/rev/e6529138e338

Let's see if it sticks this time.
Comment 91 Andrew McCreight [:mccr8] 2012-05-09 17:51:48 PDT
Olli, could you obsolete some of the older patches that you didn't land?  It is hard to tell what is what.  Thanks.
Comment 92 Jared Wein [:jaws] (please needinfo? me) 2012-05-12 16:14:09 PDT
Should the target milestone be updated to mozilla15 now?
Comment 93 Dietrich Ayala (:dietrich) 2012-07-24 13:14:22 PDT
We'd ship Basecamp without fixing this, minusing.
Comment 94 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-08-14 21:28:36 PDT
This was disabled in Firefox 15 by bug 782825.

Note You need to log in before you can comment on or make changes to this bug.