Closed Bug 716014 Opened 12 years ago Closed 12 years ago

Investigate if we could use CompartmentGC more often

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15
blocking-kilimanjaro +
blocking-basecamp -
Tracking Status
firefox15 --- disabled

People

(Reporter: smaug, Assigned: smaug)

References

(Blocks 1 open bug)

Details

(Whiteboard: [Snappy:P2])

Attachments

(14 files, 9 obsolete files)

5.51 KB, patch
Details | Diff | Splinter Review
17.35 KB, patch
mccr8
: review+
billm
: review+
Details | Diff | Splinter Review
20.72 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
6.23 KB, patch
Details | Diff | Splinter Review
1.61 KB, patch
Details | Diff | Splinter Review
v9
20.91 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.40 KB, patch
Details | Diff | Splinter Review
16.95 KB, patch
Details | Diff | Splinter Review
1.55 KB, patch
terrence
: review+
Details | Diff | Splinter Review
17.01 KB, patch
billm
: review+
Details | Diff | Splinter Review
17.02 KB, patch
Details | Diff | Splinter Review
2.14 KB, patch
terrence
: review+
Details | Diff | Splinter Review
19.45 KB, patch
Details | Diff | Splinter Review
v20
20.57 KB, patch
billm
: review+
Details | Diff | Splinter Review
Attached patch hack (obsolete) — Splinter Review
The patch is working surprisingly well, at least with purple buffer clean up
patches.
But it is really just a quick hack to try something.
Whiteboard: [snappy]
I created bug 716142, which allows us to do multi-compartment GCs. I think they could be useful for this bug. For example, we could GC all the compartments where ScriptEvaluated was called in the last n seconds. And we could also GC the chrome compartments along with them.

This attachment uses the multi-compartment GC code to schedule a GC on every compartment where the cycle collector unlinked something. On the next MaybeGC call, all these compartments will be collected. This makes it so we don't need a full GC after every CC.
Comment on attachment 586625 [details] [diff] [review]
patch to use compartment GCs after CCs


> nsXPConnect::Unlink(void *p)
> {
>+    JS_ScheduleCompartmentGC(JS_GetCompartmentOfGCThing(p));
>     return NS_OK;
> }
Unlink gets called a lot. How fast is JS_ScheduleCompartmentGC ?
(In reply to Olli Pettay [:smaug] from comment #2)
> Unlink gets called a lot. How fast is JS_ScheduleCompartmentGC ?

It's pretty fast. I suspect the common case is that we schedule a compartment GC on a compartment where one is already scheduled. In that case I would guess that the whole sequence is ~20 instructions. In the worst case it might be ~100 instructions. If it's a big deal, we could optimize it.
Blocks: 716394
Comment on attachment 586625 [details] [diff] [review]
patch to use compartment GCs after CCs

Would this lead to GCing all the relevant compartments simultaneously or
will they get GCed one by one?
(In reply to Olli Pettay [:smaug] from comment #4)
> Comment on attachment 586625 [details] [diff] [review]
> patch to use compartment GCs after CCs
> 
> Would this lead to GCing all the relevant compartments simultaneously or
> will they get GCed one by one?

With bug 716142, they would all be collected at once. And any cycles will be collected if they are contained entirely within the subset of compartments being collected. So as long as we collect chrome compartments along with a given content compartment, we're likely to collect most cycles related to that content compartment.
I just wonder if we can do something faster. Even a single compartment GC isn't exactly fast
operation.
(In reply to Olli Pettay [:smaug] from comment #6)
> I just wonder if we can do something faster. Even a single compartment GC
> isn't exactly fast
> operation.

Something faster to clean up the unlinked objects? That would be pretty complex, and I'm not sure it would be worth it.
Blocks: 721543
Blocks: 705582
Ideally, we'd pass along the reason the compartment GC is being invoked, but that will require adding a new function to jsfriendapi.  I can probably whip something up there.
Depends on: 721933
Running with Olli's patches, the GCs mostly look reasonable, but there's the occasional bizarre one.  For instance this one:

GC(T+30523.8) Type:Comp, Total:490.2, Wait:415.2, Mark:14.1, Sweep:58.7, FinObj:3.8, FinStr:0.2, FinScr:0.4, FinShp:1.4, DisCod:2.6, DisAnl:43.8, XPCnct:2.5, Destry:0.0, End:4.6, +Chu:0, -Chu:0, Reason:API

Most of the time is spent in Wait, whatever that is.

This one is less weird but still weird:

GC(T+30787.0) Type:Comp, Total:111.8, Wait:42.6, Mark:20.5, Sweep:39.9, FinObj:3.3, FinStr:1.0, FinScr:0.1, FinShp:1.4, DisCod:0.5, DisAnl:22.8, XPCnct:2.6, Destry:0.0, End:11.0, +Chu:0, -Chu:0, Reason:API

There are also some medium length ones like this that spend a lot of time in DisAnl:

GC(T+30742.9) Type:Glob, Total:271.2, Wait:8.2, Mark:91.0, Sweep:164.8, FinObj:16.2, FinStr:2.8, FinScr:1.0, FinShp:7.0, DisCod:3.4, DisAnl:122.1, XPCnct:3.5, Destry:0.5, End:10.3, +Chu:0, -Chu:0, Reason:SET_NEW_DOCUMENT

The typical one looks like this, which seems reasonable:
GC(T+30949.5) Type:Comp, Total:55.4, Wait:0.1, Mark:23.5, Sweep:18.5, FinObj:5.8, FinStr:2.9, FinScr:0.1, FinShp:1.3, DisCod:0.5, DisAnl:0.2, XPCnct:3.1, Destry:0.0, End:16.0, +Chu:0, -Chu:0, Reason:API
There is a bug open, IIRC, about Wait.
Trying to find it...
Er, I remembered wrong.
There is this https://bugzilla.mozilla.org/show_bug.cgi?id=714770#c1
(which is even stranger)
I see Wait times 0.x and once 1.x
Attached patch patch (obsolete) — Splinter Review
Um, I thought I had attached this patch to some bug, but I can't find it.
And this is not in my waiting-for-review queue.
Assignee: nobody → bugs
Attachment #592515 - Flags: review?(wmccloskey)
Attachment #592515 - Flags: review?(continuation)
The patch applies on top of Bug 721543
Andrew, do you know whether the long Wait and DisAnl times are caused by this patch, or were they happening before you started using it? Doing fewer global GCs seems like it could cause longer DisAnl times, but I don't see why Wait times should increase.
Here's another weird one just now:

GC(T+98261.9) Type:Comp, Total:463.9, Wait:92.6, Mark:26.0, Sweep:283.6, FinObj:17.1, FinStr:1.5, FinScr:0.1, FinShp:1.6, DisCod:1.3, DisAnl:251.8, XPCnct:4.5, Destry:0.0, End:65.6, +Chu:0, -Chu:0, Reason:API

I haven't really noticed anything like that before, but on the other hand, it is really only something that stands out once you most GCs are < 100ms.  Also, I've only been running the MemChaser addon for a few weeks, and that really helps you notice these spikey collections.  And there, it only really stands out once you have smaug's patches, and both GC and CC are usually < 100ms.  When it goes above 100ms, it shows up as red in the addon.  I really haven't noticed any pauses from these apparently longer GCs, I just see the MemChaser reading, then check out the log.

And see ones like this one just now (long End):

GC(T+98707.9) Type:Comp, Total:277.7, Wait:0.1, Mark:28.3, Sweep:66.7, FinObj:6.7, FinStr:46.5, FinScr:0.1, FinShp:1.6, DisCod:1.1, DisAnl:0.3, XPCnct:5.2, Destry:0.0, End:187.0, +Chu:0, -Chu:0, Reason:API

I can try out a regular Nightly and see how that goes.
Looking at my telemetry, the average GC time is 180ms.  There's a peak around 48ms, then another smaller one around 268ms, with a fairly long tail.
Also, with this patch, the GC_REASON is API 90%+ of the time.
My facebook compartment had 90megs of object-elements, that went away when I pressed the GC button in about:memory.  Then it built up another 30megs that went away on its own.  Seems like a lot of junk, but then this is Facebook so who knows.
Comment on attachment 592515 [details] [diff] [review]
patch

Review of attachment 592515 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, Olli. These heuristics make sense, and we really need something like this. However, I still have a lot of reservations about checking this in so close to the merge. I'll talk to dmandelin about it on Monday.

I am a little worried about the situation where the browser goes idle for a long time. It would be nice if we could guarantee that the last GC was a full GC. Could we maybe have a timer that schedules a global GC, set for 30 seconds or so, and we keep resetting it to 30 seconds every time we do a compartment GC? That way, as long as we're running compartment GCs, it never fires. But once we stop doing compartment GCs, we would be guaranteed to get a full GC later on.

::: dom/base/nsJSEnvironment.cpp
@@ +173,5 @@
>  static PRUint32 sPreviousSuspectedCount = 0;
>  
>  static bool sPreviousCollectionWasGC = false;
> +static PRUint32 sCompartmentGCCount = 0;
> +static PRUint32 sGlobalGCGeneration = 0;

Instead of Generation, could you use Epoch here (and in the member variable as well)? Otherwise people might think this has something to do with generational GC.

@@ +3173,5 @@
> +
> +    // If there has been a global GC after previous
> +    // script evaluation, reset mEvaluationCount to 0,
> +    // and mark the current GC generation.
> +    if (mGlobalGCGeneration != sGlobalGCGeneration) {

I think all this generation logic would be cleaner if you consolidated it into three inline functions: one to read an nsJSContext's evaluation count (returning 0 if the generation doesn't match), one to increment its count, and one to reset all the counts to zero. This code could go at the top of nsJSEnvironment.cpp.

@@ +3280,5 @@
>  
> +  if (aGckind != nsGCShrinking && !aGlobal &&
> +      sTopEvaluator &&
> +      sTopEvaluator->mGlobalGCGeneration == sGlobalGCGeneration &&
> +      !sContextDeleted && sCompartmentGCCount < 10) {

Could you put a comment somewhere explaining when we do a compartment GC? The logic is currently split between here and ScriptEvaluated. I think the comment should go here. Also, do I understand right that only every other compartment GC is allowed to collect a chrome compartment? Is this working around the problem that we can't GC multiple compartments at once?

@@ +3287,5 @@
> +    top->mEvaluationCount = 0;
> +    JSContext* cx = top->GetNativeContext();
> +    if (cx) {
> +      JSObject* global = top->GetNativeGlobal();
> +      if (global) {

Is it possible for GetNativeGlobal to return NULL?

@@ +3289,5 @@
> +    if (cx) {
> +      JSObject* global = top->GetNativeGlobal();
> +      if (global) {
> +        JSCompartment* comp = js::GetObjectCompartment(global);
> +        if (comp) {

It's not possible for this to be NULL.

@@ +3302,3 @@
>    if (nsContentUtils::XPConnect()) {
> +    if (sTopEvaluator) {
> +      sTopEvaluator->mEvaluationCount = 0;

Is this line needed? Won't a global GC cause all the evaluation counts to be invalidated anyway?

@@ +3632,5 @@
> +  if (!comp) {
> +    sCompartmentGCCount = 0;
> +    if (!++sGlobalGCGeneration) {
> +      ++sGlobalGCGeneration;
> +    }

What's this for? Some sort of overflow check? It doesn't seem like a big deal if sGlobalGCGeneration overflows to 0: some newly-created contexts will have their generation equal to 0, but they will also have an evaluation count of 0, which is correct. So why is the extra increment needed?
Comment on attachment 592515 [details] [diff] [review]
patch

I'm not sure I know what is going on here well enough to evaluate it.  I'll try taking another look tomorrow.
Attachment #592515 - Flags: review?(continuation)
(In reply to Bill McCloskey (:billm) from comment #20)

> I am a little worried about the situation where the browser goes idle for a
> long time. It would be nice if we could guarantee that the last GC was a
> full GC. Could we maybe have a timer that schedules a global GC, set for 30
> seconds or so, and we keep resetting it to 30 seconds every time we do a
> compartment GC? That way, as long as we're running compartment GCs, it never
> fires. But once we stop doing compartment GCs, we would be guaranteed to get
> a full GC later on.
I can add that.



> 
> ::: dom/base/nsJSEnvironment.cpp
> @@ +173,5 @@
> >  static PRUint32 sPreviousSuspectedCount = 0;
> >  
> >  static bool sPreviousCollectionWasGC = false;
> > +static PRUint32 sCompartmentGCCount = 0;
> > +static PRUint32 sGlobalGCGeneration = 0;
> 
> Instead of Generation, could you use Epoch here (and in the member variable
> as well)? Otherwise people might think this has something to do with
> generational GC.
sGlobalGCEpoch? ok.



> 
> @@ +3173,5 @@
> > +
> > +    // If there has been a global GC after previous
> > +    // script evaluation, reset mEvaluationCount to 0,
> > +    // and mark the current GC generation.
> > +    if (mGlobalGCGeneration != sGlobalGCGeneration) {
> 
> I think all this generation logic would be cleaner if you consolidated it
> into three inline functions: one to read an nsJSContext's evaluation count
> (returning 0 if the generation doesn't match), one to increment its count,
> and one to reset all the counts to zero. This code could go at the top of
> nsJSEnvironment.cpp.
Ok


> 
> @@ +3280,5 @@
> >  
> > +  if (aGckind != nsGCShrinking && !aGlobal &&
> > +      sTopEvaluator &&
> > +      sTopEvaluator->mGlobalGCGeneration == sGlobalGCGeneration &&
> > +      !sContextDeleted && sCompartmentGCCount < 10) {
> 
> Could you put a comment somewhere explaining when we do a compartment GC?
> The logic is currently split between here and ScriptEvaluated. I think the
> comment should go here. Also, do I understand right that only every other
> compartment GC is allowed to collect a chrome compartment?
Yes

> Is this working
> around the problem that we can't GC multiple compartments at once?
It is working around the problem that otherwise chrome would be GCed all the time.
We have so much chrome js which runs pretty much whenever user does something in the browser, so
chrome js gets evaluated way more often than content js.

> @@ +3287,5 @@
> > +    top->mEvaluationCount = 0;
> > +    JSContext* cx = top->GetNativeContext();
> > +    if (cx) {
> > +      JSObject* global = top->GetNativeGlobal();
> > +      if (global) {
> 
> Is it possible for GetNativeGlobal to return NULL?
I don't know. Adding a null check is safe.

> 
> @@ +3289,5 @@
> > +    if (cx) {
> > +      JSObject* global = top->GetNativeGlobal();
> > +      if (global) {
> > +        JSCompartment* comp = js::GetObjectCompartment(global);
> > +        if (comp) {
> 
> It's not possible for this to be NULL.
Documentation doesn't tell that but ok.
And sorry, I'm a Gecko developer and in Gecko Get* methods can in general
return null.

> 
> @@ +3302,3 @@
> >    if (nsContentUtils::XPConnect()) {
> > +    if (sTopEvaluator) {
> > +      sTopEvaluator->mEvaluationCount = 0;
> 
> Is this line needed? Won't a global GC cause all the evaluation counts to be
> invalidated anyway?
leftover from a previous version.

> 
> @@ +3632,5 @@
> > +  if (!comp) {
> > +    sCompartmentGCCount = 0;
> > +    if (!++sGlobalGCGeneration) {
> > +      ++sGlobalGCGeneration;
> > +    }
> 
> What's this for? Some sort of overflow check?
Yes.

> It doesn't seem like a big
> deal if sGlobalGCGeneration overflows to 0: some newly-created contexts will
> have their generation equal to 0, but they will also have an evaluation
> count of 0, which is correct. So why is the extra increment needed?
Not strictly needed, but to have similar behavior what sCCGeneration has.
Attached patch patch (obsolete) — Splinter Review
Attachment #586517 - Attachment is obsolete: true
Attachment #592515 - Attachment is obsolete: true
Attachment #592515 - Flags: review?(wmccloskey)
Attachment #592657 - Flags: review?(wmccloskey)
Attachment #592657 - Flags: review?(continuation)
Attached patch a bit nicer patch (obsolete) — Splinter Review
But I need to actually test this.
Attached patch patch (obsolete) — Splinter Review
Attachment #592657 - Attachment is obsolete: true
Attachment #592694 - Attachment is obsolete: true
Attachment #592657 - Flags: review?(wmccloskey)
Attachment #592657 - Flags: review?(continuation)
Attachment #592702 - Flags: review?(wmccloskey)
Attachment #592702 - Flags: review?(continuation)
I thought about this some more, and I talked to Andrew, dmandelin, and jst. I think we should wait until the next release to do this.
Ok, but let's get this landed early in the FF13 cycle.

I realized I didn't add the 30s timer. I'll add that today.
Attached patch patch (obsolete) — Splinter Review
60s full gc timer after last compartment GC.
And if after that full gc there is significant amount suspected objects, 
CC will be triggered.
Attachment #592702 - Attachment is obsolete: true
Attachment #592702 - Flags: review?(wmccloskey)
Attachment #592702 - Flags: review?(continuation)
Attachment #593037 - Flags: review?(wmccloskey)
Attachment #593037 - Flags: review?(continuation)
Attached patch patch (obsolete) — Splinter Review
Attachment #593037 - Attachment is obsolete: true
Attachment #593037 - Flags: review?(wmccloskey)
Attachment #593037 - Flags: review?(continuation)
Attachment #593038 - Flags: review?(wmccloskey)
Attachment #593038 - Flags: review?(continuation)
Attached patch patchSplinter Review
Just a minor change to keep shutdown handling the way it is now;
don't call GC if we can't access XPConnect.
Attachment #593038 - Attachment is obsolete: true
Attachment #593038 - Flags: review?(wmccloskey)
Attachment #593038 - Flags: review?(continuation)
Attachment #593474 - Flags: review?(wmccloskey)
Attachment #593474 - Flags: review?(continuation)
Comment on attachment 593474 [details] [diff] [review]
patch

Review of attachment 593474 [details] [diff] [review]:
-----------------------------------------------------------------

Please use the API in bug 721933 for specifying the reason for a compartment GC. Otherwise, it looks fine.
Attachment #593474 - Flags: review?(wmccloskey) → review+
OS: Linux → All
Hardware: x86_64 → All
Comment on attachment 593474 [details] [diff] [review]
patch

Review of attachment 593474 [details] [diff] [review]:
-----------------------------------------------------------------

Is making a compartmental GC, rather than global, the default a good idea?  You missed two places GarbageCollectNow is called in ContentChild that I would guess should stay global, not silently become compartmental.  It may also be surprising to any future callers that the default is compartmental.

Using an enum like nsGCGlobal and nsGCCompartmental might be clearer than an arbitrary boolean, but if you want to leave it as is, that's okay.

It seems a little odd to me that you have to add a new boolean argument for compartmental vs. not, instead of adding another GC type.  Can you have a compartmental shrinking GC or a compartmental incremental GC slice?  The latter makes sense, but maybe not the former.  Perhaps more a question for Bill.

::: dom/base/nsJSEnvironment.cpp
@@ +3314,5 @@
> +                                             NS_FULL_GC_DELAY,
> +                                             nsITimer::TYPE_ONE_SHOT);
> +        }
> +        JSCompartment* comp = js::GetObjectCompartment(global);
> +        JS_CompartmentGC(cx, comp);        

As Bill said, you should put the reason in here now that that works.

::: dom/base/nsJSEnvironment.h
@@ +232,5 @@
> +    aCx->mEvaluationCount = 0;
> +  }
> +  
> +  static void DOMGCFinishedCallback(JSRuntime* aRt, JSCompartment* aComp,
> +                                    const char* aStatus);

Why did you need to make this callback public?
Attachment #593474 - Flags: review?(continuation) → review+
You should fix the callers on ContentChild or change the default so they are okay.
(In reply to Andrew McCreight [:mccr8] from comment #32)
> > +  static void DOMGCFinishedCallback(JSRuntime* aRt, JSCompartment* aComp,
> > +                                    const char* aStatus);
> 
> Why did you need to make this callback public?
Because the method needs to be accessed outside nsJSContext
(In reply to Olli Pettay [:smaug] from comment #34)
> > Why did you need to make this callback public?
> Because the method needs to be accessed outside nsJSContext
The patch doesn't add any additional uses of it as far as I can see (the JS_SetGCFinishedCallback).  Is it just that it should have been made public from the start?  I'm just curious.
nsJSRuntime::Init() needs to be able to access DOMGCFinishedCallback, and
DOMGCFinishedCallback need to be able to access nsJSContext::sGlobalGCEpoch.
I'll remove the default parameters. They are errorprone
(In reply to Olli Pettay [:smaug] from comment #36)
> nsJSRuntime::Init() needs to be able to access DOMGCFinishedCallback, and
> DOMGCFinishedCallback need to be able to access nsJSContext::sGlobalGCEpoch.

Ah, I see now!  Thanks for the explanation.

(In reply to Olli Pettay [:smaug] from comment #37)
> I'll remove the default parameters. They are errorprone

Great.
Attached patch patchSplinter Review
I'll upload interdiff in a second.
Attachment #593936 - Flags: review?(continuation)
Attached patch interdiffSplinter Review
Comment on attachment 593936 [details] [diff] [review]
patch

Review of attachment 593936 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
Attachment #593936 - Flags: review?(continuation) → review+
https://hg.mozilla.org/mozilla-central/rev/c18523b51058
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 723782
No longer depends on: 723782
Target Milestone: --- → mozilla13
Version: 12 Branch → Trunk
Since this merged to inbound, Win Debug reftest has timed-out on about 90% of test runs. I've backed it out there to see if this resolves the orangeness. (If not, apologies in advance!)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm, just found that sgautherie filed bug 723907 on this. If only tbpl and bugzilla were more responsive...
Depends on: 723907
Olli, I can look into the shutdown issue.
I'm pushing some test patches to tryserver.
I have vague feeling that the problem might have something to do with the FullGCTimer, which was
added very lately, since I didn't see the problem earlier with patches which didn't have that.
I think it is more than just ref tests that are being slowed. Cww was experiencing really awful GC times on a Nightly with this patch in it.  25% or more of his GCs were > 1 second.  This is on OSX.

Here's one of them:

GC(T+219862.8) Type:Glob, Total:5606.0, Wait:0.2, Mark:290.4, Sweep:844.9, FinObj:112.5, FinStr:36.9, FinScr:2.3, FinShp:16.9, DisCod:56.6, DisAnl:574.2, XPCnct:9.7, Destry:0.1, End:4479.3, +Chu:0, -Chu:0, Reason:FULL_GC_TIMER

Note that most of the time is spent in "End", and that it is a full GC triggered by the timer.  That's just after being open for about a day.

He usually has a lot of tabs open.  His CC times are pretty good, though, so your earlier patches helped a lot with that. :)
The main thing that happens in the end phase is calling the JSGC_END callback, which triggers deferred release in XPConnect. So it sounds like it might be doing a ton of NS_RELEASE calls. Also, we've seen problems before that suggest that calling free() is really slow on MacOS, so that might be involved as well. To confirm this, we'd need to be able to reproduce the problem and time the JSGC_END callback specifically.

It's possible that there's just a lot of stuff that's not getting GCed until the global GC. I'm not sure why that would be, though. It might be helpful if you can get a more complete list of the GCs that happened on Cww's machine before the long one. Then we could see how recent the previous global GC was.
Ah, sorry, I should have gotten that info before I asked him to restarted with a newer version.  I saw 2 or 3 of the FULL_GC_TIMER GCs and they were all pretty slow.  Then there were a few compartmental GCs in between.  Probably not more than 4 or 5?  But there were many more CCs in between, so it could have been a long time in between the GCs.
If there are a lot of FULL_GC_TIMER GCs, that means that the GC isn't being triggered very often. That might account for all the deferred release calls.

It used to be that a lot of our GCs were triggered by the CC. Now that the CC is run less, we'll also have fewer GCs.
Does this happen during regular browsing? 

I see similar GC pauses if I don't interact with the browser over night for example. In such use-cases they seem fine to me.
We should just make sure that we don't allocate additional memory and allocate out of our fragmented heap.
(In reply to Andrew McCreight [:mccr8] from comment #50)
> Ah, sorry, I should have gotten that info before I asked him to restarted
> with a newer version.  I saw 2 or 3 of the FULL_GC_TIMER GCs and they were
> all pretty slow.  Then there were a few compartmental GCs in between. 
> Probably not more than 4 or 5?  But there were many more CCs in between, so
> it could have been a long time in between the GCs.
Interesting. that is very different to what I get. Usually comp GC runs quite often, and
global GC is also triggered, so I basically never get FULL_GC_TIMER GC.
I wonder if just doing full GC a bit more often would be ok (for now).
Like every 5th GC could be full (not every 10th like with the patch) and then
decrease full gc timer to 30s from 60s.
I'll test and push some patches to try.
I uploaded this to tryserver to see how it behaves.
There can be at most 5 explicit compartment GCs in a row, and
20s after the last explicit compartment gc there is a global gc.
I wonder if bug 716142 would help here.
So I wonder if we could call scheduleGC() on each JSContext a script is evaluated.
I'll try still some changes where global gc would be called more often.
...and once the incremental GC and bug 716142 are ready, we could re-evaluate.
But let's see how the changes work. Pushing to try soon.
Attached patch v8 (obsolete) — Splinter Review
I'm still waiting for tryserver results, but in cases like reftest this
should give pretty much the same behavior what m-c has now.
But in cases like FotN, compartment GC can be used still quite often.

I'll upload an interdiff in a minute.
Attachment #595076 - Flags: review?(continuation)
Attached patch v9Splinter Review
This is a tiny bit cleaner
Attachment #595076 - Attachment is obsolete: true
Attachment #595076 - Flags: review?(continuation)
Attachment #595081 - Flags: review?(continuation)
Attached patch interdiffSplinter Review
this shows the changes to the patch which was pushed to (and backed out from)
m-c
Blocks: 698919
No longer blocks: 705582
Sorry for the delay here.

So the change here is
1) always do a full GC at startup
2) do a full GC every 30 seconds instead of 60 seconds
3) make sure the context GC is a full gc

Earlier you were talking about making it be a global every 5th compartmental, instead of 10th, but you don't seem to have that in the latest version.  Was that intentional?  I guess if the GC is running every 5-10 seconds the global timer will hit you first.

Looks good to me, but I do worry a bit about the recent memory spike we saw around the original landing of the patch.  Maybe you should try out something like Gregor's MemBuster test and see what memory usage looks like 30 seconds after closing all tabs.  You should also coordinate with Bill, as he's looking into a GC issue, so I'm not sure how this may interfere with that.
Comment on attachment 595081 [details] [diff] [review]
v9

Review of attachment 595081 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the interdiff.
Attachment #595081 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight [:mccr8] from comment #63)
> Sorry for the delay here.
> 
> So the change here is
> 1) always do a full GC at startup
> 2) do a full GC every 30 seconds instead of 60 seconds
> 3) make sure the context GC is a full gc
Yes

> 
> Earlier you were talking about making it be a global every 5th
> compartmental, instead of 10th, but you don't seem to have that in the
> latest version.  Was that intentional?
We end up calling full GC anyway very often.


> Looks good to me, but I do worry a bit about the recent memory spike we saw
> around the original landing of the patch.  Maybe you should try out
> something like Gregor's MemBuster test
What is that?
(In reply to Olli Pettay [:smaug] from comment #65)
> > Looks good to me, but I do worry a bit about the recent memory spike we saw
> > around the original landing of the patch.  Maybe you should try out
> > something like Gregor's MemBuster test
> What is that?
I guess it isn't actually Gregor's.  This site: http://random.pavlov.net/membuster/index.html

It opens up a ton of sites and closes them.  Try that out with this patch, see what memory looks like in about:memory.  Or some other kind of benchmark you can compare with and without this patch that involved loading and closing a bunch of sites.
(In reply to Andrew McCreight [:mccr8] from comment #63)
> Looks good to me, but I do worry a bit about the recent memory spike we saw
> around the original landing of the patch.  Maybe you should try out
> something like Gregor's MemBuster test and see what memory usage looks like
> 30 seconds after closing all tabs.
Memusage was pretty much the same with or without the patch.
Whiteboard: [snappy] → [Snappy:P2]
noming for k9o as per recommendation by billm.
blocking-kilimanjaro: --- → ?
Attached patch patch (obsolete) — Splinter Review
Bill, does this affect badly to iGC?
The patch is otherwise the same as earlier, but some static variable handling
happens in GarbageCollectNow since DOMGCSliceCallback is quite different from
the earlier callback.

https://tbpl.mozilla.org/?tree=Try&rev=b831e9e65b54
Attachment #619957 - Flags: feedback?(wmccloskey)
Though, I think iGC should be working before trying this.
I was kinda hoping you could use the new multi-compartment GC stuff here. That would avoid the need for tracking the top evaluator. Instead, we would just GC every compartment that had run code.

The APIs for this are in jsfriendapi. First you call PrepareCompartmentForGC on every compartment you want to collect. This won't actually do any collection, so it can be done from  the ScriptEvaluated hook. Then when you actually want to collect, call IncrementalGC with the runtime and the reason.
(In reply to Bill McCloskey (:billm) from comment #71)
> I was kinda hoping you could use the new multi-compartment GC stuff here.
I can try it.
Attached patch patchSplinter Review
https://tbpl.mozilla.org/?tree=Try&rev=509d2295dba9
Attachment #619957 - Attachment is obsolete: true
Attachment #619957 - Flags: feedback?(wmccloskey)
Attachment #620010 - Flags: review?(wmccloskey)
Comment on attachment 620010 [details] [diff] [review]
patch

This looks leaky. Waiting for some more tryserver results.
Attachment #620010 - Flags: review?(wmccloskey)
It looks like the try push had some assertions caused by GCing without any compartments scheduled for GC. This patch makes it possible to check if any compartments are scheduled.

I updated Olli's patch to call this and pushed the result to try:
https://tbpl.mozilla.org/?tree=Try&rev=01d57bb0e7a4
blocking-kilimanjaro: ? → +
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

We need this to determine whether it's worth doing a compartment GC.
Attachment #620459 - Flags: review?(igor)
Attached patch Olli's patchSplinter Review
This is a slight modification of Olli's patch to fix an assertion. I'll review it soon.
Attachment #621167 - Flags: review?(wmccloskey)
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

Giving to Terrence to expedite this.
Attachment #620459 - Flags: review?(igor) → review?(terrence)
Comment on attachment 620459 [details] [diff] [review]
check if a GC is scheduled

Review of attachment 620459 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me.
Attachment #620459 - Flags: review?(terrence) → review+
Comment on attachment 621167 [details] [diff] [review]
Olli's patch

Review of attachment 621167 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +2963,5 @@
> +                                         reinterpret_cast<void *>(reason),
> +                                         NS_FULL_GC_DELAY,
> +                                         nsITimer::TYPE_ONE_SHOT);
> +    }
> +    if (js::IsGCScheduled(nsJSRuntime::sRuntime))

I just realized I forgot to fix the bracing here.

@@ +3584,5 @@
>    sPendingLoadCount = 0;
>    sLoadingInProgress = false;
>    sCCollectedWaitingForGC = 0;
>    sPostGCEventsToConsole = false;
> +  sDisableExplicitCompartmentGC = false;

I guess as long as you're initializing this one, it would make sense to initialize the other static vars.

::: dom/base/nsJSEnvironment.h
@@ +185,5 @@
>    static void LoadEnd();
>  
> +  static void GarbageCollectNow(js::gcreason::Reason reason,
> +                                PRUint32 aGckind,
> +                                bool aGlobal);

Could you make an enum for aGlobal?
Attachment #621167 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #80)
> @@ +3584,5 @@
> >    sPendingLoadCount = 0;
> >    sLoadingInProgress = false;
> >    sCCollectedWaitingForGC = 0;
> >    sPostGCEventsToConsole = false;
> > +  sDisableExplicitCompartmentGC = false;
> 
> I guess as long as you're initializing this one, it would make sense to
> initialize the other static vars.
I do initialize the others, just a few lines below.


> Could you make an enum for aGlobal?
Hmm, so many enums and similar: readon, kind, different-type-of-kind...
For now I think bool should work just fine.
Attached patch patchSplinter Review
https://hg.mozilla.org/mozilla-central/rev/ec3c29434dee
https://hg.mozilla.org/mozilla-central/rev/39daf6dc21ef
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
This may have caused dromaeo crashes on Win-opt.
Unfortunately no stack traces.

Backed out
https://hg.mozilla.org/mozilla-central/rev/79f78105c451
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 752244
This is a little patch to unschedule a compartment for GC.
Oops. Anyway, the previous patch works the way we talked about. First it schedules all compartments for GC. Then it de-schedules any compartments that have been inactive. I didn't have much time, and I added a really gross linked list of nsJSContexts. We can fix this up.

I also just realized that we should probably check if any compartments are left to GC before calling js::IncrementalGC. But that should be easy.

This is totally untested. I just wanted to get it posted before the end of the day.
Attached patch v20Splinter Review
This seems to pass all the tests.
Because of the PrepareForFullGC/SkipCompartmentForGC thing we end up collecting 
quite a few more compartments than in the previous versions, so I increased
NS_FULL_GC_DELAY and NS_MAX_COMPARTMENT_GC_COUNT.
Attachment #622383 - Flags: review?(wmccloskey)
Attachment #622256 - Flags: review?(terrence) → review+
Attachment #622383 - Flags: review?(wmccloskey) → review+
https://hg.mozilla.org/mozilla-central/rev/e6529138e338

Let's see if it sticks this time.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Olli, could you obsolete some of the older patches that you didn't land?  It is hard to tell what is what.  Thanks.
Should the target milestone be updated to mozilla15 now?
Target Milestone: mozilla13 → mozilla15
No longer blocks: 698919
Blocks: 698919
blocking-basecamp: --- → ?
We'd ship Basecamp without fixing this, minusing.
blocking-basecamp: ? → -
This was disabled in Firefox 15 by bug 782825.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.