If incremental GC or compartment GC runs all the time, CC never runs

VERIFIED FIXED in Firefox 13

Status

()

Core
DOM
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: smaug, Assigned: billm)

Tracking

12 Branch
mozilla13
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox13+ fixed)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

5 years ago
We should not kill CC timer all the time.
(Reporter)

Comment 1

5 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=730753 is an example for this problem
Blocks: 730753
(Reporter)

Updated

5 years ago
Blocks: 641025
tracking-firefox13: --- → ?
One thing we could do is to not reset the counter when we kill the CC timer, so we'll eventually make progress and fire off a CC.  That may not help all of the time, but it should reduce the problem.  When looking at the behavior of the timer callback in bug 728460, I've noticed a number of times where a ton of things (20k+) get dumped into the purple buffer, then the timer counts up to 10, gets reset, counts up to 10, etc. a few times until we actually run the CC.  I assume this is due to the GC, but I can't be sure.
We could also track how many times we've been reset, and somehow get increasingly aggressive about what value we set the counter to.  If we are only rarely getting interrupted, it is probably best to just wait a bit, but if we're in a situation where the GC is running full bore the CC should get pushier.
One thing we may be able to do is to track how many deferred releases we do.  If there aren't any, we don't need to trigger a CC.  Maybe there are always some, but maybe not.
Whiteboard: [MemShrink]
Assignee: nobody → continuation
Whiteboard: [MemShrink] → [MemShrink:P2]
status-firefox13: --- → affected
tracking-firefox13: ? → +
Bill has an idea how to fix this.  Seems pretty simple.
Assignee: continuation → wmccloskey
Blocks: 731437
Blocks: 712704
(Assignee)

Comment 6

5 years ago
Created attachment 602503 [details] [diff] [review]
patch

I think that the main effect of this patch will be to remove the KilCCTimer calls from the GC callback. However, I also simplified the code, since I don't have any explanation for the questions in bug 712704.
Attachment #602503 - Flags: review?(bugs)
Would it make sense move the nsJSContext::PokeShrinkGCBuffers() to an else branch after the initial aDesc.isCompartment check?
(Reporter)

Comment 8

5 years ago
Comment on attachment 602503 [details] [diff] [review]
patch

Do we really want _every_ compartment GC to trigger CC?
(Assignee)

Comment 9

5 years ago
(In reply to Olli Pettay [:smaug] from comment #8)
> Do we really want _every_ compartment GC to trigger CC?

Honestly, I don't know.

I do know that it won't hurt our benchmark scores, since the CCs will only run when we've returned to the event loop, after the benchmark has finished.

Also, as we move more in the direction of more compartment GCs, I think it's a bad idea to treat compartment GCs differently from full GCs.

It's possible that there will be animation-heavy pages that need to do a lot of compartment GCs, but where CCs are not necessary. However, any CCs we do will be faster than normal, since they'll only look at JS objects from the compartment being collected. So maybe that wouldn't be so bad.

Comment 10

5 years ago
>I do know that it won't hurt our benchmark scores,
That's like saying cheating won't hurt your test scores, hardly something to look at in a positive light.
No, it's like saying that partying after the test won't hurt your test scores.

Comment 12

5 years ago
Well,after some thought, it's more like refusing to have a test that used to cover one topic now cover two, won't hurt test scores. The question is, are there tests covering both topics and should there be one test to rule them both?
(Reporter)

Comment 13

5 years ago
Comment on attachment 602503 [details] [diff] [review]
patch

Also, is it possible that we end up having sCCLockedOut true almost all the
time so that CCTimerFired returns always early and never ends up
calling CC.

Could we have some way to say "don't run iGC while CC timer is active".

I think we want to have something like
run iGC (and prevent CC)-> run CC (and prevent iGC) -> run iGC (and prevent CC)->...
"run CC (and prevent iGC)" might not run after compartment GC if we
just call MaybePokeCC(), but don't set sNeedsFullCC.

Non-incremental GCs should be possible (but hopefully very very rare) during CC steps.

If during CC we try to do iGC, we could start iGC right after CC has run.
(Reporter)

Comment 14

5 years ago
Comment on attachment 602503 [details] [diff] [review]
patch

I don't see how this guarantees that CC actually runs.
sCCLockedOut may be true all the time.
Attachment #602503 - Flags: review?(bugs) → review-
Bill, is there any guarantee that we'll not run the GC for at least, say, half a second after it finishes? That should be enough to let the CC make progress.
(Assignee)

Comment 16

5 years ago
Created attachment 603809 [details] [diff] [review]
patch v2

If we only consider timer-based GCs, then they'll always be at least 4 seconds apart, and so the CC will always have time to run. So the ones we're worried about are JS engine-induced incremental GCs. In some corner cases, we probably could get a lot of these.

I tried taking Olli's suggestion, but it turned out to be somewhat difficult to implement. It required that the JS engine ask the browser whether it was allowed to run an incremental GC. Also, implementing the behavior "don't schedule a GC until the current CC is done" required a lot of extra state variables.

I think the best solution would be to use a single timer for scheduling the GC and the CC. However, that turned out to be a big patch also. I'd like to fix this problem soon, so I've put together a more conservative patch that directly addresses Olli's concern. If the CC has been locked out for too long, it just finishes the current incremental GC non-incrementally and then skips to the second-to-last timer step (which will do one ForgetSkippable step and then later a CC).
Attachment #602503 - Attachment is obsolete: true
Attachment #603809 - Flags: review?(bugs)
(Reporter)

Comment 17

5 years ago
(In reply to Bill McCloskey (:billm) from comment #16)
> If we only consider timer-based GCs, then they'll always be at least 4
> seconds apart, and so the CC will always have time to run. So the ones we're
> worried about are JS engine-induced incremental GCs. In some corner cases,
> we probably could get a lot of these.
That is the case with bug 730753, I think.
(I can't reproduce the problem, where CC doesn't run at all)


> I think the best solution would be to use a single timer for scheduling the
> GC and the CC. However, that turned out to be a big patch also. I'd like to
> fix this problem soon, so I've put together a more conservative patch that
> directly addresses Olli's concern. If the CC has been locked out for too
> long, it just finishes the current incremental GC non-incrementally and then
> skips to the second-to-last timer step (which will do one ForgetSkippable
> step and then later a CC).
Sounds good, for now. Looking at the patch...
(Reporter)

Comment 18

5 years ago
Comment on attachment 603809 [details] [diff] [review]
patch v2

What if iGC is happens quite fast.
sCCLockedOut is set true, CCTimerFired is called, 
PR_Now() - sCCLockedOutTimer < NS_MAX_CC_LOCKEDOUT_TIME is true, 
sCCLockedOut is set false, new iGC starts and sCCLockedOut is true, 
CCTimerFired is called, but it again returns early.
Attachment #603809 - Flags: review?(bugs) → review-
(Assignee)

Comment 19

5 years ago
Created attachment 603853 [details] [diff] [review]
patch v3

You're right, that seems like a problem. Unfortunately I haven't been able to hit a situation where this actually happens, but I think it should be fixed now.
Attachment #603809 - Attachment is obsolete: true
Attachment #603853 - Flags: review?(bugs)
(Reporter)

Comment 20

5 years ago
Comment on attachment 603853 [details] [diff] [review]
patch v3

>diff --git a/dom/base/nsJSEnvironment.cpp b/dom/base/nsJSEnvironment.cpp
>--- a/dom/base/nsJSEnvironment.cpp
>+++ b/dom/base/nsJSEnvironment.cpp
>@@ -142,31 +142,34 @@ static PRLogModuleInfo* gJSDiagnostics;
> // and doing the actual CC.
> #define NS_CC_DELAY                 6000 // ms
> 
> #define NS_CC_SKIPPABLE_DELAY       400 // ms
> 
> // Force a CC after this long if there's anything in the purple buffer.
> #define NS_CC_FORCED                (2 * 60 * PR_USEC_PER_SEC) // 2 min
> 
>+// Don't allow an incremental GC to lock out the CC for too long.
>+#define NS_MAX_CC_LOCKEDOUT_TIME    (5 * PR_USEC_PER_SEC) // 5 seconds
This is quite short time. But we can change this later.

>-static bool sGCHasRun;
> static bool sCCLockedOut;
>+static PRTime sCCLockedOutTimer;
This is not timer but the start time of the lock,
the first time the lock is checked, so
maybe sCCLockedOutTime
Attachment #603853 - Flags: review?(continuation)
Attachment #603853 - Flags: review?(bugs)
Attachment #603853 - Flags: review+
Comment on attachment 603853 [details] [diff] [review]
patch v3

Review of attachment 603853 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks good!

::: dom/base/nsJSEnvironment.cpp
@@ +3327,5 @@
> +
> +  if (sCCLockedOut) {
> +    PRTime now = PR_Now();
> +    if (sCCLockedOutTimer == 0) {
> +      sCCLockedOutTimer = now;

You could just return here. Might be a little clearer.

@@ +3542,5 @@
>        cs->LogStringMessage(msg.get());
>      }
>    }
>  
> +  // Prevent cycle collections and shrinking during incremental GC.

"shrinking GC" instead of "shrinking"?

@@ +3568,3 @@
>        // If this is a compartment GC, restart it. We still want
>        // a full GC to happen. Compartment GCs usually happen as a
>        // result of last-ditch or MaybeGC. In both cases its

While you are here, "its" should be "it's" or just "it is".
Attachment #603853 - Flags: review?(continuation) → review+
(Assignee)

Comment 22

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/24e233ee2677
Target Milestone: --- → mozilla13
https://hg.mozilla.org/mozilla-central/rev/24e233ee2677
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Blocks: 728871

Updated

5 years ago
status-firefox13: affected → ---

Updated

5 years ago
status-firefox13: --- → fixed
Is there a test QA can use to verify this fix?
Whiteboard: [MemShrink:P2] → [MemShrink:P2][qa?]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #24)
> Is there a test QA can use to verify this fix?

Taking silence as a 'no'. Please mark as [qa+] if there is something QA can do to verify this fix.
Whiteboard: [MemShrink:P2][qa?] → [MemShrink:P2][qa-]

Comment 26

5 years ago
Verified as fixed on Firefox Aurora(20120609042006), Nightly(20120611030526), and on the latest Firefox 14.0 beta (20120605113340 - I enabled IGC manually on this one before testing).
Status: RESOLVED → VERIFIED
Whiteboard: [MemShrink:P2][qa-] → [MemShrink:P2]
You need to log in before you can comment on or make changes to this bug.