Closed Bug 727965 Opened 8 years ago Closed 5 years ago

[CC] change WaitingForGC to be based on the number of JS objects, not all objects

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 2 obsolete files)

We shouldn't trigger the GC based on the CC freeing refcounted nodes.
Depends on: 697115
On top of bug 697115, this is a single line change, so I'm trying it out a bit.

For visited and collected there are two numbers.  The first is the number of ref counted, the second is the number of GCed.  Suggestions on how to improve that are welcome.

Here are some random CCs that are kind of interesting:

CC(T+74.4) duration: 15 ms, visited: 1612 1475, collected: 579 27 (27 waiting for GC), suspected: 520.
ForgetSkippable 27 times before CC, min: 0 ms, max: 5 ms, avg: 0 ms, total: 25 ms, removed: 3219

CC(T+258.5) duration: 44 ms, visited: 4112 3723, collected: 2336 1106 (1106 waiting for GC), suspected: 857.
ForgetSkippable 1 times before CC, min: 4 ms, max: 4 ms, avg: 4 ms, total: 4 ms, removed: 359

CC(T+268.7) duration: 17 ms, visited: 2522 2015, collected: 219 5 (5 waiting for GC), suspected: 712.
ForgetSkippable 10 times before CC, min: 0 ms, max: 3 ms, avg: 0 ms, total: 8 ms, removed: 6474

The threshhold is 250, so the first and third there would have triggered GCs without this more fine-grained counting.
Probably silly to do, but we could reset the waiting for GC count if there was a forced GC, before incrementing it with the new objects.
We do that already.
Ah, great, for some reason I thought the reset happened in GarbageCollectNow, not in the callback.  That makes sense...
Attached patch simple patch (obsolete) — Splinter Review
Assignee: nobody → continuation
Just a minor rebasing over the previous patch.  We may also want to lower the threshold for triggering a GC.  But maybe not.
Attachment #598456 - Attachment is obsolete: true
Attachment #599739 - Flags: review?(bugs)
Attachment #599739 - Flags: review?(bugs) → review+
Comment on attachment 599739 [details] [diff] [review]
use num JS objects freed to decide when to trigger GC

Bill, does this sound reasonable to you?  Currently we trigger a GC when the CC has freed a total of 250 objects of any kind since the last GC.  This patch changes it so that it will only be triggered when the CC has freed 250 GCed objects.
Attachment #599739 - Flags: feedback?(wmccloskey)
CC doesn't really free GC objects ;)
Ah, right.  Well, when the CC thinks it has freed GCed objects. ;)
Comment on attachment 599739 [details] [diff] [review]
use num JS objects freed to decide when to trigger GC

This sounds great. As usual, I'm worried that it will cause too few GCs. But I guess we'll just have to wait and see (and keep this bug in mind if anything goes wrong in the future).
Attachment #599739 - Flags: feedback?(wmccloskey) → feedback+
Thanks.  When I tried it out, the GC seemed to fire fairly frequently anyways, it was just for MAYBE_GC instead of CC_WAITING.  But yeah, you are right, we should keep an eye out.  We can also lower the threshold later, to say 100, and that would at least avoid the fairly common case where the CC frees a few hundred ref counted things and no JS things.
This patch (or the one from Bug 697115, but that seems less likely) seems to be causing a leak on Linux Moth about half the time:

https://tbpl.mozilla.org/?tree=Try&rev=a74f5df82474

I haven't seen any sign that that leak is a preexisting one, so I guess I'll have to investigate.
I think it isn't a leak per se.  This test installs a listener, then waits for it to be triggered with some condition.  When that happens, the listener is uninstalled.  So I think what is happening is that the listener is never hitting the finish condition, so the test times out.  But then the listener is still installed, so we leak.
What kind of listener?
Oops, sorry, I mean "observer" not "listener".  I guess I should be glad we don't also have "tasters" in our code.

the observer is added to this:

var annosvc = Cc["@mozilla.org/browser/annotation-service;1"].
                getService(Ci.nsIAnnotationService);
This is what a successful run looks like (with the exact same build):

13529 INFO TEST-START | chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul
++DOMWINDOW == 647 (0xc3cf4c8) [serial = 1337] [outer = 0xb737e98]
WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x8000FFFF: file ../../../../content/base/src/nsContentUtils.cpp, line 2630
WARNING: NS_ENSURE_TRUE(pusher.Push(aBoundElement)) failed: file ../../../../content/xbl/src/nsXBLProtoImplMethod.cpp, line 359
13530 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul | livemark site URI changed to http://example.org/ not to value in feed
WARNING: NS_ENSURE_TRUE((aFolderId) >= 1) failed: file ../../../../toolkit/components/places/nsNavBookmarks.cpp, line 1139
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file ../../../../toolkit/components/places/nsNavBookmarks.cpp, line 2769
13531 INFO TEST-END | chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul | finished in 1635ms

This is what a failing run looks like:
13529 INFO TEST-START | chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul
++DOMWINDOW == 647 (0xbfcd5a8) [serial = 1337] [outer = 0x9ff1dc0]
WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x8000FFFF: file ../../../../content/base/src/nsContentUtils.cpp, line 2630
WARNING: NS_ENSURE_TRUE(pusher.Push(aBoundElement)) failed: file ../../../../content/xbl/src/nsXBLProtoImplMethod.cpp, line 359
13530 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul | Test timed out.

Which isn't too interesting.  But then later it produces an error like this twice:

************************************************************
* Call to xpconnect wrapped JSObject produced this error:  *
[Exception... "'[JavaScript Error: "gLivemarkId is not defined" {file: "chrome://mochitests/content/chrome/toolkit/components/places/tests/chrome/test_341972a.xul" line: 52}]' when calling method: [nsIAnnotationObserver::onItemAnnotationSet]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: resource:///components/nsLivemarkService.js :: LM__setAnno :: line 430"  data: yes]
************************************************************

But maybe that's just expected, because we've ended the test?  Though it seems a little odd that the observer wouldn't keep alive all of the variables it refers to.
I repushed on top of an up-to-date inbound and now it fails 100% of the time.  Sweet.

https://tbpl.mozilla.org/?tree=Try&rev=092e30b328a5
Here's a run with incremental GC disabled: https://tbpl.mozilla.org/?tree=Try&rev=d63c9a6167d1

I need to retrigger it a few times, but the first run of Moth passed.
Yeah, out of a total of 4 runs, it didn't hit the error at all, so it must be some kind of interaction with incremental GC.
I was hoping Bill's patch to stop the GC from killing the CC would fix my permaorange, but instead I have a different Moth failure, another timeout. For a bonus, there's also a Linux64 test that is timing out, too.  Both are only timing out in debug builds, though.

https://tbpl.mozilla.org/?tree=Try&rev=2e11992ad14f
Here's another go at this in the hope that bug 750424 fixed the problem: https://tbpl.mozilla.org/?tree=Try&rev=1373adf0c3db
Try run looks fine.  I think I'll still wait a little bit for CPG and the GC crashing to shake out a little more.
No longer blocks: 698919
Blocks: 698919
This is still probably a good idea, though if we free a merged compartment, we probably want to trigger a GC, so I'm not sure how to deal with that.
The GC people pointed out that almost every GC in the browser is triggered by the CC, which is not great, so I'm going to look at tuning these heuristics a bit.  I think this is a good place to start, as it is going on some obviously bogus data.
There's some trickiness here, because I think we want to track if we "collected" any merged JS zones, and always trigger a GC later if we did.
Attachment #599739 - Attachment is obsolete: true
Blocks: 1083418
Comment on attachment 8507222 [details] [diff] [review]
Trigger CC_WAITING GCs based on number of freed JS objects and zones. WIP

try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=27d707c607fd
Attachment #8507222 - Flags: review?(bugs)
Comment on attachment 8507222 [details] [diff] [review]
Trigger CC_WAITING GCs based on number of freed JS objects and zones. WIP

> NeedsGCAfterCC()
> {
>   return sCCollectedWaitingForGC > 250 ||
>+    sCCollectedZonesWaitingForGC > 1 ||
Why 1 and why not 0.
That explained or changed, r+.

(Not that I see merged CCs almost ever.)
Attachment #8507222 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #27)
> Why 1 and why not 0.
Oops, yeah that should be 0.

> (Not that I see merged CCs almost ever.)
Yeah, I'm not sure they are working now. :(  I should figure out why, or maybe just remove them.  They matter less with ICC, and they are kind of weird and complicated.
Blocks: 1085447
https://hg.mozilla.org/mozilla-central/rev/81d1dd9f5666
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.