Closed Bug 506125 Opened 15 years ago Closed 13 years ago

Experiment with memory-pressure-based GC scheduler


(Core :: JavaScript Engine, defect)

Not set





(Reporter: dmandelin, Assigned: gwagner)




(1 file, 14 obsolete files)

40.50 KB, patch
Details | Diff | Splinter Review
The starting point here is that we know that the current JS GC scheduler is not particularly performant, and we'd like to try some new ideas. I'll start with one idea to narrow this down to a sane level.

First, consider a copying generational GC with a nursery (which is of course not what we do). This design is used when a high fraction of objects created are short-lived. They can then be collected early, and at low cost, by doing a 'minor' collection on the nursery. In the basic design of this kind of GC, a 'minor' collection is done exactly when the nursery is full. Thus, if the nursery size is S, a minor collection is done when S bytes have been allocated since the last GC, and the main scheduling parameter is the size of the nursery.

Now, let's follow the noble tradition of pretending a solution to one problem will work on a related but different problem. For SpiderMonkey, this would mean doing a collection once S bytes have been allocated since the previous collection. Here, bytes allocated are ideally computed thus:

- For NewGCThing(), the number of bytes allocated (8).

- For JS_malloc(),
     if the pointer gets stored in a transitively GC'd object, the size
     otherwise 0

- For JS_realloc(),
     if the pointer gets stored in a transitively GC'd object, (newsize-oldsize)
     otherwise 0

Trying this out requires creating some new allocation APIs and changing old uses:

  JS_malloc   -- for stuff that won't go under GC control
  JS_mallocGC -- for stuff that will go under GC control
  JS_reallocDouble -- stuff under GC control, when caller knows he is doubling the size
  JS_reallocFromTo -- caller gives old bytes as explicit argument
  JS_reallocWeirdness -- we might need some more ones to cover other cases

The experiment is to set up this memory accounting and GC scheduling design, and then run some tests. The parameter S will vary from test to test. For each test, max heap size, GC pause time distribution, and overall GC overhead (as a fraction of total JS time) should be measured. The same measurements should be taken for the current scheduler. Picking a workload is hard but we can start with SunSpider, v8, and bug 504640.
"Double" is confusing in light of jsdouble, and realloc is the wrong API (too prone to leak on OOM, overflow of size_t, etc.) But I do like "Weirdness" ;-)

Main point here is to suggest different name scheme, probably using Grow and Shrink to separate direction by method.

But I'm not bikeshedding -- the usability of realloc is poor, whatever its name. We want different grow&shrink APIs, also split as Dave says by growth policy. The names are secondary but of course if it no longer looks like realloc, it should not be named anything that matches *realloc*.

Blocks: 505933
Assignee: general → gal
Attached patch patch (obsolete) — Splinter Review
The attached patch implements an experimental actual-RSS size-based GC heuristics. We sample the RSS size after every GC, and trigger a new GC once we allocate 20% more pages than after the last GC. Delayed freeing during GC is accounted for.

Allocating JSGCThings is no longer tracked, and is no longer part of the heuristics. As far as I can tell triggering a GC based on JSGCThings was mostly a kludge to trigger a GC based on the DOM memory contribution per JSGCThing which we can't measure direct. Since we measure total process memory use now, thats no longer necessary. A JS program is welcome to use the entire (fairly limited) 32MB JS heap and we GC last ditch if we have to, if overall (malloc) memory pressure didn't trigger a GC already.

The patch results in about 10% better average memory use on the very memory happy benchmark.
Attached patch patch (obsolete) — Splinter Review
Cleaned up the patch a bit. malloc_size() support for win32 and linux. Getting the RSS for win and linux still missing. I am not sure whether we currently inhibit GC during startup the same way we did before. If not, I would like to add a simple N s delay using js_IntervalNow().
Attachment #390688 - Attachment is obsolete: true
Could some windows and linux fanboys help out with the patch? The windows code is in place but doesn't compile. For linux I think we have to sscanf /proc/self/stat, which is borderline funny. Maybe we can mmap(/proc/self/stat), and do a memory sscanf?
I think you want to just seek and read; proc files are pretty quick to access like that.
js_GetRSS() sits in MaybeGC which the browser calls a bunch during startup, so I want to make sure its fast Otoh we probably want to return a fake number for js_GetRSS() anyway to avoid GC during startup. I guess we will figure it out by measuring once the patch compiles everywhere.
I can fix the Windows part for you but the latest (3rd) patch doesn't apply to tip. Please rebase or tell me your parent rev.
Attached patch patch (obsolete) — Splinter Review
Untested linux support.
Attachment #390751 - Attachment is obsolete: true
Attachment #390896 - Attachment is obsolete: true
Previous version didn't incorporate the latest linux additions.
Attachment #390908 - Attachment is obsolete: true
Attachment #390902 - Attachment is obsolete: true
Updated patch seems to work well on Windows. It did fine on the image evolution page, which the old GC (before it was specifically fixed for that problem) failed on.

The more I think about it, the more I like the idea of *not* doing extra memory accounting outside of the allocator.
Updating title to reflect patch content. We can refile if we end up wanting to revive the original idea.
Summary: Experiment with total-bytes-based GC scheduler → Experiment with memory-pressure-based GC scheduler
Attached patch latest and greatest (obsolete) — Splinter Review
This patch combined with the parallel sweeping is really improving our behavior on the box2d examples (pretty dramatic improvement in comparison to 3.5.1).
Attachment #390909 - Attachment is obsolete: true
Attachment #391041 - Flags: review?(igor)
Comment on attachment 391041 [details] [diff] [review]
latest and greatest

Still have to sort out the exact balancing of the heuristics and WinCE support. I will have igor take a look in the meantime.

igor: any suggestions what to do with trigger_factor? do we want to try to somehow calculate that value backwards into something useful in the world?
... in the new world?
We pass talos, startup and tp_rss look good. mochitests seem to die on Linux and WIN32:


86709 INFO TEST-PASS | /tests/toolkit/content/tests/widgets/test_panelfrommenu.xul | focus blurred on panel from toolbarbutton open
86710 INFO TEST-PASS | /tests/toolkit/content/tests/widgets/test_panelfrommenu.xul | focus not modified on cursor down from toolbarbutton
86712 INFO Running /tests/toolkit/content/tests/widgets/test_popup_attribute.xul...

command timed out: 300 seconds without output, killing pid 24364
process killed by signal 9
program finished with exit code -1

MacOSX had an error but that looks spurious:

TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/places/tests/perf/browser_ui_history_sidebar.js | Timed out

Win32 had an error too:

1900 INFO Running chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul...
1901 INFO TEST-PASS | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul | activeWindow
1902 INFO TEST-PASS | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul |  child document hasFocus
1903 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul | [SimpleTest/SimpleTest.js, window.onerror] An error occurred: [ fm.focusedElement is null ]

command timed out: 300 seconds without output
program finished with exit code 1
(In reply to comment #17)
> igor: any suggestions what to do with trigger_factor? do we want to try to
> somehow calculate that value backwards into something useful in the world?

I suggest to remove this and any other no-longer used consts from the API, fine-tune the patch for best performance with the browser and add new API like setting the growth factor for RSS or similar. Embeddings as far as I remember rarely do any tuning of GC parameters besides calling MaybeGC. If by default MaybeGC and friends will do the right job for the browser, then the embedders should not have any troubles with code-porting.
(In reply to comment #14)
> The more I think about it, the more I like the idea of *not* doing extra memory
> accounting outside of the allocator.

IMO ideally we should have universal GC-allocator for things of arbitrary sizes and do accounting based on big chunks of mmap-allocated memory. But with the current mixture of malloc and GC-based allocations some extra accounting could be necessary. Otherwise it is hard to avoid situations when the GC thinks that its memory is over-used while having mostly-freed malloced memory or vise-verse.
Blocks: 506315
(In reply to comment #16)
> Created an attachment (id=391041) [details]
> latest and greatest

Does it passes tests in single-threaded builds?
#22: I have a mochitest failure. Debugging that.
Attached patch patch (obsolete) — Splinter Review
Attachment #391041 - Attachment is obsolete: true
Attachment #391041 - Flags: review?(igor)
For the life of me I can't reproduce the mochitest failure tinderbox is giving me. I tried mac and linux (in a vm). Will try windows in a vm now.
Ok, I try-servered this again:

linux talos try01 passed. linux talos try02 crashed. wtf?

OSX unit tests has a bunch of leaks but I am going to claim thats some unrelated **** in TM tip.

Patch passed talos on 2 WINNT columns. On try02 we got a ts_shutdown of 584.95. On try04 ts_shutdown was 296.75. Aroo?

WINNT unit tests also show a leak:

Vista talos passed too.

I think I definitively slow down ts with this patch. I will submit one that avoids GC in MaybeGC until startup is over.
This patch adopts the same cheesy startup 10s timer idea we use in the DOM elsewhere to suppress forced GCs (via JS_GC) during startup. MaybeGC is supposed to be called when the engine is idle. We really shouldn't be called it during startup (we call it some 30 times right now). I will defer to Johnny to oversee those calls to be removed. Until then this patch works around it.
Attachment #392041 - Attachment is obsolete: true
This seems to have fixed the startup time issue. I am still seeing all sorts of random test failures, but nothing that could reasonably be related to my patch.
Attached patch ready for review (obsolete) — Splinter Review
Attachment #392064 - Attachment is obsolete: true
Attachment #392157 - Flags: review?(igor)
Attachment #392157 - Flags: review?(pavlov)
With the patch the monster 3d demo peaks at about 10mb less memory use than 3.5.2 (160mb vs 170mb for 3.5.2). An empty google page started up is 60mb vs 59mb (3.5.2). All numbers for x86 MACOSX.

Mobile might want to look at more aggressive GC heuristics. Now that we have an idea of the working set size, on mobile we could constantly GC really hard if we get near the device limit (beats running out of memory).
Blocks: 508140
Comment on attachment 392157 [details] [diff] [review]
ready for review

>@@ -2553,41 +2488,39 @@ JS_IsAboutToBeFinalized(JSContext *cx, v
> JS_SetGCParameter(JSRuntime *rt, JSGCParamKey key, uint32 value)
> {
>     switch (key) {
>       case JSGC_MAX_BYTES:
>         rt->gcMaxBytes = value;
>         break;
>-        rt->gcMaxMallocBytes = value;
>         break;
>         rt->gcEmptyArenaPoolLifespan = value;
>         break;
>       default:
>         JS_ASSERT(value >= 100);
>-        rt->setGCTriggerFactor(value);
>         return;

We should remove JSGC_MAX_MALLOC_BYTES and JSGC_TRIGGER_FACTOR. The latter is not present in any public release of SM and If some embedders would complain about JSGC_MAX_MALLOC_BYTES, we can consider restoring this particular parameter. 

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>-    doGC = IsGCThresholdReached(rt);
>+    doGC = rt->gcIsNeeded;
>     for (;;) {
>         if (doGC
> #ifdef JS_TRACER
>             && !JS_ON_TRACE(cx) && !JS_TRACE_MONITOR(cx).useReservedObjects
> #endif
>             ) {
>             /*
>              * Keep rt->gcLock across the call into js_GC so we don't starve
>@@ -1856,23 +1799,21 @@ NewGCThing(JSContext *cx, uintN flags)
>         if (thing) {
>             arenaList->freeList = thing->next;
>             flagp = thing->flagp;
>             JS_ASSERT(*flagp & GCF_FINAL);
>             /*
>              * Refill the local free list by taking several things from the
>-             * global free list unless the free list is already populated or
>-             * we are still at rt->gcMaxMallocBytes barrier. The former is
>+             * global free list unless the free list is already populated.
>+             * we are still at rt->gcMaxMallocBytes barrier. This is
>              * caused via allocating new things in gcCallback(cx, 

The comments changes looks semi-finished.
Attachment #392157 - Flags: review?(igor) → review+
Blocks: 505308
Attachment #392592 - Flags: review?(igor) → review+
Increased ceiling to 125% (from 118.5%), trying to identify tjss regression cause.
We are hitting a bunch of OOM crashes on talos nochrome. MaybeGC is not called or not called sufficiently I assume. Brendan suggest calling MaybeGC from the engine as well instead of relying on the embedding.
Backed out. I will look at the tjss issue first.
Depends on: 508128
had to back this out again, unit tests failing.
Whiteboard: fixed-in-tracemonkey
Attachment #392157 - Attachment is obsolete: true
Attachment #392592 - Attachment is obsolete: true
Attachment #392157 - Flags: review?(pavlov)
Attachment #394957 - Flags: review?(jwalden+bmo)
Attachment #394957 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 394957 [details] [diff] [review]
make xpcshell call MaybeGC once a second via the operation callback

>+    // Trigger a GC if the working set grew by more than 32MB and at least 25%
>+    if (rss > lastRSS + 32*1024*1024 && rss > (lastRSS + lastRSS/4))
>+        JS_GC(cx);

I think we've generally stuck to C-style comments outside of jstracer.cpp; please do that here.

Please put spaces around the *, and nix the parens around |lastRSS + lastRSS / 4|.

>+static const char*
>+skipFields(const char* p, unsigned n)
>+    while (*++p)
>+        if ((*p == ' ') && (--n == 0))
>+            return p+1;
>+    return NULL;

Brace the body of the loop.

>+static int statfd = 0;

Is there any difference if this is placed inside |js_GetRSS|?  I'd prefer it there if there isn't.

>+    char buf[1024];
>+    if (!statfd) {
>+        sprintf(buf, "/proc/%d/stat", getpid());

Paranoia, but use snprintf here with some generous value like 100, and overall size like 128 (avoid page and cache faults more often, 1024 seems more likely to occasionally unlucky).

>+    if (statfd == -1)
>+        return 0;

< 0, please, equality looks like it could go haywire if you got unlucky in the right ways.

>+    lseek(statfd, 0, SEEK_SET);
>+    int n = read(statfd, buf, sizeof(buf)-1);

|if (n < 0) return 0|, looks like you'd smash the stack if you don't do this!

>+    buf[n] = 0;
>+    const char* p = strrchr(buf, ')') + 2;
>+    if (!p)
>+        return 0;

Um...I don't think !p is ever going to be the case if you add 2 here!

Few enough changes to r+, but only with the above fixed!  :-)
No longer depends on: 508128
No longer blocks: 505933
No longer blocks: 506315
static int foo inside a function is different from outside according to luke. Fixed everything else.
How is POD static in a function different from out? It ain't in C.

Whiteboard: fixed-in-tracemonkey
Backed out due to build failures.
Whiteboard: fixed-in-tracemonkey
Looks like try-server is back amongst the living so doing another run of that ...
I think I have the GC experts in this bug. So, excuse me putting this post here.

Is it possible that the GC is not triggered if you have a Javascript and you don't touch FF?

We have clear proof that Google pages eat memory when you keep FF idle. See bug 510854. However, when you touch the UI, all the memory is released. But without it can take as much memory as you want (although I did not wait longer after 30Mb of eaten memory). If you want to try by yourself, open the google book page in the bug, in 10 tabs. You can also do it with the Google main page, but then you have to wait for 4 minutes.

So, it looks that this GC related. Has anyone suggestion? It doesn't reproduce when I copy the page locally.

With the patch in this bug we GC every time the overall process memory use grows by 25% (and at least 32MB). After the GC we sample the heap size and then use 125% of that as new water mark. So what you describe shouldn't happen with the attached patch. Without it, anything is possible. The old heuristics is rather strange and hard to predict.
So, then bug 510854 is a dupe of this one? 

Can you test the google book page with your patch? I think I gave you a free test case :-)
I am pretty busy landing and debugging patches for a while. I don't think I will get around to testing this particular behavior but feel free to give it a spin.
This is known to regress tjss. Landing so we can get some data.
Whiteboard: fixed-in-tracemonkey
malloc_usable_size is not a portable function.
It's not available on Solaris.

Should I re-implement js_*alloc to get around?
e.g. for js_malloc(), do
size_t* iaddr = malloc(bytes+sizeof(size_t)); 
if (iaddr == nsnull)
  return nsnull;
iaddr[0] = bytes;
return (&iaddr[1]);

Is there another way to get around?
Thats pretty unfortunate. We should double check that we really funnel all allocations through js_malloc and friends.
I think it may waste a lot of memory, if js_malloc is commonly used for small allocations.
Is it possible to disable building memory-pressure-based GC?
We will back out the patch in the morning due to failures (it was meant to cycle overnight to collect some data). We should discuss our options in the bug. The problem isn't so much the memory pressure measurement. Its accounting for the background deallocation. If we disable that selectively, malloc_size is not needed.
We should autoconf-test for malloc_usable_size, not use a rat's nest of OS-specific ifdefs.

Whiteboard: fixed-in-tracemonkey
How's this going? How soon are we to getting this something which will fix the excessive memory usage of xpcshell on trunk? We're seeing a lot of random mochitests timeout on Tinderbox, and I think this is a major cause.
Blocks: 516458
Attached patch refreshed version (obsolete) — Splinter Review
Refreshed and fixed compiling errors for OS X 10.5.2
Attachment #394957 - Attachment is obsolete: true
gregor++. Thanks for the refresh.
Attached patch refreshSplinter Review
Another fix for linux build. 
Tryserver shows that CreateTimerQueueTimer is not supported by all windows versions (WinCE, WinMo)
Attachment #400932 - Attachment is obsolete: true
No longer blocks: 508140
Blocks: 505004
Assignee: gal → anygregor
Is this bug still relevant?  Can I WONTFIX it?
Bug 664291 is also related.
(In reply to comment #65)
> Is this bug still relevant?  Can I WONTFIX it?

We do still want to work on the general concept, but it looks like this particular bug hasn't been moving it a while. So let's WONTFIX it, and create a new bug if we decide to try again, or reopen if someone does want to revive this patch.
Closed: 13 years ago
Resolution: --- → WONTFIX
Bug 655455 is also similar, in terms of making the JS GC better about responding to memory pressure, though I think the focus there is going to be more on keeping the heap size under control.
You need to log in before you can comment on or make changes to this bug.