Experiment with memory-pressure-based GC scheduler

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
9 years ago
7 years ago

People

(Reporter: dmandelin, Assigned: gwagner)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 14 obsolete attachments)

40.50 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

9 years ago
The starting point here is that we know that the current JS GC scheduler is not particularly performant, and we'd like to try some new ideas. I'll start with one idea to narrow this down to a sane level.

First, consider a copying generational GC with a nursery (which is of course not what we do). This design is used when a high fraction of objects created are short-lived. They can then be collected early, and at low cost, by doing a 'minor' collection on the nursery. In the basic design of this kind of GC, a 'minor' collection is done exactly when the nursery is full. Thus, if the nursery size is S, a minor collection is done when S bytes have been allocated since the last GC, and the main scheduling parameter is the size of the nursery.

Now, let's follow the noble tradition of pretending a solution to one problem will work on a related but different problem. For SpiderMonkey, this would mean doing a collection once S bytes have been allocated since the previous collection. Here, bytes allocated are ideally computed thus:

- For NewGCThing(), the number of bytes allocated (8).

- For JS_malloc(),
     if the pointer gets stored in a transitively GC'd object, the size
     otherwise 0

- For JS_realloc(),
     if the pointer gets stored in a transitively GC'd object, (newsize-oldsize)
     otherwise 0

Trying this out requires creating some new allocation APIs and changing old uses:

  JS_malloc   -- for stuff that won't go under GC control
  JS_mallocGC -- for stuff that will go under GC control
  JS_realloc
  JS_reallocDouble -- stuff under GC control, when caller knows he is doubling the size
  JS_reallocFromTo -- caller gives old bytes as explicit argument
  JS_reallocWeirdness -- we might need some more ones to cover other cases

The experiment is to set up this memory accounting and GC scheduling design, and then run some tests. The parameter S will vary from test to test. For each test, max heap size, GC pause time distribution, and overall GC overhead (as a fraction of total JS time) should be measured. The same measurements should be taken for the current scheduler. Picking a workload is hard but we can start with SunSpider, v8, and bug 504640.
"Double" is confusing in light of jsdouble, and realloc is the wrong API (too prone to leak on OOM, overflow of size_t, etc.) But I do like "Weirdness" ;-)

Main point here is to suggest different name scheme, probably using Grow and Shrink to separate direction by method.

/be
But I'm not bikeshedding -- the usability of realloc is poor, whatever its name. We want different grow&shrink APIs, also split as Dave says by growth policy. The names are secondary but of course if it no longer looks like realloc, it should not be named anything that matches *realloc*.

/be

Updated

9 years ago
Blocks: 505933

Updated

9 years ago
Assignee: general → gal

Comment 3

9 years ago
Created attachment 390688 [details] [diff] [review]
patch

The attached patch implements an experimental actual-RSS size-based GC heuristics. We sample the RSS size after every GC, and trigger a new GC once we allocate 20% more pages than after the last GC. Delayed freeing during GC is accounted for.

Allocating JSGCThings is no longer tracked, and is no longer part of the heuristics. As far as I can tell triggering a GC based on JSGCThings was mostly a kludge to trigger a GC based on the DOM memory contribution per JSGCThing which we can't measure direct. Since we measure total process memory use now, thats no longer necessary. A JS program is welcome to use the entire (fairly limited) 32MB JS heap and we GC last ditch if we have to, if overall (malloc) memory pressure didn't trigger a GC already.

The patch results in about 10% better average memory use on the very memory happy http://people.mozilla.org/~jmuizelaar/world-map-fn.html benchmark.

Comment 4

9 years ago
Created attachment 390692 [details] [diff] [review]
patch

Cleaned up the patch a bit. malloc_size() support for win32 and linux. Getting the RSS for win and linux still missing. I am not sure whether we currently inhibit GC during startup the same way we did before. If not, I would like to add a simple N s delay using js_IntervalNow().
Attachment #390688 - Attachment is obsolete: true

Comment 5

9 years ago
Created attachment 390751 [details] [diff] [review]
patch, still doesn't build on windows and linux, but passes try server (with some leaks? probably not mine) on macosx
Attachment #390692 - Attachment is obsolete: true

Comment 6

9 years ago
Could some windows and linux fanboys help out with the patch? The windows code is in place but doesn't compile. For linux I think we have to sscanf /proc/self/stat, which is borderline funny. Maybe we can mmap(/proc/self/stat), and do a memory sscanf?
I think you want to just seek and read; proc files are pretty quick to access like that.

Comment 8

9 years ago
js_GetRSS() sits in MaybeGC which the browser calls a bunch during startup, so I want to make sure its fast Otoh we probably want to return a fake number for js_GetRSS() anyway to avoid GC during startup. I guess we will figure it out by measuring once the patch compiles everywhere.
(Reporter)

Comment 9

9 years ago
I can fix the Windows part for you but the latest (3rd) patch doesn't apply to tip. Please rebase or tell me your parent rev.

Comment 10

9 years ago
Created attachment 390896 [details] [diff] [review]
patch

Untested linux support.
Attachment #390751 - Attachment is obsolete: true

Comment 11

9 years ago
Created attachment 390902 [details] [diff] [review]
patch with working linux support (tested in shell only)
Attachment #390896 - Attachment is obsolete: true
(Reporter)

Comment 12

9 years ago
Created attachment 390908 [details] [diff] [review]
Patch with untested windows support
(Reporter)

Comment 13

9 years ago
Created attachment 390909 [details] [diff] [review]
Patch with untested windows support, v2

Previous version didn't incorporate the latest linux additions.
Attachment #390908 - Attachment is obsolete: true
(Reporter)

Updated

9 years ago
Attachment #390902 - Attachment is obsolete: true
(Reporter)

Comment 14

9 years ago
Updated patch seems to work well on Windows. It did fine on the image evolution page, which the old GC (before it was specifically fixed for that problem) failed on.

The more I think about it, the more I like the idea of *not* doing extra memory accounting outside of the allocator.
(Reporter)

Comment 15

9 years ago
Updating title to reflect patch content. We can refile if we end up wanting to revive the original idea.
Summary: Experiment with total-bytes-based GC scheduler → Experiment with memory-pressure-based GC scheduler

Comment 16

9 years ago
Created attachment 391041 [details] [diff] [review]
latest and greatest

This patch combined with the parallel sweeping is really improving our behavior on the box2d examples (pretty dramatic improvement in comparison to 3.5.1).

http://box2d-js.sourceforge.net/index2.html
Attachment #390909 - Attachment is obsolete: true

Updated

9 years ago
Attachment #391041 - Flags: review?(igor)

Comment 17

9 years ago
Comment on attachment 391041 [details] [diff] [review]
latest and greatest

Still have to sort out the exact balancing of the heuristics and WinCE support. I will have igor take a look in the meantime.

igor: any suggestions what to do with trigger_factor? do we want to try to somehow calculate that value backwards into something useful in the world?

Comment 18

9 years ago
... in the new world?

Comment 19

9 years ago
We pass talos, startup and tp_rss look good. mochitests seem to die on Linux and WIN32:

Linux

86709 INFO TEST-PASS | /tests/toolkit/content/tests/widgets/test_panelfrommenu.xul | focus blurred on panel from toolbarbutton open
86710 INFO TEST-PASS | /tests/toolkit/content/tests/widgets/test_panelfrommenu.xul | focus not modified on cursor down from toolbarbutton
86712 INFO Running /tests/toolkit/content/tests/widgets/test_popup_attribute.xul...

command timed out: 300 seconds without output, killing pid 24364
process killed by signal 9
program finished with exit code -1
elapsedTime=2159.091762

MacOSX had an error but that looks spurious:

TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/places/tests/perf/browser_ui_history_sidebar.js | Timed out

Win32 had an error too:

1900 INFO Running chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul...
1901 INFO TEST-PASS | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul | activeWindow
1902 INFO TEST-PASS | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul |  child document hasFocus
1903 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/dom/tests/mochitest/chrome/test_focus.xul | [SimpleTest/SimpleTest.js, window.onerror] An error occurred: [ fm.focusedElement is null ]

command timed out: 300 seconds without output
program finished with exit code 1
elapsedTime=364.031000

Comment 20

9 years ago
(In reply to comment #17)
> igor: any suggestions what to do with trigger_factor? do we want to try to
> somehow calculate that value backwards into something useful in the world?

I suggest to remove this and any other no-longer used consts from the API, fine-tune the patch for best performance with the browser and add new API like setting the growth factor for RSS or similar. Embeddings as far as I remember rarely do any tuning of GC parameters besides calling MaybeGC. If by default MaybeGC and friends will do the right job for the browser, then the embedders should not have any troubles with code-porting.

Comment 21

9 years ago
(In reply to comment #14)
> The more I think about it, the more I like the idea of *not* doing extra memory
> accounting outside of the allocator.

IMO ideally we should have universal GC-allocator for things of arbitrary sizes and do accounting based on big chunks of mmap-allocated memory. But with the current mixture of malloc and GC-based allocations some extra accounting could be necessary. Otherwise it is hard to avoid situations when the GC thinks that its memory is over-used while having mostly-freed malloced memory or vise-verse.

Updated

9 years ago
Blocks: 506315

Comment 22

9 years ago
(In reply to comment #16)
> Created an attachment (id=391041) [details]
> latest and greatest

Does it passes tests in single-threaded builds?

Comment 23

9 years ago
#22: I have a mochitest failure. Debugging that.

Comment 24

9 years ago
Created attachment 392041 [details] [diff] [review]
patch
Attachment #391041 - Attachment is obsolete: true
Attachment #391041 - Flags: review?(igor)

Comment 25

9 years ago
For the life of me I can't reproduce the mochitest failure tinderbox is giving me. I tried mac and linux (in a vm). Will try windows in a vm now.

Comment 26

9 years ago
Ok, I try-servered this again:

linux talos try01 passed. linux talos try02 crashed. wtf?

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249105825.37.1249108480.14413.gz
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249105825.37.1249107101.32056.gz

OSX unit tests has a bunch of leaks but I am going to claim thats some unrelated crap in TM tip.

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249102976.1249112136.21358.gz

Patch passed talos on 2 WINNT columns. On try02 we got a ts_shutdown of 584.95. On try04 ts_shutdown was 296.75. Aroo?

WINNT unit tests also show a leak:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1249102976.1249111240.11614.gz

Vista talos passed too.

I think I definitively slow down ts with this patch. I will submit one that avoids GC in MaybeGC until startup is over.

Comment 27

9 years ago
Created attachment 392064 [details] [diff] [review]
don't let MaybeGC trigger a GC during startup (first 10s)

This patch adopts the same cheesy startup 10s timer idea we use in the DOM elsewhere to suppress forced GCs (via JS_GC) during startup. MaybeGC is supposed to be called when the engine is idle. We really shouldn't be called it during startup (we call it some 30 times right now). I will defer to Johnny to oversee those calls to be removed. Until then this patch works around it.
Attachment #392041 - Attachment is obsolete: true

Comment 28

9 years ago
This seems to have fixed the startup time issue. I am still seeing all sorts of random test failures, but nothing that could reasonably be related to my patch.

Comment 29

9 years ago
Created attachment 392157 [details] [diff] [review]
ready for review
Attachment #392064 - Attachment is obsolete: true

Updated

9 years ago
Attachment #392157 - Flags: review?(igor)

Updated

9 years ago
Attachment #392157 - Flags: review?(pavlov)

Comment 30

9 years ago
With the patch the monster 3d demo peaks at about 10mb less memory use than 3.5.2 (160mb vs 170mb for 3.5.2). An empty google page started up is 60mb vs 59mb (3.5.2). All numbers for x86 MACOSX.

http://deanm.github.com/pre3d/monster.html

Mobile might want to look at more aggressive GC heuristics. Now that we have an idea of the working set size, on mobile we could constantly GC really hard if we get near the device limit (beats running out of memory).

Updated

9 years ago
Blocks: 508140

Comment 31

9 years ago
Comment on attachment 392157 [details] [diff] [review]
ready for review

>@@ -2553,41 +2488,39 @@ JS_IsAboutToBeFinalized(JSContext *cx, v
> JS_PUBLIC_API(void)
> JS_SetGCParameter(JSRuntime *rt, JSGCParamKey key, uint32 value)
> {
>     switch (key) {
>       case JSGC_MAX_BYTES:
>         rt->gcMaxBytes = value;
>         break;
>       case JSGC_MAX_MALLOC_BYTES:
>-        rt->gcMaxMallocBytes = value;
>         break;
>       case JSGC_STACKPOOL_LIFESPAN:
>         rt->gcEmptyArenaPoolLifespan = value;
>         break;
>       default:
>         JS_ASSERT(key == JSGC_TRIGGER_FACTOR);
>         JS_ASSERT(value >= 100);
>-        rt->setGCTriggerFactor(value);
>         return;

We should remove JSGC_MAX_MALLOC_BYTES and JSGC_TRIGGER_FACTOR. The latter is not present in any public release of SM and If some embedders would complain about JSGC_MAX_MALLOC_BYTES, we can consider restoring this particular parameter. 

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>-    doGC = IsGCThresholdReached(rt);
>+    doGC = rt->gcIsNeeded;
>     for (;;) {
>         if (doGC
> #ifdef JS_TRACER
>             && !JS_ON_TRACE(cx) && !JS_TRACE_MONITOR(cx).useReservedObjects
> #endif
>             ) {
>             /*
>              * Keep rt->gcLock across the call into js_GC so we don't starve
>@@ -1856,23 +1799,21 @@ NewGCThing(JSContext *cx, uintN flags)
>         if (thing) {
>             arenaList->freeList = thing->next;
>             flagp = thing->flagp;
>             JS_ASSERT(*flagp & GCF_FINAL);
> 
> #ifdef JS_THREADSAFE
>             /*
>              * Refill the local free list by taking several things from the
>-             * global free list unless the free list is already populated or
>-             * we are still at rt->gcMaxMallocBytes barrier. The former is
>+             * global free list unless the free list is already populated.
>+             * we are still at rt->gcMaxMallocBytes barrier. This is
>              * caused via allocating new things in gcCallback(cx, 

The comments changes looks semi-finished.
Attachment #392157 - Flags: review?(igor) → review+

Updated

9 years ago
Blocks: 505308

Updated

9 years ago
Duplicate of this bug: 506315

Updated

9 years ago
Duplicate of this bug: 505933

Comment 34

9 years ago
http://hg.mozilla.org/tracemonkey/rev/9fbf8217bd47
Whiteboard: fixed-in-tracemonkey

Comment 35

9 years ago
Created attachment 392592 [details] [diff] [review]
follow-up patch to remove unused pref and gc tuning parameters from DOM code
Attachment #392592 - Flags: review?(igor)

Updated

9 years ago
Attachment #392592 - Flags: review?(igor) → review+

Comment 37

9 years ago
Increased ceiling to 125% (from 118.5%), trying to identify tjss regression cause.

Comment 39

9 years ago
We are hitting a bunch of OOM crashes on talos nochrome. MaybeGC is not called or not called sufficiently I assume. Brendan suggest calling MaybeGC from the engine as well instead of relying on the embedding.

Comment 40

9 years ago
Backed out. I will look at the tjss issue first.

http://hg.mozilla.org/tracemonkey/rev/809e7d5f5b78

Updated

9 years ago
Depends on: 508128

Comment 41

9 years ago
had to back this out again, unit tests failing.

Updated

9 years ago
Whiteboard: fixed-in-tracemonkey

Comment 42

9 years ago
Created attachment 394957 [details] [diff] [review]
make xpcshell call MaybeGC once a second via the operation callback
Attachment #392157 - Attachment is obsolete: true
Attachment #392592 - Attachment is obsolete: true
Attachment #392157 - Flags: review?(pavlov)

Updated

9 years ago
Attachment #394957 - Flags: review?(jwalden+bmo)

Updated

9 years ago
Attachment #394957 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 394957 [details] [diff] [review]
make xpcshell call MaybeGC once a second via the operation callback

>+    // Trigger a GC if the working set grew by more than 32MB and at least 25%
>+    if (rss > lastRSS + 32*1024*1024 && rss > (lastRSS + lastRSS/4))
>+        JS_GC(cx);

I think we've generally stuck to C-style comments outside of jstracer.cpp; please do that here.

Please put spaces around the *, and nix the parens around |lastRSS + lastRSS / 4|.


>+static const char*
>+skipFields(const char* p, unsigned n)
>+{
>+    while (*++p)
>+        if ((*p == ' ') && (--n == 0))
>+            return p+1;
>+    return NULL;
>+}

Brace the body of the loop.


>+static int statfd = 0;

Is there any difference if this is placed inside |js_GetRSS|?  I'd prefer it there if there isn't.


>+size_t
>+js_GetRSS()
>+{
>+    char buf[1024];
>+    if (!statfd) {
>+        sprintf(buf, "/proc/%d/stat", getpid());

Paranoia, but use snprintf here with some generous value like 100, and overall size like 128 (avoid page and cache faults more often, 1024 seems more likely to occasionally unlucky).


>+    if (statfd == -1)
>+        return 0;

< 0, please, equality looks like it could go haywire if you got unlucky in the right ways.


>+    lseek(statfd, 0, SEEK_SET);
>+    int n = read(statfd, buf, sizeof(buf)-1);

|if (n < 0) return 0|, looks like you'd smash the stack if you don't do this!


>+    buf[n] = 0;
>+    const char* p = strrchr(buf, ')') + 2;
>+    if (!p)
>+        return 0;

Um...I don't think !p is ever going to be the case if you add 2 here!


Few enough changes to r+, but only with the above fixed!  :-)

Updated

9 years ago
Duplicate of this bug: 508128
No longer depends on: 508128
No longer blocks: 505933
No longer blocks: 506315

Comment 45

9 years ago
static int foo inside a function is different from outside according to luke. Fixed everything else.
How is POD static in a function different from out? It ain't in C.

/be

Updated

9 years ago
Whiteboard: fixed-in-tracemonkey

Comment 48

9 years ago
Backed out due to build failures.
Whiteboard: fixed-in-tracemonkey

Comment 50

9 years ago
Looks like try-server is back amongst the living so doing another run of that ...

Comment 51

9 years ago
I think I have the GC experts in this bug. So, excuse me putting this post here.

Is it possible that the GC is not triggered if you have a Javascript and you don't touch FF?

We have clear proof that Google pages eat memory when you keep FF idle. See bug 510854. However, when you touch the UI, all the memory is released. But without it can take as much memory as you want (although I did not wait longer after 30Mb of eaten memory). If you want to try by yourself, open the google book page in the bug, in 10 tabs. You can also do it with the Google main page, but then you have to wait for 4 minutes.

So, it looks that this GC related. Has anyone suggestion? It doesn't reproduce when I copy the page locally.

Lucas

Comment 52

9 years ago
With the patch in this bug we GC every time the overall process memory use grows by 25% (and at least 32MB). After the GC we sample the heap size and then use 125% of that as new water mark. So what you describe shouldn't happen with the attached patch. Without it, anything is possible. The old heuristics is rather strange and hard to predict.

Comment 53

9 years ago
So, then bug 510854 is a dupe of this one? 

Can you test the google book page with your patch? I think I gave you a free test case :-)

Comment 54

9 years ago
I am pretty busy landing and debugging patches for a while. I don't think I will get around to testing this particular behavior but feel free to give it a spin.

Comment 55

9 years ago
This is known to regress tjss. Landing so we can get some data.

http://hg.mozilla.org/tracemonkey/rev/9b6b17a275ec
Whiteboard: fixed-in-tracemonkey

Comment 56

9 years ago
malloc_usable_size is not a portable function.
It's not available on Solaris.

Should I re-implement js_*alloc to get around?
e.g. for js_malloc(), do
{
size_t* iaddr = malloc(bytes+sizeof(size_t)); 
if (iaddr == nsnull)
  return nsnull;
iaddr[0] = bytes;
return (&iaddr[1]);
}

Is there another way to get around?

Comment 57

9 years ago
Thats pretty unfortunate. We should double check that we really funnel all allocations through js_malloc and friends.

Comment 58

9 years ago
I think it may waste a lot of memory, if js_malloc is commonly used for small allocations.
Is it possible to disable building memory-pressure-based GC?

Comment 59

9 years ago
We will back out the patch in the morning due to failures (it was meant to cycle overnight to collect some data). We should discuss our options in the bug. The problem isn't so much the memory pressure measurement. Its accounting for the background deallocation. If we disable that selectively, malloc_size is not needed.
We should autoconf-test for malloc_usable_size, not use a rat's nest of OS-specific ifdefs.

/be

Updated

9 years ago
Whiteboard: fixed-in-tracemonkey
How's this going? How soon are we to getting this something which will fix the excessive memory usage of xpcshell on trunk? We're seeing a lot of random mochitests timeout on Tinderbox, and I think this is a major cause.

Updated

9 years ago
Blocks: 516458

Comment 62

9 years ago
Created attachment 400932 [details] [diff] [review]
refreshed version

Refreshed and fixed compiling errors for OS X 10.5.2
Attachment #394957 - Attachment is obsolete: true

Comment 63

9 years ago
gregor++. Thanks for the refresh.

Comment 64

9 years ago
Created attachment 401038 [details] [diff] [review]
refresh

Another fix for linux build. 
Tryserver shows that CreateTimerQueueTimer is not supported by all windows versions (WinCE, WinMo)
Attachment #400932 - Attachment is obsolete: true

Updated

9 years ago
No longer blocks: 508140
(Reporter)

Updated

8 years ago
Blocks: 505004

Updated

8 years ago
Assignee: gal → anygregor
Is this bug still relevant?  Can I WONTFIX it?
Bug 664291 is also related.
(Reporter)

Comment 67

7 years ago
(In reply to comment #65)
> Is this bug still relevant?  Can I WONTFIX it?

We do still want to work on the general concept, but it looks like this particular bug hasn't been moving it a while. So let's WONTFIX it, and create a new bug if we decide to try again, or reopen if someone does want to revive this patch.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
Bug 655455 is also similar, in terms of making the JS GC better about responding to memory pressure, though I think the focus there is going to be more on keeping the heap size under control.
You need to log in before you can comment on or make changes to this bug.