Closed Bug 580803 Opened 11 years ago Closed 9 years ago

TM: de-crazy-ify memory pressure handling

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: gal)

References

Details

Attachments

(1 file, 4 obsolete files)

We currently have 4 memory pressure heuristics applied in 6 different points, and 2 of them are partially overridden by the DOM code. The 4 heuristics are spread out over 5 files, which makes it hard to see just how redundant they are. This first patch tries to clean up the mess a bit and moves all the heuristics in one central place (JS_MaybeGC), makes DOM use the same set of heuristics and also has a single trigger point (also JS_MaybGC!). This means that the embedding must call JS_MaybeGC, or JS_GC to make GCs happen. To be more explicit, this eliminates last ditch GCs. The allocator path will try to grow the GC heap and the malloc heap until we get a hard error (malloc or mmap failure). If we run into the soft quotas (GC malloc bytes or GC heap bytes limits), the next call to JS_MaybeGC will cause a GC and hopefully we will get back into compliance with the soft limits.

When and from where to call JS_MaybeGC? The shell currently doesn't call it at all, which means the shell will start accumulated unlimited memory. I will fix this in the next version. The DOM already calls JS_MaybeGC in the operation callback. We should probably move this in a parallel thread that wakes up occasionally and checks for JS_MaybeGC. This would have the advantage that the GC runs on that thread and doesn't trash the main thread's cache.
Attached patch patch (obsolete) — Splinter Review
Assignee: general → gal
Attachment #459179 - Attachment is patch: true
Attachment #459179 - Attachment mime type: application/octet-stream → text/plain
Blocks: 580458
I do not agree on removing the last ditch GC, since many embeddings, in all likelihood including Firefox, rely on it.

Think about it: you have a GC heap. You need more memory. But you're up against the runtime heap-size limit. If you OOM while tons of garbage are stinking up the heap, you'll hard-stop the script and break the web. Better to GC unless you can prove there's no (or too little) garbage amid the megabytes of heap.

/be
(De-crazifying otherwise sounds good.)
Attached patch patch (obsolete) — Splinter Review
This patch schedules an operation callback if we go over quota. Ideally we probably want to do this with setrlimit and trigger the callback whenever the OS detects that we suck up too much memory, but we can do that in a second patch.

The new OOM behavior is as follows:
- If we notice that we are going over quota (GC heap or malloc), trigger an operation callback.
- If the callback notices that we are over quota, it forces a GC. If we are still over quota after the GC, we throw an OOM exception.

The callback also does MaybeGC. The DOM no longer does it.
Attachment #459179 - Attachment is obsolete: true
#2: There is minimal behavioral change here. I still force a GC approximately around the same time. The main difference is that the GC now happens in once place in all cases.

I am also moving MaybeGC into the event loop in gecko. With that we will have a shallow stack to scan and hopefully no pending requests, instead of scanning a deep stack in LastDitch.
(In reply to comment #5)
> #2: There is minimal behavioral change here.

How do you know? I mean, measure and stuff. Science!

/be
I am on it. Measuring, measuring, measuring! :) The shell behaves the same. You can allocate in a loop to your hearts desire with JIT on or off. Trying the browser now.
(In reply to comment #6)
> (In reply to comment #5)
> > #2: There is minimal behavioral change here.
> 
> How do you know? I mean, measure and stuff. Science!

The problem can only come from a long trace that allocate a lot of stuff. With fatvals this is much less likely and, if problematic, for such trace an extra operation count guard could be inserted in the middle after some threshold of potentially allocating operations. 

But this is a moot problem since we do not enforce any malloc quotas nor run the last ditch GC when malloc returns null. For this reason embeddings already faces that problem of getting OOM reports while having tons of garbage. So gal's plan is very sound.
Comment on attachment 459232 [details] [diff] [review]
patch

>diff --git a/js/src/jscntxt.h b/js/src/jscntxt.h
>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>@@ -1565,29 +1545,45 @@ struct JSRuntime {
> 
>     JSWrapObjectCallback wrapObjectCallback;
> 
>     JSRuntime();
>     ~JSRuntime();
> 
>     bool init(uint32 maxbytes);
> 
>+    void triggerGC() {
>+        gcIsNeeded = true;

The function must still trigger the operation callback. Otherwise the runtime may not notice the breach of GC limits for quite some time.

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>--- a/js/src/jsgc.cpp
>+++ b/js/src/jsgc.cpp
>@@ -636,27 +636,16 @@ ReleaseGCChunk(JSRuntime *rt, jsuword ch
>     METER(rt->gcStats.nchunks--);
>     rt->gcChunkAllocator->free(p);
> }
> 
> static JSGCArena *
> NewGCArena(JSContext *cx)
> {
>     JSRuntime *rt = cx->runtime;
>-    if (!JS_THREAD_DATA(cx)->waiveGCQuota && rt->gcBytes >= rt->gcMaxBytes) {
>-        /*
>-         * FIXME bug 524051 We cannot run a last-ditch GC on trace for now, so
>-         * just pretend we are out of memory which will throw us off trace and
>-         * we will re-try this code path from the interpreter.
>-         */
>-        if (!JS_ON_TRACE(cx))
>-            return NULL;
>-        js_TriggerGC(cx, true);

I do not spot the place that would trigger the GC when the limit is breached. So this code should remain in a simpler form that just do js_TriggerGC(cx, true) when rt->gcBytes >= rt->gcMaxBytes.

> static JSGCThing *
> RefillFinalizableFreeList(JSContext *cx, unsigned thingKind)
> {
>-        bool canGC = !JS_ON_TRACE(cx) && !JS_THREAD_DATA(cx)->waiveGCQuota;
>-        bool doGC = canGC && IsGCThresholdReached(rt);

This is righteous. We do not need to check for IsGCThresholdReached as we rely both in GC and in malloc on triggerGC scheduling the GC ASAP.
(In reply to comment #8)
> But this is a moot problem since we do not enforce any malloc quotas nor run
> the last ditch GC when malloc returns null.

We've always run it when rt->gcMaxBytes is hit, of course.

> For this reason embeddings already
> faces that problem of getting OOM reports while having tons of garbage.

That's not true if gcMaxBytes is << memory available from the OS to the process data segment. Like in Firefox.

Anyway, science will save us! Gal has a white smock on.

/be
Attached patch patch (obsolete) — Splinter Review
The patch wasn't ready for review yet. This one should be close.
Attachment #459232 - Attachment is obsolete: true
This patch does triggerGC() if we go over malloc-quota while malloc-ing or over gc heap quota while getting a new arena. Ideally, instead of checking directly this should be done through a setrlimit/getrlimit scheme where we chase an ever raising memory ceiling, and once we hit either of the quota we trigger the GC. This is a bit harder to do on windows, so I am leaving this for a follow-up bug.
Attachment #459479 - Flags: review?(brendan)
Could we got some fuzzing for this (especially gczeal).
(In reply to comment #13)
> Could we got some fuzzing for this (especially gczeal).

Early info: this patch seems to hold up fine after 10+ minutes of preliminary fuzzing. :)
Comment on attachment 459479 [details] [diff] [review]
patch

Bouncing to Igor, he is the right reviewer and I'm focused on my patch queue.

/be
Attachment #459479 - Flags: review?(brendan) → review?(igor)
Comment on attachment 459479 [details] [diff] [review]
patch

>diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp
>--- a/js/src/jscntxt.cpp
>+++ b/js/src/jscntxt.cpp
>@@ -1888,43 +1887,36 @@ js_InvokeOperationCallback(JSContext *cx
>+#ifdef JS_THREADSAFE
>+    JS_YieldRequest(cx);
>+#endif

If the callback is called under memory pressure yielding here just delays the GC.  If simpler logic is desired at least call this after the GC. And comment about that!

>@@ -1573,16 +1553,36 @@ struct JSRuntime {
>     void setGCTriggerFactor(uint32 factor);
>     void setGCLastBytes(size_t lastBytes);
> 
>-    void* malloc(size_t bytes) { return ::js_malloc(bytes); }
>-
>-    void* calloc(size_t bytes) { return ::js_calloc(bytes); }
>-
>-    void* realloc(void* p, size_t bytes) { return ::js_realloc(p, bytes); }
>+    bool gcQuotaReached() {
>+        return gcBytes >= gcMaxBytes;
>+    }
>+
>+    size_t updateMallocCounter(size_t bytes) {

Returning bytes from the function complicates following the code. So keep the type void here and update the callers accordingly.

>+        gcMallocBytes -= bytes; /* We tolerate races and lost counts here. */

gcMallocBytes should be volatile.

AFAICS it is better to call triggerGC here so the compiler would not need to load gcMallocBytes twice, once before and once after malloc/realloc while checking again for gcMallocBytes.

>+    inline void triggerGC() {
>+        if (!runtime->gcIsNeeded) {
>+            runtime->gcIsNeeded = true;
>+            JS_TriggerOperationCallback(this);

This is incompatible with thread workers. The GC callback in firefox prevents the GC to run on non-main threads. So if a worker thread hits the GC limit while the main thread busy doing non-allocation loop the GC can be delayed for long time. For now just keep the triggerAllOperationCallbacks here and file a followup bug about better scheduling of the GC from non-main thread. For example, we can add  a hook the FF can use to trigger the GC only on the main thread.

>@@ -1774,45 +1732,21 @@ RefillFinalizableFreeList(JSContext *cx,

Comment in the function that we check for memory pressure only if we need to get a new arena.
Forgot to write about r+ with the above addressed.
http://hg.mozilla.org/tracemonkey/rev/b4aeef9c6c91
Whiteboard: fixed-in-tracemonkey
I forgot to change triggerGC to notify all callbacks, not just the current context's

http://hg.mozilla.org/tracemonkey/rev/277470ab29c8
Attachment #459479 - Flags: review?(igor) → review+
The change to JS_TriggerAllOperationCallbacks introduces a deadlock.
I think you might have turned a test intermittently orange (at least as far as I can tell, it started with your first push here):

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1279929778.1279930497.23454.gz&fulltext=1

The test in question is here, and I think you're causing the first check in checkVeryLongRequestLine to fail:

http://hg.mozilla.org/tracemonkey/file/default/netwerk/test/httpserver/test/test_request_line_split_in_two_packets.js

If this problem was caused by this patch, or by another patch of yours, don't hesitate to poke me for debugging help.  If the problem is far upstream of that failure point, as I suspect it is, I can probably point you at the code that generated the faulty data more quickly than you can from reading the code.  I'm unlikely to be on IRC until Monday; catch me then if you haven't figured it out by then.
(In reply to comment #19)
> I forgot to change triggerGC to notify all callbacks, not just the current
> context's
> 
> http://hg.mozilla.org/tracemonkey/rev/277470ab29c8

In the comment 16 I asked to move the triggerGC call into updateMallocCounter for code simplicity and to avoid re-loading the malloc counter. Either do it here or in a followup.
All the code is inlined. The location of the access doesn't matter.
(In reply to comment #24)
> All the code is inlined. The location of the access doesn't matter.

malloc itself is not inlined so to implement the current:

gcMallocBytes -= size;
p = malloc(size);
if (gcMallocBytes < 0)
    triggerGC();

the compiler will have to reload gcMallocBytes. This extra load is clearly visible in the following example were test1 follows the code from the patch and test2 is the suggested change:

#include <stdlib.h>

struct A {
    long buffer[100];
    volatile long gcMallocBytes;
    volatile bool triggerGC;
};
    
void report_oom(A *a);

    
void *test1(A *a, size_t size) {
    a->gcMallocBytes -= size;
    void *p = malloc(size);
    if (a->gcMallocBytes < 0)
        a->triggerGC = true;
    if (!p)
        report_oom(a);
    return p;
}


void *test2(A *a, size_t size) {
    a->gcMallocBytes -= size;
    if (a->gcMallocBytes < 0)
        a->triggerGC = true;
    void *p = malloc(size);
    if (!p)
        report_oom(a);
    return p;
}

I compile the code with GCC for i386 using:

g++ -Os -fno-rtti -fno-exceptions -m32 -S x.cpp -Wall -Wextra

The assembly for test1:

	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$32, %esp
	movl	8(%ebp), %ebx
	movl	12(%ebp), %edx
	movl	400(%ebx), %eax
	subl	%edx, %eax
	movl	%eax, 400(%ebx)
	pushl	%edx
	call	malloc
	movl	400(%ebx), %edx
	addl	$16, %esp
	testl	%edx, %edx
	jns	.L7
	movb	$1, 404(%ebx)
.L7:
	testl	%eax, %eax
	jne	.L8
	subl	$12, %esp
	pushl	%ebx
	movl	%eax, -12(%ebp)
	call	_Z10report_oomP1A
	movl	-12(%ebp), %eax
	addl	$16, %esp
.L8:
	movl	-4(%ebp), %ebx
	leave
	ret

for test2:

	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$20, %esp
	movl	8(%ebp), %ebx
	movl	12(%ebp), %edx
	movl	400(%ebx), %eax
	subl	%edx, %eax
	movl	%eax, 400(%ebx)
	movl	400(%ebx), %eax
	testl	%eax, %eax
	jns	.L2
	movb	$1, 404(%ebx)
.L2:
	subl	$12, %esp
	pushl	%edx
	call	malloc
	addl	$16, %esp
	testl	%eax, %eax
	jne	.L3
	subl	$12, %esp
	pushl	%ebx
	movl	%eax, -12(%ebp)
	call	_Z10report_oomP1A
	movl	-12(%ebp), %eax
	addl	$16, %esp
.L3:
	movl	-4(%ebp), %ebx
	leave
	ret

Here for test1 the compiler needs to issue movl	400(%ebx), %eax twice while for test2 only single load is present.
Are we seriously arguing here about an extra load surrounding a malloc? You known, malloc, the thing that does mmap, locking, touching of lots of memory, and stuff like that. This branch is predicted with near 100% accuracy. What problem are we solving here?
Attachment #460066 - Flags: review?(igor)
(In reply to comment #26)
> Are we seriously arguing here about an extra load surrounding a malloc?

Comment 24 contains:
> All the code is inlined. The location of the access doesn't matter.

My previous comment shows that it does matter. I was not arguing about the relevance of the optimization as the comment 24 also does not talk about it.

In the comment 23 I asked about this change because the resulting code will be simpler and gives more opportunities for a compiler to optimize. "Simpler" here because updating and checking the counter in the same function is easy to follow than to look into two functions one of which named reportIfOutOfMemory, a name that does not hint about the counter check.
Comment on attachment 460066 [details] [diff] [review]
move gc triggering into the runtime

Thanks, this is much nicer!
Attachment #460066 - Flags: review?(igor) → review+
(In reply to comment #12)
> Ideally, instead of checking directly
> this should be done through a setrlimit/getrlimit scheme where we chase an ever
> raising memory ceiling, and once we hit either of the quota we trigger the GC.
> This is a bit harder to do on windows, so I am leaving this for a follow-up
> bug.

Filed yet?

/be
Whiteboard: fixed-in-tracemonkey
Attached patch patchSplinter Review
Backout patch.

This sadly also loses the waiveGC elimination. Sigh.
Attachment #459479 - Attachment is obsolete: true
Attachment #460066 - Attachment is obsolete: true
Backed out.

http://hg.mozilla.org/tracemonkey/rev/babf32a45349
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Why INVALIDate this bug? No one wants crazy-fied code. We may have to defer it to post firefox 4, if there's no benchmark-based need, but why throw the bug away? It loses information. I would hate to see someone reinvent as much of the wheel as we have here.

Bugzilla is a searchable knowledge base. I even save bugmail for faster searching of my well-curated corpus.

/be
Could we have for the record the list of issues that triggered orange?
Understanding all the orange would be good but might take a while.

It also in my best guess would not leave us with no need for last ditch GC in the current state of the codebase minus whatever bugs result in the orange. I could be wrong, but the whole "precognitive GC scheduling" idea is unquantified and probabilistic.

Even without bugs, it has non-zero likelihood of resulting in OOM with heap full of garbage. I don't think we can take that chance, and again: I have seen no profile data showing we need to take that chance *right now*.

/be
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to comment #35)
> Understanding all the orange would be good but might take a while.

There are two unrelated parts in the patch. The first one is eliminating the last ditch GC. The second is changing JSContext::malloc and friends to use simplified memory pressure detection. To make sure that at least that part is sound I am going to factor that out into separated bug.

> It also in my best guess would not leave us with no need for last ditch GC in
> the current state of the codebase minus whatever bugs result in the orange. I
> could be wrong, but the whole "precognitive GC scheduling" idea is unquantified
> and probabilistic.

I do not see how that could be unsound. If js_TriggerGC does not work, it implies that operation callback support is broken. But those bugs can be fixed. At worst it may be necessary to inject the callback checks into long traces or few places like the parser.

> 
> Even without bugs, it has non-zero likelihood of resulting in OOM with heap
> full of garbage.

Yet SpiderMonkey had not run the last-ditch GC for ages when malloc returns NULL. So I really see no fundamental difference between that and the OOM in the GC when it tries to bring more pages. 

> I don't think we can take that chance, and again: I have seen
> no profile data showing we need to take that chance *right now*.

Elimination of the last-ditch GC helps to make code safer since it removes the number of cases when the GC can run. Granted, with conservative GC this is less relevant, but still the knowledge that NewGCThing does not run the GC makes it much easy to proof the correctness.
Depends on: 582718
(Igor, if you want to grab this bug, go for it)
(In reply to comment #36)
> There are two unrelated parts in the patch. The first one is eliminating the
> last ditch GC. The second is changing JSContext::malloc and friends to use
> simplified memory pressure detection. To make sure that at least that part is
> sound I am going to factor that out into separated bug.

Great -- smaller is better if possible.

> > It also in my best guess would not leave us with no need for last ditch GC in
> > the current state of the codebase minus whatever bugs result in the orange. I
> > could be wrong, but the whole "precognitive GC scheduling" idea is unquantified
> > and probabilistic.
> 
> I do not see how that could be unsound. If js_TriggerGC does not work, it
> implies that operation callback support is broken.

No, I'm saying if you don't run a last ditch GC from the allocator slow path, you will hit OOM right there -- triggering a GC later or earlier may not help.

/be
(In reply to comment #38)
> No, I'm saying if you don't run a last ditch GC from the allocator slow path,
> you will hit OOM right there -- triggering a GC later or earlier may not help.

Yes, but (I have to repeat myself) how this is different from hitting OOM on malloc?
(In reply to comment #39)
> (In reply to comment #38)
> > No, I'm saying if you don't run a last ditch GC from the allocator slow path,
> > you will hit OOM right there -- triggering a GC later or earlier may not help.
> 
> Yes, but (I have to repeat myself) how this is different from hitting OOM on
> malloc?

It is very different! If the OS actually fails malloc (many do not), you've used up the heap due to a leak or real workload demand

If we hit the last ditch, the heap may instead be full of dead temps and ripe for GC and OOM is not only user-hostile or embedding-hostile, it is unnecessary and completely backward-incompatible.

/be
(In reply to comment #40)
> (In reply to comment #39)
> > (In reply to comment #38)
> > > No, I'm saying if you don't run a last ditch GC from the allocator slow path,
> > > you will hit OOM right there -- triggering a GC later or earlier may not help.
> > 
> > Yes, but (I have to repeat myself) how this is different from hitting OOM on
> > malloc?
> 
> It is very different! If the OS actually fails malloc (many do not), you've
> used up the heap due to a leak or real workload demand
> 
> If we hit the last ditch, the heap

Oh, and different heap here -- sorry. JS GC heap != OS data segment. JS GC heap << OS data segment.

/be
(In reply to comment #41)
> Oh, and different heap here -- sorry. JS GC heap != OS data segment. JS GC heap
> << OS data segment.

The patch explicitly turns the hard GC heap quota into a soft one that can be breached. Thus with the patch NewGCThing can only return null when the mmap fails which is exactly the same condition for jemalloc to return null.

So I see no differences here especially given the fact that the trace code were operating under this soft quota regime for quite some time, see waiveGC usage in jsgc.cpp.
We need to figure out why JS_GetStringBytes returned "" instead of the expected big string. I have a hard time believing mmap returned 0, but it's possible.

/be
Still a valid non-TM bug?
This is no longer relevant - the compartment GC changed a lot in this area.
Status: REOPENED → RESOLVED
Closed: 11 years ago9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.