Last Comment Bug 746253 - OOM when creating a lot of garbage without pauses with IGC on
: OOM when creating a lot of garbage without pauses with IGC on
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla15
Assigned To: Till Schneidereit [till] (pto until Dec 5)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks: gecko-games 735099
  Show dependency treegraph
 
Reported: 2012-04-17 11:28 PDT by Alon Zakai (:azakai)
Modified: 2012-05-10 07:59 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
BananaBread very early unoptimized build (3.90 MB, application/octet-stream)
2012-04-17 11:28 PDT, Alon Zakai (:azakai)
no flags Details
patch (3.88 KB, patch)
2012-05-02 18:48 PDT, [PTO to Dec5] Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs (2.70 KB, patch)
2012-05-03 04:31 PDT, Till Schneidereit [till] (pto until Dec 5)
igor: review+
Details | Diff | Splinter Review

Description Alon Zakai (:azakai) 2012-04-17 11:28:06 PDT
Created attachment 615804 [details]
BananaBread very early unoptimized build

STR:

1. Unpack attached archive
2. python -m SimpleHTTPServer 8888
3. Browse to localhost:8888/client.html

In Firefox, memory usage rises dramatically until we run out of it, and there is a silent failure (the only way you can tell an OOM happened is to view memory usage on your machine). In Chrome, memory usage rises very little, and loading proceeds up to a later point where a "Shader compilation halt" exception is thrown (this is expected).
Comment 1 Alon Zakai (:azakai) 2012-04-17 11:31:50 PDT
(The expected exception can be seen in the web console.)
Comment 2 Alon Zakai (:azakai) 2012-04-18 09:52:47 PDT
This only happens when javascript.options.mem.gc_incremental is set to true. So it looks like when IGC is on, garbage accumulates until we eventually OOM, it is not collected quickly enough.

There are 2 bugs here, though. Aside from garbage not being collected quickly enough, there is the question of why garbage is accumulating at all. The code itself is emscripten-compiled and does not generate any significant amount of garbage. So perhaps this is TI-generated stuff? (Note that all the unnecessary GCing probably explains why it takes much longer to load in Firefox than Chrome.)
Comment 3 Andrew McCreight [:mccr8] 2012-04-18 11:38:05 PDT
What does about:memory look like when the memory usage is rising?  In other words, what is it that is using more and more memory?
Comment 4 Alon Zakai (:azakai) 2012-04-18 11:49:22 PDT
The problem is that the memory increase happens during a continuous span of processing, without returning to the browser event loop, the UI hangs during that time. So I can't open about:memory until the OOM happens and memory goes back down.

Actually come to think of it, that might be connected to the OOM, if IGC only runs when code stops running.
Comment 5 Alon Zakai (:azakai) 2012-04-21 18:54:07 PDT
Turns out there was a bug in the JS here that did generate garbage, so GCing is expected in this testcase. Still, there remains the bug shown here that when garbage is created quickly before returning to the browser event loop, we can OOM with IGC on.
Comment 6 [PTO to Dec5] Bill McCloskey (:billm) 2012-05-01 16:38:18 PDT
I'll take a look at this soon.
Comment 7 [PTO to Dec5] Bill McCloskey (:billm) 2012-05-02 18:48:36 PDT
Created attachment 620548 [details] [diff] [review]
patch

This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When one of these triggered, we would start incremental GC. However, since the event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc bytes were used. There are checks in place if gcBytes grows too quickly, but not gcMallocBytes. This fixes that problem.
Comment 8 Till Schneidereit [till] (pto until Dec 5) 2012-05-03 02:55:03 PDT
This problem indicates that there is value in having gcMallocBytes tracked in the runtime, too, as opposed to what I was saying in bug 723350. Maybe the solution for that would be to set a flag on the runtime as soon as at least one compartment has triggered a TOO_MUCH_MALLOC GC?

I'll test with the attached test-case and a modified version of the patch from comment 7 and report back.
Comment 9 Igor Bukanov 2012-05-03 02:58:39 PDT
(In reply to Bill McCloskey (:billm) from comment #7)
> Created attachment 620548 [details] [diff] [review]
> patch
> 
> This seems to fix the problem. All these GCs are TOO_MUCH_MALLOC GCs. When
> one of these triggered, we would start incremental GC. However, since the
> event loop wasn't spinning, we wouldn't GC until the next 100MB of malloc
> bytes were used. There are checks in place if gcBytes grows too quickly, but
> not gcMallocBytes. This fixes that problem.

But what is exactly the problem then? Bad incremental GC scheduling so its slice does not run as needed?
Comment 10 Till Schneidereit [till] (pto until Dec 5) 2012-05-03 04:31:54 PDT
Created attachment 620649 [details] [diff] [review]
Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs

This patch builds on Bug 723350 and is strictly a reduced version of billm's patch that leaves out the runtime trigger.

Testing with azakai's test-case shows that it effectively removes the unbounded memory growth and leads to non-incremental GCs that only collect the affected compartment instead of being global.
Comment 11 [PTO to Dec5] Bill McCloskey (:billm) 2012-05-03 08:10:43 PDT
(In reply to Igor Bukanov from comment #9)
> But what is exactly the problem then? Bad incremental GC scheduling so its
> slice does not run as needed?

Yes, it's bad incremental scheduling. Once an incremental GC starts, there are three reasons why we'll do more slices:

1. There's a 100ms timer in nsJSEnvironment.
2. If we draw a frame, we do another slice.
3. If gcBytes grows too large, we'll immediately finish the GC.

Since we never run the event loop in this code, #1 and #2 will not trigger. And since the demo allocates a lot of malloc bytes but almost no JS objects, we won't hit #3.

We will do a slice every time the malloc trigger hits, which is every 100MB of malloc allocation. But this is not frequent enough to stop us from OOMing.

Trigger #3 is a "safety valve" to ensure that we GC frequently enough. The problem is that it wasn't tracking malloc bytes. This patch makes it so it does.

I'm fine with Till's patch since it fixes the problem as well.
Comment 12 Igor Bukanov 2012-05-04 02:01:21 PDT
Comment on attachment 620649 [details] [diff] [review]
Use non-incremental GCs for compartmental TOO_MUCH_MALLOC GCs

Review of attachment 620649 [details] [diff] [review]:
-----------------------------------------------------------------

As a spot fix I like this better.
Comment 13 Igor Bukanov 2012-05-04 02:04:19 PDT
(In reply to Bill McCloskey (:billm) from comment #11)
> 1. There's a 100ms timer in nsJSEnvironment.
> 2. If we draw a frame, we do another slice.
> 3. If gcBytes grows too large, we'll immediately finish the GC.
> 
> Since we never run the event loop in this code, #1 and #2 will not trigger.
> And since the demo allocates a lot of malloc bytes but almost no JS objects,
> we won't hit #3.

We have all that wonderful operational callback infrastructure including the extra thread that reliably triggers the callback to stop the long-running script or do other things. We should use that to trigger the incremental GC scheduling reliably.
Comment 14 Till Schneidereit [till] (pto until Dec 5) 2012-05-04 02:13:44 PDT
Thanks for the review, Igor!
Comment 15 [PTO to Dec5] Bill McCloskey (:billm) 2012-05-04 14:30:08 PDT
(In reply to Igor Bukanov from comment #13)
> We have all that wonderful operational callback infrastructure including the
> extra thread that reliably triggers the callback to stop the long-running
> script or do other things. We should use that to trigger the incremental GC
> scheduling reliably.

Yeah, I guess it would be nice to use that to trigger the incremental slices.

However, I still don't see this patch as a spot fix. I've tried to design the incremental GC heuristics so that we always start the GC before any allocation triggers hit. If we do hit an allocation trigger (like gcTriggerBytes or gcMaxMallocBytes) then I think we should always do a non-incremental GC right away.

It would probably make sense to start an incremental GC from MaybeGC if gcMallocBytes == gcMaxMallocBytes/2, or something like that. I'm not sure how useful that would be in practice, though.
Comment 16 Ryan VanderMeulen [:RyanVM] 2012-05-04 19:41:14 PDT
Assuming that attachment 620649 [details] [diff] [review] is the patch intended for checkin, it does not apply cleanly to inbound. Please rebase and re-request.
Comment 17 [PTO to Dec5] Bill McCloskey (:billm) 2012-05-09 15:11:37 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd39bb6ae2f2
Comment 18 Ed Morley [:emorley] 2012-05-10 07:59:26 PDT
https://hg.mozilla.org/mozilla-central/rev/bd39bb6ae2f2

Note You need to log in before you can comment on or make changes to this bug.