Closed Bug 619822 Opened 14 years ago Closed 13 years ago

Excessively infrequent garbage collection


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
blocking2.0 --- -


(Reporter: jseward, Unassigned)



(Whiteboard: [cib-memory])


(3 files, 1 obsolete file)

I profiled M-C + bhackett's code discard patch (bug 617656)
loading 10 tabs.  I left the browser completely idle
for about 30 minutes, then quit.  I was surprised to see a memory
profile as per the attached screenshot.

Direct activity to do with loading the 10 tabs ceases somewhere
around the 20-25 billion instruction mark.  The browser then idles
(more or less) for the next ~20 minutes, executing another approx
30 billion instructions.

During this time the total memory use (total process VM size) 
rises continuously from 920 MB to 1040 MB.

Moving the mouse at any point in this process triggers GC and the
image size falls back down.  So it's not a per se leak, but it
does seem like GC can be excessively infrequent.

Putting a printf in js_GC shows that during that time, if there
is no mouse movement over the browser, then there are no GCs for
very long periods -- in this case, the entire slope up from circa
25 billion to 50 billion instructions.

Further instrumentation/experimentation reveals that:

* if left to idle long enough (overnight), GC does indeed
  occur, but only about once every 70 billion instructions

* there are never any calls into JS_MaybeGC

* the leaking that happens is due to allocations done by the method
  jit.  Calling chain is

  generated code
   js::mjit::ic::NativeCall(js::VMFrame&, ...)
     array_slice(JSContext*, unsigned int, ...)
      js_NewArrayObject(JSContext*, unsigned int, ...)
       RefillFinalizableFreeList(JSContext*, ...)
Attached patch patch (obsolete) — Splinter Review
call JS_MaybeGC more often with a low stack. Maybe we still have to change the behavior in MaybeGC to actually perform the GC.
Blocks: Jaeger
blocking2.0: --- → betaN+
Whiteboard: [cib-memory]
Note that this eliminates the forced GC from the DOM callback, which is probably good for our SS score. We should tune MaybeGC. It should GC based on time and volume to catch the slow leak above.
> which is probably good for our SS score.

But bad for memory usage and probably non-crashiness in cases when a loop creates a bunch of DOM objects with a sizeable C++ object backing the jsobject....

Then again, the operation callback JS_MaybeGC is sorta a hack around that problem anyway.
(In reply to comment #1)
> Created attachment 498240 [details] [diff] [review]

This does fix the problem, but needs improvement:

* JS_MaybeGC is getting called up to 300 times per second.  
  Subjectively it seems like the browser is slower to restart and
  restore multiple tabs, due to spending more time not at 100% cpu,
  although that could be due to network slowness.

  From a mobile perspective, continuous polling like this will be
  bad news for power management.

* Has no effect on Mac (according to Gregor)

Ideal solution IMO would be to have a timer that causes a call to
JS_MaybeGC every 2 to 5 minutes, so as to give a worst case backstop
without causing a lot of power management wakeups.
(In reply to comment #4)
> Ideal solution IMO would be to have a timer that causes a call to
> JS_MaybeGC every 2 to 5 minutes, so as to give a worst case backstop
> without causing a lot of power management wakeups.

Only if the application embedding has done activity.  If my process is otherwise sleeping waiting for input say, it's just broken if a SpiderMonkey thread wakes up to GC.

For example, Firefox could add a callback queued any time SpiderMonkey calls back into C++, which then queues a timer on the mainloop.
(In reply to comment #5)
> For example, Firefox could add a callback queued any time SpiderMonkey calls
> back into C++, which then queues a timer on the mainloop.

In GNOME 3 we call JS_MaybeGC a minute after receiving X11 events, for example.  Not perfect, but good enough.
Attached patch WiPSplinter Review
WiP: Call MaybeGC every 5 seconds on linux.
Attachment #498240 - Attachment is obsolete: true
Comment on attachment 498364 [details] [diff] [review]

>+// When Idle, call JS_MaybeGC every 5 seconds to prevent a growing GC heap.
>+#define MAYBE_GC_CALL_LIMIT PR_MillisecondsToInterval(5000)

I don't know about Mozilla style, but I prefer code to have unit suffices.  In this case, it would be e.g.

>     mLastNativeEventTime = PR_IntervalNow();
>+    if (!mLastMaybeGCCallTime)
>+      mLastMaybeGCCallTime = mLastNativeEventTime;
>+    if ((mayWait) && ((mLastNativeEventTime - mLastMaybeGCCallTime) > MAYBE_GC_CALL_LIMIT)) {
>+      mLastMaybeGCCallTime = mLastNativeEventTime;
>+      MaybeGC();

I think this section of code deserves a comment, maybe a link to this bug?
That's WiP.

The reason why it doesn't work on Mac is that the mayWait argument is always false, even if the browser is idle. On Linux, mayWait is true when the browser is idle.
Looks good.  Now we just need to fix this on OSX and Windows too :-)
The 300 calls happen when processing events anyway. The browser isn't waking up to call MaybeGC, its already awake and doing something, and we use the opportunity to do a MaybeGC. I am not saying we shouldn't throttle the calls, I am just saying this isn't a idle/wakeup isue.
Gregor, want to run SS and V8? The lack of GC in the DOM callback should reflect in the scores.
Again, removing that call from the DOM callback is likely to impact actual websites negatively unless we take other steps to resolve the issues it's dealing with....
bz, that call actually forces a GC, not just suggests a GC with maybeGC. Why are we doing that again?
See comment 3?  But worth double-checking the blame....  Perhaps this was working around broken js-side gc heuristics and those got fixed?
I don't think the operation callback is a good for what comment #3 describes, you would have to run for 10s or so to run into the callback. By then you generated a few GB of garbage.
Well, the code was added back when this was a branch callback.

But we certainly call that callback more often than every 10s, no?  The MaybeGC call is _before_ all the "how much time actually elapsed?" checks.

I agree that the current code is not perfect, btw; the question is whether it's handling cases of memory growth now that are common and would be unhandled with the attached patch.
Whiteboard: [cib-memory] → [cib-memory][softblocker]
I retried the test in comment 0 on Windows 7. and got rather different results (I guess a lot of patches landed since then, plus I think Brian revised his work before landing). Just to make sure I did the right thing, I:

  took a TM nightly of Jan 5 and loaded 10 copies of:

This brought memory usage up to only 155 MB, which rose a bit to 162 MB in a minute or so, but didn't rise at all during the next 3 minutes. So, I think we might have actually fixed the problem. Certainly it doesn't seem serious at this point.

Do renominate if I got it wrong and it is still serious. Or maybe we can even close if it's all working well now. I guess we could keep it open as a GC scheduling bug, but I hope the new GC project will obviate such things.
blocking2.0: betaN+ → -
I think this bug is more a very nice thing to have in order to avoid blog posts like "FF is leaking, memory consumption grows like crazy if I don't close it over night and FF is slow once I move the mouse after a very long time because we are performing a GC right when a user wants to use it after a long idle time....". 
Periodic MaybeGC calls would solve this problem.
Adding the actual "memory-overhead" to the trigger would also be an elegant solution for the high memory footprint after the v8 benchmark run problem.

I learned that a solution that works for all platforms is very hard to find because of the different implementations and testing is very time consuming.
Whiteboard: [cib-memory][softblocker] → [cib-memory]
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.