Last Comment Bug 617569 - High CPU load and memory leak with web workers
: High CPU load and memory leak with web workers
Status: RESOLVED WORKSFORME
[cib-workers][softblocker][MemShrink:P2]
: testcase
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- critical with 7 votes (vote)
: ---
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
Mentors:
http://29a.ch/sandbox/2010/cornellbox/
: 632745 (view as bug list)
Depends on: new-web-workers
Blocks: mlk-fx4-beta
  Show dependency treegraph
 
Reported: 2010-12-08 03:44 PST by Igor Bukanov
Modified: 2011-07-31 11:36 PDT (History)
36 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
.x+


Attachments

Description Igor Bukanov 2010-12-08 03:44:20 PST
See bug 590533 comment 21.

For the test case see bug 590533 comment 6 and the attachment 493718 [details] (bzip tar archive).
Comment 1 Luke Wagner [:luke] 2011-01-07 14:48:15 PST
Twice now, I ran the testcase on an osx64 debug build, watched it get up to 3GB, navigate away, watched it stay at 3GB allocated but then eventually go back down to 200MB (but it takes like 10 minutes).  Does this correspond to any worker-related timers that you know of Ben?
Comment 2 Luke Wagner [:luke] 2011-01-07 15:56:07 PST
Ben explained that workers get suspended when you navigate away, so it makes sense that they are holding the resources until they get kicked out of the cache.  Tools -> Clear History (Everything) causes everything to be released within about 30 seconds.  So that's not the leak.

What does seem to be a leak is that letting this run for a while uses up all memory and then crashes.
Comment 3 Luke Wagner [:luke] 2011-01-07 18:52:00 PST
I played with it some more and a few observations:

1. If I let the test run, it keeps growing in memory (until it crashes (at about 4.2GB) from the DOMWorkerErrorReporter (called by js_ReportOutOfMemory)).

2. If I create a new tab or otherwise muck with the browser (while keeping the test running in the other tab) the GC kicks in and memory goes back down to a few hundred MB (and starts rising again).

3. After finally getting Shark to give me the profile I wanted, I can see that all four workers spend the majority of their time in the JS_YieldRequest and other manner of blocking on GC.

4. All this allocation seems to be from stubs::CreateThis (presumably for the V3 vector objects that are created in the render loop).

So point 2 seems to indicate that GC is still getting somewhat blocked.

As for Point 3, will per-compartment GC allow the threads not to block each other?
Comment 4 Igor Bukanov 2011-01-08 05:09:29 PST
(In reply to comment #3)
> So point 2 seems to indicate that GC is still getting somewhat blocked.

This is due to the main thread not running the JS so we do not have chance to schedule the GC. This suggests to post event to the main thread when a worker wants the GC. But we need an extra callback to post that event promptly.

> 
> As for Point 3, will per-compartment GC allow the threads not to block each
> other?

No - the proposed implementation would block all the threads in the same way as the current GC does.
Comment 5 Andreas Gal :gal 2011-01-08 10:00:52 PST
comment 4: Blocking all threads is a temporary implementation kludge we did to accelerate the FF4 schedule. It will be removed after FF4.
Comment 6 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-01-18 12:38:11 PST
So should we implement a workaround GC schedule runnable in the meantime?
Comment 7 Andreas Gal :gal 2011-01-18 12:43:12 PST
I don't understand comment 6. How does that help?
Comment 8 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-01-18 13:33:28 PST
Well, it sounds like memory usage climbs because lots of stuff is happening on workers where GC is not allowed. The main thread doesn't know that it needs to trigger a GC since it isn't running JS. To work around that we could modify the GC callback to post a "MaybeGC" runnable when it gets invoked off the main thread.
Comment 9 chris hofmann 2011-02-18 19:40:06 PST
softblockers that require a beta are likely no longer in scope for 4.0.  sounds
like this needs to move to FF 4.0.x or be made a hardblocker.

I think I hit this bug or something like 3 or 4 times a day (see Bug 629703).  The only thing I can do is go get coffee and wait for the pegged cpu to go back down after 4 or 5 minutes.  It's pretty painful.

anyone come up with other ideas in the last month?
Comment 10 Andreas Gal :gal 2011-02-18 21:03:53 PST
Not a hard blocker. Workers are reasonably rare still. .x sounds good to me. We should post a runnable with MaybeGC. That sounds pretty good to me.
Comment 11 chris hofmann 2011-02-19 08:48:02 PST
is there an easy way for me to detect if I've got a page loaded in one of many tabs that might have web workers running?
Comment 12 chris hofmann 2011-02-19 08:49:52 PST
might also affect how much we talk about and evangelize web workers in the rollout of 4.0.  cc blizard.
Comment 13 Andreas Gal :gal 2011-02-19 16:51:12 PST
I am actually out of blockers so I will jump on this one. I will ask for approval. I can probably do something that will only trigger with workers, which should be low risk.
Comment 14 Andreas Gal :gal 2011-02-19 20:32:29 PST
I ran this with a TM tip build. Fresh browser instance. 70MB in use. When I start the test case memory use goes to 600MB and stays at 600MB. After navigating away memory use drops to 378 MB but stays at 378 MB. After navigating around a bit more memory use drops to 120MB.

There are a couple bugs here:
1. Is 600MB memory use necessary for this test case?
2. Why do we only drop to 378MB and why does it stay flat at 378MB until I navigate around more and close the tab?

None of these seem like a hard blocker.
Comment 15 Andreas Gal :gal 2011-02-19 20:51:06 PST
Alright, I can reproduce this. I think this is related to the chunk release code. This also explains one of smaug's bugs. Every time we GC we age the chunk. 3 GCs or so bring the memory use down to 74MB.
Comment 16 Andreas Gal :gal 2011-02-19 22:42:45 PST
I have a patch. This is actually a dup of bug 631733 and I will fix there. I am leaving this open for confirmation.

In related news I noticed that GCs in a debug build take roughly a second for this test case, so that's another reason to leave this open. Our GC performance on this test case blows.
Comment 17 Andreas Gal :gal 2011-02-19 23:33:57 PST
I looked at the test case and the profile. The code really, really wants a generational collector. It creates some ridiculous amount of temporary object garbage. Background finalization might help us to hide the bad pause times until we get a generational collector.
Comment 18 Scoobidiver (away) 2011-02-25 05:17:01 PST
*** Bug 632745 has been marked as a duplicate of this bug. ***
Comment 19 Alon Zakai (:azakai) 2011-05-02 14:43:34 PDT
Bug 654265 looks very similar to this, and has what I think is a simpler testcase. The leak there only happens when CPU load is very high (when the worker runs without pausing).
Comment 20 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-06-05 19:10:39 PDT
With bug 649537 this test page behaves quite normally, memory usage stays stable, no crashes.
Comment 21 Johnny Stenback (:jst, jst@mozilla.com) 2011-06-28 13:46:33 PDT
Ben, sounds like this can simply be closed once your new worker stuf lands.
Comment 22 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-31 11:36:00 PDT
Totally fixed by bug 649537.

Note You need to log in before you can comment on or make changes to this bug.