Closed Bug 599694 Opened 14 years ago Closed 12 years ago

large number of DOM elements slow down CC (leak maybe?)

Categories

(Core :: XPConnect, defect, P2)

Other Branch
defect

Tracking

()

RESOLVED WORKSFORME
mozilla2.0
Tracking Status
blocking2.0 --- .x+

People

(Reporter: gal, Assigned: mccr8)

References

Details

(Whiteboard: [MemShrink:P2][snappy])

Attachments

(3 files, 1 obsolete file)

Reported by sahab.yazdani+bugzilla@gmail.com

Generates 200 DIVs continuously in a timeout loop.

The following attachment will create tons of DOM element references.

Start a new instance of Firefox 4 Beta 6.
Navigate to the attachment.
Every 10 minutes, refresh the page.
Once the memory consumption of Firefox reaches around 350 MB, the UI thread
starts becoming unresponsive.
Initially the stalls are very small, but they start growing in length.

The example is fairly self-explanatory, but the numbers being output are the
amount of time it takes to generate 200 DIVs. Obviously the accuracy is
dependent on Firefox's internal time resolution.

Hope this attachment helps.
I will request b+ for final for this to get it on the radar. We should see if this reproduces, and if so whats up here. We have patches in flight for final that will help hide the latency of the pauses, but it sounds like there is an underlying leak here. We have fixed an observer leak recently, so this might not reproduce any more with a current beta.
blocking2.0: --- → ?
Priority: -- → P2
Target Milestone: --- → mozilla2.0
Attached file test case
Assignee: nobody → gal
Not sure I attached the test case right. Here is the original:

https://bug490122.bugzilla.mozilla.org/attachment.cgi?id=478609
Letting the testcase run for a while just hung my browser (shaver's windows box too).
So does this depend on jquery? jquery has had (or probably still has) problems
keeping objects alive way too long, which effectively means creating *huge*
DOM trees.
And IIRC, though I could be very wrong here, at some point some of that
jquery behavior was for gecko only to workaround a bug-in-gecko-fixed-years-ago.
(In reply to comment #6)
> And IIRC, though I could be very wrong here, at some point some of that
> jquery behavior was for gecko only to workaround a
> bug-in-gecko-fixed-years-ago.
So, if we know what is causing it, we can probably get it fixed.  It then just becomes an issue of getting people to update jQuery.
Attached file test case without jQuery (obsolete) —
The first test case was basically a reduced example of a common workflow at work when we test our web app. The DOM for the app is relatively complicated and overtime creates, attaches, detaches, and removes sub-trees using jQuery with the potential to leave references hanging around as part of bugs.

This second test case does the exact same thing functionally, but without jQuery.

Just a note as well, the numbers flowing by don't seem to grow any bigger when the UI stalls, but they do stop flowing by so it is still a useful guide to see if the UI is actually doing anything or not.

Hope it helps.
Attached file test case
I'm reattaching the test case from 490122 since the one Andreas attached doesn't seem to work for whatever reason.
Attachment #478647 - Attachment is obsolete: true
whoops, obsoleted the non-jQuery test case accidentally. sorry for the spam.
So just to make sure, the non-jQuery test still shows the problem, right?
I don't think so, at least I wasn't patient enough to get it to happen with the non-jQuery version. I just added the case as a compare and contrast tool (and maybe a starting point to see the effects of other DOM manipulations). The jQuery version in contrast demonstrates it fairly quickly.

On my Win7 x64 system running FF4 beta 6 (32-bit), here are some observations from running the jQuery test case:

- Memory usage starts at about 51 MB(*)
- Run the test.
- The memory usage starts climbing.
- By the 500th iteration, the memory is at around 250 MB
- By the 1000th iteration, the memory is at around 300 MB
- By the 1200th iteration, I start noticing stalls, but the memory usage is fairly consistent at around 320 MB
- By the 1500th iteration, the memory is at around 380 MB
- I stop the test case by around the 2000th iteration, and the memory usage is at around 420 MB
- Refresh the page.
- Memory starts at 430MB
- Stalls at around 200th iteration and memory usage shrinks down to 380 MB
- Stalls around 500th iteration... memory usage is very consistently between 380 - 420 MB
- From here on in, nothing really interesting happens in terms of memory usage, but the stalls are there, so the test case becomes more of an observation tool of when the stalls happen.

Nothing formal yet, and it could be a case of placebos or what not, but the stalls do seem to start getting more periodic as time goes on though... I'll try to see if this observation has any merit.

For the plain version, the memory starts climbing more slowly, but once I get to the 2000th iteration, I refresh the page. If the memory reaches around 400 MB, a GC (plus a freeze) happens and the memory drops back down to 250 MB. Once you refresh the page, the memory starts at whatever value it left off and the cycle repeats again. On my Win7 x64 system with 4 GB of RAM, waiting until the 2000th iteration is typically enough to get FF to do a garbage collection (i'm assuming that's why the memory usage drops). Nothing else of interest really happens.

(*) I am readying the "Memory (Private Working Set)" column from the windows task manager.

If you guys want me to focus on something in particular when taking readings, do let me know.
This test case is definitely pretty interesting. Shaver and I observed the test case running to a shape regen eventually, at which point the browser was GCing itself to death. Each GC took forever, and bringing up the slow script dialog triggered several GCs. I am not sure what happens in that state. Maybe we hit the memory ceiling (GC heap size) and everything goes crazy. This is all independent of the effect Sahab is describing. That seems to be a DOM leak of some sort that keeps things alive after reloading (the browser should go back to 51MB-ish, not start around 380MB). We need some thorough analysis here.
blocking2.0: ? → final+
-> JSEngine
Component: DOM → JavaScript Engine
QA Contact: general → general
Whiteboard: softblocker
JS engine? No way. This is CC and DOM and xpconnect.
Assignee: gal → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
** PRODUCT DRIVERS PLEASE NOTE **

This bug is one of 7 automatically changed from blocking2.0:final+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons:

 - it was marked as a soft blocking issue without a requirement for beta coverage
blocking2.0: final+ → .x+
Nicholas, this seems like the sort of things you've been looking at recently....
Blocks: mlk-fx4-beta
Whiteboard: softblocker → [softblocker][MemShrink]
Assignee: nobody → continuation
Assignee: continuation → nobody
Whiteboard: [softblocker][MemShrink] → [softblocker][MemShrink:P2]
Assignee: nobody → continuation
This should probably block/depend on some of our other "slow CC" bugs.
Whiteboard: [softblocker][MemShrink:P2] → [MemShrink:P2][snappy]
Hmm.  Yeah, I never really looked at it.  I should do that.  Maybe bug 702813 is related?  It may not be as much of a problem any more after some of the patches smaug landed.
At first glance, it sounds like bug 702813 based on the behavior and the description here in the bug.  Heap unclassified is about 50% or so.  Reloading the tab after running it for around 10 minutes caused this monster CC:

CC(T+84156.2) collected: 2152405 (2152405 waiting for GC), suspected: 1121071, duration: 13108 ms.

Generally, though, CC times weren't that bad.  Mostly around 300ms, with two around 800ms.

I tried again in a slightly fresher profile with nothing else.  CC times are 11ms to 72ms after 10 minutes.  64% heap-unclassified. The times it is printing have gone to 5 from 1, but not too bad.

After 16 minutes, memory usage is 1.8gigs.  CC times are around 11ms.  Browser is totally responsive.

After 20 minutes, memory usage is 2gigs.  CC times are around 20ms, with a few spiking up to 100-256ms.

Reloading the page after that caused a 25 second CC, but it freed 6.3 million things, so that doesn't seem too bad.  heap-unclassified dropped down.  JS stayed pretty high, but that was mostly gc-heap-decommitted, so that's just fragmentation.  Closing the tab entirely and memory dropped down to 74mb.

So, I could be wrong, but it sounds like this works now.  I suspect that some of the stuff that smaug landed to fix 702813 solved this issue.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: