Created attachment 514410 [details]
test case: betanews.com web page where leak is observed
Memory consumption increases indefinitely
Regression pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=94a51a3b64d4&tochange=336d5906cb0f
1. Create a new profile
2. Extract and open the attached test case
3. Open Task Manager and note the constant memory increase
I guess I should note regression dates:
Fine: Jan 29, 2011 nightly build
Leaks: Jane 30, 2011 nightly build
Created attachment 514425 [details]
test case: actual page exposing the leak
This is the page actually exposing the link. It was embedded as an ad, in an iframe, on the betanews.com page.
Prior to the regression, the page would cause memory to keep cycling within about a 1 MB range (going up but then going back down, and never trending upwards). Post regression, memory keeps going up for as long as the page is loaded.
(In reply to comment #2)
> This is the page actually exposing the link.
I meant "leak."
Unzip and launch the attachment.
Confirming the Issue on http://hg.mozilla.org/tracemonkey/rev/a8926981d452 too.
Memory Occupation is growing slowly but steady.
Created attachment 514730 [details]
test case: page content exposing the leak (cleaned up a bit)
Could this be fallout from bug 624549?
Don't think so bz. That bug changes scheduling. This is a real leak it seems.
Testing with a linux debug build. More in a bit.
This doesn't reproduce for me on a TM tip build. Memory use stays around 82MB, which is on the low end for a 64-bit build. We are GCing and CCing a lot. The page seems to be churning through a lot of allocations. This is probably fixed by http://hg.mozilla.org/tracemonkey/rev/482d728659f7 which is also on Mozilla central. It landed around the same time the bug was filed so I recommend re-testing. Closing as WFM. Please feel free to re-open if you can still reproduce this on a TM trunk build.
I don't understand how why this got resolved WFM. Nothing stated in comment 9 is verifiable by my testing. When I first noticed this bug, it was with an hourly with the referenced cset (http://hg.mozilla.org/tracemonkey/rev/482d728659f7) in it. Assuming it may have been a regression from that tm->mc merge, I went regression hunting and found the range stated in comment 0.
With today's nightly (Mozilla/5.0 (Windows NT 5.1; rv:2.0b13pre) Gecko/20110224 Firefox/4.0b13pre / http://hg.mozilla.org/mozilla-central/rev/881f4ab63c31), I'm still reproducing this. This is not WFM.
Can you please try a tracemonkey tip build:
If that still reproduces for you, I will need precise STRs and what platform you are using. I am not claiming the bug is gone. I am just saying I can't reproduce it.
bz can reproduce this with a TM tip build. He sees private bytes creep up. This doesn't seem to be XPCOM stuff. Memory creeped up from 78MB to 165MB. Manual opening/closing of tabs causes a GC and makes memory getting freed up again. The test case must be churning through memory which makes the JS heap grow. We need to know how long this grows to determine whether this is intended or not.
bz is saying about 10 minutes for 90MB. IU, what exactly are you seeing, and does memory go up constantly or does it eventually get released again? Or does the browser eventually run into OOM?
Alright, I have a setup that reproduces this. I can see a few CCs but no suspected pointers and nothing is collected, so this is all GC garbage. I am up to 156mb from 78mb without a GC. I will try to find out what happens if I let this run. I post again in an hour.
Thanks. I was about to post a response after some more testing in response to your comment 11, but here are some findings:
I'm currently running a TM build on Window XP SP3 in a VirtualBox VM. Started at 66.5 MB. 24 minutes later, it hit about 226 MB, dropped to 199 MB, and then started going back up. When it again hit 226 MB PB (248 MB VM) at 28 minutes, it dropped to 115 MB, initially, and in about another minute it went down to about 47 MB. The pattern (of trending upwards) then repeats. At 42 minutes, now back up to 149 MB and counting.
I suspect the first observed drop may have been because my WinXP VM guest OS hit the allocated memory limit of 384 MB.
Since it's late, I don't have time to run this on my real hardware (4 GB RAM) to see if memory consumption will increase to the point of an OOM situation. Since it didn't crash the VM it might not be catastrophic, but real hardware may behave differently.
That said, it's still a regression, as noted.
By the way, when each of the observed drops occurred, VM size also dropped by about the same proportion.
This tops out around 260mb for me and then goes down to 113mb. The page is allocating a ton of stuff and eventually hits the GC heap size limit and we GC. This is a change in the GC heuristics but so far I don't see anything extremely troublesome. This might be more painful for mobile though.
Would it make sense to reduce he heap size limit on mobile?
Yeah, definitely. gc_frequency 300 (I think thats % of last heap size per GC) is way too high.
A question: Why is it that only Firefox experiences all that memory allocation in the first place? Memory consumption in Chrome 11 is fairly steady -- barely cycling at all. Opera 11 much steadier (assuming it is actually running the test case at all). And IE 9 RC's behavior is similar to what Firefox's was prior to the regression.
Whether you see the increase depends on how often you run garbage collection. Chrome has a fast garbage collector that it runs often (but which doesn't collect everything; they also have a slower more exhaustive one that they run more rarely). We have a much slower collector for the moment, so are less aggressive about running it. That will certainly change as we implement a faster garbage collector.
Our GC heuristics are not ideal here, but we don't think we can block on this.
(In reply to comment #21)
> That will certainly change as we implement a faster garbage collector.
Is there a bug number for this?
Please don't renominate without a strong rationale just because you disagree. We are trying to ship FF4. FF5 is in 3 month. This is clearly not great behavior here and its a high priority for post FF4.
> Is there a bug number for this?
Andreas, that doesn't look like a serious renomination to me -- just Bugzilla being its normal buggy self.
This has the same regression range as bug 631733, is it not a duplicate?
I think this is a slightly different aspect of the heuristics. I would like to keep it open and track it separately.
OK, fair enough.
Re-nom'd without an explanation - guessing this was a mistaken flag-flip. Re-minusing.
By the way, an OOM-causing memory leak still exists in Minefield and the GC has no effect.
At work, I have Minefield on Windows 7 and always leave the browser open when I go home. Three days in a row now, I have returned to find Minefield either crashed (two times) or hung and needing to be closed (today; and checking Task Manager before closing showed firefox.exe with over 1.3 GB of RAM).
As I type this, memory is slowly climbing; however, because I have many tabs open, it will take a while for me to pinpoint which content is responsible. My first suspicion is Gmail, as that is executing scripts every 5-15 seconds.
I've tried replicating this issue with a Mozmill endurance test. I'm seeing memory usage increasing and being released regularly, and a very gradual increase in the amount of memory after a GC, however it's not of a particularly alarming rate or amount.
The following report was a test that ran for 5 hours. Mapped/allocated memory was recorded at 1 minute intervals.
Firefox 4.0b13pre (2.0b13pre, en-US, 20110310030427):
(In reply to comment #32)
> I've tried replicating this issue with a Mozmill endurance test. I'm seeing
> memory usage increasing and being released regularly, and a very gradual
> increase in the amount of memory after a GC, however it's not of a particularly
> alarming rate or amount.
The allocated size flattens out halfway, saw-toothing around 100MB. The mapped size continues to increase slowly, which suggests fragmentation is increasing over time.
Dave Hunt, it would be interesting to see that graph over a longer period, eg. 24 hours, if it's not too much trouble.
As suggested I ran the Mozmill endurance test for 24 hours. The results are not directly comparable to my previous results as this test was run in a virtual machine, and on a different Minefield build.
Firefox 4.0b13pre (2.0b13pre, en-US, 20110312030414):
Thanks for doing the longer run. That graph appears to refute the fragmentation hypothesis, as both allocated and mapped memory look like they continue to increase, albeit slowly.
I revisited this with my Mozmill test on Windows XP as mentioned in the initial report, and on a build where the issue was originally noticed. I ran two tests: 5hrs and 24hrs.
Firefox 4.0b11pre (2.0b11pre, en-US, 20110130030342)
Windows NT 5.1.2600 Service Pack 3 (x86)
Created attachment 535028 [details]
test case update
This is the same test case as before with a few important changes. I dont think I have altered anything important in reproducing.
1. Countdown always starts with 7 days left. Previous test case stopped working on timestamp 1298937540 (2011-02-28).
2. Added back in the stylesheet and image with numbers so we can see it working.
3. Made the main loop get called every 10ms instead of 1000ms so the results should be visible sooner.
I then ran it for 30 minutes in minefield 2011-02-20 where I saw a definite rise in memory usage. Then I also tested 4.0.1, latest nightly and minefield 2011-03-20 where the memory usage was basically flat zigzag the entire time.
My testing was on win7 and with my version of the test case so it may be slightly different. I suggest another run of Mozmill for confirmation in a more recent build.
(In reply to comment #37)
> I then ran it for 30 minutes in minefield 2011-02-20 where I saw a definite
> rise in memory usage. Then I also tested 4.0.1, latest nightly and minefield
> 2011-03-20 where the memory usage was basically flat zigzag the entire time.
Thanks for the data. So you're saying you think this problem is fixed, but you suggest a MozMill run to double-check?
Created attachment 535354 [details] [diff] [review]
Mozmill endurance test to replicate the issue
I've attached a Mozmill endurance test that replicates this issue. Let me know if you'd like me to run the tests, and on what platforms/binaries.
To run yourself:
1. hg clone http://hg.mozilla.org/qa/mozmill-tests
2. apply attached patch
3. hg clone http://hg.mozilla.org/qa/mozmill-automation
4. modify the following in testrun.py
replace: self.test_path = os.path.join('tests','endurance')
with: self.test_path = os.path.join('tests','endurance','adhoc','testBug636077')
replace: self.timeout = delay + 60
with: self.timeout = delay + 60 + 3600
where 3600 is how long your test will be running
5. From the location you cloned mozmill-automation, run: ./testrun_endurance.py --delay=60 --iterations=60 --logfile=endurance.log --repository=../mozmill-tests --report=http://davehunt.couchone.com/mozmill /Applications/Firefox\ 4.0.app/
The above will run for 1 hour. Change the iterations count to increase how long the test runs, however you will also need to increase the timeout mentioned on step 4.
Note: Change the repository parameter to point to the location of the patched mozmill-tests repository.
Note: Change the final parameter to the location of the Firefox binary.
Results will be reported and visible at: http://mozmill.blargon7.com/#/endurance/reports
IU, xtc4uall: it's very likely that bug 656120 has fixed this problem. Can you please try a nightly build (nightly.mozilla.org) and let us know if it has been fixed? Thanks.
IU's Testcases are invalid now due to the Betanews Promo being closed.
Att. https://bugzilla.mozilla.org/attachment.cgi?id=535028 works for me against Build http://hg.mozilla.org/tracemonkey/rev/a8926981d452, Firefox 4.0.1 and Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110621 Firefox/7.0a1 ID:20110621030803 as expected, i.e. Memory climbs up to a constant Peak and falls again in Cycles with a sawtooth manner.
That said I'm not sure if that Testcase exposed the same Leak at all.
Didn't run the Mozmill Test though.
xtc4uall: thanks for the data. I think we can declare victory here.