Last Comment Bug 636077 - when idle, memory slowly increases on betanews.com and it takes a long time until we GC
: when idle, memory slowly increases on betanews.com and it takes a long time u...
Status: RESOLVED FIXED
[mozmill-endurance][MemShrink:P2]
: mlk, regression, testcase
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 Windows XP
: -- major with 2 votes (vote)
: ---
Assigned To: Andreas Gal :gal
:
Mentors:
http://promo.fileforum.com
Depends on: 656120
Blocks: mlk-fx4-beta
  Show dependency treegraph
 
Reported: 2011-02-22 19:50 PST by IU
Modified: 2011-06-22 02:55 PDT (History)
26 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
test case: betanews.com web page where leak is observed (13.70 KB, application/x-zip-compressed)
2011-02-22 19:50 PST, IU
no flags Details
test case: actual page exposing the leak (85.08 KB, application/x-zip-compressed)
2011-02-22 22:44 PST, IU
no flags Details
test case: page content exposing the leak (cleaned up a bit) (31.94 KB, application/x-zip-compressed)
2011-02-23 21:50 PST, IU
no flags Details
test case update (35.10 KB, application/octet-stream)
2011-05-25 04:32 PDT, Hugh Nougher [:Hughman]
no flags Details
Mozmill endurance test to replicate the issue (154.50 KB, patch)
2011-05-26 08:51 PDT, Dave Hunt (:davehunt)
no flags Details | Diff | Review

Description IU 2011-02-22 19:50:11 PST
Created attachment 514410 [details]
test case: betanews.com web page where leak is observed

Memory consumption increases indefinitely

Regression pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=94a51a3b64d4&tochange=336d5906cb0f

STR:
1. Create a new profile 
2. Extract and open the attached test case
3. Open Task Manager and note the constant memory increase
Comment 1 IU 2011-02-22 19:54:08 PST
I guess I should note regression dates:

Fine:	Jan 29, 2011 nightly build
Leaks:	Jane 30, 2011 nightly build
Comment 2 IU 2011-02-22 22:44:20 PST
Created attachment 514425 [details]
test case: actual page exposing the leak

This is the page actually exposing the link.  It was embedded as an ad, in an iframe, on the betanews.com page.

Prior to the regression, the page would cause memory to keep cycling within about a 1 MB range (going up but then going back down, and never trending upwards).  Post regression, memory keeps going up for as long as the page is loaded.
Comment 3 IU 2011-02-22 22:46:58 PST
(In reply to comment #2)
> This is the page actually exposing the link.

I meant "leak."

Unzip and launch the attachment.
Comment 4 XtC4UaLL [:xtc4uall] 2011-02-23 03:17:04 PST
Confirming the Issue on http://hg.mozilla.org/tracemonkey/rev/a8926981d452 too.
Memory Occupation is growing slowly but steady.
Comment 5 IU 2011-02-23 21:50:24 PST
Created attachment 514730 [details]
test case: page content exposing the leak (cleaned up a bit)
Comment 6 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-02-24 13:28:17 PST
Could this be fallout from bug 624549?
Comment 7 Andreas Gal :gal 2011-02-24 13:34:02 PST
Don't think so bz. That bug changes scheduling. This is a real leak it seems.
Comment 8 Andreas Gal :gal 2011-02-24 14:41:42 PST
Testing with a linux debug build. More in a bit.
Comment 9 Andreas Gal :gal 2011-02-24 14:57:19 PST
This doesn't reproduce for me on a TM tip build. Memory use stays around 82MB, which is on the low end for a 64-bit build. We are GCing and CCing a lot. The page seems to be churning through a lot of allocations. This is probably fixed by http://hg.mozilla.org/tracemonkey/rev/482d728659f7 which is also on Mozilla central. It landed around the same time the bug was filed so I recommend re-testing. Closing as WFM. Please feel free to re-open if you can still reproduce this on a TM trunk build.
Comment 10 IU 2011-02-24 19:26:00 PST
I don't understand how why this got resolved WFM.  Nothing stated in comment 9 is verifiable by my testing.  When I first noticed this bug, it was with an hourly with the referenced cset (http://hg.mozilla.org/tracemonkey/rev/482d728659f7) in it.  Assuming it may have been a regression from that tm->mc merge, I went regression hunting and found the range stated in comment 0.

With today's nightly (Mozilla/5.0 (Windows NT 5.1; rv:2.0b13pre) Gecko/20110224 Firefox/4.0b13pre / http://hg.mozilla.org/mozilla-central/rev/881f4ab63c31), I'm still reproducing this.  This is not WFM.
Comment 11 Andreas Gal :gal 2011-02-24 19:30:28 PST
Can you please try a tracemonkey tip build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/

If that still reproduces for you, I will need precise STRs and what platform you are using. I am not claiming the bug is gone. I am just saying I can't reproduce it.
Comment 12 Andreas Gal :gal 2011-02-24 20:06:02 PST
bz can reproduce this with a TM tip build. He sees private bytes creep up. This doesn't seem to be XPCOM stuff. Memory creeped up from 78MB to 165MB. Manual opening/closing of tabs causes a GC and makes memory getting freed up again. The test case must be churning through memory which makes the JS heap grow. We need to know how long this grows to determine whether this is intended or not.
Comment 13 Andreas Gal :gal 2011-02-24 20:07:19 PST
bz is saying about 10 minutes for 90MB. IU, what exactly are you seeing, and does memory go up constantly or does it eventually get released again? Or does the browser eventually run into OOM?
Comment 14 Andreas Gal :gal 2011-02-24 20:38:28 PST
Alright, I have a setup that reproduces this. I can see a few CCs but no suspected pointers and nothing is collected, so this is all GC garbage. I am up to 156mb from 78mb without a GC. I will try to find out what happens if I let this run. I post again in an hour.
Comment 15 IU 2011-02-24 21:11:19 PST
Thanks.  I was about to post a response after some more testing in response to your comment 11, but here are some findings:

I'm currently running a TM build on Window XP SP3 in a VirtualBox VM.  Started at 66.5 MB.  24 minutes later, it hit about 226 MB, dropped to 199 MB, and then started going back up.  When it again hit 226 MB PB (248 MB VM) at 28 minutes, it dropped to 115 MB, initially, and in about another minute it went down to about 47 MB.  The pattern (of trending upwards) then repeats.  At 42 minutes, now back up to 149 MB and counting.

I suspect the first observed drop may have been because my WinXP VM guest OS hit the allocated memory limit of 384 MB.

Since it's late, I don't have time to run this on my real hardware (4 GB RAM) to see if memory consumption will increase to the point of an OOM situation.  Since it didn't crash the VM it might not be catastrophic, but real hardware may behave differently.

That said, it's still a regression, as noted.
Comment 16 IU 2011-02-24 21:26:29 PST
By the way, when each of the observed drops occurred, VM size also dropped by about the same proportion.
Comment 17 Andreas Gal :gal 2011-02-24 21:38:30 PST
This tops out around 260mb for me and then goes down to 113mb. The page is allocating a ton of stuff and eventually hits the GC heap size limit and we GC. This is a change in the GC heuristics but so far I don't see anything extremely troublesome. This might be more painful for mobile though.
Comment 18 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-02-24 21:45:31 PST
Would it make sense to reduce he heap size limit on mobile?
Comment 19 Andreas Gal :gal 2011-02-24 22:21:23 PST
Yeah, definitely. gc_frequency 300 (I think thats % of last heap size per GC) is way too high.
Comment 20 IU 2011-02-25 06:15:35 PST
A question: Why is it that only Firefox experiences all that memory allocation in the first place?  Memory consumption in Chrome 11 is fairly steady -- barely cycling at all.  Opera 11 much steadier (assuming it is actually running the test case at all).  And IE 9 RC's behavior is similar to what Firefox's was prior to the regression.

So, that the memory consumption increase is even happening at may still be a bug in Firefox's JavaScript engine.  Going from 1 MB bounds to about 200 MB is pretty significant, and no other browser has this issue.
Comment 21 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-02-25 09:03:42 PST
Whether you see the increase depends on how often you run garbage collection.  Chrome has a fast garbage collector that it runs often (but which doesn't collect everything; they also have a slower more exhaustive one that they run more rarely).  We have a much slower collector for the moment, so are less aggressive about running it.  That will certainly change as we implement a faster garbage collector.
Comment 22 Robert Sayre 2011-02-25 11:37:14 PST
Our GC heuristics are not ideal here, but we don't think we can block on this.
Comment 23 IU 2011-02-25 13:25:31 PST
(In reply to comment #21)
> That will certainly change as we implement a faster garbage collector.

Is there a bug number for this?
Comment 24 Andreas Gal :gal 2011-02-25 13:28:14 PST
Please don't renominate without a strong rationale just because you disagree. We are trying to ship FF4. FF5 is in 3 month. This is clearly not great behavior here and its a high priority for post FF4.
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-02-25 14:00:36 PST
> Is there a bug number for this?

Bug 619558.
Comment 26 Jeff Walden [:Waldo] (remove +bmo to email) 2011-02-25 14:16:57 PST
Andreas, that doesn't look like a serious renomination to me -- just Bugzilla being its normal buggy self.
Comment 27 orangezilla 2011-02-25 16:16:02 PST
This has the same regression range as bug 631733, is it not a duplicate?
Comment 28 Andreas Gal :gal 2011-02-25 16:25:53 PST
I think this is a slightly different aspect of the heuristics. I would like to keep it open and track it separately.
Comment 29 orangezilla 2011-02-26 06:09:24 PST
OK, fair enough.
Comment 30 Johnathan Nightingale [:johnath] 2011-02-28 11:37:02 PST
Re-nom'd without an explanation - guessing this was a mistaken flag-flip. Re-minusing.
Comment 31 IU 2011-03-02 08:14:09 PST
By the way, an OOM-causing memory leak still exists in Minefield and the GC has no effect.

At work, I have Minefield on Windows 7 and always leave the browser open when I go home.  Three days in a row now, I have returned to find Minefield either crashed (two times) or hung and needing to be closed (today; and checking Task Manager before closing showed firefox.exe with over 1.3 GB of RAM).

As I type this, memory is slowly climbing; however, because I have many tabs open, it will take a while for me to pinpoint which content is responsible.  My first suspicion is Gmail, as that is executing scripts every 5-15 seconds.
Comment 32 Dave Hunt (:davehunt) 2011-03-11 07:33:35 PST
I've tried replicating this issue with a Mozmill endurance test. I'm seeing memory usage increasing and being released regularly, and a very gradual increase in the amount of memory after a GC, however it's not of a particularly alarming rate or amount.

The following report was a test that ran for 5 hours. Mapped/allocated memory was recorded at 1 minute intervals.

Firefox 4.0b13pre (2.0b13pre, en-US, 20110310030427):
http://mozmill.blargon7.com/#/endurance/report/ce92dc10dba5720bf6ef64e42c0031aa
Comment 33 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-12 02:53:41 PST
(In reply to comment #32)
> I've tried replicating this issue with a Mozmill endurance test. I'm seeing
> memory usage increasing and being released regularly, and a very gradual
> increase in the amount of memory after a GC, however it's not of a particularly
> alarming rate or amount.

The allocated size flattens out halfway, saw-toothing around 100MB.  The mapped size continues to increase slowly, which suggests fragmentation is increasing over time.

Dave Hunt, it would be interesting to see that graph over a longer period, eg. 24 hours, if it's not too much trouble.
Comment 34 Dave Hunt (:davehunt) 2011-03-14 03:01:35 PDT
As suggested I ran the Mozmill endurance test for 24 hours. The results are not directly comparable to my previous results as this test was run in a virtual machine, and on a different Minefield build.

Firefox 4.0b13pre (2.0b13pre, en-US, 20110312030414):
http://mozmill.blargon7.com/#/endurance/report/88b31ff45aa5a2c9f30fa1462d007881
Comment 35 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-03-15 01:37:37 PDT
Thanks for doing the longer run.  That graph appears to refute the fragmentation hypothesis, as both allocated and mapped memory look like they continue to increase, albeit slowly.
Comment 36 Dave Hunt (:davehunt) 2011-03-18 18:18:29 PDT
I revisited this with my Mozmill test on Windows XP as mentioned in the initial report, and on a build where the issue was originally noticed. I ran two tests: 5hrs and 24hrs.

Firefox 4.0b11pre (2.0b11pre, en-US, 20110130030342)
Windows NT 5.1.2600 Service Pack 3 (x86)

5 hours:
http://mozmill.blargon7.com/#/endurance/report/88b31ff45aa5a2c9f30fa1462d0087dd

24 hours:
http://mozmill.blargon7.com/#/endurance/report/88b31ff45aa5a2c9f30fa1462d00c6fd
Comment 37 Hugh Nougher [:Hughman] 2011-05-25 04:32:05 PDT
Created attachment 535028 [details]
test case update

This is the same test case as before with a few important changes. I dont think I have altered anything important in reproducing.

1. Countdown always starts with 7 days left. Previous test case stopped working on timestamp 1298937540 (2011-02-28).
2. Added back in the stylesheet and image with numbers so we can see it working.
3. Made the main loop get called every 10ms instead of 1000ms so the results should be visible sooner.

I then ran it for 30 minutes in minefield 2011-02-20 where I saw a definite rise in memory usage. Then I also tested 4.0.1, latest nightly and minefield 2011-03-20 where the memory usage was basically flat zigzag the entire time.

My testing was on win7 and with my version of the test case so it may be slightly different. I suggest another run of Mozmill for confirmation in a more recent build.
Comment 38 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-05-25 04:39:58 PDT
(In reply to comment #37)
> 
> I then ran it for 30 minutes in minefield 2011-02-20 where I saw a definite
> rise in memory usage. Then I also tested 4.0.1, latest nightly and minefield
> 2011-03-20 where the memory usage was basically flat zigzag the entire time.

Thanks for the data.  So you're saying you think this problem is fixed, but you suggest a MozMill run to double-check?
Comment 39 Dave Hunt (:davehunt) 2011-05-26 08:51:46 PDT
Created attachment 535354 [details] [diff] [review]
Mozmill endurance test to replicate the issue

I've attached a Mozmill endurance test that replicates this issue. Let me know if you'd like me to run the tests, and on what platforms/binaries.

To run yourself:
1. hg clone http://hg.mozilla.org/qa/mozmill-tests
2. apply attached patch
3. hg clone http://hg.mozilla.org/qa/mozmill-automation
4. modify the following in testrun.py
replace: self.test_path = os.path.join('tests','endurance')
with: self.test_path = os.path.join('tests','endurance','adhoc','testBug636077')
replace: self.timeout = delay + 60
with: self.timeout = delay + 60 + 3600
where 3600 is how long your test will be running
5. From the location you cloned mozmill-automation, run: ./testrun_endurance.py --delay=60 --iterations=60 --logfile=endurance.log --repository=../mozmill-tests --report=http://davehunt.couchone.com/mozmill /Applications/Firefox\ 4.0.app/

The above will run for 1 hour. Change the iterations count to increase how long the test runs, however you will also need to increase the timeout mentioned on step 4.

Note: Change the repository parameter to point to the location of the patched mozmill-tests repository.

Note: Change the final parameter to the location of the Firefox binary.

Results will be reported and visible at: http://mozmill.blargon7.com/#/endurance/reports
Comment 40 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-06-21 20:25:03 PDT
IU, xtc4uall: it's very likely that bug 656120 has fixed this problem.  Can you please try a nightly build (nightly.mozilla.org) and let us know if it has been fixed?  Thanks.
Comment 41 XtC4UaLL [:xtc4uall] 2011-06-22 01:48:51 PDT
IU's Testcases are invalid now due to the Betanews Promo being closed.

Att. https://bugzilla.mozilla.org/attachment.cgi?id=535028 works for me against Build http://hg.mozilla.org/tracemonkey/rev/a8926981d452, Firefox 4.0.1 and Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110621 Firefox/7.0a1 ID:20110621030803 as expected, i.e. Memory climbs up to a constant Peak and falls again in Cycles with a sawtooth manner.
That said I'm not sure if that Testcase exposed the same Leak at all.

Didn't run the Mozmill Test though.
Comment 42 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-06-22 02:55:59 PDT
xtc4uall: thanks for the data.  I think we can declare victory here.

Note You need to log in before you can comment on or make changes to this bug.