Closed Bug 728871 Opened 8 years ago Closed 8 years ago

High memory usage spike detected in Mozmill Endurance tests on Feb 19 caused by opening new tabs

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect, major)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mihaelav, Unassigned)

References

Details

(Whiteboard: [mozmill-endurance][MemShrink])

Dave mentioned on irc that the two tests that open new tabs are the causes of the increase
Summary: High memory usage spike detected in Mozmill Endurance tests on Feb 19 → High memory usage spike detected in Mozmill Endurance tests on Feb 19 caused by opening new tabs
OS: Windows 7 → All
Hardware: x86 → All
The big tabbed browsing change landed in firefox-default branch is about:newtab and I need to see if this is causing the leaks. 

For sure, about:newtab is loaded lazily (fixed a test some days ago which was failing because of this) 

I will run those tests locally with 'about:blank' on new tab and compare the results
Let's wait and see tomorrow's results. Locally the memory usage is very low 

http://mozmill-release.blargon7.com/#/endurance/report/55d601cc2aabfac28f59060a84add696

and cannot compare anything at the moment.
Feb 20 results are still high (except Ubuntu 10.10 x86 which dropped about 30MB)
cc-ing Tim to get an opinion about this - I tried to reproduce the leak locally on my machine, but it seems the bad results are only in the dashboard - Might usage of VMs affect our metrics?

Please see the link in comment 3
Whiteboard: [mozmill-endurance]
Vlad: Can you post your own results to the mozmill-crowd dashboard with same itermations/entities (50/10) for builds from 20120218031156 and 20120219031223 and link to them from here?
Vlad, any update? We shouldn't forget about issues like this for nearly a whole week. Thanks.
The values increased with almost 10% (and are constant) since March 1 on:
- Linux Ububtu 10.10 x86_64: 446->490MB(max explicit memory), 340->360MB(max resident memory)
- Win NT 5.1.2600 x86: 280->315MB(max explicit memory), 449->481MB(max resident memory)
- Mac OSX 10.6.8 x86_64: 650->710MB(max explicit memory), 840->900MB(max resident memory)
Mihaela: The new increase should be raised as a separate bug.

Vlad: Please can you provide your latest findings. This should be considered a high priority considering the impact on resources it's identified.
Severity: normal → major
(In reply to Dave Hunt (:davehunt) from comment #6)
> Vlad: Can you post your own results to the mozmill-crowd dashboard with same
> itermations/entities (50/10) for builds from 20120218031156 and
> 20120219031223 and link to them from here?

Here are the results for the latest-nightly build from http://hg.mozilla.org/mozilla-central/rev/433cfbd2a0da 

http://mozmill-crowd.blargon7.com/#/endurance/report/e438d6e3916b2b636037d7744539861e 

We can sense a slight memory increase in the tabbed browsing tests. I'll test further with the 20120218031156 build
This is the pushlog we want for the regression range but we have no changes what so ever in tabbed browsing 

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=550779e6bab4%20&tochange=4d47329bb02e
I've pref-ed out 'about:newtab' http://mozmill-crowd.blargon7.com/#/endurance/report/e438d6e3916b2b636037d7744539e2fe and got better results in memory usage for the 20120219031223 
( http://hg.mozilla.org/mozilla-central/rev/4d47329bb02e ) build
Bingo! 

Memory leak reproduce on Mac 
http://mozmill-crowd.blargon7.com/#/endurance/report/e438d6e3916b2b636037d774453a0f07

We'll start investigation from top tomorrow on Mac Os X
Great work! Let's find out what change caused this and raise a bug as soon as possible. Thanks guys!
Preffed out about:newtab for Mac and here are the results. I'm running it again 

http://mozmill-crowd.blargon7.com/#/endurance/report/e438d6e3916b2b636037d774458f39fb

Also, I've bumped into this - bug 723832. Could this be related?
I'm seeing that the memory regression is no longer appearing in our charts. Memory usage has dropped considerably since March 9th. Can you confirm this on your local copy Vlad?
Indeed there are improvements 

http://mozmill-release.blargon7.com/#/endurance/report/e438d6e3916b2b636037d77445774254 

Build from March 9th is the last with problems. I will run locally with latest-nightly and compare
Confirmed better results locally with latest-nightly, 
build http://hg.mozilla.org/mozilla-central/rev/5ec9524de1af 

Still, let's not close this bug just yet
Vlad, can you reproduce this locally (without running Endurance tests)?
I was able to reproduce the original memory leak and bisected. The changeset causing the leak is: http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=20478b673212

I will expand the bisect and continue... I can repeat this for the builds where the memory leak seemed to disappear.
(In reply to Dave Hunt (:davehunt) from comment #23)
> I was able to reproduce the original memory leak and bisected. The changeset
> causing the leak is:
> http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=20478b673212
> 
> I will expand the bisect and continue... I can repeat this for the builds
> where the memory leak seemed to disappear.

Thanks Dave. Can you file a Firefox bug and continue investigation there?
It will be very helpful to know all the changes for tabbed browsing since 19th. 
We know the major ones, but we have to know every new code snippet there. 

Couldn't someone from the dev team give a hand here? Just point me to the changes and I can do a try build locally to scavenge for the cause.
Really, I don't think anything related to tabbed browsing has affected this. I imagine that the incremental GC changes on bug 641025 have been caused it. The simplest task you could perform now is to disable iGC and run the tests again.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #22)
> Vlad, can you reproduce this locally (without running Endurance tests)?

I don't really know how you can reproduce this without running endurance. we need the api to give us memory information. 

Are you suggesting the following? 

1. replace enduranceManager.loop with a JS loop statement like 'for' or 'forEach'
2. run the test 
3. gather memory information from the OS (terminal)

I think you want to explicitly rule out the idea of our endurance run somehow causing bad data, right?
(In reply to Henrik Skupin (:whimboo) from comment #26)
> Really, I don't think anything related to tabbed browsing has affected this.
> I imagine that the incremental GC changes on bug 641025 have been caused it.
> The simplest task you could perform now is to disable iGC and run the tests
> again.

Henrik, thanks for the tip. on it!
Pref'ed out incremental GC for latest nightly 
Good mem results 

http://mozmill-crowd.blargon7.com/#/endurance/report/e438d6e3916b2b636037d77445d1435b 

Let's run some multiple times and draw some initial conclusion.
(In reply to Maniac Vlad Florin (:vladmaniac) from comment #29)
> Pref'ed out incremental GC for latest nightly 
> Good mem results 
> 
> http://mozmill-crowd.blargon7.com/#/endurance/report/
> e438d6e3916b2b636037d77445d1435b 
> 
> Let's run some multiple times and draw some initial conclusion.

Given this weeks platform meeting notes it's known that iGC causes memory leaks. That's why there are thoughts to turn it off by default again. CC'ing all relevant people to spread this information about the leak.
This was caused by bug 641025 (the landing of incremental GC), and should have been fixed by bug 730853 (which landed on 3/9).  A similar regression was detected by areweslimyet (bug 731437).  It would be nice to what changeset caused it to go up again on the 14th.
Depends on: 730853
The basic problem of 730853 that we saw on areweslimyet is that the test opens pages so quickly that the cycle collector never ended up running, or something like that. Partially this is because the test is a little unrealistic, but it did expose a genuine problem in the code, so in bug 730853, Bill fixed things up so the CC isn't so easily overcome.
Also it would be nice if you CCed me or some other MemShrink person, and put the [MemShrink] tag on any regressions like this that you see in the future, as they happen.  Even if it just turns out to be a problem with test harness, I'd like to be informed of potential problems like this less than three weeks after they are found. Thanks!
Whiteboard: [mozmill-endurance] → [mozmill-endurance][MemShrink]
At this point, the most important thing is to figure out what happened on 3/14 that caused the problem to re-appear. Is it possible to get a regression interval?
Vlad, can you please give us a pushlog range for the re-appearance of the problem on 3/14? At least this would already be helpful to identify possible candidates.
Ran the endurance run with the latest nightly and I see now memory leaks now 

http://mozmill-crowd.blargon7.com/#/endurance/reports?branch=14.0&platform=Mac&from=2012-03-16&to=2012-03-16 

We might be facing something intermittent here
I can confirm through a bisect that the changeset http://hg.mozilla.org/mozilla-central/rev/2a8ceeb27f7c caused this spike in the endurance tests.

With bug 730853 landing on 3/9 our results immediately improved.

I am unable to see the issue recurring on our daily endurance tests. It does appear that there has been an increase in memory usage on builds since 3/13 but nowhere near the scale that the incremental GC had.

Vlad: You link to a report of a build of Firefox 13 on 3/14, however our daily runs have this as a Firefox 14 build and do not show the memory leak. Can you confirm that you can consistently replicate this?
(In reply to Dave Hunt (:davehunt) from comment #37)
> I can confirm through a bisect that the changeset
> http://hg.mozilla.org/mozilla-central/rev/2a8ceeb27f7c caused this spike in
> the endurance tests.
> 
> With bug 730853 landing on 3/9 our results immediately improved.
> 
> I am unable to see the issue recurring on our daily endurance tests. It does
> appear that there has been an increase in memory usage on builds since 3/13
> but nowhere near the scale that the incremental GC had.
> 
> Vlad: You link to a report of a build of Firefox 13 on 3/14, however our
> daily runs have this as a Firefox 14 build and do not show the memory leak.
> Can you confirm that you can consistently replicate this?

This is not reproducible consistently. As I said, we might be facing something intermittent.
I don't think this is intermittent. Looking closer at your recurrence it appears to be built from http://hg.mozilla.org/mozilla-central/rev/4d47329bb02e, which is after the incremental GC was introduced and before the fix. This is strange, given the build id of 20120314124254...
(In reply to Dave Hunt (:davehunt) from comment #37)
> I can confirm through a bisect that the changeset
> http://hg.mozilla.org/mozilla-central/rev/2a8ceeb27f7c caused this spike in
> the endurance tests.
> 
> With bug 730853 landing on 3/9 our results immediately improved.
> 
> I am unable to see the issue recurring on our daily endurance tests. It does
> appear that there has been an increase in memory usage on builds since 3/13
> but nowhere near the scale that the incremental GC had.

Can we mark this bug as fixed?  Seems like the main issue has been addressed.  Maybe we should create a follow-up for the minor increase after 3/13?
Yes, the main issue here is resolved.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.