Closed Bug 887791 Opened 11 years ago Closed 11 years ago

Higher memory consumption on Mac 10.7.5 and 10.8.4 since June the 22nd

Categories

(Core :: Graphics: Layers, defect)

x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 + verified
firefox28 --- verified

People

(Reporter: mihaelav, Assigned: mattwoodrow)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(7 files, 5 obsolete files)

294.66 KB, image/jpeg
Details
4.58 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.58 KB, patch
roc
: review+
Details | Diff | Splinter Review
10.61 KB, patch
roc
: review+
Details | Diff | Splinter Review
4.83 KB, patch
nical
: review+
Details | Diff | Splinter Review
7.77 KB, patch
nical
: review+
Details | Diff | Splinter Review
3.76 KB, patch
nical
: review+
Details | Diff | Splinter Review
Mac OS X (10.7.5 and 10.8.4) results show higher memory consumption on Nightly branch since June the 22nd: ~20MB for resident memory and ~10MB (Mac 10.8.4) and ~15MB (Mac 10.7.5) for explicit memory. The regression appeared between builds 20130621031051 (http://hg.mozilla.org/mozilla-central/rev/7ba8c86f1a56) and 20130622031042 (http://hg.mozilla.org/mozilla-central/rev/cea75ce9a559).

The tests that regressed most are Flash related: 
/testFlash_SWFVideoEmbed/test1.js - testFlashViaEmbedTag 	
/testFlash_SWFVideoIframe/test1.js - testFlashIframe 	
/testFlash_SWFVideoObject/test1.js  - testFlashObject 	
/testFlash_SWFVideoURL/test1.js - testFlashURL 	

Reports:
* Mac 10.7.5 21June: http://mozmill-ci.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1e214e4c
* Mac 10.8.4 21June: http://mozmill-ci.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1e205e8c
* Mac 10.7.5 22June: http://mozmill-ci.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1e3fbfe7
* Mac 10.8.4 22June: http://mozmill-ci.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1e3fa4e5

Charts:
http://mozmill-ci.blargon7.com/#/endurance/charts?branch=24.0&platform=Mac&from=2013-06-20&to=2013-06-27

Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7ba8c86f1a56&tochange=cea75ce9a559

Note: The regression is reproducible locally (tested on Mac 10.7.5) Reports:
* June 21st build: http://mozmill-crowd.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1ee66826
* June 22nd build: http://mozmill-crowd.blargon7.com/#/endurance/report/91511d5536e962e0de7f57ed1ed38cc2
Thanks Mihaela, this looks like a genuine regression! Please raise a bug against Firefox, and set this as a blocking bug. Then, try to reduce the regression range by testing against tinderbox builds. You can find details of how to do this here: https://quality.mozilla.org/docs/mozmill/investigating-memory-regressions/
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7ba8c86f1a56&tochange=cea75ce9a559
The following bugs might be good candidates for introducing this regression:
* Bug 883333 seems to be a rather large refactor
* Bug 885158 is related to IPC channels
OS: Windows 8 → Mac OS X
That said the pushlog contains well over 100 changes so it'd be advisable to reduce the regression range further as Dave has suggested.
The first bad revision is:
changeset:   135929:03fcdac57879
parent:      135928:03a709825eff
parent:      135870:7ba8c86f1a56
user:        Ryan VanderMeulen <ryanvm@gmail.com>
date:        Thu Jun 20 20:12:31 2013 -0400
summary:     Merge m-c to inbound.

I'm currently bisecting and building locally in order to narrow down the regression
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #3)
> That said the pushlog contains well over 100 changes so it'd be advisable to
> reduce the regression range further as Dave has suggested.

Mihaela is on PTO, and for now I took responsibility for endurance triage and investigations.

I reduced those 100 changesets to about 10 between last good and first bad and I hope until tomorrow I'll find the real causing changeset for this issue. 

Sorry I'm not able to bisect faster but it takes about 3h for an endurance testrun.
Can we narrow that down to a specific test or is the memory constantly growing through the whole testrun? If only some tests can cause this behavior you might want to trim the endurance testrun.
Thanks Henrik, I had created a new testrun with those 4 tests that are causing this behavior, mentioned in Comment 0.

Here is the real causing changeset for this issue:

author	Ehsan Akhgari <ehsan@mozilla.com>
Sun Jun 23 20:58:53 2013 -0400 (at Sun Jun 23 20:58:53 2013 -0400)
changeset 136213	76820c6dff7b
parent 136212	9b5816c7fa6e
child 136214	9661a68f0bff
pushlog:	76820c6dff7b

First bad:
http://mozmill-crowd.blargon7.com/#/endurance/report/180cf2548ef2865af3ae441d61be25f6
I don't see how http://hg.mozilla.org/mozilla-central/rev/76820c6dff7b could be the cause for this issue. Can you please check that again with a full endurance testrun?
Assignee: nobody → mihai.morar
Checked as requested in Comment 8:

Last Good:
author	Ryan VanderMeulen <ryanvm@gmail.com>
Fri Jun 14 22:23:21 2013 -0400 (at Fri Jun 14 22:23:21 2013 -0400)
changeset 135149	3d16d59c9317
parent 135148	05d9196b27a1
child 135150	7f8de495bf6d
pushlog:	3d16d59c9317

http://mozmill-crowd.blargon7.com/#/endurance/report/180cf2548ef2865af3ae441d61cb67d8

First bad:
author	Ehsan Akhgari <ehsan@mozilla.com>
Sun Jun 23 20:58:53 2013 -0400 (at Sun Jun 23 20:58:53 2013 -0400)
changeset 136213	76820c6dff7b
parent 136212	9b5816c7fa6e
child 136214	9661a68f0bff
pushlog:	76820c6dff7b

http://mozmill-crowd.blargon7.com/#/endurance/report/180cf2548ef2865af3ae441d61d40bc6
Ehsan, could you imagine what could be the cause of this memory increase for our Flash tests? Could this really be the landing of one of the patches on bug 579517?
Flags: needinfo?(ehsan)
These changesets are eight days apart, and there are many changes between them: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d16d59c9317&tochange=76820c6dff7b

We already established a much smaller regression range in comment 2, so please try to narrow this down further.
After discussion on IRC we established the regression showed up between these tinderbox builds:

last good: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1372029844/

first bad: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1372053243/

This makes the relevant pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aef71cf41cec&tochange=76820c6dff7b

Please continue by bisecting and building Firefox. I'm cancelling the needinfo for now.
Flags: needinfo?(ehsan)
But even that doesn't match what we got in comment 2, and has a time drift of nearly 2 days. So I doubt, this is correct.
(In reply to Henrik Skupin (:whimboo) from comment #13)
> But even that doesn't match what we got in comment 2, and has a time drift
> of nearly 2 days. So I doubt, this is correct.

Good point, I missed that.
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aef71cf41cec&tochange=76820c6dff7b

All changesets are good in the latest investigations that I had done.

Here are the results:

http://mozmill-crowd.blargon7.com/#/endurance/reports?branch=24.0&platform=Mac&from=2013-07-26&to=2013-07-26

I'll continue next week to bisect following pushlog from Comment 2. Dave could this be a better solution?
(In reply to Mihai Morar, QA (:MihaiMorar) from comment #15)
> I'll continue next week to bisect following pushlog from Comment 2. Dave
> could this be a better solution?

Yes, in fact you should have done this instead. As Henrik pointed out in comment 13 there's doubt that the regression range you had provided was accurate as it disagreed with the initial nightly regression range. Please continue by bisecting the range from comment 2.
Just to to rule it out; was anything done to the mozmill test farm that may have caused this?
(In reply to Tracy Walker [:tracy] from comment #17)
> Just to to rule it out; was anything done to the mozmill test farm that may
> have caused this?

None that I am aware of. Mihai: have you made any further progress here? It's important to move quickly on potential memory regressions. This was reported almost two months ago now.
Flags: needinfo?(mihai.morar)
benjamin this is the bug I mentioned in the stability meeting.
Mihai's on PTO now, so I investigated this further. I tested on Mac OSX 10.7.5 with the whole endurance suite.

When running the endurance suite with a Firefox 26 nightly, I reproduce this bug. When running it with the 06/22 Firefox 24 nightly or any build built after a bisection of the comment 2 pushlog, it doesn't reproduce.


Here are the results of all my test runs:
http://mozmill-crowd.blargon7.com/#/endurance/reports?branch=All&platform=All&from=2013-08-12&to=2013-08-20

Any ideas of what's happening here?
Flags: needinfo?(mihai.morar)
Please can you provide the results of your bisect? It was confirmed previously that the regression range is 7ba8c86f1a56 to cea75ce9a559, therefore it should just be a case of performing a bisect/build to determine which commit introduced the regression. It would be worth also building the above mentioned revisions to confirm that a local build can replicate the regression.
Flags: needinfo?(ioana.budnar)
(In reply to Dave Hunt (:davehunt) from comment #21)
> Please can you provide the results of your bisect? It was confirmed
> previously that the regression range is 7ba8c86f1a56 to cea75ce9a559,
> therefore it should just be a case of performing a bisect/build to determine
> which commit introduced the regression. It would be worth also building the
> above mentioned revisions to confirm that a local build can replicate the
> regression.

The results of my testing were posted in my above comment, but I will detail them here if that's what you need.

I run the endurance test suite on a Firefox 26 build (latest nightly at that point) to make sure I reproduce the bug. It reproduced: http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2b18652

Bisected the pushlog in comment 2, built Firefox, and tested the following commits. None reproduced this bug:
2b0640fa5b18 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d27b97b6
918fd1e35b39 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d288a733
a0b216d3be12 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2abfea1
8ce7857353a6 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2bb9495
0c5c910798b7 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2c1780a
43c6fe6a9301 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2f81226
25fba7984dc9 - http://mozmill-crowd.blargon7.com/#/endurance/report/b7ef1fb3d9703aeaf2c46e07d2fe1bb0
19a51caf7582 - http://mozmill-crowd.blargon7.com/#/endurance/report/e503b7b0a70a3839c66c64572e0adec9
c082decf4957 - http://mozmill-crowd.blargon7.com/#/endurance/report/e503b7b0a70a3839c66c64572e242221
808302fb793e - http://mozmill-crowd.blargon7.com/#/endurance/report/e503b7b0a70a3839c66c64572e6b6ffe
deb4d87d1684 - http://mozmill-crowd.blargon7.com/#/endurance/report/e503b7b0a70a3839c66c64572e6eee22

cea75ce9a559 - http://mozmill-crowd.blargon7.com/#/endurance/report/e503b7b0a70a3839c66c64572e7b7b65
Flags: needinfo?(ioana.budnar)
Thanks for the detail Ioana, this is indeed much more useful. This implies either:

1. Your local build differs from the nightlies produced by release engineering
2. The original regression range in incorrect
3. You are unable to reproduce the original issue and are seeing a different regression in latest nightlies

You should be able to find out which (if any) of these are true. For item 1 you can directly compare a locally built Firefox with a nightly built from the same changeset. For items 2 and 3 you should be able to determine a regression range based on the regression you are able to reproduce.

Also, looking at your bisect, nothing before changeset 2b0640fa5b18 was tested. You probably need to at least compare 7ba8c86f1a56 with cea75ce9a559 to see if there is a memory regression, otherwise how are you determining a good/bad build?
Flags: needinfo?(ioana.budnar)
(In reply to Dave Hunt (:davehunt) from comment #23)
> Thanks for the detail Ioana, this is indeed much more useful. This implies
> either:
> 
> 1. Your local build differs from the nightlies produced by release
> engineering
> 2. The original regression range in incorrect
> 3. You are unable to reproduce the original issue and are seeing a different
> regression in latest nightlies
> 
> You should be able to find out which (if any) of these are true. For item 1
> you can directly compare a locally built Firefox with a nightly built from
> the same changeset. For items 2 and 3 you should be able to determine a
> regression range based on the regression you are able to reproduce.
> 
> Also, looking at your bisect, nothing before changeset 2b0640fa5b18 was
> tested. You probably need to at least compare 7ba8c86f1a56 with cea75ce9a559
> to see if there is a memory regression, otherwise how are you determining a
> good/bad build?

I already ran the test suite on the cea75ce9a559 Nightly, and I get the same results as for the one built locally.

I determined if good/bad by the memory consumption. On Fx 26 I got about the same as comment 0 specifies for the buggy build. On the other builds I got about the same that was specified for the build without the bug. ("about" means a difference of at most 2-3mb)

I am currently working on determining if I'm in case 2 or 3 and will post the results here as soon as I have them.
Flags: needinfo?(ioana.budnar)
Unfortunately you can't compare your results directly with results gathered on another machine, as they will be dependent on many things (not least the amount of memory present/available). The best way to determine if you can replicate the regression is to compare against your own results.
Any updates on this? It's been over a week. We really need to react faster to memory regressions.
Flags: needinfo?(ioana.budnar)
We are tracking endurance results here:

https://wiki.mozilla.org/Endurace_WorkLog#Endurance_Results_Triage_Log

As the charts show, memory usage on Mac OS got decreased since 2013-08-29 and it is still decreasing.
Thanks Mihai, but that does not get us any closer to understanding the regression that this bug is regarding. The recent memory improvements could be unrelated.
I have replicated the regression using the two original nightly builds it was reported for with a single iteration. The regression was much less, but we could try running with more entities or less tests with more iterations to see if we can make it more pronounced for confident regression hunting. My results are summarised below:

Build: 20130621031051
Avg resident: 75MB
Avg explicit: 139MB
http://mozmill-crowd.blargon7.com/#/endurance/report/3c3cd991de01b704f3f65783a1d95440

Build: 20130622031042
Avg resident: 76MB
Avg explicit: 142MB
http://mozmill-crowd.blargon7.com/#/endurance/report/3c3cd991de01b704f3f65783a1d9cbc3

Looking at these results, the flash tests are the ones that appear to have increased the most.
(In reply to Dave Hunt (:davehunt) from comment #25)
> Unfortunately you can't compare your results directly with results gathered
> on another machine, as they will be dependent on many things (not least the
> amount of memory present/available). The best way to determine if you can
> replicate the regression is to compare against your own results.

I got the same results for the 06/21 and the 06/22 builds, so I couldn't replicate the regression. What I found was indeed another bug. After investigating further, I got:

Last good: http://hg.mozilla.org/mozilla-central/rev/a81349b61b4e
First bad: http://hg.mozilla.org/mozilla-central/rev/d4f1b39a8575

testFlashIframe used 17mb more resident memory, and testFlashObject 8mb more, with the first bad build. The differences for resident memory for the other tests go around 1-3mb. All tests used more explicit memory than with the last good build:
http://mozmill-crowd.blargon7.com/#/endurance/report/3c3cd991de01b704f3f65783a1d848b5
http://mozmill-crowd.blargon7.com/#/endurance/report/3c3cd991de01b704f3f65783a1e92b6d
Flags: needinfo?(ioana.budnar)
(In reply to Ioana Budnar, QA [:ioana] from comment #30)
> Last good: http://hg.mozilla.org/mozilla-central/rev/a81349b61b4e
> First bad: http://hg.mozilla.org/mozilla-central/rev/d4f1b39a8575
> 
> testFlashIframe used 17mb more resident memory, and testFlashObject 8mb
> more, with the first bad build. The differences for resident memory for the
> other tests go around 1-3mb. All tests used more explicit memory than with

This might be unrelated to this bug and also an expected change given that the memory reporters have been changed for explicit memory. Nicholas might know more about it.

Otherwise I would also really appreciate that we can get the real issue sorted out here. It has been taken way too long so far.
(In reply to Ioana Budnar, QA [:ioana] from comment #30)
> testFlashIframe used 17mb more resident memory, and testFlashObject 8mb
> more, with the first bad build. The differences for resident memory for the
> other tests go around 1-3mb. All tests used more explicit memory than with
> the last good build:
> http://mozmill-crowd.blargon7.com/#/endurance/report/
> 3c3cd991de01b704f3f65783a1d848b5
> http://mozmill-crowd.blargon7.com/#/endurance/report/
> 3c3cd991de01b704f3f65783a1e92b6d

This is a different regression and should therefore be investigated in a separate bug. FWIW it's also likely to be the regression shown in our dashboard at: http://mozmill-daily.blargon7.com/#/endurance/charts?branch=25.0&platform=Mac&from=2013-07-29&to=2013-08-01

Mihai: Have you made any further progress here?
Flags: needinfo?(mihai.morar)
(In reply to Dave Hunt (:davehunt) from comment #32)
> This is a different regression and should therefore be investigated in a
> separate bug. FWIW it's also likely to be the regression shown in our
> dashboard at:
> http://mozmill-daily.blargon7.com/#/endurance/charts?branch=25.
> 0&platform=Mac&from=2013-07-29&to=2013-08-01

Now tracked in bug 912462.
I am trying for 2 days to build firefox but always get same errors:

1)
 5:28.01   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/sysconfig.py", line 407, in _init_posix
 5:28.01     raise DistutilsPlatformError(my_msg)
 5:28.01 distutils.errors.DistutilsPlatformError: $MACOSX_DEPLOYMENT_TARGET mismatch: now "10.6" but "10.7" during configure
 5:28.02 make[7]: *** [libs-xpcshell-tests] Error 1
 5:28.02 make[6]: *** [libs] Error 2
 5:28.02 make[5]: *** [libs] Error 2
 5:28.02 make[4]: *** [libs_tier_platform] Error 2
 5:28.02 make[3]: *** [tier_platform] Error 2
 5:28.02 make[2]: *** [default] Error 2
 5:28.04 make[1]: *** [realbuild] Error 2
 5:28.04 make: *** [build] Error 2
 5:28.08 34 compiler warnings present.
(mozmill-env)sl77-213:mozilla-central svuser$ 

See: https://bugzilla.mozilla.org/show_bug.cgi?id=659881

2)
(mozmill-env)sl77-213:mozilla-central-d8cce0fb9d9a svuser$ ~/mozmill-automation/testrun_endurance.py  --iterations=1 --entities=10 --report=http://mozmill-crowd.blargon7.com/db obj-x86_64-apple-darwin11.4.2/dist/Nightly.app
*** Platform: Mac OS X 10.7.5 64bit
*** Cloning repository to '/var/folders/ms/_pqw2rz91yn9lqh4yyj1xg7m0000gn/T/tmp0DS3qD.mozmill-tests'
requesting all changes
adding changesets
adding manifests
adding file changes
added 3386 changesets with 9970 changes to 780 files (+4 heads)
updating to branch default
412 files updated, 0 files merged, 0 files removed, 0 files unresolved
*** Application: Firefox 24.0a1
No option 'SourceRepository' in section: 'App'
*** Removing repository '/var/folders/ms/_pqw2rz91yn9lqh4yyj1xg7m0000gn/T/tmp0DS3qD.mozmill-tests'
Traceback (most recent call last):
  File "/Users/svuser/mozmill-automation/testrun_endurance.py", line 52, in <module>
    main()
  File "/Users/svuser/mozmill-automation/testrun_endurance.py", line 47, in main
    EnduranceTestRun().run()
  File "/Users/svuser/mozmill-automation/libs/testrun.py", line 439, in run
    raise self.last_exception
ConfigParser.NoOptionError: No option 'SourceRepository' in section: 'App'

I have reinstalled the entire OS and the environments and tools required, followed all steps from https://developer.mozilla.org/en-US/docs/Simple_Firefox_build but not able to build the requested changesets. Something had happened with the machine since I get back from PTO because I was not able to build any changesets since that.
Flags: needinfo?(mihai.morar)
Mihai: Please raise these issues separately and keep this bug only for progress on the investigation of the memory regression.
(In reply to Dave Hunt (:davehunt) from comment #32)
> Mihai: Have you made any further progress here?

I reduced the regression to about 15changesets. I have used 5 iterations and 10 entities because using only 1 iteration, I was not able to say if a build is good or bad due to the small memory differences. 

Last Good Changeset: 7ba8c86f1a56
Avg resident: 80MB
Avg explicit: 171MB
http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445689fca

First Bad Changeset: adfd8d9bfd0b
Avg resident: 89MB
Avg explicit: 181MB
http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace6744579bc5b

I will continue on Monday morning to bisect and find the real causing changeset. There are about 2-3 more changesets that have to be tested for that.
Just finished the bisection. Here are my final results:

The first bad revision is:
changeset:   135929:03fcdac57879
parent:      135928:03a709825eff
parent:      135870:7ba8c86f1a56
user:        Ryan VanderMeulen <ryanvm@gmail.com>
date:        Thu Jun 20 20:12:31 2013 -0400
summary:     Merge m-c to inbound.
Something look weird here. If I bisect using the following regression: good 7ba8c86f1a56, bad 03fcdac57879 I get the results mentioned in Comment 37 but please take a look at the following URL which says there several changesets between my last good and first bad mentioned above.

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7ba8c86f1a56&tochange=03fcdac57879
Flags: needinfo?(dave.hunt)
(In reply to Mihai Morar, QA (:MihaiMorar) from comment #38)
> Something look weird here. If I bisect using the following regression: good
> 7ba8c86f1a56, bad 03fcdac57879 I get the results mentioned in Comment 37 but
> please take a look at the following URL which says there several changesets
> between my last good and first bad mentioned above.
> 
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=7ba8c86f1a56&tochange=03fcdac57879

Why are you using 7ba8c86f1a56 as your fromchange value? Your bisect will have identified later changesets that were also good. Do you have the full history of your bisect?

I wouldn't expect http://hg.mozilla.org/mozilla-central/rev/03fcdac57879 to cause a regression as this is a merge from mozilla-central to mozilla-inbound.
Flags: needinfo?(dave.hunt)
(In reply to Dave Hunt (:davehunt) from comment #39)

> Why are you using 7ba8c86f1a56 as your fromchange value? 
Based on Comment 2 it looks like 7ba8c86f1a56 was last good build known.

> Do you have the full history of your bisect?
I have used following range:

-good 7ba8c86f1a56 (Nightly 2013/06/21) - http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445689fca
-bad cea75ce9a559 (Nightly 2013/06/22) - http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace674456796b5

Tested following changesets:

d8cce0fb9d9a - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace674456a6c5d

adfd8d9bfd0b - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace6744579bc5b

5eabc0ad7e33 - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445d73a4f

86ddaf01b346 - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445d9db92

16ddd7268abc - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445e1c5bf

03fcdac57879 - BAD http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445ea4141

After I tested changeset 03fcdac57879 I got this results:

The first bad revision is:
changeset:   135929:03fcdac57879
parent:      135928:03a709825eff
parent:      135870:7ba8c86f1a56
user:        Ryan VanderMeulen <ryanvm@gmail.com>
date:        Thu Jun 20 20:12:31 2013 -0400
summary:     Merge m-c to inbound.
(In reply to Mihai Morar, QA (:MihaiMorar) from comment #40)
> (In reply to Dave Hunt (:davehunt) from comment #39)
> 
> > Why are you using 7ba8c86f1a56 as your fromchange value? 
> Based on Comment 2 it looks like 7ba8c86f1a56 was last good build known.

Yes, but that was before you started bisecting.

Were you using hg bisect to automatically bisect? I don't understand why it stops at 03fcdac57879 when there are still several changesets to test and none have been found to be good. Please try a new bisect between the changesets 7ba8c86f1a56 and 03fcdac57879.
Attached image Print Screen
Yes I am using Hg bisect. I almost got to an end, I'm pretty close. After receiving results mentioned in Comment 40 I continue to bisect using following range:

-good 7ba8c86f1a56 - http://mozmill-crowd.blargon7.com/#/endurance/report/fb97b6210ae70da1b9ace67445689fca
-bad 03a709825eff - http://mozmill-crowd.blargon7.com/#/endurance/report/26d5854914a68aa13b333f28157aacda

It took 3b16a9c6fb6a which was good - http://mozmill-crowd.blargon7.com/#/endurance/report/26d5854914a68aa13b333f28152f8f45 and my actual range is mentioned in the attachment.
Final results:

The first bad revision is:
changeset:   135911:b029e52490da
user:        Matt Woodrow <mwoodrow@mozilla.com>
date:        Fri Jun 21 09:32:04 2013 +1200
summary:     Bug 756601 - Enable OMTC by default on OSX 10.7 and 10.8. r=roc

last good changeset: 79d6fa05cff4 - http://mozmill-crowd.blargon7.com/#/endurance/report/26d5854914a68aa13b333f2815a1c941
first bad changeset: b029e52490da - http://mozmill-crowd.blargon7.com/#/endurance/report/dc3d4d2f66af31b9e75e291fa0152801
Great, thanks Mihai!

Robert, Matt: Is the memory increase expected as a result of this patch landing?
Flags: needinfo?(roc)
Flags: needinfo?(matt.woodrow)
I would expect some memory usage regression. I don't know if this much is reasonable.
Flags: needinfo?(roc)
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Some higher memory usage is definitely expected, but I wouldn't expect Flash to cause any more than other pages.

How do I reproduce this locally?

I followed the instructions from https://developer.mozilla.org/en-US/docs/Mozmill_Tests and come up with:

mozmill -t mozmill-tests/tests/endurance/testFlash_SWFVideoEmbed/test1.js -b "/Applications/FirefoxNightly.app/" --report=http://mozmill-crowd.blargon7.com/db/

But this fails with the error: Sending results to 'http://mozmill-crowd.blargon7.com/db/' failed (This document requires the field tests_changeset).

I also tried the instructions here https://quality.mozilla.org/docs/mozmill/investigating-memory-regressions/ for running against nightly builds, but download.py no longer exists.
Flags: needinfo?(matt.woodrow)
Ok, I think I have a lead on this.

We don't do leak checking on MacIOSurface (the graphics surfaces allocated for plugins on OSX), and manually adding one shows us leaking them.

Getting the docs for running MozMill would still be useful for the future though :)
Assignee: mihai.morar → matt.woodrow
So, we're taking a refcounted object (MacIOSurface), adding a ref and wrapping it in a non-refcounted one (SharedTextureHandle).

We give that to an Image, which has no way of releasing the ref. If we never share the image with the compositor, then it leaks.

If we do share it with the compositor, then we're passing the raw pointer across IPDL (so it'll never work with e10s), plus we have complicated lifetime issues since both the Image* and the compositor's TextureHost have the same SharedTextureHandle and have to agree on who releases it.

Given this, I don't think that the SharedTextureHandle abstraction is worth the cost, so I'm attaching patches to have an explicit Image/TextureClient/TextureHost for MacIOSurface.

This lets us manage the lifetime sanely (using the refcount), and also share across IPDL using the ID number (which will work cross-process).

I can no longer reproduce any leaking MacIOSurface objects with my patch queue.
Attachment #823773 - Flags: review?(roc)
Note that this is just an intermediate step that preserves existing behaviour.
Attachment #823774 - Flags: review?(nical.bugzilla)
Attachment #823775 - Flags: review?(roc)
Attachment #823776 - Flags: review?(nical.bugzilla)
Attachment #823777 - Flags: review?(nical.bugzilla)
Attachment #823778 - Flags: review?(nical.bugzilla)
Decided that there's no need to have these as separate patches.
Attachment #823774 - Attachment is obsolete: true
Attachment #823778 - Attachment is obsolete: true
Attachment #823774 - Flags: review?(nical.bugzilla)
Attachment #823778 - Flags: review?(nical.bugzilla)
Attachment #823784 - Flags: review?(nical.bugzilla)
Comment on attachment 823776 [details] [diff] [review]
TextureClient implementation for MacIOSurface

Review of attachment 823776 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureClientOGL.h
@@ +87,5 @@
> +  virtual TextureClientData* DropTextureData() MOZ_OVERRIDE
> +  {
> +    mSurface = nullptr;
> +    MarkInvalid();
> +    return nullptr;

Please add a comment here explaining why it is OK to clear your reference to the data without returning a TextureClientData to be deallocated later.

This breaks the semantics of TEXTURE_DEALLOCATE_CLIENT because then we don't really deallocate on the client (unless the MacIOSurface is externally owned but in that case it would be safer to return a TextureClientData that would signal the owner when the TextureClientData is destroyed).

SharedTextureClientOGL also does that but it's not great. If we want do do that here as well there should be a comment explaining why it is safe to do that.
Comment on attachment 823777 [details] [diff] [review]
TextureHost implementation for MacIOSurface

Review of attachment 823777 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureHostOGL.h
@@ +422,5 @@
> +  virtual bool Lock() MOZ_OVERRIDE;
> +
> +  virtual gfx::SurfaceFormat GetFormat() const MOZ_OVERRIDE {
> +    MOZ_ASSERT(mTextureSource);
> +    return mTextureSource->GetFormat();

With all other TextureHosts we can call GetFormat without locking which is not the case here (mTextureSource is null until the first call to Lock()).
The PrintInfo stuff calls GetFormat without locking the TextureHost (which is not tested on tbpl but eventually someone uses it for debugging and it crashes, we already had a bug like that with GrallocTextureHostOGL).
So you should make it possible to call GetFormat() without locking the host.
Comment on attachment 823784 [details] [diff] [review]
Add support for MacIOSurfaceImage to ImageClient v2

Review of attachment 823784 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ImageClient.cpp
@@ +238,5 @@
> +      return false;
> +    }
> +
> +    GetForwarder()->UseTexture(this, mFrontBuffer);
> +#endif

Please override MacIOSurfaceImage::GetTextureClient() and MacIOSurface::AsSharedImage() instead of adding this block
Attachment #823784 - Flags: review?(nical.bugzilla) → review-
(In reply to Matt Woodrow (:mattwoodrow) from comment #47)
> Getting the docs for running MozMill would still be useful for the future
> though :)

Thanks, I've updated the documentation at https://quality.mozilla.org/docs/mozmill/investigating-memory-regressions/
Attachment #823776 - Attachment is obsolete: true
Attachment #823776 - Flags: review?(nical.bugzilla)
Attachment #824153 - Flags: review?(nical.bugzilla)
Attachment #823777 - Attachment is obsolete: true
Attachment #823777 - Flags: review?(nical.bugzilla)
Attachment #824154 - Flags: review?(nical.bugzilla)
Attachment #823784 - Attachment is obsolete: true
Attachment #824156 - Flags: review?(nical.bugzilla)
Given that this bug has now been transformed to an integration bug for the fix, Matt would you mind to move this bug into the right component and update the summary? Thanks.
Attachment #824153 - Flags: review?(nical.bugzilla) → review+
Attachment #824154 - Flags: review?(nical.bugzilla) → review+
Attachment #824156 - Flags: review?(nical.bugzilla) → review+
Matt, please nominate these for Aurora.  (Given the non-simple nature of the patches and the fact that we already shipped this regression in FF24, we probably shouldn't backport to Beta.)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(matt.woodrow)
I no longer seem to be able to request approval, the flag option isn't there under patch details. Is this a recent bugzilla change?
(In reply to Matt Woodrow (:mattwoodrow) from comment #69)
> I no longer seem to be able to request approval, the flag option isn't there
> under patch details. Is this a recent bugzilla change?

Byron, do you know?
Flags: needinfo?(glob)
(In reply to Matt Woodrow (:mattwoodrow) from comment #69)
> I no longer seem to be able to request approval, the flag option isn't there
> under patch details. Is this a recent bugzilla change?

I think that's because this is filed under Mozilla QA rather than Core.  So just move it over to whatever appropriate component and you should be able to request approval.
Component: Mozmill Tests → Graphics: Layers
Product: Mozilla QA → Core
QA Contact: hskupin
Comment on attachment 823773 [details] [diff] [review]
Add MacIOSurfaceImage

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 756601
User impact if declined: Leaking memory when using plugins. Leak should be relative to the number of plugin instances used, not number of frames presented.
Testing completed (on m-c, etc.): Been on m-c for a few days.
Risk to taking this patch (and alternatives if risky): Big change, but no real alternative other than ignoring it.
String or IDL/UUID changes made by this patch: None.
Attachment #823773 - Flags: approval-mozilla-aurora?
Note that the above request is for all patches in this bug.

(In reply to Andrew McCreight [:mccr8] from comment #71)
> I think that's because this is filed under Mozilla QA rather than Core.  So
> just move it over to whatever appropriate component and you should be able
> to request approval.

That worked, thanks.
Flags: needinfo?(glob)
Target Milestone: --- → mozilla28
(In reply to Matt Woodrow (:mattwoodrow) from comment #72)
> Comment on attachment 823773 [details] [diff] [review]
> Add MacIOSurfaceImage
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): Bug 756601
> User impact if declined: Leaking memory when using plugins. Leak should be
> relative to the number of plugin instances used, not number of frames
> presented.
> Testing completed (on m-c, etc.): Been on m-c for a few days.
> Risk to taking this patch (and alternatives if risky): Big change, but no
> real alternative other than ignoring it.
> String or IDL/UUID changes made by this patch: None.

Would any fallout be obvious in the next couple of weeks while this bakes on aurora? Do you recommend any targeted QA testing here ? Would this be simple enough to backout if we found some serious regressions once we enter the beta cycle. Given your comment that the change is "big" want to have a few options before uplifting in case this causes some major web regressions.
Fallout should be obvious, as any bugs would affect all Flash (and other plugin) content. I doubt we need an extra QA testing.

I don't imagine backing this out will be too difficult, we won't be doing much on top of it that would conflict.
Attachment #823773 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: verifyme
Depends on: 935778
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: