Last Comment Bug 853390 - Significant spike in https://areweslimyet.com/ from image decoding refactor
: Significant spike in https://areweslimyet.com/ from image decoding refactor
Status: RESOLVED FIXED
[MemShrink]
:
Product: Core
Classification: Components
Component: General (show other bugs)
: unspecified
: x86_64 Linux
: -- normal with 2 votes (vote)
: mozilla22
Assigned To: Joe Drew (not getting mail)
:
Mentors:
Depends on:
Blocks: 716140
  Show dependency treegraph
 
Reported: 2013-03-21 04:07 PDT by Olli Pettay [:smaug] (high review load, please consider other reviewers)
Modified: 2013-10-29 12:19 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
sm.bad.json (see comment 4) (575.24 KB, application/json)
2013-03-21 06:09 PDT, Justin Lebar (not reading bugmail)
no flags Details
sm.good.json (see comment 4) (493.65 KB, application/json)
2013-03-21 06:10 PDT, Justin Lebar (not reading bugmail)
no flags Details
fix (1.97 KB, patch)
2013-03-21 18:09 PDT, Joe Drew (not getting mail)
seth: review+
Details | Diff | Review
AWSY screenshot showing the problem is fixed (162.23 KB, image/png)
2013-03-24 15:58 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details

Description Olli Pettay [:smaug] (high review load, please consider other reviewers) 2013-03-21 04:07:48 PDT
Something which landed March 17 or 18 seem to have caused higher memory usage.
Comment 1 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2013-03-21 04:11:24 PDT
If I read awsy correctly, Bug 716140 looks like the cause.
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c9c29cc97af0&tochange=1407288d820b
Comment 2 Justin Lebar (not reading bugmail) 2013-03-21 05:26:53 PDT
We should probably back bug 716140 out.
Comment 3 Justin Lebar (not reading bugmail) 2013-03-21 05:28:44 PDT
Maybe this is a compartment leak?  I'm not having much success diffing the awsy results...
Comment 4 Justin Lebar (not reading bugmail) 2013-03-21 06:07:18 PDT
Yeah, it looks like we have additional compartments alive.  I guess it's not right to call it a "leak" because the compartments seem to be collected in the end (purple line).

> $ diff <(grep 'compartment(http' sm.good.json | sort) <(grep 'compartment(http' sm.bad.json | sort)
> 0a1,29
> >                     "js-compartment(http:\\\\localhost:8053\\tp5\\douban.com\\www.douban.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8054\\tp5\\imgur.com\\imgur.com\\gallery\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.reddit.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.redditmedia.com\\ads\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\dads.new.digg.com\\view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.facebook.com\\plugins\\likebox.php@href=http%253A%252F%252Fwww.facebook.com%252Fpages%252FFilesTube-Media-Search-Engine%252F135577699745&width=292&connections=0&stream=false&header=true&height=62.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8059\\tp5\\whois.domaintools.com\\whois.domaintools.com\\mozilla.com.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=331.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=409.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=441.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\timesofindia.indiatimes.com\\glbldata.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphaniftysensexframe1.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphatbslform_new1.cms@type=travel.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googlehome.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googleslink.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\subscribe.cms.html)": {

I'll attach my sm.good.json and sm.bad.json files; they're the "TabsOpenSettled" sections from before and after bug 716140 landed.
Comment 5 Justin Lebar (not reading bugmail) 2013-03-21 06:09:28 PDT
Created attachment 727654 [details]
sm.bad.json (see comment 4)

TabsOpenSettled section from https://areweslimyet.com/data/d1f978369c50ff398618c6ce04d1dc33d6883d81.json.gz
Comment 6 Justin Lebar (not reading bugmail) 2013-03-21 06:10:34 PDT
Created attachment 727656 [details]
sm.good.json (see comment 4)

TabsOpenSettled section from https://areweslimyet.com/data/eb01f41e077b0dab04766d93164dbe2596bd3fec.json
Comment 7 IU 2013-03-21 06:50:17 PDT
(In reply to Justin Lebar [:jlebar] from comment #2)
> We should probably back bug 716140 out.

Why is this blocking 716140?  It's not even in the range.
Comment 8 Justin Lebar (not reading bugmail) 2013-03-21 06:54:30 PDT
Are you looking at the pushlog in comment 1?  That certainly has bug 716104 in it.  Or have a look in hg at the difference between the commits in comment 5 and comment 6.
Comment 9 IU 2013-03-21 07:25:16 PDT
Yes comment 1 has it, but comment 0 said it was due to a March 17 or 18 landing.
Comment 10 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2013-03-21 07:40:31 PDT
Apparently awsy gives odd dates occasionally unless you zoom in enough. So the dates in comment 0 are
bogus.
Comment 11 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2013-03-21 07:45:32 PDT
(or maybe I just managed to look at the wrong dates somehow.)
Comment 12 John Schoenick [:johns] 2013-03-21 08:39:53 PDT
(In reply to Olli Pettay [:smaug] from comment #10)
> Apparently awsy gives odd dates occasionally unless you zoom in enough. So
> the dates in comment 0 are
> bogus.

When zoomed out, each point represents many runs, and the tooltip is showing a midpoint of that timerange. I opened an issue[1] on the AWSY tracker to clarify this.

Clicking on the datapoints involved, it looks like the first bad tested revision was d1f978369c50ff398618c6ce04d1dc33d6883d81 and the last good tested revision was eb01f41e077b0dab04766d93164dbe2596bd3fec. AWSY only tests the tip of every m-i push by default, but I queued up tests of every revision in that range to see if it can narrow it down further.

Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eb01f41e077b0dab04766d93164dbe2596bd3fec&tochange=d1f978369c50ff398618c6ce04d1dc33d6883d81

[1] https://github.com/Nephyrin/MozAreWeSlimYet/issues/31
Comment 13 Joe Drew (not getting mail) 2013-03-21 11:40:47 PDT
Before the commit, there were roughly 13 MB of images, but after, there were 18 MB. This tells me that we're not discarding them as opportunistically as we used to.
Comment 14 Joe Drew (not getting mail) 2013-03-21 12:24:31 PDT
Actually, these are in "used" images - these images shouldn't be used, since their web pages have gone away. This also explains why after 30s the CC/GC make the problem go away; there's some sort of reference cycle here not letting us destroy the images.
Comment 15 Joe Drew (not getting mail) 2013-03-21 13:46:38 PDT
Note: we're not multithreaded yet :)
Comment 16 Justin Lebar (not reading bugmail) 2013-03-21 17:27:28 PDT
(In reply to Joe Drew (:JOEDREW! \o/) from comment #14)
> Actually, these are in "used" images - these images shouldn't be used, since
> their web pages have gone away.

Except they haven't; see comment 4.
Comment 17 Joe Drew (not getting mail) 2013-03-21 18:08:12 PDT
Right; what I meant was "tabs".
Comment 18 Joe Drew (not getting mail) 2013-03-21 18:09:30 PDT
Created attachment 727997 [details] [diff] [review]
fix

This is something that might help, based on what fixed bug 820602. I tried pushing it to try, but I guess try is blowing up into thousands of tiny pieces right now
Comment 19 Joe Drew (not getting mail) 2013-03-21 21:12:07 PDT
Comment on attachment 727997 [details] [diff] [review]
fix

https://areweslimyet.com/?series=joe

The second point on the graph is the actual fix, this patch. The first was a stab in the dark and is probably wrong.

What this does is ensure that after every InitDecoder, we call FinishedSomeDecoding() to block onload in our observers. My theory is that we were letting onload fire while images were still decoding, which meant that after we closed all those tabs there were still objects busily working.

The ShutdownDecoder call that I removed was an artifact of earlier revisions of my patches, and unnecessary.
Comment 20 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-21 22:03:43 PDT
I find it disturbing that image decoding tweaks can have such a large effect on memory consumption...
Comment 21 Joe Drew (not getting mail) 2013-03-22 06:19:51 PDT
In this case it doesn't actually affect real-world memory consumption, just our measurement of it. This patch makes the images finish decoding before onload fires; without the patch, onload and image decoding are not guaranteed to be related.

Basically, this patch makes us measure the real-world result in AWSY.
Comment 22 Joe Drew (not getting mail) 2013-03-22 06:29:44 PDT
Try push: https://tbpl.mozilla.org/?tree=Try&rev=f47a623a5260
Comment 23 John Schoenick [:johns] 2013-03-22 08:03:30 PDT
(In reply to Joe Drew (:JOEDREW! \o/) from comment #21)
> In this case it doesn't actually affect real-world memory consumption, just
> our measurement of it. This patch makes the images finish decoding before
> onload fires; without the patch, onload and image decoding are not
> guaranteed to be related.
> 
> Basically, this patch makes us measure the real-world result in AWSY.

AWSY, waits ten seconds between each tab finishing loading and opening the next one, which should be plenty of time to finish any image decoding, shouldn't it?
Comment 24 Joe Drew (not getting mail) 2013-03-22 09:51:50 PDT
Does it wait that same 10s before measuring when closing all tabs?
Comment 25 John Schoenick [:johns] 2013-03-22 10:04:05 PDT
(In reply to Joe Drew (:JOEDREW! \o/) from comment #24)
> Does it wait that same 10s before measuring when closing all tabs?

It does, you can see the basic logic here:
https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/mozmill_endurance_test/test1.js#L202
Comment 26 Seth Fowler [:seth] [:s2h] 2013-03-22 14:48:10 PDT
Comment on attachment 727997 [details] [diff] [review]
fix

Review of attachment 727997 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +2809,5 @@
>    if (!mDecoder) {
>      rv = InitDecoder(/* aDoSizeDecode = */ false);
> +    CONTAINER_ENSURE_SUCCESS(rv);
> +
> +    rv = FinishedSomeDecoding();

Hopefully this won't ever immediately shut down the decoder... I'd suggest adding an assert that it didn't.

@@ +2911,5 @@
>  
> +  rv = FinishedSomeDecoding();
> +  CONTAINER_ENSURE_SUCCESS(rv);
> +
> +  if (mDecoder) {

This is a great improvement. It was never clear to me why we did things the other way, since if we took that branch IsDecodedFinish() is true, and that means FinishedSomeDecoding() will shut down the decoder.
Comment 27 Joe Drew (not getting mail) 2013-03-22 16:12:25 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8d39b112885
Comment 28 Joe Drew (not getting mail) 2013-03-23 15:59:34 PDT
https://hg.mozilla.org/mozilla-central/rev/e8d39b112885
Comment 29 Nicholas Nethercote [:njn] (on vacation until July 11) 2013-03-24 15:58:26 PDT
Created attachment 728789 [details]
AWSY screenshot showing the problem is fixed

AWSY confirms the fix.  Thanks for the quick response, Joe!

It's still not clear to me if this was a genuine problem or an artifact of how AWSY does its measurements.

Note You need to log in before you can comment on or make changes to this bug.