Significant spike in https://areweslimyet.com/ from image decoding refactor

RESOLVED FIXED in mozilla22

Status

()

Core
General
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: smaug, Assigned: Joe Drew (not getting mail))

Tracking

unspecified
mozilla22
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink])

Attachments

(4 attachments)

(Reporter)

Description

4 years ago
Something which landed March 17 or 18 seem to have caused higher memory usage.
(Reporter)

Comment 1

4 years ago
If I read awsy correctly, Bug 716140 looks like the cause.
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c9c29cc97af0&tochange=1407288d820b
Whiteboard: [MemShrink]
We should probably back bug 716140 out.
Blocks: 716140
Maybe this is a compartment leak?  I'm not having much success diffing the awsy results...
Yeah, it looks like we have additional compartments alive.  I guess it's not right to call it a "leak" because the compartments seem to be collected in the end (purple line).

> $ diff <(grep 'compartment(http' sm.good.json | sort) <(grep 'compartment(http' sm.bad.json | sort)
> 0a1,29
> >                     "js-compartment(http:\\\\localhost:8053\\tp5\\douban.com\\www.douban.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8054\\tp5\\imgur.com\\imgur.com\\gallery\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.reddit.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.redditmedia.com\\ads\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\dads.new.digg.com\\view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.facebook.com\\plugins\\likebox.php@href=http%253A%252F%252Fwww.facebook.com%252Fpages%252FFilesTube-Media-Search-Engine%252F135577699745&width=292&connections=0&stream=false&header=true&height=62.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8059\\tp5\\whois.domaintools.com\\whois.domaintools.com\\mozilla.com.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=331.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=409.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=441.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\timesofindia.indiatimes.com\\glbldata.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphaniftysensexframe1.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphatbslform_new1.cms@type=travel.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googlehome.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googleslink.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\subscribe.cms.html)": {

I'll attach my sm.good.json and sm.bad.json files; they're the "TabsOpenSettled" sections from before and after bug 716140 landed.
Created attachment 727654 [details]
sm.bad.json (see comment 4)

TabsOpenSettled section from https://areweslimyet.com/data/d1f978369c50ff398618c6ce04d1dc33d6883d81.json.gz
Created attachment 727656 [details]
sm.good.json (see comment 4)

TabsOpenSettled section from https://areweslimyet.com/data/eb01f41e077b0dab04766d93164dbe2596bd3fec.json

Comment 7

4 years ago
(In reply to Justin Lebar [:jlebar] from comment #2)
> We should probably back bug 716140 out.

Why is this blocking 716140?  It's not even in the range.
Are you looking at the pushlog in comment 1?  That certainly has bug 716104 in it.  Or have a look in hg at the difference between the commits in comment 5 and comment 6.

Comment 9

4 years ago
Yes comment 1 has it, but comment 0 said it was due to a March 17 or 18 landing.
(Reporter)

Comment 10

4 years ago
Apparently awsy gives odd dates occasionally unless you zoom in enough. So the dates in comment 0 are
bogus.
(Reporter)

Comment 11

4 years ago
(or maybe I just managed to look at the wrong dates somehow.)
(In reply to Olli Pettay [:smaug] from comment #10)
> Apparently awsy gives odd dates occasionally unless you zoom in enough. So
> the dates in comment 0 are
> bogus.

When zoomed out, each point represents many runs, and the tooltip is showing a midpoint of that timerange. I opened an issue[1] on the AWSY tracker to clarify this.

Clicking on the datapoints involved, it looks like the first bad tested revision was d1f978369c50ff398618c6ce04d1dc33d6883d81 and the last good tested revision was eb01f41e077b0dab04766d93164dbe2596bd3fec. AWSY only tests the tip of every m-i push by default, but I queued up tests of every revision in that range to see if it can narrow it down further.

Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eb01f41e077b0dab04766d93164dbe2596bd3fec&tochange=d1f978369c50ff398618c6ce04d1dc33d6883d81

[1] https://github.com/Nephyrin/MozAreWeSlimYet/issues/31
(Assignee)

Comment 13

4 years ago
Before the commit, there were roughly 13 MB of images, but after, there were 18 MB. This tells me that we're not discarding them as opportunistically as we used to.
(Assignee)

Comment 14

4 years ago
Actually, these are in "used" images - these images shouldn't be used, since their web pages have gone away. This also explains why after 30s the CC/GC make the problem go away; there's some sort of reference cycle here not letting us destroy the images.
Summary: Significant spike in https://areweslimyet.com/ → Significant spike in https://areweslimyet.com/ from multithreaded image decoding
(Assignee)

Comment 15

4 years ago
Note: we're not multithreaded yet :)
Summary: Significant spike in https://areweslimyet.com/ from multithreaded image decoding → Significant spike in https://areweslimyet.com/ from image decoding refactor
(In reply to Joe Drew (:JOEDREW! \o/) from comment #14)
> Actually, these are in "used" images - these images shouldn't be used, since
> their web pages have gone away.

Except they haven't; see comment 4.
(Assignee)

Comment 17

4 years ago
Right; what I meant was "tabs".
(Assignee)

Comment 18

4 years ago
Created attachment 727997 [details] [diff] [review]
fix

This is something that might help, based on what fixed bug 820602. I tried pushing it to try, but I guess try is blowing up into thousands of tiny pieces right now
Assignee: nobody → joe
(Assignee)

Comment 19

4 years ago
Comment on attachment 727997 [details] [diff] [review]
fix

https://areweslimyet.com/?series=joe

The second point on the graph is the actual fix, this patch. The first was a stab in the dark and is probably wrong.

What this does is ensure that after every InitDecoder, we call FinishedSomeDecoding() to block onload in our observers. My theory is that we were letting onload fire while images were still decoding, which meant that after we closed all those tabs there were still objects busily working.

The ShutdownDecoder call that I removed was an artifact of earlier revisions of my patches, and unnecessary.
Attachment #727997 - Attachment description: maybe fix? → fix
Attachment #727997 - Flags: review?(seth)
I find it disturbing that image decoding tweaks can have such a large effect on memory consumption...
(Assignee)

Comment 21

4 years ago
In this case it doesn't actually affect real-world memory consumption, just our measurement of it. This patch makes the images finish decoding before onload fires; without the patch, onload and image decoding are not guaranteed to be related.

Basically, this patch makes us measure the real-world result in AWSY.
(Assignee)

Comment 22

4 years ago
Try push: https://tbpl.mozilla.org/?tree=Try&rev=f47a623a5260
(In reply to Joe Drew (:JOEDREW! \o/) from comment #21)
> In this case it doesn't actually affect real-world memory consumption, just
> our measurement of it. This patch makes the images finish decoding before
> onload fires; without the patch, onload and image decoding are not
> guaranteed to be related.
> 
> Basically, this patch makes us measure the real-world result in AWSY.

AWSY, waits ten seconds between each tab finishing loading and opening the next one, which should be plenty of time to finish any image decoding, shouldn't it?
(Assignee)

Comment 24

4 years ago
Does it wait that same 10s before measuring when closing all tabs?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #24)
> Does it wait that same 10s before measuring when closing all tabs?

It does, you can see the basic logic here:
https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/mozmill_endurance_test/test1.js#L202
Comment on attachment 727997 [details] [diff] [review]
fix

Review of attachment 727997 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +2809,5 @@
>    if (!mDecoder) {
>      rv = InitDecoder(/* aDoSizeDecode = */ false);
> +    CONTAINER_ENSURE_SUCCESS(rv);
> +
> +    rv = FinishedSomeDecoding();

Hopefully this won't ever immediately shut down the decoder... I'd suggest adding an assert that it didn't.

@@ +2911,5 @@
>  
> +  rv = FinishedSomeDecoding();
> +  CONTAINER_ENSURE_SUCCESS(rv);
> +
> +  if (mDecoder) {

This is a great improvement. It was never clear to me why we did things the other way, since if we took that branch IsDecodedFinish() is true, and that means FinishedSomeDecoding() will shut down the decoder.
Attachment #727997 - Flags: review?(seth) → review+
(Assignee)

Comment 27

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8d39b112885
Target Milestone: --- → mozilla22
(Assignee)

Comment 28

4 years ago
https://hg.mozilla.org/mozilla-central/rev/e8d39b112885
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Created attachment 728789 [details]
AWSY screenshot showing the problem is fixed

AWSY confirms the fix.  Thanks for the quick response, Joe!

It's still not clear to me if this was a genuine problem or an artifact of how AWSY does its measurements.

Updated

4 years ago
Depends on: 932277

Updated

4 years ago
No longer depends on: 932277
You need to log in before you can comment on or make changes to this bug.