Closed Bug 853390 Opened 9 years ago Closed 9 years ago

Significant spike in https://areweslimyet.com/ from image decoding refactor

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: smaug, Assigned: joe)

References

Details

(Whiteboard: [MemShrink])

Attachments

(4 files)

Something which landed March 17 or 18 seem to have caused higher memory usage.
Whiteboard: [MemShrink]
We should probably back bug 716140 out.
Blocks: 716140
Maybe this is a compartment leak?  I'm not having much success diffing the awsy results...
Yeah, it looks like we have additional compartments alive.  I guess it's not right to call it a "leak" because the compartments seem to be collected in the end (purple line).

> $ diff <(grep 'compartment(http' sm.good.json | sort) <(grep 'compartment(http' sm.bad.json | sort)
> 0a1,29
> >                     "js-compartment(http:\\\\localhost:8053\\tp5\\douban.com\\www.douban.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8054\\tp5\\imgur.com\\imgur.com\\gallery\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.reddit.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8055\\tp5\\reddit.com\\www.redditmedia.com\\ads\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\dads.new.digg.com\\view.html@kw=zone%253A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%253Apermalink&template=5.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8056\\tp5\\digg.com\\digg.com\\news\\story\\New_logo_for_Mozilla_Firefox_browser.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.facebook.com\\plugins\\likebox.php@href=http%253A%252F%252Fwww.facebook.com%252Fpages%252FFilesTube-Media-Search-Engine%252F135577699745&width=292&connections=0&stream=false&header=true&height=62.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html)": {
> >                     "js-compartment(http:\\\\localhost:8057\\tp5\\filestube.com\\www.filestube.com\\t\\the+vampire+diaries.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8058\\tp5\\dailymail.co.uk\\www.dailymail.co.uk\\ushome\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8059\\tp5\\whois.domaintools.com\\whois.domaintools.com\\mozilla.com.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=331.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=409.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\netspiderads3.indiatimes.com\\ads.dll\\getad@slotid=441.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\timesofindia.indiatimes.com\\glbldata.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphaniftysensexframe1.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\alphatbslform_new1.cms@type=travel.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googlehome.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\googleslink.cms.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\index.html, about:blank)": {
> >                     "js-compartment(http:\\\\localhost:8060\\tp5\\indiatimes.com\\www.indiatimes.com\\subscribe.cms.html)": {

I'll attach my sm.good.json and sm.bad.json files; they're the "TabsOpenSettled" sections from before and after bug 716140 landed.
(In reply to Justin Lebar [:jlebar] from comment #2)
> We should probably back bug 716140 out.

Why is this blocking 716140?  It's not even in the range.
Are you looking at the pushlog in comment 1?  That certainly has bug 716104 in it.  Or have a look in hg at the difference between the commits in comment 5 and comment 6.
Yes comment 1 has it, but comment 0 said it was due to a March 17 or 18 landing.
Apparently awsy gives odd dates occasionally unless you zoom in enough. So the dates in comment 0 are
bogus.
(or maybe I just managed to look at the wrong dates somehow.)
(In reply to Olli Pettay [:smaug] from comment #10)
> Apparently awsy gives odd dates occasionally unless you zoom in enough. So
> the dates in comment 0 are
> bogus.

When zoomed out, each point represents many runs, and the tooltip is showing a midpoint of that timerange. I opened an issue[1] on the AWSY tracker to clarify this.

Clicking on the datapoints involved, it looks like the first bad tested revision was d1f978369c50ff398618c6ce04d1dc33d6883d81 and the last good tested revision was eb01f41e077b0dab04766d93164dbe2596bd3fec. AWSY only tests the tip of every m-i push by default, but I queued up tests of every revision in that range to see if it can narrow it down further.

Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eb01f41e077b0dab04766d93164dbe2596bd3fec&tochange=d1f978369c50ff398618c6ce04d1dc33d6883d81

[1] https://github.com/Nephyrin/MozAreWeSlimYet/issues/31
Before the commit, there were roughly 13 MB of images, but after, there were 18 MB. This tells me that we're not discarding them as opportunistically as we used to.
Actually, these are in "used" images - these images shouldn't be used, since their web pages have gone away. This also explains why after 30s the CC/GC make the problem go away; there's some sort of reference cycle here not letting us destroy the images.
Summary: Significant spike in https://areweslimyet.com/ → Significant spike in https://areweslimyet.com/ from multithreaded image decoding
Note: we're not multithreaded yet :)
Summary: Significant spike in https://areweslimyet.com/ from multithreaded image decoding → Significant spike in https://areweslimyet.com/ from image decoding refactor
(In reply to Joe Drew (:JOEDREW! \o/) from comment #14)
> Actually, these are in "used" images - these images shouldn't be used, since
> their web pages have gone away.

Except they haven't; see comment 4.
Right; what I meant was "tabs".
Attached patch fixSplinter Review
This is something that might help, based on what fixed bug 820602. I tried pushing it to try, but I guess try is blowing up into thousands of tiny pieces right now
Assignee: nobody → joe
Comment on attachment 727997 [details] [diff] [review]
fix

https://areweslimyet.com/?series=joe

The second point on the graph is the actual fix, this patch. The first was a stab in the dark and is probably wrong.

What this does is ensure that after every InitDecoder, we call FinishedSomeDecoding() to block onload in our observers. My theory is that we were letting onload fire while images were still decoding, which meant that after we closed all those tabs there were still objects busily working.

The ShutdownDecoder call that I removed was an artifact of earlier revisions of my patches, and unnecessary.
Attachment #727997 - Attachment description: maybe fix? → fix
Attachment #727997 - Flags: review?(seth)
I find it disturbing that image decoding tweaks can have such a large effect on memory consumption...
In this case it doesn't actually affect real-world memory consumption, just our measurement of it. This patch makes the images finish decoding before onload fires; without the patch, onload and image decoding are not guaranteed to be related.

Basically, this patch makes us measure the real-world result in AWSY.
(In reply to Joe Drew (:JOEDREW! \o/) from comment #21)
> In this case it doesn't actually affect real-world memory consumption, just
> our measurement of it. This patch makes the images finish decoding before
> onload fires; without the patch, onload and image decoding are not
> guaranteed to be related.
> 
> Basically, this patch makes us measure the real-world result in AWSY.

AWSY, waits ten seconds between each tab finishing loading and opening the next one, which should be plenty of time to finish any image decoding, shouldn't it?
Does it wait that same 10s before measuring when closing all tabs?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #24)
> Does it wait that same 10s before measuring when closing all tabs?

It does, you can see the basic logic here:
https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/mozmill_endurance_test/test1.js#L202
Comment on attachment 727997 [details] [diff] [review]
fix

Review of attachment 727997 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +2809,5 @@
>    if (!mDecoder) {
>      rv = InitDecoder(/* aDoSizeDecode = */ false);
> +    CONTAINER_ENSURE_SUCCESS(rv);
> +
> +    rv = FinishedSomeDecoding();

Hopefully this won't ever immediately shut down the decoder... I'd suggest adding an assert that it didn't.

@@ +2911,5 @@
>  
> +  rv = FinishedSomeDecoding();
> +  CONTAINER_ENSURE_SUCCESS(rv);
> +
> +  if (mDecoder) {

This is a great improvement. It was never clear to me why we did things the other way, since if we took that branch IsDecodedFinish() is true, and that means FinishedSomeDecoding() will shut down the decoder.
Attachment #727997 - Flags: review?(seth) → review+
https://hg.mozilla.org/mozilla-central/rev/e8d39b112885
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
AWSY confirms the fix.  Thanks for the quick response, Joe!

It's still not clear to me if this was a genuine problem or an artifact of how AWSY does its measurements.
Depends on: 932277
No longer depends on: 932277
You need to log in before you can comment on or make changes to this bug.