1664807 - 10.19 - 11.83% about_newtab_with_snippets responsiveness (linux64-shippable) regression on push cd82ed10e8e7c0fe36c13c387ef7c137127351f5 (Wed September 9 2020)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Description

•

4 years ago

Perfherder has detected a talos performance regression from push cd82ed10e8e7c0fe36c13c387ef7c137127351f5. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

12% about_newtab_with_snippets responsiveness linux64-shippable opt e10s stylo 0.15 -> 0.17
10% about_newtab_with_snippets responsiveness linux64-shippable opt e10s stylo 0.15 -> 0.17

Improvements:

55% about_newtab_with_snippets linux64-shippable-qr opt e10s stylo 81.38 -> 36.24
53% about_newtab_with_snippets linux64-shippable opt e10s stylo 76.58 -> 35.79
50% about_newtab_with_snippets windows10-64-shippable-qr opt e10s stylo 74.14 -> 36.89
50% about_newtab_with_snippets macosx1014-64-shippable opt e10s stylo 79.65 -> 39.70
50% about_newtab_with_snippets windows7-32-shippable opt e10s stylo 68.70 -> 34.40
48% about_newtab_with_snippets windows10-64-shippable opt e10s stylo 73.68 -> 38.61

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(ckerschb)

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Updated

•

4 years ago

Component: Performance → Security: CAPS

Product: Testing → Core

Alexandru Ionescu (needinfo me) [:alexandrui]

Reporter

Comment 1

•

4 years ago

== Change summary for alert #26928 (as of Mon, 14 Sep 2020 11:08:55 GMT) ==

Regressions:

3% Images linux1804-64-shippable opt 5,410,321.11 -> 5,587,528.66
3% Images linux1804-64-shippable-qr opt tp6 8,339,721.97 -> 8,566,832.43
2% Images linux1804-64-shippable-qr opt tp6 8,385,428.69 -> 8,564,065.55

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=26928

Christoph Kerschbaumer [:ckerschb]

Comment 2

•

4 years ago

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #1)

Regressions:
3% Images linux1804-64-shippable opt 5,410,321.11 -> 5,587,528.66
3% Images linux1804-64-shippable-qr opt tp6 8,339,721.97 -> 8,566,832.43
2% Images linux1804-64-shippable-qr opt tp6 8,385,428.69 -> 8,564,065.55

Huh, I really don't know how to tackle that. Interesting is also that our patch improved performance in a lot of cases (see comment 0). I don't know what an acceptable trade off is here.

Zombie, Mike, Kate, any suggestions on how to move forward?

Flags: needinfo?(tomica)

Flags: needinfo?(mconley)

Flags: needinfo?(khudson)

Flags: needinfo?(ckerschb)

Mike Conley (:mconley) (:⚙️)

Comment 3

•

4 years ago

Hey davehunt, can you remind me again what Images on tp6 measures? Is this the number of bytes of image data loaded or something?

Flags: needinfo?(mconley) → needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 4

•

4 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #3)

Hey davehunt, can you remind me again what Images on tp6 measures? Is this the number of bytes of image data loaded or something?

According the the AWSY docs it's a subset of the "explicit" measurement that focuses on memory used to render images.

Flags: needinfo?(dave.hunt)

Mike Conley (:mconley) (:⚙️)

Comment 5

•

4 years ago

Oh, thanks! I'd forgotten about those docs.

This... confuses me. I have no idea how switching from resource:// to chrome:// would cause us to use more memory on images.

ckerschb, is chrome:// somehow more aggressive at loading than resource://? I thought they were equivalent, but could it be that chrome:// images load into memory with higher priority or something?

Flags: needinfo?(ckerschb)

Christoph Kerschbaumer [:ckerschb]

Comment 6

•

4 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #5)

ckerschb, is chrome:// somehow more aggressive at loading than resource://? I thought they were equivalent, but could it be that chrome:// images load into memory with higher priority or something?

I really don't know. I also thought we could more or less use them interchangeably from a load time point of view. Gijs is usually my go-to-person when it comes to our packaging system. Gijs, any thoughts?

Flags: needinfo?(ckerschb) → needinfo?(gijskruitbosch+bugs)

:Gijs (out for now; he/him)

Comment 7

•

4 years ago

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #6)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #5)

ckerschb, is chrome:// somehow more aggressive at loading than resource://? I thought they were equivalent, but could it be that chrome:// images load into memory with higher priority or something?

I really don't know. I also thought we could more or less use them interchangeably from a load time point of view. Gijs is usually my go-to-person when it comes to our packaging system. Gijs, any thoughts?

This would be highly surprising.

It seems like the obvious way to try and figure this out would be to load the memory dumps that are available as artifacts for the tests into about:memory and figure out where the additional memory has gone. So, for instance, https://treeherder.mozilla.org/perf.html#/comparesubtest?framework=4&originalProject=autoland&originalSignature=2242047&newProject=autoland&newSignature=2242047&originalRevision=93f3b8aa5bc50a2e4d4e61c7ba0e839b681518a2&newRevision=614a90725debb09363f79ca62a20cab9e3a26fa4 (accessible via the "subtests" link on one of the reassigned regression links) shows that we're 20% (400kb) worse off at the Tabs closed [+30s, forced GC] stage.

So the "before" job is https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=93f3b8aa5bc50a2e4d4e61c7ba0e839b681518a2&searchStr=awsy%2Clinux&selectedTaskRun=MVvLWPhXQr6Xtx_hDgvLAQ.0, the "after" one is https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=614a90725debb09363f79ca62a20cab9e3a26fa4&searchStr=awsy%2Clinux&selectedTaskRun=Ct43tZUhQwa5MMtOPvcYCw.0 .

There are 3 memory reports each,

before: 1, 2, 3
after: 1, 2, 3

You should be able to load these pairwise into about:memory to do a "diff" to see what is going on (or, failing that, load them individually into separate tabs and compare manually; I dunno how flexible about:memory is when comparing diffs from different processes). I'd expect the images section to reveal which images are now loaded that weren't before (or something like that). Hopefully that can help pinpoint what part of the patch is causing issues.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(ckerschb)

:Gijs (out for now; he/him)

Comment 8

•

4 years ago

(Oh, and my other suggestion would be asking :tn or :aosmond to see if they have ideas about how this change could have impacted images)

Christoph Kerschbaumer [:ckerschb]

Comment 9

•

4 years ago

Thanks for your input Gijs - I'll try to tackle those perf regressions obviously.

(In reply to :Gijs (he/him) from comment #8)

(Oh, and my other suggestion would be asking :tn or :aosmond to see if they have ideas about how this change could have impacted images)

In the mean time I'll also flag :tn and :aosmond if they have any input on why switching from resource:// to chrome:// would cause us to use more memory on images.

Flags: needinfo?(tnikkel)

Flags: needinfo?(ckerschb)

Flags: needinfo?(aosmond)

Christoph Kerschbaumer [:ckerschb]

Comment 10

•

4 years ago

Attached image activity_stream_1.png — Details

When comparing the two pushes (before and after 1) using about:memory I found that four images show up (see screenshot):

chrome://activity-stream/content/data/content/assets/glyph-highlights-16.svg
chrome://activity-stream/content/data/content/assets/glyph-topsites-16.svg
chrome://activity-stream/content/data/content/assets/glyph-search-16.svg
chrome://activity-stream/content/data/content/assets/glyph-arrowhead-down-12.svg

Those are all *.svgs - but I don't know what that tells us about the regression.

Timothy Nikkel (:tnikkel)

Comment 11

•

4 years ago

Images that are loaded from chrome:// use their own image cache, separate from all other images. This could definitely have an effect on memory usage. For example, if the chrome image cache is well below capacity and the non-chrome image cache is always at capacity this change might make both image caches operate close to capacity.

There are some other places that we look at the scheme, looking specifically for chrome and resource in /image ( https://searchfox.org/mozilla-central/search?q=schemeis&path=image&case=false&regexp=false ) but that's the one that seems most likely to be involved here.

Flags: needinfo?(tnikkel)

:Gijs (out for now; he/him)

Comment 12

•

4 years ago

Huh, OK. So chrome images are cached and resource: ones are... cached differently? (doublechecking this with :tn)

TBH, cutting loading times of about:home and about:newtab pretty much in half for a 2-3% memory increase on Linux (unsure what the effect was on other OSes, but I'm guessing smaller, so it didn't get us alerts) seems like a pretty good deal, so I'm inclined to say we just accept this change and move on... It's a little surprising that the URL change would have this effect, but if we wanted to investigate that further IMHO we could do that under less pressure than "hey please fix this perf regression" - it's possible we could have wins elsewhere if there are still places left where we use resource URIs for images.

Mike, would you agree?

Flags: needinfo?(tomica)

Flags: needinfo?(tnikkel)

Flags: needinfo?(mconley)

Flags: needinfo?(khudson)

Flags: needinfo?(aosmond)

Timothy Nikkel (:tnikkel)

Comment 13

•

4 years ago

(In reply to :Gijs (he/him) from comment #12)

Huh, OK. So chrome images are cached and resource: ones are... cached differently? (doublechecking this with :tn)

There are just two caches. The two caches operate under almost exactly the same rules, it's just that chrome uris go into one, and everything else goes into the other. Things like when we purge entries from the cache are dependant on how many other things are in the cache, so loading the exact same image from chrome:// or resource:// at the exact same time could have different behaviour.

Flags: needinfo?(tnikkel)

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

4 years ago

Set release status flags based on info from the regressing bug 1654258

status-firefox80: --- → unaffected

status-firefox81: --- → unaffected

status-firefox-esr68: --- → unaffected

status-firefox-esr78: --- → unaffected

Jim Mathies [:jimm]

Comment 15

•

4 years ago

Sounds like this is expected based on the changes we made. We need a call on whether we want to live with this or back out the changes.

Flags: needinfo?(ckerschb)

Mike Conley (:mconley) (:⚙️)

Comment 16

•

4 years ago

(In reply to :Gijs (he/him) from comment #12)

TBH, cutting loading times of about:home and about:newtab pretty much in half for a 2-3% memory increase on Linux (unsure what the effect was on other OSes, but I'm guessing smaller, so it didn't get us alerts) seems like a pretty good deal, so I'm inclined to say we just accept this change and move on...

I concur. I think we're buying a pretty nice speed improvement for the (supposedly Linux-only) memory hit. Let's take it.

Status: NEW → RESOLVED

Closed: 4 years ago

Flags: needinfo?(mconley)

Flags: needinfo?(ckerschb)

Resolution: --- → WONTFIX

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes