1548837 - 14.69 - 30.82% raptor-tp6-docs-firefox / raptor-tp6-docs-firefox fcp (osx-10-10-shippable) regression on push f5f8345a5c873fc99459f53d2eb14247cbc5473a (Thu May 2 2019)

Reporter

Description

•

5 years ago

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=d9a454afe9bf4e6299021d96948f135ecbc827ca&tochange=c477a96a488995459baab0948a10cb7231d7cbe3

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

31% raptor-tp6-docs-firefox fcp osx-10-10-shippable opt 1,373.96 -> 1,797.38
15% raptor-tp6-docs-firefox osx-10-10-shippable opt 1,387.57 -> 1,591.39

Improvements:

27% raptor-tp6-microsoft-firefox fcp linux64-shippable opt 26% raptor-tp6-microsoft-firefox fcp linux64-shippable-qr opt 26% raptor-tp6-microsoft-firefox fcp windows7-32-shippable opt 24% raptor-tp6-microsoft-firefox fcp windows10-64-shippable opt 23% raptor-tp6-microsoft-firefox fcp windows10-64-shippable-qr opt 21% raptor-tp6-microsoft-firefox fcp osx-10-10-shippable opt 20% raptor-tp6-yandex-firefox loadtime linux64-shippable-qr opt 19% raptor-tp6-microsoft-firefox linux64-shippable-qr opt 19% raptor-tp6-microsoft-firefox linux64-shippable opt 19% raptor-tp6-yandex-firefox loadtime linux64-shippable opt 18% raptor-tp6-yandex-firefox fcp windows10-64-shippable-qr opt 17% raptor-tp6-imdb-firefox osx-10-10-shippable opt 17% raptor-tp6-yandex-firefox windows7-32-shippable opt 16% raptor-tp6-yandex-firefox linux64-shippable-qr opt 14% raptor-tp6-yandex-firefox windows10-64-shippable opt 13% raptor-tp6-yandex-firefox linux64-shippable opt 13% raptor-tp6-yandex-firefox osx-10-10-shippable opt 10% raptor-tp6-imdb-firefox fcp linux64-shippable-qr opt 9% raptor-tp6-imdb-firefox fcp linux64-shippable opt 5% raptor-tp6-sheets-firefox loadtime linux64-shippable-qr opt 4% raptor-tp6-sheets-firefox loadtime linux64-shippable opt 481.75 -> 353.62
513.71 -> 381.62
542.71 -> 404.25
526.62 -> 402.88
546.04 -> 421.88
1,247.02 -> 982.75
287.08 -> 229.62
1,024.67 -> 826.61
975.46 -> 792.91
280.33 -> 228.33
90.08 -> 74.25
326.45 -> 271.86
122.80 -> 102.44
129.49 -> 108.66
122.18 -> 104.63
130.66 -> 113.22
252.07 -> 219.32
167.67 -> 150.75
165.62 -> 151.08
753.46 -> 714.83
729.25 -> 699.62

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20773

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Cristina Coroiu [:ccoroiu]

Reporter

Updated

•

5 years ago

Component: General → ImageLib

Flags: needinfo?(igoldan)

Flags: needinfo?(ehsan)

Product: Testing → Core

(no longer active)

Comment 1

•

5 years ago

The only reason why I could imagine OS X could regress while other platforms don't is that the patch is making us do more work on that platform. This is because the raptor-tp6-docs-firefox test just measures the load time of this page: https://docs.google.com/document/d/1US-07msg12slQtI_xchzYxcKlTs6Fp7WqIc6W5GK5M8/edit (see https://searchfox.org/mozilla-central/rev/b2015fdd464f598d645342614593d4ebda922d95/testing/raptor/raptor/tests/raptor-tp6-2.ini#21). If you load that page and watch the network connections it makes under devtools you'll see there are no third-party tracking resources found there.

mTopLevelBaseDomain should be google.com for the image subresources of the main doc and empty for the rest. I doubt the cost of copying the string objects would be the cause...

This OSX only caller of imgILoader::LoadImage() looks interesting: https://searchfox.org/mozilla-central/rev/e7d9a8749303b39dadcc0e18ea0d60a570a68145/widget/cocoa/nsMenuItemIconX.mm#235. I doubt we have a ton of menu items with icons but maybe the cost of hashing is the root cause? I think I may need to do some try pushes.

Does anyone know how I can get a talos profile for these regressions? (I don't have access to a Mac machine myself.) Thanks!

Flags: needinfo?(ehsan)

Ionuț Goldan [:igoldan]

Comment 2

•

5 years ago

(In reply to Ehsan Akhgari (:ehsan) from comment #1)

Does anyone know how I can get a talos profile for these regressions? (I don't have access to a Mac machine myself.) Thanks!

I suppose you mean Gecko profiles? If yes, I can share that.

Flags: needinfo?(igoldan) → needinfo?(ehsan)

(no longer active)

Comment 3

•

5 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #2)

(In reply to Ehsan Akhgari (:ehsan) from comment #1)

Does anyone know how I can get a talos profile for these regressions? (I don't have access to a Mac machine myself.) Thanks!

I suppose you mean Gecko profiles? If yes, I can share that.

Yes, that would be great. Thank you!

Flags: needinfo?(ehsan)

(no longer active)

Comment 4

•

5 years ago

Having some Gecko profiles from Linux and Windows raptor runs which did not show a regression when this landed would be extremely helpful for comparison too!

Ionuț Goldan [:igoldan]

Comment 5

•

5 years ago

(In reply to Ehsan Akhgari (:ehsan) from comment #4)

Having some Gecko profiles from Linux and Windows raptor runs which did not show a regression when this landed would be extremely helpful for comparison too!

These are the Gecko profiles for the raptor-tp6-docs-firefox regressions from OSX:

before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FFpEGIqppTneDxAjN3xAS9w%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_raptor-tp6-docs-firefox.zip

after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FXLwteq9GSVCRpUSicJNtWw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_raptor-tp6-docs-firefox.zip

(no longer active)

Comment 6

•

5 years ago

Thanks, Ionuț.

Discoveries so far: this patch seems to win back the regression that I caused: https://hg.mozilla.org/try/rev/2dc297791558bb4b83de7b07dfae80a78a489cce. Reverting no other changes from the original patch has any performance impact. Specifically that means changes to cache key computation or cache key comparison were not at fault. So the only possible explanation for why the above change makes any difference is if this code has some positive impact on OSX: https://searchfox.org/mozilla-central/rev/99a2a5a955960b0e58ceade1db1f7652d9db4ba1/image/imgLoader.cpp#1461 which would only make sense if clearing the image cache in between page reloads more often helps make things faster on OSX...

I'm going to push a couple of patches to try to test these hypotheses.

(no longer active)

Comment 7

•

5 years ago

Breaking the image cache intentionally (https://hg.mozilla.org/try/rev/1ec3ab54dd5633a5814ac474ca57d8f8792800b3) wins back the performance regression https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fff2f680db9f1f98c5232153c2e0a8d6db52de96&newProject=try&newRevision=2b0bcf9c9b24931aa81691aa56f83df8c3d0299a&framework=10.

So I think this confirms that the reason behind this regression on OSX is that my patch in bug 1548349 made Firefox better at using its image cache, and as a result now it has worse performance on this test.

Andrew, what's your recommendation on how to deal with this regression?

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Comment 8

•

5 years ago

I believe whatever is causing the regression was already present and unrelated to the patches in bug 1548349. Since we are hitting the cache more frequently, we will have changed the event ordering from the imgRequestProxy notifications (e.g. they come in sooner than they would have otherwise). This would be my best guess as to why the benchmark changed. However this would mean it is a bug already present and sensitive to timing, which makes it challenging to investigate. However that also means that real world performance is unlikely to be affected, so barring reports to the contrary from users, I would recommend we leave this for now, and keep an eye on any related future regressions.

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Updated

•

5 years ago

Priority: -- → P3

(no longer active)

Comment 9

•

5 years ago

Thanks!

Not sure if we should leave the bug open or not, but I'm not planning on doing more here.

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Keywords: backlog-deferred

Florin Strugariu [:Bebe]

Updated

•

4 years ago

Keywords: perf-alert

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Has Regression Range: --- → yes

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

14.69 - 30.82% raptor-tp6-docs-firefox / raptor-tp6-docs-firefox fcp (osx-10-10-shippable) regression on push f5f8345a5c873fc99459f53d2eb14247cbc5473a (Thu May 2 2019)

Categories

(Core :: Graphics: ImageLib, defect, P3)

Tracking

()

People

(Reporter: ccoroiu, Unassigned)

References

(Regression)

Details

(4 keywords)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Updated

Updated

Updated

Updated