1528763 - 4.49 - 6.7% raptor-tp6-wikia-firefox / raptor-tp6-wikia-firefox fcp / raptor-tp6-wikia-firefox loadtime (linux64, linux64-qr) regression on push 4548b735febd065afacda0473f9185710e9e5895

Reporter

Description

•

5 years ago

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=79120d13e713114032263c0851455ec5f981d72c&tochange=4548b735febd065afacda0473f9185710e9e5895

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

7% raptor-tp6-wikia-firefox fcp linux64-qr opt 176.08 -> 187.88
6% raptor-tp6-wikia-firefox fcp linux64 opt 177.02 -> 186.79
5% raptor-tp6-wikia-firefox linux64 pgo 155.84 -> 164.01
5% raptor-tp6-wikia-firefox linux64 opt 184.34 -> 193.70
5% raptor-tp6-wikia-firefox loadtime linux64 pgo 174.79 -> 182.83
4% raptor-tp6-wikia-firefox loadtime linux64 opt 188.06 -> 196.50

Improvements:

3% raptor-tp6-slides-firefox fcp linux64 pgo 797.08 -> 773.38
3% raptor-tp6-slides-firefox linux64 pgo 931.02 -> 906.75
2% raptor-tp6-slides-firefox fcp linux64 opt 846.17 -> 826.83

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=19405

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Component: General → DOM

Product: Testing → Core

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

5 years ago

These are the Gecko profiles for raptor-tp6-wikia-firefox on Linux 64bit (OPT builds):

before: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FEjtbYor9TI6gSCHCZXAi4A%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_raptor-tp6-wikia-firefox.zip

after: https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.net%2Fv1%2Ftask%2FXZfTwzyJTSWUB-QQ031bYA%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_raptor-tp6-wikia-firefox.zip

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 2

•

5 years ago

Hmm. I had done talos try runs with just bug 1527905 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=64f1cbd052aadbac01f30067c688f70ae5934da8) and with both bug 1527905 and bug 1527742, plus two other patches on top (https://treeherder.mozilla.org/#/jobs?repo=try&revision=c835d307046155c11f91e05798ac50dd685579b8) and neither one seemed to show this problem...

I just took another look at the perfherder compares for those runs, at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=1&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=1&selectedTimeRange=172800 and these tests don't even show up there. I guess they're at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=10&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=10&selectedTimeRange=172800 respectively?

So a few questions to help guide my investigation and prevent this sort of thing in the future:

Is there some way to get a "all the performance changes from this push" view instead of having to go through every item in the dropdown on perfherder?
I don't even see "raptor-tp6-wikia-firefox fcp linux64" in the "raptor" section of those try pushes. Is that expected? I do see some other "fcp" tests and do see some other "wikia" tests.
I tried doing some try pushes with "-t raptor-tp6-e10s", but that does not seem to run the tests in question. How do I actually run them on try, so I can figure out whether bug 1527905 or bug 1527742 (or both) is relevant here? I can presumably do a "-t all --rebuild-talos whatever" push, but that seems like a waste of resources if I just want to run the set of tests relevant here.

Flags: needinfo?(igoldan)

Boris Zbarsky [:bzbarsky]

Comment 3

•

5 years ago

Oh, one other question: what do the "raptor-tp6-wikia-firefox" numbers mean? Are those milliseconds? Because the profiles are showing a much longer time period when the content process is busy (700+ms). Are there possibly profile markers at the start and end of the thing the test is measuring?

Boris Zbarsky [:bzbarsky]

Comment 4

•

5 years ago

And also, do we have some way to "diff" profiles?

Flags: needinfo?(mstange)

Ionuț Goldan [:igoldan]

Reporter

Comment 5

•

5 years ago

•

Edited

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #2)

Hmm. I had done talos try runs with just bug 1527905 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=64f1cbd052aadbac01f30067c688f70ae5934da8) and with both bug 1527905 and bug 1527742, plus two other patches on top (https://treeherder.mozilla.org/#/jobs?repo=try&revision=c835d307046155c11f91e05798ac50dd685579b8) and neither one seemed to show this problem...

The Try pushes solely only show the regression overview, via the raptor-tp6-wikia-firefox geomean.
fcp and loadtime are subtests of it and can only be seen in the PERFHERDER_DATA artifact log, in the subtests section.

I just took another look at the perfherder compares for those runs, at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=1&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=1&selectedTimeRange=172800 and these tests don't even show up there. I guess they're at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=10&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=10&selectedTimeRange=172800 respectively?

You are right: these are Raptor regressions and to visualize them, you need to select the raptor item from the left dropdown. To see the subtests, just click on the subtests hyperlink next to the raptor-tp6-wikia-firefox geomean.
For example, these are the subtests for raptor-tp6-wikia-firefox on Linux 64bit OPT builds.

Ionuț Goldan [:igoldan]

Reporter

Comment 6

•

5 years ago

•

Edited

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #2)

So a few questions to help guide my investigation and prevent this sort of thing in the future:

Is there some way to get a "all the performance changes from this push" view instead of having to go through every item in the dropdown on perfherder?

There's no other Perfherder-way at the moment. I guess this is a feature request. But there may be an alternative until then, using ActiveData recipes. :ekyle, could you provide more details?

I don't even see "raptor-tp6-wikia-firefox fcp linux64" in the "raptor" section of those try pushes. Is that expected? I do see some other "fcp" tests and do see some other "wikia" tests.

Yes, that's expected. I explained this in comment 5. This is already a feature we want to add in, as other developers requested it already.

I tried doing some try pushes with "-t raptor-tp6-e10s", but that does not seem to run the tests in question. How do I actually run them on try, so I can figure out whether bug 1527905 or bug 1527742 (or both) is relevant here? I can presumably do a "-t all --rebuild-talos whatever" push, but that seems like a waste of resources if I just want to run the set of tests relevant here.

I always use mach try fuzzy --no-artifact and look up tests based on "raptor", platform and build type.
This command line tool is pretty cool. It allows you to save/list/reuse presets, so you don't have to search the same thing all over again for multiple pushes.

Flags: needinfo?(igoldan) → needinfo?(klahnakoski)

Markus Stange [:mstange]

Comment 7

•

5 years ago

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)

And also, do we have some way to "diff" profiles?

We have a way to import two threads from two profiles into one combined profile, but it doesn't do any diffing. This "side-by-side" view can be accessed by sharing the profiles you want to compare, selecting the threads and the ranges you care about in each profile, and then pasting the full URLs of each view into https://profiler.firefox.com/compare/ .

For example, I opened the two profiles from comment 1, went to ..._pagecycle_2.profile in each, went to the marker chart of the content process main thread, found the TTFI marker, double clicked on it, zoomed in on the range, uploaded each profile, and then pasted their URLs to get this:

https://profiler.firefox.com/compare/network-chart/?globalTrackOrder=0-1&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2Fe52144f733da4ddda115bfb57e418be09d8b7621%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_2.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0078_2.4443%26thread%3D5%26v%3D3&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2F6a76d8510058ce2a2ba8f9495dfb2d0069bcffee%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_3.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0088_2.4910%26thread%3D5%26v%3D3&thread=0&v=3

If you switch between the two network tracks, you can see that some loads started later in the slower profile.

We also have a prototype of a page load execution ordering compare view in https://github.com/firefox-devtools/profiler/pull/1702 . Here's that view for the profile above:

https://deploy-preview-1702--perf-html.netlify.com/compare/compare-profiles/?globalTrackOrder=0-1&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2Fe52144f733da4ddda115bfb57e418be09d8b7621%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_2.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0078_2.4443%26thread%3D5%26v%3D3&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2F6a76d8510058ce2a2ba8f9495dfb2d0069bcffee%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_3.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0088_2.4910%26thread%3D5%26v%3D3&thread=1&v=3

We do not have a call tree diff yet. The tracker issue for profile comparison is https://github.com/firefox-devtools/profiler/issues/470 .

Flags: needinfo?(mstange)

Kyle Lahnakoski [:ekyle]

Comment 8

•

5 years ago

ActiveData is usually behind taskcluster by an hour; perfherder data has less latency.

ActiveData Recipes has no recipe that will show all regressions between two revisions, but there is a recipe that will show just-one subtest across all known suites (for TP6 only). This recipe will work across all branches.

Here is an example, comparing FCP from the regressed revision (abe81) with one from before (1820). It looks like 1820 did not have all tests run, so maybe a better one should have been chosen.

 python.exe adr/cli.py perf_tp6_compare --r1 1820 --r2 abe81 --test fcp


 ┌────────────────────────────────┬─────────┬──────────────┬──────────────┬────────┐
 │ Test                           │ Subtest │ 182057bee133 │ abe819688f32 │ Change │
 ├────────────────────────────────┼─────────┼──────────────┼──────────────┼────────┤
 │ raptor-tp6-amazon-firefox      │ fcp     │ None         │ 317          │        │
 │ raptor-tp6-apple-firefox       │ fcp     │ 295          │ 356          │ -21%   │
 │ raptor-tp6-bing-firefox        │ fcp     │ None         │ 99           │        │
 │ raptor-tp6-docs-firefox        │ fcp     │ 1106         │ 1160         │ -5%    │
 │ raptor-tp6-facebook-firefox    │ fcp     │ None         │ 362          │        │
 │ raptor-tp6-google-firefox      │ fcp     │ None         │ 150          │        │
 │ raptor-tp6-imdb-firefox        │ fcp     │ 174          │ 182          │ -4%    │
 │ raptor-tp6-imgur-firefox       │ fcp     │ 271          │ 301          │ -11%   │
 │ raptor-tp6-microsoft-firefox   │ fcp     │ 479          │ 651          │ -36%   │
 │ raptor-tp6-reddit-firefox      │ fcp     │ None         │ 384          │        │
 │ raptor-tp6-sheets-firefox      │ fcp     │ 272          │ 321          │ -18%   │
 │ raptor-tp6-slides-firefox      │ fcp     │ 775          │ 875          │ -13%   │
 │ raptor-tp6-wikia-firefox       │ fcp     │ 173          │ 188          │ -8%    │
 │ raptor-tp6-yahoo-news-firefox  │ fcp     │ None         │ 321          │        │
 │ raptor-tp6-yandex-firefox      │ fcp     │ None         │ 110          │        │
 │ raptor-tp6-youtube-firefox     │ fcp     │ None         │ 363          │        │
 │ raptor-tp6m-amazon-geckoview   │ fcp     │ None         │ 887          │        │
 │ raptor-tp6m-facebook-geckoview │ fcp     │ None         │ 1717         │        │
 │ raptor-tp6m-google-geckoview   │ fcp     │ None         │ 322          │        │
 │ raptor-tp6m-youtube-geckoview  │ fcp     │ None         │ 346          │        │
 └────────────────────────────────┴─────────┴──────────────┴──────────────┴────────┘

Here is an email on ADR:

All,

We have an ActiveData Recipe (ADR) that accepts two revisions and will
return the subtest values across all suites.

The active-datat-recipe project is here:
https://github.com/mozilla/active-data-recipes

You should clone the repo, instead of using pip, so you always have the
best version:
git clone https://github.com/mozilla/active-data-recipes
cd active-data-recipes
python setup.py develop

Be sure to use python3.6, or later!
c:\python36\python.exe setup.py develop

The setup installs the ADR CLI:
c:\python36\Scripts\adr.exe

You can list all the recipes:
c:\python36\Scripts\adr.exe -l

We are interested in only perf_tp6_compare recipe:
c:\python36\Scripts\adr.exe perf_tp6_compare -h

Here is an example run, we suspect some facebook breakage between these
two revisions:
c:\python36\Scripts\adr.exe perf_tp6_compare --r1 9e919 --r2 0c6d

Have a good day!

Flags: needinfo?(klahnakoski)

Boris Zbarsky [:bzbarsky]

Comment 9

•

5 years ago

There's no other Perfherder-way at the moment.

OK. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1529090 on that.

If you switch between the two network tracks, you can see that some loads started later in the slower profile.

Markus, thank you! Unfortunately, I see nothing in the profile that would tell me anything about the "why".

Here is an example, comparing FCP from the regressed revision (abe81) with one from before (1820)

Hmm. This is showing things like a change in raptor-tp6-apple-firefox that weren't there in the original report here...

Boris Zbarsky [:bzbarsky]

Comment 10

•

5 years ago

•

Edited

Anyway, I tried doing a few try pushes to try to tease out what's going on here.

The relevant perfherder compares should be:

The results don't match up to comment 0 all that well. For example, there are no particular changes in "tp6-slides" but there is a big win from bug 1527905 in "tp6-google-mail" which isn't mentioned in comment 0. The same win shows up in the "both bugs" compare too.

That said, the wikia regression does seem to show up. I assume "fcp" is "first contentful paint"? What's "dcf"?

Do you happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

Flags: needinfo?(igoldan)

Boris Zbarsky [:bzbarsky]

Comment 11

•

5 years ago

I guess one question is whether we basically shifted some GC activity around or something, with the wins on some tests and losses on others...

Flags: needinfo?(bzbarsky)

Ionuț Goldan [:igoldan]

Reporter

Comment 12

•

5 years ago

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #10)

That said, the wikia regression does seem to show up. I assume "fcp" is "first contentful paint"? What's "dcf"?

"dcf" stands for dom-content-flushed, according to the docs.

Do yuo happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

Robert, could you provide Boris with a summary on this? Maybe even add it to the Raptor wiki page.

Flags: needinfo?(igoldan) → needinfo?(rwood)

Robert Wood [:rwood]

Comment 13

•

5 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #12)

Do yuo happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

For Raptor page-load tests each page is loaded 25 times in a row, then the browser is shutdown and restarted (with a new profile) before the next test page is started.

https://wiki.mozilla.org/Performance_sheriffing/Raptor#Page-Load_Tests

Flags: needinfo?(rwood)

Kyle Lahnakoski [:ekyle]

Comment 14

•

5 years ago

...and the median of the 25 loads is showing in the reports

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

5 years ago

Priority: -- → P2

Boris Zbarsky [:bzbarsky]

Comment 15

•

5 years ago

I haven't made any progress on figuring out why using zone-per-load regresses wikia specifically. That said, given that it improves some other tests, and lays the groundwork for bug 1523843, which is a significant performance improvement on various workloads we care about, I am tempted to just write off the regression here...

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 16

•

5 years ago

This regression appears to have held. Is there someone else we could bring in to help understand the cause of this regression?

https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=autoland,1869497,1,10&series=mozilla-inbound,1870761,1,10&series=mozilla-central,1870426,1,10&series=mozilla-beta,1885685,1,10&selected=mozilla-beta,1885685,435618,727885811,10

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 17

•

5 years ago

I don't know who, offhand... But see comment 11 for a likely explanation.

Flags: needinfo?(bzbarsky)

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Keywords: stalled

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: stalled