Closed Bug 1528763 Opened 3 years ago Closed 3 years ago

4.49 - 6.7% raptor-tp6-wikia-firefox / raptor-tp6-wikia-firefox fcp / raptor-tp6-wikia-firefox loadtime (linux64, linux64-qr) regression on push 4548b735febd065afacda0473f9185710e9e5895

Categories

(Core :: DOM: Core & HTML, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=79120d13e713114032263c0851455ec5f981d72c&tochange=4548b735febd065afacda0473f9185710e9e5895

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

7% raptor-tp6-wikia-firefox fcp linux64-qr opt 176.08 -> 187.88
6% raptor-tp6-wikia-firefox fcp linux64 opt 177.02 -> 186.79
5% raptor-tp6-wikia-firefox linux64 pgo 155.84 -> 164.01
5% raptor-tp6-wikia-firefox linux64 opt 184.34 -> 193.70
5% raptor-tp6-wikia-firefox loadtime linux64 pgo 174.79 -> 182.83
4% raptor-tp6-wikia-firefox loadtime linux64 opt 188.06 -> 196.50

Improvements:

3% raptor-tp6-slides-firefox fcp linux64 pgo 797.08 -> 773.38
3% raptor-tp6-slides-firefox linux64 pgo 931.02 -> 906.75
2% raptor-tp6-slides-firefox fcp linux64 opt 846.17 -> 826.83

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=19405

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/Performance_sheriffing/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Component: General → DOM
Product: Testing → Core
Flags: needinfo?(bzbarsky)

Hmm. I had done talos try runs with just bug 1527905 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=64f1cbd052aadbac01f30067c688f70ae5934da8) and with both bug 1527905 and bug 1527742, plus two other patches on top (https://treeherder.mozilla.org/#/jobs?repo=try&revision=c835d307046155c11f91e05798ac50dd685579b8) and neither one seemed to show this problem...

I just took another look at the perfherder compares for those runs, at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=1&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=1&selectedTimeRange=172800 and these tests don't even show up there. I guess they're at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=10&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=10&selectedTimeRange=172800 respectively?

So a few questions to help guide my investigation and prevent this sort of thing in the future:

  1. Is there some way to get a "all the performance changes from this push" view instead of having to go through every item in the dropdown on perfherder?

  2. I don't even see "raptor-tp6-wikia-firefox fcp linux64" in the "raptor" section of those try pushes. Is that expected? I do see some other "fcp" tests and do see some other "wikia" tests.

  3. I tried doing some try pushes with "-t raptor-tp6-e10s", but that does not seem to run the tests in question. How do I actually run them on try, so I can figure out whether bug 1527905 or bug 1527742 (or both) is relevant here? I can presumably do a "-t all --rebuild-talos whatever" push, but that seems like a waste of resources if I just want to run the set of tests relevant here.

Flags: needinfo?(igoldan)

Oh, one other question: what do the "raptor-tp6-wikia-firefox" numbers mean? Are those milliseconds? Because the profiles are showing a much longer time period when the content process is busy (700+ms). Are there possibly profile markers at the start and end of the thing the test is measuring?

And also, do we have some way to "diff" profiles?

Flags: needinfo?(mstange)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #2)

Hmm. I had done talos try runs with just bug 1527905 (https://treeherder.mozilla.org/#/jobs?repo=try&revision=64f1cbd052aadbac01f30067c688f70ae5934da8) and with both bug 1527905 and bug 1527742, plus two other patches on top (https://treeherder.mozilla.org/#/jobs?repo=try&revision=c835d307046155c11f91e05798ac50dd685579b8) and neither one seemed to show this problem...

The Try pushes solely only show the regression overview, via the raptor-tp6-wikia-firefox geomean.
fcp and loadtime are subtests of it and can only be seen in the PERFHERDER_DATA artifact log, in the subtests section.

I just took another look at the perfherder compares for those runs, at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=1&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=1&selectedTimeRange=172800 and these tests don't even show up there. I guess they're at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=a37037f00a30dec768c9d6e731f16c017a2bb861&framework=10&selectedTimeRange=172800 and https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c835d307046155c11f91e05798ac50dd685579b8&framework=10&selectedTimeRange=172800 respectively?

You are right: these are Raptor regressions and to visualize them, you need to select the raptor item from the left dropdown. To see the subtests, just click on the subtests hyperlink next to the raptor-tp6-wikia-firefox geomean.
For example, these are the subtests for raptor-tp6-wikia-firefox on Linux 64bit OPT builds.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #2)

So a few questions to help guide my investigation and prevent this sort of thing in the future:

  1. Is there some way to get a "all the performance changes from this push" view instead of having to go through every item in the dropdown on perfherder?

There's no other Perfherder-way at the moment. I guess this is a feature request. But there may be an alternative until then, using ActiveData recipes. :ekyle, could you provide more details?

  1. I don't even see "raptor-tp6-wikia-firefox fcp linux64" in the "raptor" section of those try pushes. Is that expected? I do see some other "fcp" tests and do see some other "wikia" tests.

Yes, that's expected. I explained this in comment 5. This is already a feature we want to add in, as other developers requested it already.

  1. I tried doing some try pushes with "-t raptor-tp6-e10s", but that does not seem to run the tests in question. How do I actually run them on try, so I can figure out whether bug 1527905 or bug 1527742 (or both) is relevant here? I can presumably do a "-t all --rebuild-talos whatever" push, but that seems like a waste of resources if I just want to run the set of tests relevant here.

I always use mach try fuzzy --no-artifact and look up tests based on "raptor", platform and build type.
This command line tool is pretty cool. It allows you to save/list/reuse presets, so you don't have to search the same thing all over again for multiple pushes.

Flags: needinfo?(igoldan) → needinfo?(klahnakoski)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)

And also, do we have some way to "diff" profiles?

We have a way to import two threads from two profiles into one combined profile, but it doesn't do any diffing. This "side-by-side" view can be accessed by sharing the profiles you want to compare, selecting the threads and the ranges you care about in each profile, and then pasting the full URLs of each view into https://profiler.firefox.com/compare/ .

For example, I opened the two profiles from comment 1, went to ..._pagecycle_2.profile in each, went to the marker chart of the content process main thread, found the TTFI marker, double clicked on it, zoomed in on the range, uploaded each profile, and then pasted their URLs to get this:

https://profiler.firefox.com/compare/network-chart/?globalTrackOrder=0-1&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2Fe52144f733da4ddda115bfb57e418be09d8b7621%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_2.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0078_2.4443%26thread%3D5%26v%3D3&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2F6a76d8510058ce2a2ba8f9495dfb2d0069bcffee%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_3.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0088_2.4910%26thread%3D5%26v%3D3&thread=0&v=3

If you switch between the two network tracks, you can see that some loads started later in the slower profile.

We also have a prototype of a page load execution ordering compare view in https://github.com/firefox-devtools/profiler/pull/1702 . Here's that view for the profile above:

https://deploy-preview-1702--perf-html.netlify.com/compare/compare-profiles/?globalTrackOrder=0-1&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2Fe52144f733da4ddda115bfb57e418be09d8b7621%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_2.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0078_2.4443%26thread%3D5%26v%3D3&profiles[]=https%3A%2F%2Fperf-html.io%2Fpublic%2F6a76d8510058ce2a2ba8f9495dfb2d0069bcffee%2Fmarker-chart%2F%3Ffile%3Dprofile_raptor-tp6-wikia-firefox%252Fraptor-tp6-wikia-firefox_pagecycle_3.profile%26globalTrackOrder%3D0-1-2-3-4%26hiddenGlobalTracks%3D1-2-3%26range%3D2.0088_2.4910%26thread%3D5%26v%3D3&thread=1&v=3

We do not have a call tree diff yet. The tracker issue for profile comparison is https://github.com/firefox-devtools/profiler/issues/470 .

Flags: needinfo?(mstange)

ActiveData is usually behind taskcluster by an hour; perfherder data has less latency.

ActiveData Recipes has no recipe that will show all regressions between two revisions, but there is a recipe that will show just-one subtest across all known suites (for TP6 only). This recipe will work across all branches.

Here is an example, comparing FCP from the regressed revision (abe81) with one from before (1820). It looks like 1820 did not have all tests run, so maybe a better one should have been chosen.

 python.exe adr/cli.py perf_tp6_compare --r1 1820 --r2 abe81 --test fcp


 ┌────────────────────────────────┬─────────┬──────────────┬──────────────┬────────┐
 │ Test                           │ Subtest │ 182057bee133 │ abe819688f32 │ Change │
 ├────────────────────────────────┼─────────┼──────────────┼──────────────┼────────┤
 │ raptor-tp6-amazon-firefox      │ fcp     │ None         │ 317          │        │
 │ raptor-tp6-apple-firefox       │ fcp     │ 295          │ 356          │ -21%   │
 │ raptor-tp6-bing-firefox        │ fcp     │ None         │ 99           │        │
 │ raptor-tp6-docs-firefox        │ fcp     │ 1106         │ 1160         │ -5%    │
 │ raptor-tp6-facebook-firefox    │ fcp     │ None         │ 362          │        │
 │ raptor-tp6-google-firefox      │ fcp     │ None         │ 150          │        │
 │ raptor-tp6-imdb-firefox        │ fcp     │ 174          │ 182          │ -4%    │
 │ raptor-tp6-imgur-firefox       │ fcp     │ 271          │ 301          │ -11%   │
 │ raptor-tp6-microsoft-firefox   │ fcp     │ 479          │ 651          │ -36%   │
 │ raptor-tp6-reddit-firefox      │ fcp     │ None         │ 384          │        │
 │ raptor-tp6-sheets-firefox      │ fcp     │ 272          │ 321          │ -18%   │
 │ raptor-tp6-slides-firefox      │ fcp     │ 775          │ 875          │ -13%   │
 │ raptor-tp6-wikia-firefox       │ fcp     │ 173          │ 188          │ -8%    │
 │ raptor-tp6-yahoo-news-firefox  │ fcp     │ None         │ 321          │        │
 │ raptor-tp6-yandex-firefox      │ fcp     │ None         │ 110          │        │
 │ raptor-tp6-youtube-firefox     │ fcp     │ None         │ 363          │        │
 │ raptor-tp6m-amazon-geckoview   │ fcp     │ None         │ 887          │        │
 │ raptor-tp6m-facebook-geckoview │ fcp     │ None         │ 1717         │        │
 │ raptor-tp6m-google-geckoview   │ fcp     │ None         │ 322          │        │
 │ raptor-tp6m-youtube-geckoview  │ fcp     │ None         │ 346          │        │
 └────────────────────────────────┴─────────┴──────────────┴──────────────┴────────┘

Here is an email on ADR:

All,

We have an ActiveData Recipe (ADR) that accepts two revisions and will
return the subtest values across all suites.

The active-datat-recipe project is here:
https://github.com/mozilla/active-data-recipes

You should clone the repo, instead of using pip, so you always have the
best version:
git clone https://github.com/mozilla/active-data-recipes
cd active-data-recipes
python setup.py develop

Be sure to use python3.6, or later!
c:\python36\python.exe setup.py develop

The setup installs the ADR CLI:
c:\python36\Scripts\adr.exe

You can list all the recipes:
c:\python36\Scripts\adr.exe -l

We are interested in only perf_tp6_compare recipe:
c:\python36\Scripts\adr.exe perf_tp6_compare -h

Here is an example run, we suspect some facebook breakage between these
two revisions:
c:\python36\Scripts\adr.exe perf_tp6_compare --r1 9e919 --r2 0c6d

Have a good day!

Flags: needinfo?(klahnakoski)

There's no other Perfherder-way at the moment.

OK. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1529090 on that.

If you switch between the two network tracks, you can see that some loads started later in the slower profile.

Markus, thank you! Unfortunately, I see nothing in the profile that would tell me anything about the "why".

Here is an example, comparing FCP from the regressed revision (abe81) with one from before (1820)

Hmm. This is showing things like a change in raptor-tp6-apple-firefox that weren't there in the original report here...

Anyway, I tried doing a few try pushes to try to tease out what's going on here.

  1. A baseline without bug 1527742 or bug 1527905: https://treeherder.mozilla.org/#/jobs?repo=try&revision=938741114c2bea0e61c638b1f04922e02bb7a382

  2. (1) plus bug 1527742: https://treeherder.mozilla.org/#/jobs?repo=try&revision=34400270266b341b06b70e01b6d54d5e74b1eb7a

  3. (2) plus bug 1527905: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3cce2764177baf9b829271fc3fa9b46dbcc83a0d

The relevant perfherder compares should be:

The results don't match up to comment 0 all that well. For example, there are no particular changes in "tp6-slides" but there is a big win from bug 1527905 in "tp6-google-mail" which isn't mentioned in comment 0. The same win shows up in the "both bugs" compare too.

That said, the wikia regression does seem to show up. I assume "fcp" is "first contentful paint"? What's "dcf"?

Do you happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

Flags: needinfo?(igoldan)

I guess one question is whether we basically shifted some GC activity around or something, with the wins on some tests and losses on others...

Flags: needinfo?(bzbarsky)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #10)

That said, the wikia regression does seem to show up. I assume "fcp" is "first contentful paint"? What's "dcf"?

"dcf" stands for dom-content-flushed, according to the docs.

Do yuo happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

Robert, could you provide Boris with a summary on this? Maybe even add it to the Raptor wiki page.

Flags: needinfo?(igoldan) → needinfo?(rwood)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #12)

Do yuo happen to know how is the wikia test actually gets run? Is it loading the same page multiple times, or loading it after some other page? If the latter, which one?

For Raptor page-load tests each page is loaded 25 times in a row, then the browser is shutdown and restarted (with a new profile) before the next test page is started.

https://wiki.mozilla.org/Performance_sheriffing/Raptor#Page-Load_Tests

Flags: needinfo?(rwood)

...and the median of the 25 loads is showing in the reports

Priority: -- → P2

I haven't made any progress on figuring out why using zone-per-load regresses wikia specifically. That said, given that it improves some other tests, and lays the groundwork for bug 1523843, which is a significant performance improvement on various workloads we care about, I am tempted to just write off the regression here...

I don't know who, offhand... But see comment 11 for a likely explanation.

Flags: needinfo?(bzbarsky)
Component: DOM → DOM: Core & HTML
Keywords: stalled
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.