5.13 - 13.97% tp5n main_startup_fileio / tp5n nonmain_startup_fileio / tp5n time_to_session_store_window_restored_ms (windows7-32) regression on push 346002ff619eef0712be55594102460b40f44db1 (Fri Mar 8 2019)

VERIFIED FIXED in Firefox 67

Status

defect
VERIFIED FIXED
5 months ago
4 months ago

People

(Reporter: Bebe, Assigned: glandium)

Tracking

(Blocks 1 bug, {perf, regression, talos-regression})

unspecified
mozilla68
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox66 unaffected, firefox67 fixed, firefox68 fixed)

Details

Attachments

(1 attachment)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=346002ff619eef0712be55594102460b40f44db1

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

14% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,619,678.67 -> 2,985,549.00
9% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 758.15 -> 825.84
5% tp5n main_startup_fileio windows7-32 pgo e10s stylo 97,353,382.17 -> 102,347,779.67

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=19823

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Product: Testing → Firefox Build System
Version: Version 3 → unspecified

14% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,619,678.67 -> 2,985,549.00
5% tp5n main_startup_fileio windows7-32 pgo e10s stylo 97,353,382.17 -> 102,347,779.67

Those two are pretty much expected, because we turn implicit I/O into explicit I/O on purpose. That's the reason we have a whitelist of I/O in the talos xperf test in testing/talos/talos/xtalos/xperf_whitelist.json. The whitelist doesn't apply to tp5, which is why this appears as a regression.

9% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 758.15 -> 825.84

This one, however, is interesting. And I think it goes along something I noticed while working on restoring the jar preloading in bug 1529894: We're preloading according to a log of a first-run, but in a second-run, we actually need less because other stuff comes from the startup cache in the profile. I've been thinking about adding another layer of ordering to account for that. But maybe a short term reasonable thing to do is to just remove jar readahead on Windows 7 (because it's actual explicit I/O) but keep it on Windows 8+ where it's a hint given to the OS, like on other platforms. Aaron, what do you think?

Flags: needinfo?(aklotz)
Assignee: nobody → mh+mozilla

(In reply to Mike Hommey [:glandium] from comment #1)

9% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 758.15 -> 825.84

This one, however, is interesting. And I think it goes along something I noticed while working on restoring the jar preloading in bug 1529894: We're preloading according to a log of a first-run, but in a second-run, we actually need less because other stuff comes from the startup cache in the profile. I've been thinking about adding another layer of ordering to account for that.

I like that.

But maybe a short term reasonable thing to do is to just remove jar readahead on Windows 7 (because it's actual explicit I/O) but keep it on Windows 8+ where it's a hint given to the OS, like on other platforms. Aaron, what do you think?

It's pretty hard to argue with the time_to_session_store_window_restored_ms numbers, so sure.

Flags: needinfo?(aklotz)

Re-enabling the PGO jarlog, which was unexpectedly disabled since Firefox 56
showed a regression on Windows 7, due to the use of mozilla::ReadAhead,
which on Windows 7 does explicit I/O on the caller thread.

While there may be some benefit from doing so, evidence says the
opposite, which is presumably due to the amount of data being loaded not
being relevant in every case: the jarlog is gathered from a first-run,
which has a different jar-loading profile from subsequent runs of
Firefox.

While we may want to improve the situation later on, the immediate thing
we can do is stop doing this explicit read, while keeping the OS
readahead hints on other platforms, which don't imply explicit I/O.

All this does is effectively get us back to the same state as if jarlogs
were disabled like it was since Firefox 56, for Windows 7 only.

aFd not being used anymore, the code could be cleaned up a lot, but we
may reintroduce the readahead later, so keep the status quo for now.

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/33b7a55cdc20
Do not readahead when PrefetchVirtualMemory is not available. r=aklotz
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Please nominate this for Beta uplift when you get a chance.

Flags: needinfo?(mh+mozilla)

Comment on attachment 9051224 [details]
Bug 1534209 - Do not readahead when PrefetchVirtualMemory is not available.

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1529894
  • User impact if declined: Regression in startup time on Windows 7
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This removes part of some code that was modified in bug 1529894, and wasn't used before that since Firefox 56.
  • String changes made/needed: N/A
Flags: needinfo?(mh+mozilla)
Attachment #9051224 - Flags: approval-mozilla-beta?

Comment on attachment 9051224 [details]
Bug 1534209 - Do not readahead when PrefetchVirtualMemory is not available.

Fix for a perf regression in our build system already on Nightly, uplift approved for our next beta. Thanks.

Attachment #9051224 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I confirm this got fixed! \0/

== Change summary for alert #19961 (as of Tue, 19 Mar 2019 16:54:57 GMT) ==

Improvements:

56% tp5n main_normal_fileio windows7-32 pgo e10s stylo 1,207,775.83 -> 534,550.33
9% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,943,320.08 -> 2,674,200.08
5% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 815.83 -> 773.52
5% tp5n main_startup_fileio windows7-32 pgo e10s stylo 102,560,394.67 -> 97,717,170.00

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19961

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.