Closed Bug 1429319 Opened 2 years ago Closed 2 years ago

2.05 - 36.55% Multiple platform_microbenchmark test (windows10-64, windows7-32) regressions on push 3ede11fe526eed5f34040399dfaed3af8f1e7c71 (Thu Jan 4 2018)

Categories

(Infrastructure & Operations :: SRE, defect)

Unspecified
Windows
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression)

We have detected a platform microbenchmarks regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=3ede11fe526eed5f34040399dfaed3af8f1e7c71

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 37%  Strings PerfStripWhitespace windows10-64 opt      116,022.00 -> 158,428.00
 13%  Stylo Gecko_nsCSSParser_ParseSheet_Bench windows7-32 opt 73,231.21 -> 82,789.25
 12%  Stylo Servo_StyleSheet_FromUTF8Bytes_Bench windows7-32 opt 71,707.29 -> 80,128.25
 11%  Stylo Servo_StyleSheet_FromUTF8Bytes_Bench windows10-64 opt 63,095.21 -> 70,302.64
 11%  Stylo Gecko_nsCSSParser_ParseSheet_Bench windows10-64 opt 60,969.83 -> 67,773.71
 10%  Strings PerfStripCRLF windows10-64 opt            83,435.83 -> 91,606.42
  2%  TestStandardURL NormalizePerf windows7-32 opt     73,572.17 -> 75,078.83


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=11073

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Platform_Microbenchmarks
I investigated these regressions for about a week now. They don't related to any in-tree changes.
I retriggered all of the tests above on old changes, some dating back to December 8th. The new values pretty much resemble the regression levels above. So this sounds like an infrastructure change.

This behavior happens also on the build time metrics, to which :gps came up with two possibilities:

* TaskCluster rolling out new AMI (which is slower for some reason). grenade: did you roll out anything last week?
* Spectre and Meltdown patches being applied by AWS. We know the mitigations will make machines slower. But the impact is workload dependent and it is unclear what the impact on Firefox builds will be.

I'm into concluding these regressions on the :gps' explanations.
Blocks: 1429325
recent windows ami updates:

- gecko-t-win10-64-gpu: 04/01/2018
  https://github.com/mozilla-releng/OpenCloudConfig/commit/289627fcf95d9e9ec7cefb559f43f9e411626ed9
- gecko-3-b-win2012: 08/12/2017
  https://github.com/mozilla-releng/OpenCloudConfig/commit/4bb45fd6d9860d047d18a9d8f8953e016a0e0f55
- gecko-1-b-win2012 & gecko-1-b-win2012: 07/12/2017
  https://github.com/mozilla-releng/OpenCloudConfig/commit/9d038b8532b25ade976cdff47661f9e91960f7d9

additionally all windows instances were updated to use the "high performance" power plan (see bug 1362613). this change would affect any instance booted after december 13th. no ami update was required for this change as instances would have picked up the config on boot. https://github.com/mozilla-releng/OpenCloudConfig/commit/d1e4a1e06989e46dda64522c050e2a5b1e2e3379
Component: Untriaged → Infrastructure: AWS
Product: Firefox → Infrastructure & Operations
QA Contact: cshields
I labeled this bug under Infrastructure & Operations :: Infrastructure: AWS component, based on :gps' comments [1] in a similar bug.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1429311#c2
And it appears the perf regressions have returned to previous baseline as of a few days ago. Good times.
(In reply to Gregory Szorc [:gps] from comment #4)
> And it appears the perf regressions have returned to previous baseline as of
> a few days ago. Good times.

Yes, it looks like it did. Marking this bug as resolved.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
These results confirm that the baselines have returned to normal:

== Change summary for alert #11160 (as of Fri, 12 Jan 2018 08:36:28 GMT) ==

Improvements:

 26%  Strings PerfStripWhitespace windows10-64 opt      160,245.67 -> 118,289.67
 11%  Stylo Servo_StyleSheet_FromUTF8Bytes_Bench windows7-32 opt 79,586.58 -> 71,181.50
 10%  Stylo Gecko_nsCSSParser_ParseSheet_Bench windows7-32 opt 81,710.83 -> 73,613.08
  8%  Strings PerfIsASCIIHundred windows7-32 opt        3,070.33 -> 2,816.15
  2%  TestStandardURL NormalizePerf windows7-32 opt     75,047.38 -> 73,416.92
  2%  Strings PerfIsUTF8Example3 windows7-32 opt        8,021.84 -> 7,860.23

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11160
You need to log in before you can comment on or make changes to this bug.