26.63% startup_about_home_paint_realworld_webextensions (macosx1014-64-shippable) regression on push 6faac02d6d1ffe5cf2023f70c38f90b50948eb65 (Mon October 21 2019)
Categories
(Infrastructure & Operations :: RelOps: Posix OS, defect)
Tracking
(Not tracked)
People
(Reporter: marauder, Unassigned)
References
Details
(4 keywords)
Talos has detected a Firefox performance regression around October 21st.
I did several retriggers and backfills and the regression was going backwards.
When i saw that pattern i decided to do 2 retriggers on two datapoints from the past:
2233060e1f08a - October 15th
09f5cd302da54 - October 11th
and the regression was there too.
Regressions:
27% startup_about_home_paint_realworld_webextensions macosx1014-64-shippable opt e10s stylo 1,833.46 -> 2,321.67
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23519
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
I think this is an infra change, can someone confirm this ?
Thank you!
Comment 2•5 years ago
|
||
I've moved this over to the correct component for macOS infra, though from looking at the patch you cite it doesn't look like an infra issue. NI'ing the patch author to comment and potentially move this bug.
Comment 3•5 years ago
|
||
The patch in the push of comment 0 does not touch any release code. The code patched there is fuzzing-only and cannot be source of your regression.
Comment 4•5 years ago
•
|
||
(In reply to Christian Holler (:decoder) from comment #3)
The patch in the push of comment 0 does not touch any release code. The code patched there is fuzzing-only and cannot be source of your regression.
The patch from comment 0 doesn't have any kind of relevance. As perf sheriffs, we're interested to know if there where any infra changes done on OSX platforms, around October 20-21.
Marian, please update comment 0 by deleting that commit URL (it's indeed confusing) & replace it with the approximate date when the infra changes were 1st noticed in our graphs.
Reporter | ||
Comment 5•5 years ago
|
||
I updated the first comment to better explain what is happening.
Reporter | ||
Updated•5 years ago
|
This is likely caused by my changes to the MacOS systems on Oct 21st.
I changed the log forwarding configuration (forwarding less log entries: filtering out below error level for non-generic-worker/kernel/sudo processes). I also added monitoring, which I disabled that afternoon (I checked across the machines this morning and confirmed the monitoring(telegraf) service is not running).
I'll try running these tests on a staging worker with the logging changed back to how it was previously.
Comment 7•5 years ago
•
|
||
Dave, could you link the bug in question here? Or provide a link to the PR, so we can properly conclude this bug?
(In reply to Ionuț Goldan [:igoldan], Performance Sheriff from comment #7)
Dave, could you link the bug in question here? Or provide a link to the PR, so we can properly conclude this bug?
Ionut, thanks! Here is the PR: https://github.com/mozilla-platform-ops/ronin_puppet/pull/126
and the bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1585750
(In reply to Dave House [:dhouse] from comment #8)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriff from comment #7)
Dave, could you link the bug in question here? Or provide a link to the PR, so we can properly conclude this bug?
Ionut, thanks! Here is the PR: https://github.com/mozilla-platform-ops/ronin_puppet/pull/126
and the bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1585750
That PR was for turning off the monitoring to fix the test failures (over-running time) on the 21st.
These are the changes applied that likely caused the problem:
logging change: https://github.com/mozilla-platform-ops/ronin_puppet/pull/125 (still active in prod)
monitoring service: https://github.com/mozilla-platform-ops/ronin_puppet/pull/118 (disabled by above pr)
Updated•5 years ago
|
Reporter | ||
Comment 10•4 years ago
|
||
i'll close this as fixed
Description
•