Closed Bug 1131670 Opened 6 years ago Closed 6 years ago

5-10% linux* talos regressions seen around Feb 8th

Categories

(Testing :: Talos, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

we got a handful of linux related talos alerts in glterrain, svg-asap, tart/cart on all branches from this past weekend.  It looked like Sunday Feb 8th for most of the alerts.  But when verifying this is the real regression and it is sustained, we had questions. 

What we did is retrigger a bunch of jobs:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=linux talos svgr&startdate=2015-02-04&enddate=2015-02-08

the value for cart is <19.00 up until feb 8th (commit was 12:50 PDT) where the value is >20.00 (usually above 21.00).  retriggering back to Feb 5th, all retriggered jobs report the new higher value.

I see this on aurora and fx-team for other regressions and here are the alerts I see:
http://alertmanager.allizom.org:8080/alerts.html?rev=6896b086d022&showAll=1&testIndex=0&platIndex=0
http://alertmanager.allizom.org:8080/alerts.html?rev=85477b7fa1d6&showAll=1&testIndex=0&platIndex=0
http://alertmanager.allizom.org:8080/alerts.html?rev=1cfec4aaa453&showAll=1&testIndex=0&platIndex=0
http://alertmanager.allizom.org:8080/alerts.html?rev=009346e87007&showAll=1&testIndex=0&platIndex=0
http://alertmanager.allizom.org:8080/alerts.html?rev=ab375fe6ca9f&showAll=1&testIndex=0&platIndex=0

From chatting on IRC this appears to be related (although a couple days late) to a kernel upgrade.  There are plans to downgrade it.
The Ubuntu kernel upgrade landed on the 5th (https://bugzilla.mozilla.org/show_bug.cgi?id=1113328#c15) but "kicked in" gradually over the weekend.

talos-linux64-ix-010: Start-Date: 2015-02-06  16:35:54
talos-linux64-ix-011: Start-Date: 2015-02-08  02:30:22
talos-linux64-ix-012: Start-Date: 2015-02-05  19:48:13
talos-linux64-ix-013: Start-Date: 2015-02-06  03:49:50
talos-linux64-ix-014: Start-Date: 2015-02-06  02:12:59
talos-linux64-ix-015: Start-Date: 2015-02-07  09:08:56
talos-linux64-ix-016: Start-Date: 2015-02-05  20:34:50
talos-linux64-ix-017: Start-Date: 2015-02-06  02:48:39
talos-linux64-ix-018: Start-Date: 2015-02-05  14:24:09

The switch back to 3.2.0 may revert the performance change -- it's hard to say.  It's not to the original kernel version, but to a version patched to address the security vulnerability.
See Also: → 1113328
Joel, is this a reasonable explanation? Do we need to keep looking?
Flags: needinfo?(jmaher)
I think it explains it and we should accept this change as the new normal!  how can we be made aware of new changes in the future?
Flags: needinfo?(jmaher)
The reversion to a 3.2.0 kernel was shipped this morning. When machines reboot, they'll pick up the 3.2.0-76-generic kernel. As dustin says, it's a newer revision than we had previously to address the security issues. Do the machines that have upgraded revert the regression as well?
thanks for the heads up, I don't see any new regressions from changing the kernel, so I am going to close this as wontfix and we will have documentation in this bug.  Luckily this won't show up when we uplift as this affected all branches at the same time.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.