Closed Bug 1425493 Opened 2 years ago Closed 2 months ago
Perfherder's alerting algorithm doesn't handle multi-modal data well
When we moved TC workers off AUFS, build jobs got minutes faster. This is clearly visible at the following link: https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=autoland,1444817,1,2&series=autoland,1582195,1,2&series=autoland,1444828,1,2&series=autoland,1618745,1,2&zoom=1511268162556.8628,1513356286000,0,2982.022403331285 (Note: we rolled this out initially around November 28, reverted it a few days later, then rolled it out again (for good) on December 7. Due to bug 1424383, it took 1+ day for old workers to phase out and the new fast build times to be universal.) The visual data clearly shows the floor of build times dropping off. However, we didn't receive a Perfherder alert for the change. This is likely because the data is noisy (due to sccache hit rate in builds). I'm not sure what could be done to fix this. But it would be nice to receive a Perfherder alert for "visually obvious" improvements like this. I would also posit that if we fail to detect the improvement, we fail to detect the regression. That would be concerning for performance sheriffing.
Yeah, the t-test we use in perfherder kind of fails for bimodal data (which that graph seems to be). Roberto wrote up a proposal for an alternative algorithm for detecting changes which might be more resilient to this problem, needs someone to implement it though: https://robertovitillo.com/2016/01/10/detecting-talos-regressions/ The code for performance alerting is here: https://github.com/mozilla/treeherder/tree/master/treeherder/perfalert
Summary: Failed to receive Perfherder alert for build time improvements → Perfherder's alerting algorithm doesn't handle multi-modal data well
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1581533
You need to log in before you can comment on or make changes to this bug.