[meta] Provide plan for using Mann-Whitney-U alongside student' s t-test
Categories
(Tree Management :: Perfherder, task)
Tracking
(Not tracked)
People
(Reporter: igoldan, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Keywords: meta)
All our tests are using the student-T test to generate performance alerts. This algorithm is generally inferior to Mann-Whitney-U (MWU), in that it cannot alert on data that is noisy, multi modal and/or has outliers.
Downside of fully enabling MWU is it will generate way more alerts than our perf sheriffs are able to investigate.
Let's come up with a plan for integrating MWU as a new alerting strategy & enable it only for Fenix tests. We should be able to run either of these strategies or both with minimal changes.
Jira link: https://jira.mozilla.com/browse/FXP-1472
Comment 1•3 years ago
|
||
If this is successful, I am hopeful that it can be expanded to cover other tests including Desktop.
I don't yet know if Mann-Whitney-U would detect this change, but consider this change (a memory cache experiment):
18.84%
improvement to mean loadtime over 45 runs.
But not picked up as statistically significant.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 2•3 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #1)
If this is successful, I am hopeful that it can be expanded to cover other tests including Desktop.
I believe this is the intention: that each test (mobile, desktop) can have it's own alert setup (either MWU or student' s t-test). I've rephrased the summary.
I don't yet know if Mann-Whitney-U would detect this change, but consider this change (a memory cache experiment):
18.84%
improvement to mean loadtime over 45 runs.
But not picked up as statistically significant.
That looks like an outlier. According to :ekyle' s report, MWU should be able to detect the change.
Description
•