Use kmeans instead of gaussian filter in extra-summary-methods
Categories
(Testing :: Raptor, task, P1)
Tracking
(firefox113 fixed)
Tracking | Status | |
---|---|---|
firefox113 | --- | fixed |
People
(Reporter: sparky, Assigned: sparky)
References
(Regressed 1 open bug)
Details
Attachments
(2 files)
There's a bug with our existing filtering method that uses the gaussian method. It's not exactly a bug, but it's not working as intended. The issue is that when we have a few too many outliers that we want to remove, the std. dev. increases along with the mean. This makes it difficult to remove the outlier points.
Instead, given that we're almost always dealing with multi-modal data in some way, we should use a kmeans filter with 2 k-means to search for. This way, we'll always be able to remove at least one of the offending modes if they don't take up too much of the data, and if the differences are large enough.
Example of this issue: https://treeherder.mozilla.org/jobs?repo=autoland&revision=d59b76766f0dca1e3e1fb4227a5110b1f4f58f11&group_state=expanded&selectedTaskRun=Oz-BqKwzQz2HyUzDD_Ls3A.0
Running locally using k-means:
$ ./mach raptor -t wikia --browsertime-existing-results "/home/sparky/Downloads/browsertime-results(49)/browsertime-results" --browsertime-visualmetrics --extra-summary-methods geomean --chimera
...
21:24:11 INFO - loadtime (geomean): Filtering out 6 data points found in minor_group of data with mean 48643.333333333336 vs. 1721.6842105263158 in major group
21:24:11 INFO - cpuTime (geomean): Filtering out 6 data points found in minor_group of data with mean 27647.166666666668 vs. 12903.631578947368 in major group
21:24:11 INFO - LastVisualChange (geomean): Filtering out 6 data points found in minor_group of data with mean 14460.0 vs. 4048.4210526315787 in major group
21:24:11 INFO - SpeedIndex (geomean): Filtering out 6 data points found in minor_group of data with mean 4631.333333333333 vs. 1225.7894736842106 in major group
21:24:11 INFO - PerceptualSpeedIndex (geomean): Filtering out 6 data points found in minor_group of data with mean 4416.5 vs. 1181.0526315789473 in major group
21:24:11 INFO - perftest-output Info: PERFHERDER_DATA: {"framework": {"name": "browsertime"}, "suites": [{"name": "wikia", "type": "pageload", "extraOptions": ["fission", "cold", "webrender"], "tags": ["fission", "cold", "webrender"], "lowerIsBetter": true, "unit": "ms", "alertThreshold": 2.0, "subtests": [{"name": "ContentfulSpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [2047, 1937, 2399, 1930, 1882, 1967, 2006, 2015, 1818, 1749, 1810, 1586, 1785, 2549, 1956, 1943, 1951, 1960, 1970, 2525, 1951, 1992, 1993, 2065, 2067], "value": 1960}, {"name": "ContentfulSpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [2047, 1937, 2399, 1930, 1882, 1967, 2006, 2015, 1818, 1749, 1810, 1586, 1785, 2549, 1956, 1943, 1951, 1960, 1970, 2525, 1951, 1992, 1993, 2065, 2067], "value": 1983.5}, {"name": "FirstVisualChange", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [400, 240, 240, 240, 240, 240, 240, 240, 200, 240, 240, 240, 240, 200, 320, 280, 240, 240, 240, 240, 240, 240, 240, 360, 240], "value": 240}, {"name": "FirstVisualChange (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [400, 240, 240, 240, 240, 240, 240, 240, 200, 240, 240, 240, 240, 200, 320, 280, 240, 240, 240, 240, 240, 240, 240, 360, 240], "value": 249.7}, {"name": "LastVisualChange", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [15160, 4320, 5200, 4080, 14720, 14960, 4160, 4160, 3840, 3600, 3840, 3200, 3720, 14400, 4160, 13920, 4000, 4080, 4080, 13600, 3960, 4120, 4040, 4200, 4160], "value": 4160}, {"name": "LastVisualChange (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [15160, 4320, 5200, 4080, 14720, 14960, 4160, 4160, 3840, 3600, 3840, 3200, 3720, 14400, 4160, 13920, 4000, 4080, 4080, 13600, 3960, 4120, 4040, 4200, 4160], "value": 4032.0}, {"name": "PerceptualSpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [4500, 1383, 1524, 1186, 4345, 4394, 1184, 1196, 1054, 1019, 1139, 929, 1090, 4646, 1255, 4218, 1138, 1177, 1184, 4396, 1120, 1203, 1184, 1276, 1199], "value": 1196}, {"name": "PerceptualSpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [4500, 1383, 1524, 1186, 4345, 4394, 1184, 1196, 1054, 1019, 1139, 929, 1090, 4646, 1255, 4218, 1138, 1177, 1184, 4396, 1120, 1203, 1184, 1276, 1199], "value": 1174.7}, {"name": "SpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [4776, 1187, 1375, 1244, 4615, 4728, 1313, 1256, 1164, 1115, 1163, 1016, 1129, 4853, 1266, 4478, 1237, 1224, 1236, 4338, 1215, 1253, 1247, 1318, 1332], "value": 1253}, {"name": "SpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [4776, 1187, 1375, 1244, 4615, 4728, 1313, 1256, 1164, 1115, 1163, 1016, 1129, 4853, 1266, 4478, 1237, 1224, 1236, 4338, 1215, 1253, 1247, 1318, 1332], "value": 1222.9}, {"name": "cpuTime", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [28410, 13485, 18855, 12904, 28233, 26317, 11694, 13046, 11700, 10925, 13357, 11686, 13012, 27600, 14100, 27394, 11637, 12638, 13073, 27929, 11604, 13034, 12986, 13346, 12087], "value": 13046}, {"name": "cpuTime (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [28410, 13485, 18855, 12904, 28233, 26317, 11694, 13046, 11700, 10925, 13357, 11686, 13012, 27600, 14100, 27394, 11637, 12638, 13073, 27929, 11604, 13034, 12986, 13346, 12087], "value": 12816.7}, {"name": "dcf", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [565, 389, 383, 377, 380, 412, 409, 389, 490, 494, 477, 243, 433, 294, 363, 377, 415, 537, 499, 353, 401, 458, 388, 425, 416], "value": 405.0}, {"name": "dcf (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [565, 389, 383, 377, 380, 412, 409, 389, 490, 494, 477, 243, 433, 294, 363, 377, 415, 537, 499, 353, 401, 458, 388, 425, 416], "value": 408.6}, {"name": "fcp", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [362, 203, 142, 193, 181, 195, 205, 190, 188, 173, 193, 168, 191, 176, 184, 192, 193, 202, 195, 176, 189, 190, 196, 197, 199], "value": 191.5}, {"name": "fcp (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [362, 203, 142, 193, 181, 195, 205, 190, 188, 173, 193, 168, 191, 176, 184, 192, 193, 202, 195, 176, 189, 190, 196, 197, 199], "value": 192.4}, {"name": "fnbpaint", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [363, 204, 178, 194, 182, 196, 206, 191, 189, 174, 194, 169, 192, 177, 188, 193, 194, 203, 196, 176, 190, 191, 198, 198, 200], "value": 192.5}, {"name": "fnbpaint (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [363, 204, 178, 194, 182, 196, 206, 191, 189, 174, 194, 169, 192, 177, 188, 193, 194, 203, 196, 176, 190, 191, 198, 198, 200], "value": 195.3}, {"name": "loadtime", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [51986, 3051, 3054, 1034, 51716, 42369, 2949, 1047, 1268, 597, 3203, 507, 1064, 51343, 4225, 42624, 1260, 1057, 997, 51822, 1307, 1081, 1020, 1085, 2906], "value": 1287.5}, {"name": "loadtime (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [51986, 3051, 3054, 1034, 51716, 42369, 2949, 1047, 1268, 597, 3203, 507, 1064, 51343, 4225, 42624, 1260, 1057, 997, 51822, 1307, 1081, 1020, 1085, 2906], "value": 1438.0}]}, {"name": "wikia", "type": "pageload", "extraOptions": ["fission", "webrender", "warm"], "tags": ["fission", "webrender", "warm"], "lowerIsBetter": true, "unit": "ms", "alertThreshold": 2.0, "subtests": [{"name": "ContentfulSpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [661, 600, 584, 644, 655, 712, 614, 624, 651, 568, 652, 620, 614, 669, 611, 596, 647, 662, 715, 654, 601, 611, 653, 650, 652], "value": 647}, {"name": "ContentfulSpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [661, 600, 584, 644, 655, 712, 614, 624, 651, 568, 652, 620, 614, 669, 611, 596, 647, 662, 715, 654, 601, 611, 653, 650, 652], "value": 635.8}, {"name": "FirstVisualChange", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [120, 80, 120, 160, 120, 160, 120, 120, 120, 120, 120, 160, 160, 160, 80, 120, 160, 160, 160, 160, 120, 120, 120, 120, 120], "value": 120}, {"name": "FirstVisualChange (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [120, 80, 120, 160, 120, 160, 120, 120, 120, 120, 120, 160, 160, 160, 80, 120, 160, 160, 160, 160, 120, 120, 120, 120, 120], "value": 128.9}, {"name": "LastVisualChange", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [1280, 1200, 1120, 1200, 1240, 1360, 1160, 1200, 1240, 1080, 1280, 1120, 1120, 1240, 1200, 1120, 1200, 1240, 1360, 1200, 1160, 1200, 1280, 1240, 1280], "value": 1200}, {"name": "LastVisualChange (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [1280, 1200, 1120, 1200, 1240, 1360, 1160, 1200, 1240, 1080, 1280, 1120, 1120, 1240, 1200, 1120, 1200, 1240, 1360, 1200, 1160, 1200, 1280, 1240, 1280], "value": 1210.8}, {"name": "PerceptualSpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [399, 346, 361, 411, 393, 444, 377, 379, 390, 354, 393, 397, 396, 419, 350, 365, 412, 416, 442, 414, 366, 372, 394, 390, 399], "value": 393}, {"name": "PerceptualSpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [399, 346, 361, 411, 393, 444, 377, 379, 390, 354, 393, 397, 396, 419, 350, 365, 412, 416, 442, 414, 366, 372, 394, 390, 399], "value": 390.3}, {"name": "SpeedIndex", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [429, 375, 387, 439, 423, 478, 405, 407, 421, 382, 424, 419, 421, 450, 383, 393, 442, 448, 475, 442, 398, 404, 427, 414, 431], "value": 421}, {"name": "SpeedIndex (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [429, 375, 387, 439, 423, 478, 405, 407, 421, 382, 424, 419, 421, 450, 383, 393, 442, 448, 475, 442, 398, 404, 427, 414, 431], "value": 419.9}, {"name": "cpuTime", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [8404, 9307, 9330, 8625, 9003, 8810, 8635, 8660, 8529, 9354, 8695, 9316, 9106, 8764, 8414, 9636, 8725, 8802, 8740, 9513, 9245, 8861, 8518, 8775, 8818], "value": 8802}, {"name": "cpuTime (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [8404, 9307, 9330, 8625, 9003, 8810, 8635, 8660, 8529, 9354, 8695, 9316, 9106, 8764, 8414, 9636, 8725, 8802, 8740, 9513, 9245, 8861, 8518, 8775, 8818], "value": 8896.9}, {"name": "dcf", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [141, 147, 146, 175, 139, 119, 153, 174, 196, 154, 178, 141, 136, 180, 151, 144, 142, 157, 167, 141, 145, 194, 152, 193, 148], "value": 151.5}, {"name": "dcf (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [141, 147, 146, 175, 139, 119, 153, 174, 196, 154, 178, 141, 136, 180, 151, 144, 142, 157, 167, 141, 145, 194, 152, 193, 148], "value": 155.3}, {"name": "fcp", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [74, 69, 63, 81, 71, 79, 64, 73, 73, 76, 67, 77, 73, 78, 77, 77, 76, 75, 76, 76, 79, 70, 77, 74, 63], "value": 75.5}, {"name": "fcp (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [74, 69, 63, 81, 71, 79, 64, 73, 73, 76, 67, 77, 73, 78, 77, 77, 76, 75, 76, 76, 79, 70, 77, 74, 63], "value": 73.4}, {"name": "fnbpaint", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [106, 104, 102, 118, 109, 118, 106, 107, 104, 108, 105, 109, 104, 108, 113, 113, 109, 108, 116, 107, 115, 107, 107, 104, 103], "value": 107.5}, {"name": "fnbpaint (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [106, 104, 102, 118, 109, 118, 106, 107, 104, 108, 105, 109, 104, 108, 113, 113, 109, 108, 116, 107, 115, 107, 107, 104, 103], "value": 108.3}, {"name": "loadtime", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [225, 218, 263, 262, 228, 236, 224, 266, 224, 272, 262, 270, 265, 221, 225, 301, 224, 271, 225, 226, 259, 223, 227, 227, 225], "value": 227.5}, {"name": "loadtime (geomean)", "lowerIsBetter": true, "alertThreshold": 2.0, "unit": "ms", "shouldAlert": false, "replicates": [225, 218, 263, 262, 228, 236, 224, 266, 224, 272, 262, 270, 265, 221, 225, 301, 224, 271, 225, 226, 259, 223, 227, 227, 225], "value": 241.7}]}], "application": {"name": "firefox", "version": "112.0a1"}}
Assignee | ||
Comment 1•2 years ago
|
||
This patch changes the filtering method from a gaussian filter to a k-means filter that should be more suitable to our needs. See this bug comment: https://bugzilla.mozilla.org/show_bug.cgi?id=1821791#c0
With kmeans from scipy, we specify it to search for 2 groups. From there, we check to see if there is a group that comprises no more than 40% of the total size. If there is a group, then we check if the difference in the means are 200%. If they are, then we throw out the dataset that has the least amount of data in it.
This fixes an issue where datasets that had outliers that skewed the standard deviation, and the mean too much would prevent us from removing them.
Updated•2 years ago
|
Assignee | ||
Comment 2•2 years ago
|
||
Depends on D172320
Comment 4•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f7da71e7b1f9
https://hg.mozilla.org/mozilla-central/rev/3363ac0afc0d
Description
•