I am concerned about the new bloom test, specifically with the alerts we are seeing from OSX. here is the data for all the platforms: https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bautoland,1d83527507421790c3e598b6fde76955bce2467d,1,1%5D&series=%5Bautoland,7595a193a0efa0d6b73e058f912a913862eaa9a1,1,1%5D&series=%5Bautoland,297a614b54fa9f991fb2cff58ea4db9cbc7b1bd2,1,1%5D if you mute (uncheck) the series, you can see the win7 and win8 are acting normal. Linux just got started yesterday, so we don't have much data. Now look at osx: https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bautoland,1d83527507421790c3e598b6fde76955bce2467d,1,1%5D 8 alerts in the last week (4 regressions, 4 improvements). These are not related to code landing and backing out. I feel this test is too sensitive on OSX- I would like to find a way to reduce the alerts we see here so we do not randomize developers. a few options: 1) increase the alert threshold to 5% (not sure how to do this for osx specifically, we can probably figure it out) 2) do not run the test on OSX 3) realize that OSX is problematic and put resources towards investigating these 4) adjust the test honestly options 1 or 2 are the most realistic.
:bholley, do you have thoughts here or ideas of who might have interest in helping figure out what is going on and what to do?
Bumping the threshold to 5% on all the perf reftests should be fine. The swings we're looking for with these tests are larger than that, and I don't want to waste anybody's time here.
Created attachment 8861514 [details] [diff] [review] 5% alert threshold for bloom tests
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4da0e269a156 new bloom test on osx is generating too many alerts. r=rwood