There is only one case which is windows xperf fileIO metrics- otherwise IIRC all other data points are comparable.
With that said, I suspect what is going on here is the default compare to the last 2 days on mozilla-central. here what happens is we often have values change (infra, tooling, tests, browser) and when looking at the range of values on mozilla-central it can be misleading vs a single revision that has a much smaller range of normal. For example, lets say a startup test improved 5%, but your try push doesn't have that fix, so now it looks like you have a 5% regression (or something between 0-5 based on how the data points align).
The reason there is a range on m-c is that the more data points we can collect, the more accurate our detection of a regression/improvement becomes. Typically having 6 data points on either side of a compare will result in enough accuracy to detect almost all changes. We recommend this on try, but cheat a bit in compare view when comparing against m-c as we have a ~24 hour view to get 6 data points.
There are 2 alternatives I see to this given the current state of noisy tests:
- do a before/after try push
- retrigger m-c push to have more data points
- we could be smart here and do this on nightlies and at least give people a better representation- this would reduce the effects of the larger ranges over 2 days, but it still will yield some issues.
- being smarter we could analyze each signature or generated alerts to hint that an improvement/regression might be a side effect of another issue on the tree.
Does this help?