Open Bug 1515681 Opened 10 months ago Updated 8 months ago

Perfherder Compare should not allow (or highly warn) when comparing a try build against a mozilla-central build

Categories

(Tree Management :: Perfherder, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: jaws, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: good-first-bug)

Comparisons between talos results from try and mozilla-central builds are not reliable due to the different hardware being used for both build machines.

Perfherder Compare should either not allow these comparisons or put a big warning up front for anyone trying to use it in this way.

This has caused a number of engineers to get incorrect results and thus waste time either trying to fix a perf issue or land then get backed out due to an unexpected perf issue.
Blocks: 1520720
Keywords: good-first-bug
Priority: -- → P3

:jmaher Is there a reason why mozilla-central and try can't be reliably compared?

Flags: needinfo?(jmaher)

There is only one case which is windows xperf fileIO metrics- otherwise IIRC all other data points are comparable.

With that said, I suspect what is going on here is the default compare to the last 2 days on mozilla-central. here what happens is we often have values change (infra, tooling, tests, browser) and when looking at the range of values on mozilla-central it can be misleading vs a single revision that has a much smaller range of normal. For example, lets say a startup test improved 5%, but your try push doesn't have that fix, so now it looks like you have a 5% regression (or something between 0-5 based on how the data points align).

The reason there is a range on m-c is that the more data points we can collect, the more accurate our detection of a regression/improvement becomes. Typically having 6 data points on either side of a compare will result in enough accuracy to detect almost all changes. We recommend this on try, but cheat a bit in compare view when comparing against m-c as we have a ~24 hour view to get 6 data points.

There are 2 alternatives I see to this given the current state of noisy tests:

  1. do a before/after try push
  2. retrigger m-c push to have more data points
  • we could be smart here and do this on nightlies and at least give people a better representation- this would reduce the effects of the larger ranges over 2 days, but it still will yield some issues.
  • being smarter we could analyze each signature or generated alerts to hint that an improvement/regression might be a side effect of another issue on the tree.

Does this help?

Flags: needinfo?(jmaher)

Thanks Joel. It sounds like we'll always get best results when comparing to try with/without the patch, and that maybe a warning as suggested by :jaws would be a good idea.

I wonder if we could provide an option to automatically trigger a secondary try push without the patch(es) applied when performance tests are selected. This could make it much easier for developers wanting to test the impact on performance.

Hello,
I would like to help with this bug.
Would an alert highlighting the unreliability be sufficient?

It does not break the current flow and still generates the comparison but the user would have to aknowledge that the
upcoming results might be unreliable.
The alert would also point to this bug for more information.

Hi Edouard, this part of the application is currently undergoing a conversion from Angular to React (see bug 1509216) and it would be tricky to coordinate. Is there another good-first-bug you'd be interested in working on instead (that is not for Compare views/routes)?

Hello Sarah,

No problem. I will search for another one.
What will happen with this bug then?

Thank you.

This bug is marked as a P3 - so it's not a high priority. You can check back in a few weeks to see if bug 1509216 is resolved/fixed, and if so (and it isn't assigned to someone else) then you can ask to work on it.

Depends on: 1509216
You need to log in before you can comment on or make changes to this bug.