1613956 - address noisy build metrics to reduce workload for perf sheriffs

Reporter

Description

•

5 years ago

I had a meeting with Dave Hunt and his team today regarding build metric sherriffing
tldr: The build metrics that the perf sheriffing team looks at are very noisy and the bugs they open for the build team have a very low fix rate. The build size and sccache hit rate in particular are very noisy.
https://treeherder.mozilla.org/perf.html#/health?framework=2
Can we reduce the alert thresholds or make them more meaningful to allow them to sheriff them more effectively? Also, in the case of build times, they mentioned that it's difficult to determine the root cause of the build time alert given there are a number of different AWS instance types that they run on which impact how long the build takes to run.

Notes from the meeting:https://docs.google.com/document/d/1zP4fYxbnChSEoNcswWLMj3SD4KdgfziAq4V1y-GXLK8/edit

Chris Manchester (limited bugmail, email directly)

Comment 1

•

5 years ago

A place to start here would be to up the threshold for alerts on sccache hit rate. This seems to account for most of the nuisance alerts in sccache. I think this is just a matter of applying the fix from bug 1358098 to the hit rate.

With respect to build time noise/instance type issues, I have burned a lot of time trying to track these down myself and completely agree it's an issue. I think a more complete solution here would be to implement a way to automatically aggregate/differentiate across instance types in perfherder. I'm not sure how much work that would be. Related to this, some way to retrigger/backfill against a specific instance type would be very helpful

Short of that we could move to mostly looking at the "plain" builds (builds with no automation steps or sccache) and ensure they only run on a single instance type to increase our confidence in the results. We still care about other build times, but the timings are already very noisy due to sccache, so I don't know how much we should try to depend on them.

Finally, if we can't figure out how to improve the tooling to make these amenable to sheriffs, there are still important regressions that get detected and fixed here (bug 1605110 comes to mind). I might suggest we move to a workflow where sheriffs file build metrics regressions and flag a build peer to investigate rather than attempting to find a specific regressing change. Most regressions in build time are going to end up needing to be assessed by a build peer anyway.

Chris Manchester (limited bugmail, email directly)

Updated

•

5 years ago

Depends on: 1613986

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Updated

•

5 years ago

Depends on: 1614485

Kim Moir [:kmoir] ET

Reporter

Updated

•

5 years ago

Type: task → enhancement

Kim Moir [:kmoir] ET

Reporter

Updated

•

5 years ago

Priority: -- → P2

Gian-Carlo Pascutto [:gcp]

Comment 2

•

4 years ago

Glob, Chris is no longer working on this. Is anyone else following up on these problems? We've seen that the sherrifs are having problems tracking build time regressions because of the amount of noise in build time metrics.

Flags: needinfo?(glob)

:glob ✱

Comment 3

•

4 years ago

This will be raised at the next Build Peers meeting.

Flags: needinfo?(glob) → needinfo?(rstewart)

Ricky Stewart

Comment 4

•

4 years ago

I brought this up at the build peers meeting on Tuesday. Here were the main takeaways from that conversation:

sccache is going to introduce noisiness into build time metrics; when it is working correctly, it will introduce noise (and if the metric is NOT noisy over an extended period of time, that's probably good evidence that sccache is not working correctly). Therefore build time regressions, especially where we intend to find a single regressing push, should primarily be reported on for plain builds where sccache isn't a factor. This was pointed out in comment 1, but I don't see anything suggesting it was ever done.
There's a strong feeling that we should move plain builds over to running on a single instance type to keep that metric as stable as possible as well. Again, mentioned in comment 1, but not implemented yet to my knowledge.
The non-plain build time metrics are still meaningful inasmuch as they can point out trends over time, but the actual difference in build times between between any pair of subsequent builds, or even across a spread of a small set of builds, is more or less meaningless due to the noise mentioned. If we wanted to get automated insights into this data, we would want to do something a little bit more holistic, like tracking average build times over a period of time and triggering when there's a sudden jump in the average or minimum; I don't know if infra is equipped to allow us to define this type of alert or if that would be a feature request. Alternatively, this sort of regression is extremely easy to spot by looking at a graph, so we could migrate to a less automated, more manual process for this. Either way this wouldn't allow you to automatically pinpoint a regressing change, which we're comfortable with (we have enough expertise to diagnose that sort of thing when the existence of a regression is pointed out).
On our end, we suspect that if these alerts are such a persistent concern for the sheriffs and a lot of time seems to be being wasted, then something about the process should probably be refined, because a build peer is usually going to want to evaluate these regressions anyway, and we don't expect much out of the sheriffs besides the basics (like the exact alert that was triggered, a graph, and maybe a range of regressing pushes that has been refined as much as possible/is convenient). Again, mentioned in comment 1 but never followed up on to my knowledge. Importantly though revamping the process isn't something we can do by ourselves so I might benefit from syncing up with someone on this matter directly so we can clarify exactly what the expectations are that we have out of each other. The deliverable from that meeting may end up looking like a new playbook that gives the build peers more ownership and responsibility for diagnosing and resolving regressions.

Flags: needinfo?(rstewart)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 5

•

4 years ago

bebe: please sync up with Ricky on your build_metrics proposals.

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Comment 6

•

4 years ago

Created a meeting for next week

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Comment 7

•

4 years ago

I think we can close this bug as build metrics load was adjusted and everything is at a normal sheriffing load.

Flags: needinfo?(rstewart)

Flags: needinfo?(dave.hunt)

Flags: needinfo?(aionescu)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 8

•

4 years ago

Yes, I believe so. Could you summarise the actions taken here :bebe?

Flags: needinfo?(dave.hunt) → needinfo?(fstrugariu)

Ricky Stewart

Comment 9

•

4 years ago

+1

Flags: needinfo?(rstewart)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 10

•

4 years ago

Agree

Flags: needinfo?(aionescu)

Florin Strugariu [:Bebe]

Comment 11

•

4 years ago

If I remember correctly we disable sherrifing on sccache builds and started to sheriff only “Non sccache build”.

Also we agreed with Ricky that if we have any issues sherrifing build metrics or identifying the culprits we can tag his team and open up a conversation.

You can find more information on the reaserch in the doc:
https://docs.google.com/document/d/1dyjBrX4PkauulHKHisjVE8GVMzJTegARm2lrQiMZ-8Y/edit#

Status: NEW → RESOLVED

Closed: 4 years ago

Flags: needinfo?(fstrugariu)

Resolution: --- → FIXED

Bugzilla

address noisy build metrics to reduce workload for perf sheriffs

Categories

(Firefox Build System :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: kmoir, Unassigned)

References

(Depends on 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11