Closed Bug 1619552 Opened 6 years ago Closed 6 years ago

generate alerts for mozilla-central perf data for tests that are marked as tier-2

Categories

(Tree Management :: Perfherder, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

I think a first step here is to have a hard coded list of tests that qualify. In this case the main driver is moving android and osx perf tests from tier-1 to tier-2, but we want to make sure we can generate alerts.

Prior discussion was in bug 1436819.

code to enable alerts is here:
https://github.com/mozilla/treeherder/blob/0e181ede9a753a40996500e4dd7c01055779c174/treeherder/model/fixtures/repository.json#L28

  • side note, there are a lot of old repos in that file, at the very least mozilla-inbound should change status from active to onhold.

I see we use the attribute performance_alerts_enabled here:
https://github.com/mozilla/treeherder/blob/2cdc2df4996bbf96dfbd70c8579d793d38bb1fd2/treeherder/etl/perf.py#L204

the code looks like:
if ((signature.should_alert or (signature.should_alert is None and
suite.get('value') is None)) and
datum_created and job.repository.performance_alerts_enabled):
generate_alerts.apply_async(args=[signature.id],
queue='generate_perf_alerts')

at first glance modifying should_alert would work- we could modify this in-tree via the raptor harness:
https://searchfox.org/mozilla-central/source/testing/raptor/raptor/output.py#1373

my understanding is that we do not set suite.shouldAlert, so it is None and doesn't evaluate to True via ingestion. If we add code in raptor like:
shouldAlert = False
if forceAlert:
shouldAlert = True

            suite = {
                "name": test["name"],
                "type": test["type"],
                "extraOptions": test["extra_options"],
                "lowerIsBetter": test["lower_is_better"],
                "unit": test["unit"],
                "alertThreshold": float(test["alert_threshold"]),
                # like suites, subtests are identified by names
                "subtests": {},
                "shouldAlert": shouldAlert,
            }

then I could see forceAlert getting set as a commandline parameter when we launch the harness:
raptor --test <tp6...> --tier 2

I think in addition we need repo or branch and have this passed in from taskcluster as well. this would then look like:
raptor --test <tp6...> --tier 2 --branch mozilla-central

in parsing command line args:

avoid try server alerts and limit tier-2 alerts to m-c only

if tier == 2 and branch == mozilla-central:
forceAlert = True

This might not be the ideal solution, but I imagine it could get a conversation started.

As we are looking to reduce bitbar capacity in the next 4 weeks, I would like to see this get in the queue for March.

:davehunt, is there more information I need to add here?

Flags: needinfo?(dave.hunt)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

This might not be the ideal solution, but I imagine it could get a conversation started.

I was thinking that the test manifest would indicate the tier, and that Perfherder would maintain the logic for which repositories/tiers would generate alerts. I'd like to involve :igoldan in this discussion. Perhaps we could schedule a meeting to discuss the implementation.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #1)

:davehunt, is there more information I need to add here?

I feel we're still early in working out what our perf test tiers should be, and would like to gather additional feedback before pushing ahead. We're currently challenged on one of our Perfherder related OKRs so it's unlikely that we'll be able to give this much attention before March.

What are we planning to do to help the sheriffs identify regressions that are downstream from autoland? I imagine these will be the majority of mozilla-central alerts, and there's a good chance that this change will require additional time from the sheriffs.

Flags: needinfo?(dave.hunt)

the risk here is we have to act in March to reduce resources, otherwise it is a pointless exercise.

What concerns are there for comparing alerts to autoland? I would think we would just look on other platforms via perfherder and if there is a related change, associate the alert to the original. If there isn't, it would be retrigger/backfill almost as normally done.

using the tier in the manifest makes sense- we would just need to find a way to get taskcluster configs to recognize that and honor it. If we don't do that, then I would want to use a different word than 'tier', maybe 'level' ?

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

the risk here is we have to act in March to reduce resources, otherwise it is a pointless exercise.

Are we reducing resources to a level that would immediately require this change, or can the reduced pool carry the load of autoland for a limited time? How has reducing the post startup settle time in bug 1602657 impacted our device load?

What concerns are there for comparing alerts to autoland? I would think we would just look on other platforms via perfherder and if there is a related change, associate the alert to the original. If there isn't, it would be retrigger/backfill almost as normally done.

I'd like to get feedback from the sheriffs on this. When doing this with inbound/autoland, Perfherder would plot both of these on the graph so detecting downstream alerts was easier. Could we do something similar with mozilla-central alerts? The UX for Perfherder is not great, and I'm concerned that having to plot the same test from another platform/repository is problematic, especially if we're having to do it frequently.

using the tier in the manifest makes sense- we would just need to find a way to get taskcluster configs to recognize that and honor it. If we don't do that, then I would want to use a different word than 'tier', maybe 'level' ?

I do have some concern with 'tier' being confused with the existing job tiers. I'm okay with using 'rank' or 'level' if it makes it clearer. I don't think we should link the two though.

Flags: needinfo?(igoldan)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #4)

[...]
I'd like to get feedback from the sheriffs on this. When doing this with inbound/autoland, Perfherder would plot both of these on the graph so detecting downstream alerts was easier. Could we do something similar with mozilla-central alerts? The UX for Perfherder is not great, and I'm concerned that having to plot the same test from another platform/repository is problematic, especially if we're having to do it frequently.

We discussed about this last week. We decided not to alert on mozilla-central. Rather run about 25 jobs on Android autoland and OSX.
That way, perf sheriffs keep the ability to backfill + they could potentially be assisted by the backfill bot.

[...]
I do have some concern with 'tier' being confused with the existing job tiers. I'm okay with using 'rank' or 'level' if it makes it clearer. I don't think we should link the two though.

We already agreed on a distinctive naming (the level term), though we may need to rename it again.

Flags: needinfo?(igoldan)

We decided not to alert on mozilla-central but to run tier 2 perf tests at a lower frequency using SETA.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.