Open Bug 1635465 Opened 5 years ago Updated 2 years ago

Define a process to periodically assess the usefullness of platform/configurations/suites

Categories

(Testing :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: marco, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

We often don't notice if a platform/configuration/suite offers no value anymore for a long time, until some random event (e.g. somebody goes and checks what's running and notices something that might be avoided).

When we do notice, it's usually hard to figure out a good point of contact to ask if our intuition is correct, also because documentation about them is lacking.

It would be nice if we had a process to handle this, that ensures:

  1. All platforms/configurations/suites are documented and have a point of contact;
  2. Periodically we (or, better, the point of contact) are asked to confirm that the platform/configuration/suite still offers value.
See Also: → 1635826
Depends on: 1635826
Depends on: 1636400
Severity: -- → N/A
Priority: -- → P3

details from bug 1641966:
Some thoughts would flow like so:

developer submits patch, reviewbot runs taskgraph diff job and outputs net new jobs + estimated load
if any new jobs: require developer to include <details> about the new jobs
if count new jobs:

    5: require review from infra team
    10: addition required review from manager
    50: requires VP signature

for <details> I imagine:

business reason for this
justification for platforms/configs chosen
risk of not running this
when will this feature ship
what branches should this run on
owner (person(s), team)
expiration of new jobs
try run with proof of minimal intermittent failures
acknowledgement that you have selected the appropriate tier
** this means we need clear definitions of tiers and everything follows those rules

The more this can be automated (review bot job with failures for new jobs, bugzilla form with <details>, notifications to infra team for new tasks) the better we can make this process successful

documentation is one thing, but I believe we need a more automated process for when someone is affecting the taskgraph.

Version: Version 3 → unspecified
Depends on: 1787327

With bug 1787327 done, and with https://github.com/mozilla/code-review/issues/1475 almost done, I think we can close this.

Joel, do you think there's more we should do?

Flags: needinfo?(jmaher)

I am not sure where bug 1635826 documentation lives. I suspect it is out of date, finding a way to keep it up to date or notifying action needed when out of date would be enough for me to assume this is all good.

Flags: needinfo?(jmaher)

All right, that's tracked in bug 1636400.

You need to log in before you can comment on or make changes to this bug.