Open Bug 1501520 Opened 6 years ago Updated 6 years ago

[tracking] make robustcheckout more reliable

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: jlund, Unassigned)

References

(Depends on 4 open bugs)

Details

Jordan Lund (:jlund)

Reporter

Description

•

6 years ago

currently, hg robustcheckout requires some of the most manual reruns in release automation. This puts a burden on releaseduty and delays releases. This bug tracks some examples of various ways it fails

Jordan Lund (:jlund)

Reporter

Updated

•

6 years ago

Depends on: 1318173, 1371378

Jordan Lund (:jlund)

Reporter

Comment 1

•

6 years ago

@gps - any guidance or help on these would be greatly appreciated.

Flags: needinfo?(gps)

Kim Moir [:kmoir] ET

Updated

•

6 years ago

Keywords: in-triage

Gregory Szorc [:gps]

Comment 2

•

6 years ago

The "robust" in robustcheckout is supposed to mean something. The extension is a glorified wrapper around Mercurial internals that is supposed to retry (with intelligent backoffs) when intermittent network errors occur. To a large extent, we're successful in doing this. But there are a handful of failures that sill manage to creep in. We're essentially engaged in a game of whack-a-mole with failures. It also doesn't help that robustcheckout.py is vendored into a few different repos. Sometimes we forget to update it everywhere. So e.g. TaskCluster Windows workers may not get all fixes as quickly as mozilla-central. And we may not uplift robustcheckout.py changes to e.g. mozilla-release. I think we should treat all VCS checkout failures like we do any other failure in CI: prioritize fixing problems by the impact of their failures (read: count and disruption to release processes) and chase the long tail as long as we can justify it. Do you have particular failures that care causing significant pain? Bug 1371378 had 1 failure last week and bug 1318173 had 3. These seem pretty low frequency...

Flags: needinfo?(gps)

Keywords: in-triage

Summary: [tracking] make robustcheckout more reliable for release tasks → [tracking] make robustcheckout more reliable

Jordan Lund (:jlund)

Reporter

Updated

•

6 years ago

Depends on: 1504345

Jordan Lund (:jlund)

Reporter

Comment 3

•

6 years ago

(In reply to Gregory Szorc [:gps] from comment #2) > Do you have particular failures that care causing significant pain? Bug > 1371378 had 1 failure last week and bug 1318173 had 3. These seem pretty low > frequency... Seems like these examples haven't been hit in last two betas. Perhaps there were improvements or we were just unlucky. Fine to ignore until it happens again.

Jordan Lund (:jlund)

Reporter

Updated

•

6 years ago

Depends on: 1504346

Jordan Lund (:jlund)

Reporter

Comment 4

•

6 years ago

(In reply to Gregory Szorc [:gps] from comment #2) > The "robust" in robustcheckout is supposed to mean something. The extension > is a glorified wrapper around Mercurial internals that is supposed to retry > (with intelligent backoffs) when intermittent network errors occur. To a > large extent, we're successful in doing this. To be clear, robustcheckout seems to be very successful at this, and it's awesome. However when we do have release automation failures, even if they are more seldom, it puts more operations pressure on Releng and delays the release from getting into QA's hands. Perhaps we could invest some time getting CIDuty to help rerun the intermittents but that only hides the failure under the rug. >I think we should treat all VCS checkout failures like we do any other failure in CI: prioritize fixing problems by the impact of their failures (read: count and disruption to release processes) and chase the long tail as long as we can justify it. I've added some more failures we hit this past week that seem to be new: bug 1504346 and bug 1504345. Release automation tasks are not often starred. Perhaps they should be but that's out of scope here. We do have some historical data in our weekly postmortems but no metrics of frequency (yet): https://github.com/mozilla-releng/releasewarrior-data/tree/master/postmortems

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[tracking] make robustcheckout more reliable

Categories

(Developer Services :: Mercurial: robustcheckout, enhancement)

Tracking

(Not tracked)

People

(Reporter: jlund, Unassigned)

References

(Depends on 4 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Updated

Comment 3

Updated

Comment 4