Closed Bug 696056 Opened 13 years ago Closed 13 years ago

Some release jobs did not re-trigger after HG failures

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

Details

(Whiteboard: [release-process-improvement][automation][mercurial][retry])

Armen [:armenzg]

Reporter

Description

•

13 years ago

For instance: * Linux64 had an hg timeout rather than an error and did not re-trigger [1] * Linux64 xulrunner had a connection refused [2] * Fennec source had not been clobbered from the previous run [3] (this was a re-triggered job) * Android failed on mozharness multi-locale script [4] * All 6 linux mobile repacks failed with a Bad Gateway [5] All of these jobs required manual re-triggering rather than being retried automatically. This is a very unique problem as HG was extremely bad but it shows that we have some steps going red rather than purple which in general is good to retry. [1] command timed out: 3600 seconds without output, attempting to kill elapsedTime=3600.006155 program finished with exit code -1 [2] abort: error: Connection refused elapsedTime=0.717556 program finished with exit code 255 [3] Process stderr: abort: destination 'mozilla-beta' is not empty program finished with exit code 1 elapsedTime=1870.255260 [4] 11:28:52 ERROR - abort: HTTP Error 500: Internal Server Error 11:28:52 ERROR - Return code: 255 ... 11:28:52 ERROR - CalledProcessError: Command '['hg', 'share', '-U', '/builds/hg-shared/releases/l10n/mozilla-beta/it', '/builds/slave/rel-m-beta-lnx-andrd-bld/mozilla-beta/it']' returned non-zero exit status 255 ... Traceback (most recent call last): File "mozharness/scripts/multil10n.py", line 52, in <module> multi_locale_build.run() File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/script.py", line 509, in run self._possibly_run_method(method_name, error_if_missing=True) File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/script.py", line 480, in _possibly_run_method return getattr(self, method_name)() File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/l10n/multi_locale_build.py", line 189, in pull_locale_source tag_override=c.get('tag_override')) File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/vcs/vcsbase.py", line 102, in vcs_checkout_repos self.vcs_checkout(**kwargs) File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/vcs/vcsbase.py", line 87, in vcs_checkout raise VCSException, "No got_revision from ensure_repo_and_revision()" mozharness.base.errors.VCSException: No got_revision from ensure_repo_and_revision() program finished with exit code 1 elapsedTime=12.626796 [5] abort: HTTP Error 502: Bad Gateway program finished with exit code 255 elapsedTime=0.457380

Chris Cooper [:coop] (he/him)

Comment 1

•

13 years ago

As I've mentioned elsewhere, hg clone jobs that *were* retrying automatically actually made things worse by sustaining and increasing the load. While I'd probably make an exception for release jobs, I'm more in favor of improving our hg infr to handle our required load.

Priority: -- → P3

Summary: Some jobs did not re-trigger after HG failures → Some release jobs did not re-trigger after HG failures

Whiteboard: [release-process-improvement][automation][mercurial][retry]

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

13 years ago

No longer blocks: 627271

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

13 years ago

Blocks: hg-automation

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

13 years ago

Mass move of bugs to Release Automation component.

Component: Release Engineering → Release Engineering: Automation (Release Automation)

bhearsum@mozilla.com (:bhearsum)

Updated

•

13 years ago

No longer blocks: hg-automation

bhearsum@mozilla.com (:bhearsum)

Comment 3

•

13 years ago

Comment #0 talks about a bunch of different failures. I know that some of these are fixed, and that we're in a better place these days w.r.t. recovering from hg failures. Let's file any new issues that come up individually.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Some release jobs did not re-trigger after HG failures

Categories

(Release Engineering :: Release Automation: Other, defect, P3)

Tracking

(Not tracked)

People

(Reporter: armenzg, Unassigned)

References

Details

(Whiteboard: [release-process-improvement][automation][mercurial][retry])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Updated

Comment 3

Updated