Closed
Bug 696056
Opened 13 years ago
Closed 13 years ago
Some release jobs did not re-trigger after HG failures
Categories
(Release Engineering :: Release Automation: Other, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
Details
(Whiteboard: [release-process-improvement][automation][mercurial][retry])
For instance:
* Linux64 had an hg timeout rather than an error and did not re-trigger [1]
* Linux64 xulrunner had a connection refused [2]
* Fennec source had not been clobbered from the previous run [3] (this was a re-triggered job)
* Android failed on mozharness multi-locale script [4]
* All 6 linux mobile repacks failed with a Bad Gateway [5]
All of these jobs required manual re-triggering rather than being retried automatically.
This is a very unique problem as HG was extremely bad but it shows that we have some steps going red rather than purple which in general is good to retry.
[1]
command timed out: 3600 seconds without output, attempting to kill
elapsedTime=3600.006155
program finished with exit code -1
[2]
abort: error: Connection refused
elapsedTime=0.717556
program finished with exit code 255
[3]
Process stderr:
abort: destination 'mozilla-beta' is not empty
program finished with exit code 1
elapsedTime=1870.255260
[4]
11:28:52 ERROR - abort: HTTP Error 500: Internal Server Error
11:28:52 ERROR - Return code: 255
...
11:28:52 ERROR - CalledProcessError: Command '['hg', 'share', '-U', '/builds/hg-shared/releases/l10n/mozilla-beta/it', '/builds/slave/rel-m-beta-lnx-andrd-bld/mozilla-beta/it']' returned non-zero exit status 255
...
Traceback (most recent call last):
File "mozharness/scripts/multil10n.py", line 52, in <module>
multi_locale_build.run()
File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/script.py", line 509, in run
self._possibly_run_method(method_name, error_if_missing=True)
File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/script.py", line 480, in _possibly_run_method
return getattr(self, method_name)()
File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/l10n/multi_locale_build.py", line 189, in pull_locale_source
tag_override=c.get('tag_override'))
File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/vcs/vcsbase.py", line 102, in vcs_checkout_repos
self.vcs_checkout(**kwargs)
File "/builds/slave/rel-m-beta-lnx-andrd-bld/mozharness/mozharness/base/vcs/vcsbase.py", line 87, in vcs_checkout
raise VCSException, "No got_revision from ensure_repo_and_revision()"
mozharness.base.errors.VCSException: No got_revision from ensure_repo_and_revision()
program finished with exit code 1
elapsedTime=12.626796
[5]
abort: HTTP Error 502: Bad Gateway
program finished with exit code 255
elapsedTime=0.457380
Comment 1•13 years ago
|
||
As I've mentioned elsewhere, hg clone jobs that *were* retrying automatically actually made things worse by sustaining and increasing the load.
While I'd probably make an exception for release jobs, I'm more in favor of improving our hg infr to handle our required load.
Priority: -- → P3
Summary: Some jobs did not re-trigger after HG failures → Some release jobs did not re-trigger after HG failures
Whiteboard: [release-process-improvement][automation][mercurial][retry]
Updated•13 years ago
|
Blocks: hg-automation
Comment 2•13 years ago
|
||
Mass move of bugs to Release Automation component.
Component: Release Engineering → Release Engineering: Automation (Release Automation)
Updated•13 years ago
|
No longer blocks: hg-automation
Comment 3•13 years ago
|
||
Comment #0 talks about a bunch of different failures. I know that some of these are fixed, and that we're in a better place these days w.r.t. recovering from hg failures. Let's file any new issues that come up individually.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•