Closed Bug 1284253 Opened 8 years ago Closed 6 years ago

Set RETRY status on release l10n timeouts

Categories

(Release Engineering :: Release Automation: Other, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Unassigned)

References

Details

We should set RETRY status on time outs like this
https://tools.taskcluster.net/task-inspector/#9BfL484uSuCjRGDsAhR2lw/0
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-beta-l10n/release-mozilla-beta_firefox_win32_l10n_repack-bm71-build1-build13.txt.gz

19:14:17     INFO - 2016-06-30 19:14:17,989 - Copying ../../dist//firefox-48.0.te.win32.checksums.asc to cache c:\\builds\\moz2_slave\\rel-m-beta_fx_w32_l10n_rpk-000\\build\\signing_cache\gpg\9f0d5c763587feef9af4fd237902906befdcb057
19:14:17     INFO - c:/builds/moz2_slave/rel-m-beta_fx_w32_l10n_rpk-000/build/mozilla-beta/obj-l10n/_virtualenv/Scripts/python.exe -u c:/builds/moz2_slave/rel-m-beta_fx_w32_l10n_rpk-000/build/mozilla-beta/build/upload.py --base-path ../../dist \
19:14:17     INFO - 	--package 'firefox-48.0.te.win32.zip' \
19:14:17     INFO - 	--properties-file ../../dist/mach_build_properties.json \
19:14:17     INFO - 	'../../dist/firefox-48.0.te.win32.zip' '../../dist/install/sea/firefox-48.0.te.win32.installer.exe' '../../dist/update/firefox-48.0.te.win32.complete.mar' '../../dist/win32/xpi/firefox-48.0.te.langpack.xpi'                         '../../dist/install/sea/firefox-48.0.te.win32.installer-stub.exe' \
19:14:17     INFO - 	'../../dist//firefox-48.0.te.win32.checksums' '../../dist//firefox-48.0.te.win32.checksums'.asc

command timed out: 1800 seconds without output running ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/desktop_l10n.py', '--branch-config', 'single_locale/mozilla-beta.py', '--platform-config', 'single_locale/win32.py', '--environment-config', 'single_locale/production.py', '--balrog-config', 'balrog/production.py'], attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1

remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
]
Jordan, do you think that https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#76 is the right place to add another rule for "command timed out: \d+ seconds without output running"?
Flags: needinfo?(jlund)
(In reply to Rail Aliiev [:rail] from comment #1)
> Jordan, do you think that
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> mozilla/building/buildbase.py#76 is the right place to add another rule for
> "command timed out: \d+ seconds without output running"?


how about retrying within mh? If this is a specific mh network based subproc call that is not retrying automatically, that seems like the fix. It should save us lot's of time as it would only retry uploading that one locale's artifacts rather than the entire l10n task.

essentially - passing an `output_timeout` that is < 1800s here[1] and bubble it up to here[2]. Then wrapping the self._make() call with a self.retry() here[3]

[1] https://dxr.mozilla.org/mozilla-central/rev/f378a56b25ce2a2997b263c1857629f3f18d7400/testing/mozharness/scripts/desktop_l10n.py#855
[2] https://dxr.mozilla.org/mozilla-central/rev/f378a56b25ce2a2997b263c1857629f3f18d7400/testing/mozharness/mozharness/base/script.py#1103
[3] https://dxr.mozilla.org/mozilla-central/rev/f378a56b25ce2a2997b263c1857629f3f18d7400/testing/mozharness/scripts/desktop_l10n.py#793
Flags: needinfo?(jlund)
Priority: -- → P4
Release l10n is now on TC as of Fx59+.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.