Closed Bug 793642 Opened 12 years ago Closed 12 years ago

hg "abort: HTTP Error 500: Internal Server Error" should RETRY for mozharness builds/steps too

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mozilla)

References

(Blocks 1 open bug)

Details

(Whiteboard: [mozharness][sheriff-want])

Attachments

(2 files, 3 obsolete files)

This job should have been a retry, rather than:
results: failure (2)

https://tbpl.mozilla.org/php/getParsedLog.php?id=15474942&tree=Firefox
{
========= Started 'mock_mozilla -v ...' failed (results: 2, elapsed: 1 mins, 19 secs) (at 2012-09-24 03:35:21.332596) =========
mock_mozilla -v -r mozilla-centos6-i386 --cwd /builds/slave/m-cen-andrd-ntly --unpriv --shell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" JARSIGNER="/builds/slave/m-cen-andrd-ntly/tools/release/signing/mozpass.py" MOZ_CRASHREPORTER_NO_REPORT="1" IS_NIGHTLY="yes" SYMBOL_SERVER_HOST="symbols1.dmz.phx1.mozilla.com" CCACHE_DIR="/builds/ccache" POST_SYMBOL_UPLOAD_CMD="/usr/local/bin/post-symbol-upload.py" MOZ_SYMBOLS_EXTRA_BUILDID="mozilla-central" SYMBOL_SERVER_SSH_KEY="/home/mock_mozilla/.ssh/ffxbld_dsa" PATH="/tools/jdk6/bin:/opt/local/bin:/tools/python/bin:/tools/buildbot/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/" MOZ_UPDATE_CHANNEL="nightly" CCACHE_BASEDIR="/builds/slave/m-cen-andrd-ntly" TINDERBOX_OUTPUT="1" CCACHE_COMPRESS="1" SYMBOL_SERVER_PATH="/mnt/netapp/breakpad/symbols_mob/" MOZ_OBJDIR="obj-firefox" LC_ALL="C" SYMBOL_SERVER_USER="ffxbld" JAVA_HOME="/tools/jdk6" DISPLAY=":2" CCACHE_UMASK="002" python mozharness/scripts/multil10n.py --config-file multi_locale/mozilla-central_android.json --merge-locales --only-pull-locale-source --only-add-locales --only-package-multi'
 in dir /builds/slave/m-cen-andrd-ntly (timeout 1200 secs)
 watching logfiles {}
 argv: ['mock_mozilla', '-v', '-r', 'mozilla-centos6-i386', '--cwd', '/builds/slave/m-cen-andrd-ntly', '--unpriv', '--shell', '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" JARSIGNER="/builds/slave/m-cen-andrd-ntly/tools/release/signing/mozpass.py" MOZ_CRASHREPORTER_NO_REPORT="1" IS_NIGHTLY="yes" SYMBOL_SERVER_HOST="symbols1.dmz.phx1.mozilla.com" CCACHE_DIR="/builds/ccache" POST_SYMBOL_UPLOAD_CMD="/usr/local/bin/post-symbol-upload.py" MOZ_SYMBOLS_EXTRA_BUILDID="mozilla-central" SYMBOL_SERVER_SSH_KEY="/home/mock_mozilla/.ssh/ffxbld_dsa" PATH="/tools/jdk6/bin:/opt/local/bin:/tools/python/bin:/tools/buildbot/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/" MOZ_UPDATE_CHANNEL="nightly" CCACHE_BASEDIR="/builds/slave/m-cen-andrd-ntly" TINDERBOX_OUTPUT="1" CCACHE_COMPRESS="1" SYMBOL_SERVER_PATH="/mnt/netapp/breakpad/symbols_mob/" MOZ_OBJDIR="obj-firefox" LC_ALL="C" SYMBOL_SERVER_USER="ffxbld" JAVA_HOME="/tools/jdk6" DISPLAY=":2" CCACHE_UMASK="002" python mozharness/scripts/multil10n.py --config-file multi_locale/mozilla-central_android.json --merge-locales --only-pull-locale-source --only-add-locales --only-package-multi']
...
...
06:36:29     INFO - Setting /builds/slave/m-cen-andrd-ntly/l10n-central/it to http://hg.mozilla.org/l10n-central/it.
06:36:29     INFO - Cloning http://hg.mozilla.org/l10n-central/it to /builds/slave/m-cen-andrd-ntly/l10n-central/it.
06:36:29     INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', u'http://hg.mozilla.org/l10n-central/it', u'/builds/slave/m-cen-andrd-ntly/l10n-central/it']
06:36:29     INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/l10n-central/it /builds/slave/m-cen-andrd-ntly/l10n-central/it
06:36:40    ERROR -  abort: HTTP Error 500: Internal Server Error
06:36:40    ERROR - Return code: 255
06:36:40     INFO - Updating /builds/slave/m-cen-andrd-ntly/l10n-central/it.
06:36:40    ERROR - Can't run command ['hg', '--config', 'ui.merge=internal:merge', 'branch'] in non-existent directory /builds/slave/m-cen-andrd-ntly/l10n-central/it!
06:36:40    ERROR - Can't run command ['hg', '--config', 'ui.merge=internal:merge', 'update', '-C'] in non-existent directory /builds/slave/m-cen-andrd-ntly/l10n-central/it!
06:36:40    FATAL - Unable to update /builds/slave/m-cen-andrd-ntly/l10n-central/it!
06:36:40    FATAL - Exiting -1
}
I suspect this isn't directly related to mock, but instead is because these clones are happening inside the mozharness script. I'm not sure where the best place to catch this is. Perhaps adding some error parsing to this step would help:
http://hg.mozilla.org/build/buildbotcustom/file/0bc63f4376bc/process/factory.py#l1717
No longer blocks: 772446
Summary: hg "abort: HTTP Error 500: Internal Server Error" should RETRY for mock too → hg "abort: HTTP Error 500: Internal Server Error" should RETRY for mozharness builds/steps too
Oops, patch in bug 793641.  Dup?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Oops, misread bug. Retry.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Whiteboard: [mozharness]
Switching to the build/tools version of hgtool.py should fix this.
Catlee has a patch to do that that may land shortly.
Assignee: nobody → aki
Blocks: 770960
Blocks: 793022
https://tbpl.mozilla.org/php/getParsedLog.php?id=16819412&tree=Mozilla-Inbound#error0
Whiteboard: [mozharness] → [mozharness][sheriff-want]
Attached patch retry part i (obsolete) — Splinter Review
This is what I've got so far.
It seems to work ok, except we fatal() inside of MercurialVCS which makes us halt during the first try.

I need to either override the fatal() or throw a catchable exception or something of the sort.
Attached patch hg retry: works (obsolete) — Splinter Review
This works.
I need to make sure that everything is either calling VCSScript.pull() or adds num_retries to the vcs_checkout_repos() call.
Attachment #679932 - Attachment is obsolete: true
Testing a b2g unagi nightly atm, and that's gotten to the point of compilation.
I used a users/asasaki_mozilla.com/nonexistent repo to verify retry.

http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness/file/f0e46c8029f8/configs/single_locale/mozilla-central_android.py#l26

Let me know if you're swamped and want me to move the r? .
Attachment #679937 - Attachment is obsolete: true
Attachment #680252 - Flags: review?(catlee)
Comment on attachment 680252 [details] [diff] [review]
hg retry: also update all direct vcs_checkout*() calls

Review of attachment 680252 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/base/vcs/vcsbase.py
@@ +102,5 @@
> +            if self.config.get('repos'):
> +                repos = self.config['repos']
> +            else:
> +                self.info("Pull has nothing to do!")
> +                return

could be written as

repos = repos or self.config.get('repos')
if not repos:
   ...
Attachment #680252 - Flags: review?(catlee) → review+
Comment on attachment 680252 [details] [diff] [review]
hg retry: also update all direct vcs_checkout*() calls

http://hg.mozilla.org/build/mozharness/rev/90dfbcb12d53
Attachment #680252 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Reopening because the retry should be in vcs_checkout() instead of vcs_checkout_repos().

Backed out the above changeset from scripts/b2g_build.py only
http://hg.mozilla.org/build/mozharness/rev/88450f8489f0
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Needs staging.
Current state: everything but b2g_build.py in mozharness should be hg retrying.
This got b2g_build.py retrying in staging, and still retries for mobile_l10n.

http://dev-master01.build.mozilla.org:8052/builders/b2g_mozilla-central_unagi_nightly/builds/5/steps/run_script/logs/stdio
Attachment #680726 - Attachment is obsolete: true
Attachment #680865 - Flags: review?(catlee)
Blocks: 808536
Blocks: 778688
No longer blocks: 808536
Removing bug 793022 from the Blocks: list since we're retrying for all desktop unittest jobs already.
No longer blocks: 793022
Attachment #680865 - Flags: review?(catlee) → review+
Comment on attachment 680865 [details] [diff] [review]
hg retry 2: move retry logic to vcs_checkout()

http://hg.mozilla.org/build/mozharness/rev/1b351aca6e6a
Attachment #680865 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: