Closed Bug 808814 Opened 12 years ago Closed 12 years ago

mozharness download-and-extract should detect, retry, and report download errors

Categories

(Release Engineering :: Applications: MozharnessCore, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

(Whiteboard: [mozharness][unittest])

Attachments

(1 file, 2 obsolete files)

13:45:21     INFO - #####
13:45:21     INFO - ##### Running download-and-extract step.
13:45:21     INFO - #####
13:45:21     INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.tests.zip
13:45:25     INFO - mkdir: /Users/cltbld/talos-slave/test/build
13:51:26     INFO - mkdir: /Users/cltbld/talos-slave/test/build/tests
13:51:26     INFO - Running command: ['unzip', '-o', '/Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip', 'bin/*', 'certs/*', 'modules/*', 'mozbase/*', 'mochitest/*'] in /Users/cltbld/talos-slave/test/build/tests
13:51:26     INFO - Copy/paste: unzip -o /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip bin/* certs/* modules/* mozbase/* mochitest/*
13:51:26     INFO -  Archive:  /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip
13:51:26     INFO -    End-of-central-directory signature not found.  Either this file is not
13:51:26     INFO -    a zipfile, or it constitutes one disk of a multi-part archive.  In the
13:51:26     INFO -    latter case the central directory and zipfile comment will be found on
13:51:26     INFO -    the last disk(s) of this archive.
13:51:26     INFO -  unzip:  cannot find zipfile directory in one of /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip or
13:51:26     INFO -          /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.zip, and cannot find /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.ZIP, period.
13:51:26    ERROR - Return code: 9
13:51:26     INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
13:55:16     INFO - Setting buildbot property build_url to http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
13:55:16     INFO - mkdir: /Users/cltbld/talos-slave/test/properties
13:55:16     INFO - Writing buildbot properties ['build_url'] to /Users/cltbld/talos-slave/test/properties/build_url
13:55:16     INFO - Writing to file /Users/cltbld/talos-slave/test/properties/build_url
13:55:16     INFO - Contents:
13:55:16     INFO -  build_url:http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
Rather more terse but still unhelpful, from downloading a build (50-50 odds whether it was a period of ftp.m.o doing 500/503, or one of the busted-dns periods we're having now where ftp.m.o can't be resolved for a few seconds)

https://tbpl.mozilla.org/php/getParsedLog.php?id=16876494&tree=Cedar

17:07:05     INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip
17:07:26    FATAL - URL Error: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip
17:07:26    FATAL - Exiting -1
I think the HTTP error gives a status code, and the URL error tells you there's a url issue (possibly dns?).

We could potentially do a dns check on the server after the retries fail.
Assignee: nobody → aki
This is my latest test result, from a bogus sendchange:

12:36:06     INFO - Downloading http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip to /home/cltbld/talos-slave/test/build/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:06  WARNING - Try 1: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:06     INFO - Sleeping 5 seconds...
12:36:11  WARNING - Try 2: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:11     INFO - Sleeping 10 seconds...
12:36:21  WARNING - Try 3: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:21     INFO - Sleeping 15 seconds...
12:36:36  WARNING - Try 4: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:36     INFO - Sleeping 20 seconds...
12:36:56  WARNING - Try 5: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:56     INFO - Sleeping 25 seconds...
12:37:21     INFO - Running command: ['nslookup', 'ftpyadda.mozilla.org']
12:37:21     INFO - Copy/paste: nslookup ftpyadda.mozilla.org
12:37:22     INFO -  Server:		10.12.48.19
12:37:22     INFO -  Address:	10.12.48.19#53
12:37:22    ERROR -  ** server can't find ftpyadda.mozilla.org: NXDOMAIN
12:37:22    ERROR -  Either ftpyadda.mozilla.org is an invalid hostname, or DNS is busted.
12:37:22     INFO - Return code: 0
12:37:22    FATAL - Try 6: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:37:22    FATAL - Exiting -1


We'll be sleeping longer (multiples of 20, atm) outside of staging.

Q: should we then set RETRY in buildbot? Or is failing this many download attempts reason to go red?
Flags: needinfo?
This patch is ready for review, unless we want to add a buildbot RETRY status at the end.
Attachment #681350 - Attachment is obsolete: true
Flags: needinfo?
Flags: needinfo?
(In reply to Aki Sasaki [:aki] from comment #6)
> Created attachment 681645 [details] [diff] [review]
> download retry with nslookup, also tear out vestiges of noop
> 
> This patch is ready for review, unless we want to add a buildbot RETRY
> status at the end.

Lets say no buildbot RETRY for now, and see how often it hits us in production :-)
Flags: needinfo?
Now with tooltool retry, which I tested by putting in a bogus tooltool server in the b2g emulator configs in staging.
Attachment #681645 - Attachment is obsolete: true
Attachment #681661 - Flags: review?(rail)
Blocks: 812149
Comment on attachment 681661 [details] [diff] [review]
download retry with nslookup, tooltool retry, also tear out vestiges of noop

Review of attachment 681661 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. I think, it would be great to factor the retry logic or use util.retry form tools.
Attachment #681661 - Flags: review?(rail) → review+
Yeah, I was thinking that we could pass a method, frequency/count, error_level, error_msg, etc. to a helper retry method.  I'm futuring that atm, though.
Comment on attachment 681661 [details] [diff] [review]
download retry with nslookup, tooltool retry, also tear out vestiges of noop

http://hg.mozilla.org/build/mozharness/rev/8854e241ce97

Thanks Rail!
Attachment #681661 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: