Closed Bug 811279 Opened 13 years ago Closed 13 years ago

Download buildstep should cause job to RETRY if wget fails

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: ewong)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

Attachments

(1 file, 7 obsolete files)

This job should have been buildbot result = RETRY rather than FAILURE. https://tbpl.mozilla.org/php/getParsedLog.php?id=16985789&tree=Mozilla-Inbound { wget: unable to resolve host address `ftp.mozilla.org' program finished with exit code 1 elapsedTime=15.048786 ========= Finished download failed (results: 2, elapsed: 15 secs) (at 2012-11-13 02:06:40.925600) ========= }
Attachment #683032 - Flags: feedback?(bmo)
Comment on attachment 683032 [details] [diff] [review] Change download buildstep to RETRY instead of fail. Review of attachment 683032 [details] [diff] [review]: ----------------------------------------------------------------- We need to do something similar to what the tegra verify.py step does: { log_eval_func=rc_eval_func({0: SUCCESS, None: RETRY}) } ::: steps/misc.py @@ +154,5 @@ > return self.finished(FAILURE) > > > class DownloadFile(ShellCommand): > + haltOnFailure = False This needs to remain True, since we do want to halt the job at this step. @@ +185,5 @@ > else: > url = self.url > except Exception, e: > self.addCompleteLog("errors", "Automation Error: %s" % str(e)) > + return self.finished(RETRY) This isn't the failure mode we are hitting, so I think it's best to not turn exceptions into RETRY, or we may wallpaper over later different problems.
Attachment #683032 - Flags: feedback?(bmo) → feedback-
(In reply to Ed Morley [:edmorley UTC+0] from comment #5) > Comment on attachment 683032 [details] [diff] [review] > Change download buildstep to RETRY instead of fail. > > Review of attachment 683032 [details] [diff] [review]: > ----------------------------------------------------------------- > > We need to do something similar to what the tegra verify.py step does: > { > log_eval_func=rc_eval_func({0: SUCCESS, None: RETRY}) > } > > @@ +185,5 @@ > > else: > > url = self.url > > except Exception, e: > > self.addCompleteLog("errors", "Automation Error: %s" % str(e)) > > + return self.finished(RETRY) > > This isn't the failure mode we are hitting, so I think it's best to not turn > exceptions into RETRY, or we may wallpaper over later different problems. While ed is correct that this is the wrong place to add that, we can't do like the tegras here, we have our own eval function explicitly defined: http://mxr.mozilla.org/build/source/buildbotcustom/steps/misc.py#204 I think we'd need to not set Retry for any DownloadFile with a "haltOnFailure=False" though, since if we don't consider the job failed, retrying the whole job could be bad.
(In reply to Justin Wood (:Callek) from comment #6) > While ed is correct that this is the wrong place to add that, we can't do > like the tegras here, we have our own eval function explicitly defined: > > http://mxr.mozilla.org/build/source/buildbotcustom/steps/misc.py#204 Ah, so we need something more like... 206 if SUCCESS != superResult: 207 return RETRY Right? > I think we'd need to not set Retry for any DownloadFile with a > "haltOnFailure=False" though, since if we don't consider the job failed, > retrying the whole job could be bad. This one doesn't use haltOnFailure=False.
ewong, are you happy to put together a patch using the approach in comment 7, or would you prefer me to?
Assignee: nobody → ewong
Status: NEW → ASSIGNED
Attachment #683032 - Attachment is obsolete: true
Attachment #683858 - Flags: feedback?(bmo)
(In reply to Ed Morley [:edmorley UTC+0] from comment #8) > ewong, are you happy to put together a patch using the approach in comment > 7, or would you prefer me to? I'd like to try this.
Comment on attachment 683858 [details] [diff] [review] Change Download step to return RETRY instead of superResult. (v2) LGTM :-)
Attachment #683858 - Flags: review?(bhearsum)
Attachment #683858 - Flags: feedback?(bmo)
Attachment #683858 - Flags: feedback+
Comment on attachment 683858 [details] [diff] [review] Change Download step to return RETRY instead of superResult. (v2) Review of attachment 683858 [details] [diff] [review]: ----------------------------------------------------------------- Actually, I think we should adjust the wget command instead. Setting RETRY will cause the job to instantly get retried, and because this happens so early in the job, we could end killing an already suffering machine. wget has --tries which I think we should set to at least 5. It also has --waitretry=SECONDS, which should be set to at least 60s, maybe even 120...
Attachment #683858 - Flags: review?(bhearsum) → review-
Attachment #683858 - Attachment is obsolete: true
Attachment #684566 - Flags: review?(bhearsum)
Comment on attachment 684566 [details] [diff] [review] Add '--tries=' and '--waitretry' parameters to wget. (v3) Review of attachment 684566 [details] [diff] [review]: ----------------------------------------------------------------- wrong patch
Attachment #684566 - Flags: review?(bhearsum) → review-
...if the --tries/--waitretry version of the patch gets accepted, we should forward-dupe Bug 800288 here.
Attachment #684566 - Attachment is obsolete: true
Attachment #684573 - Flags: review?(bhearsum)
Attachment #684573 - Attachment is obsolete: true
Attachment #684573 - Flags: review?(bhearsum)
Attachment #684576 - Flags: review?(bhearsum)
Attachment #684576 - Attachment is obsolete: true
Attachment #684576 - Flags: review?(bhearsum)
Attachment #684590 - Flags: review?(bhearsum)
Comment on attachment 684590 [details] [diff] [review] Add '--tries=' and '--waitretry' parameters to wget. (v4) Review of attachment 684590 [details] [diff] [review]: ----------------------------------------------------------------- ::: steps/misc.py @@ +196,5 @@ > if self.filename_property: > self.setProperty(self.filename_property, > os.path.basename(renderedUrl), "DownloadFile") > > + self.wget_args += ["--tries=%d" % self.retries, "--waitretry=%d" % self.waitRetry]; No reason to do this at runtime - put it in the constructor instead. r=me with that change. I looked around various machines and AFAICT all of our wget's support this. I also noticed that wget doesn't retry on DNS failure, 404, or connection refused... So, I still think we should do this, because it handles time outs, mid-download drops, etc. - and is faster in those cases than retrying the whole job.
Attachment #684590 - Flags: review?(bhearsum) → review+
10:57 < bhearsum> edmorley: do you think a fix for https://bugzilla.mozilla.org/show_bug.cgi?id=811279 that doesn't retry upon dns failure, 404, or connection refused would be useful? 10:58 < edmorley> what does that leave? 10:58 < bhearsum> failures when the connection drops mid-transfer, for one 10:58 < bhearsum> time outs 10:59 < edmorley> ah yeah 10:59 < bhearsum> i was looking at wget a bit more and it turns out it doesn't retry in the three cases i mentioned 10:59 < bhearsum> maybe we should use wget's retry, because it will be faster for cases it does handle....and then also set RETRY 10:59 < bhearsum> (for the other cases) 10:59 < edmorley> I imagine it would still be useful, yeah 11:00 < bhearsum> yeah, ok 11:00 < edmorley> and also until the ftp.m.o dns issues bug is solved, might be best not to obscure them 11:05 < bhearsum> that's true too
by 'adding to the constructor', this is what you mean, right?
Attachment #684590 - Attachment is obsolete: true
Attachment #686458 - Flags: review?(bhearsum)
Comment on attachment 686458 [details] [diff] [review] Add '--tries=' and '--waitretry' parameters to wget. (v5) Review of attachment 686458 [details] [diff] [review]: ----------------------------------------------------------------- ::: steps/misc.py @@ +173,5 @@ > if wget_args: > self.wget_args = wget_args > else: > self.wget_args = ['--progress=dot:mega'] > + self.wget_args += ["--tries=%d" % self.retries, "--waitretry=%d" % self.waitRetry]; typo, trailing ;
Attachment #686458 - Flags: feedback+
Attachment #686458 - Attachment is obsolete: true
Attachment #686458 - Flags: review?(bhearsum)
Attachment #686471 - Flags: review?(bhearsum)
Attachment #686471 - Flags: feedback+
Blocks: 800288
Comment on attachment 686471 [details] [diff] [review] Add '--tries=' and '--waitretry' parameters to wget. (v6) Review of attachment 686471 [details] [diff] [review]: ----------------------------------------------------------------- lgtm, feel free to land on the default branch any time.
Attachment #686471 - Flags: review?(bhearsum) → review+
In production since 9:00PT
Keywords: sheriffing-P1
Whiteboard: [sheriff-want]
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: