Closed
Bug 811279
Opened 13 years ago
Closed 13 years ago
Download buildstep should cause job to RETRY if wget fails
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: ewong)
References
(Blocks 1 open bug)
Details
(Keywords: sheriffing-P1)
Attachments
(1 file, 7 obsolete files)
1.53 KB,
patch
|
bhearsum
:
review+
ewong
:
feedback+
|
Details | Diff | Splinter Review |
This job should have been buildbot result = RETRY rather than FAILURE.
https://tbpl.mozilla.org/php/getParsedLog.php?id=16985789&tree=Mozilla-Inbound
{
wget: unable to resolve host address `ftp.mozilla.org'
program finished with exit code 1
elapsedTime=15.048786
========= Finished download failed (results: 2, elapsed: 15 secs) (at 2012-11-13 02:06:40.925600) =========
}
![]() |
Reporter | |
Comment 1•13 years ago
|
||
![]() |
Reporter | |
Comment 2•13 years ago
|
||
![]() |
Reporter | |
Comment 3•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103614&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103625&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103575&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103606&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103632&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103586&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103597&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103619&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103615&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=17103613&tree=Mozilla-Inbound
![]() |
Assignee | |
Comment 4•13 years ago
|
||
Attachment #683032 -
Flags: feedback?(bmo)
![]() |
Reporter | |
Comment 5•13 years ago
|
||
Comment on attachment 683032 [details] [diff] [review]
Change download buildstep to RETRY instead of fail.
Review of attachment 683032 [details] [diff] [review]:
-----------------------------------------------------------------
We need to do something similar to what the tegra verify.py step does:
{
log_eval_func=rc_eval_func({0: SUCCESS, None: RETRY})
}
::: steps/misc.py
@@ +154,5 @@
> return self.finished(FAILURE)
>
>
> class DownloadFile(ShellCommand):
> + haltOnFailure = False
This needs to remain True, since we do want to halt the job at this step.
@@ +185,5 @@
> else:
> url = self.url
> except Exception, e:
> self.addCompleteLog("errors", "Automation Error: %s" % str(e))
> + return self.finished(RETRY)
This isn't the failure mode we are hitting, so I think it's best to not turn exceptions into RETRY, or we may wallpaper over later different problems.
Attachment #683032 -
Flags: feedback?(bmo) → feedback-
Comment 6•13 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #5)
> Comment on attachment 683032 [details] [diff] [review]
> Change download buildstep to RETRY instead of fail.
>
> Review of attachment 683032 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> We need to do something similar to what the tegra verify.py step does:
> {
> log_eval_func=rc_eval_func({0: SUCCESS, None: RETRY})
> }
>
> @@ +185,5 @@
> > else:
> > url = self.url
> > except Exception, e:
> > self.addCompleteLog("errors", "Automation Error: %s" % str(e))
> > + return self.finished(RETRY)
>
> This isn't the failure mode we are hitting, so I think it's best to not turn
> exceptions into RETRY, or we may wallpaper over later different problems.
While ed is correct that this is the wrong place to add that, we can't do like the tegras here, we have our own eval function explicitly defined:
http://mxr.mozilla.org/build/source/buildbotcustom/steps/misc.py#204
I think we'd need to not set Retry for any DownloadFile with a "haltOnFailure=False" though, since if we don't consider the job failed, retrying the whole job could be bad.
![]() |
Reporter | |
Comment 7•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #6)
> While ed is correct that this is the wrong place to add that, we can't do
> like the tegras here, we have our own eval function explicitly defined:
>
> http://mxr.mozilla.org/build/source/buildbotcustom/steps/misc.py#204
Ah, so we need something more like...
206 if SUCCESS != superResult:
207 return RETRY
Right?
> I think we'd need to not set Retry for any DownloadFile with a
> "haltOnFailure=False" though, since if we don't consider the job failed,
> retrying the whole job could be bad.
This one doesn't use haltOnFailure=False.
![]() |
Reporter | |
Comment 8•13 years ago
|
||
ewong, are you happy to put together a patch using the approach in comment 7, or would you prefer me to?
![]() |
Assignee | |
Updated•13 years ago
|
Assignee: nobody → ewong
Status: NEW → ASSIGNED
![]() |
Assignee | |
Comment 9•13 years ago
|
||
Attachment #683032 -
Attachment is obsolete: true
Attachment #683858 -
Flags: feedback?(bmo)
![]() |
Assignee | |
Comment 10•13 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #8)
> ewong, are you happy to put together a patch using the approach in comment
> 7, or would you prefer me to?
I'd like to try this.
![]() |
Reporter | |
Comment 11•13 years ago
|
||
Comment on attachment 683858 [details] [diff] [review]
Change Download step to return RETRY instead of superResult. (v2)
LGTM :-)
Attachment #683858 -
Flags: review?(bhearsum)
Attachment #683858 -
Flags: feedback?(bmo)
Attachment #683858 -
Flags: feedback+
Comment 12•13 years ago
|
||
Comment on attachment 683858 [details] [diff] [review]
Change Download step to return RETRY instead of superResult. (v2)
Review of attachment 683858 [details] [diff] [review]:
-----------------------------------------------------------------
Actually, I think we should adjust the wget command instead. Setting RETRY will cause the job to instantly get retried, and because this happens so early in the job, we could end killing an already suffering machine. wget has --tries which I think we should set to at least 5. It also has --waitretry=SECONDS, which should be set to at least 60s, maybe even 120...
Attachment #683858 -
Flags: review?(bhearsum) → review-
![]() |
Assignee | |
Comment 13•13 years ago
|
||
Attachment #683858 -
Attachment is obsolete: true
Attachment #684566 -
Flags: review?(bhearsum)
Comment 14•13 years ago
|
||
Comment on attachment 684566 [details] [diff] [review]
Add '--tries=' and '--waitretry' parameters to wget. (v3)
Review of attachment 684566 [details] [diff] [review]:
-----------------------------------------------------------------
wrong patch
Attachment #684566 -
Flags: review?(bhearsum) → review-
Comment 15•13 years ago
|
||
...if the --tries/--waitretry version of the patch gets accepted, we should forward-dupe Bug 800288 here.
![]() |
Assignee | |
Comment 16•13 years ago
|
||
Attachment #684566 -
Attachment is obsolete: true
Attachment #684573 -
Flags: review?(bhearsum)
![]() |
Assignee | |
Comment 17•13 years ago
|
||
Attachment #684573 -
Attachment is obsolete: true
Attachment #684573 -
Flags: review?(bhearsum)
Attachment #684576 -
Flags: review?(bhearsum)
![]() |
Assignee | |
Comment 18•13 years ago
|
||
Attachment #684576 -
Attachment is obsolete: true
Attachment #684576 -
Flags: review?(bhearsum)
Attachment #684590 -
Flags: review?(bhearsum)
Comment 19•13 years ago
|
||
Comment on attachment 684590 [details] [diff] [review]
Add '--tries=' and '--waitretry' parameters to wget. (v4)
Review of attachment 684590 [details] [diff] [review]:
-----------------------------------------------------------------
::: steps/misc.py
@@ +196,5 @@
> if self.filename_property:
> self.setProperty(self.filename_property,
> os.path.basename(renderedUrl), "DownloadFile")
>
> + self.wget_args += ["--tries=%d" % self.retries, "--waitretry=%d" % self.waitRetry];
No reason to do this at runtime - put it in the constructor instead. r=me with that change.
I looked around various machines and AFAICT all of our wget's support this. I also noticed that wget doesn't retry on DNS failure, 404, or connection refused...
So, I still think we should do this, because it handles time outs, mid-download drops, etc. - and is faster in those cases than retrying the whole job.
Attachment #684590 -
Flags: review?(bhearsum) → review+
Comment 20•13 years ago
|
||
10:57 < bhearsum> edmorley: do you think a fix for https://bugzilla.mozilla.org/show_bug.cgi?id=811279 that doesn't retry upon dns failure, 404, or connection refused
would be useful?
10:58 < edmorley> what does that leave?
10:58 < bhearsum> failures when the connection drops mid-transfer, for one
10:58 < bhearsum> time outs
10:59 < edmorley> ah yeah
10:59 < bhearsum> i was looking at wget a bit more and it turns out it doesn't retry in the three cases i mentioned
10:59 < bhearsum> maybe we should use wget's retry, because it will be faster for cases it does handle....and then also set RETRY
10:59 < bhearsum> (for the other cases)
10:59 < edmorley> I imagine it would still be useful, yeah
11:00 < bhearsum> yeah, ok
11:00 < edmorley> and also until the ftp.m.o dns issues bug is solved, might be best not to obscure them
11:05 < bhearsum> that's true too
![]() |
Assignee | |
Comment 21•13 years ago
|
||
by 'adding to the constructor', this is what you mean, right?
Attachment #684590 -
Attachment is obsolete: true
Attachment #686458 -
Flags: review?(bhearsum)
Comment 22•13 years ago
|
||
Comment on attachment 686458 [details] [diff] [review]
Add '--tries=' and '--waitretry' parameters to wget. (v5)
Review of attachment 686458 [details] [diff] [review]:
-----------------------------------------------------------------
::: steps/misc.py
@@ +173,5 @@
> if wget_args:
> self.wget_args = wget_args
> else:
> self.wget_args = ['--progress=dot:mega']
> + self.wget_args += ["--tries=%d" % self.retries, "--waitretry=%d" % self.waitRetry];
typo, trailing ;
Attachment #686458 -
Flags: feedback+
![]() |
Assignee | |
Comment 23•13 years ago
|
||
Attachment #686458 -
Attachment is obsolete: true
Attachment #686458 -
Flags: review?(bhearsum)
Attachment #686471 -
Flags: review?(bhearsum)
Attachment #686471 -
Flags: feedback+
Comment 24•13 years ago
|
||
Comment on attachment 686471 [details] [diff] [review]
Add '--tries=' and '--waitretry' parameters to wget. (v6)
Review of attachment 686471 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm, feel free to land on the default branch any time.
Attachment #686471 -
Flags: review?(bhearsum) → review+
![]() |
Assignee | |
Comment 25•13 years ago
|
||
Pushed to buildbotcustom:
http://hg.mozilla.org/build/buildbotcustom/rev/046eaf31039d
![]() |
||
Comment 26•13 years ago
|
||
In production since 9:00PT
![]() |
Reporter | |
Updated•13 years ago
|
Keywords: sheriffing-P1
Whiteboard: [sheriff-want]
![]() |
Assignee | |
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•