Closed
Bug 808536
Opened 12 years ago
Closed 12 years ago
ScriptFactory hg steps should RETRY on "abort: HTTP Error 500: Internal Server Error"
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: mozilla)
References
(Blocks 1 open bug)
Details
(Whiteboard: [mozharness][sheriff-want])
Attachments
(1 file, 2 obsolete files)
1003 bytes,
patch
|
bhearsum
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
eg: Rev3 Fedora 12 mozilla-inbound debug test marionette on 2012-11-05 04:24:03 PST for push 31784b0d6334 slave: talos-r3-fed-075 https://tbpl.mozilla.org/php/getParsedLog.php?id=16750774&tree=Mozilla-Inbound { ========= Started 'hg clone ...' failed (results: 2, elapsed: 1 secs) (at 2012-11-05 04:24:04.634905) ========= hg clone http://hg.mozilla.org/build/mozharness scripts in dir /home/cltbld/talos-slave/test/. (timeout 1200 secs) watching logfiles {} argv: ['hg', 'clone', 'http://hg.mozilla.org/build/mozharness', 'scripts'] environment: CVS_RSH=ssh DISPLAY=:0.0 G_BROKEN_FILENAMES=1 HISTCONTROL=ignoreboth HISTSIZE=1000 HOME=/home/cltbld HOSTNAME=talos-r3-fed-075.build.mozilla.org LANG=en_US.UTF-8 LESSOPEN=|/usr/bin/lesspipe.sh %s LOGNAME=cltbld MAIL=/var/spool/mail/cltbld PATH=/home/cltbld/bin:/tools/buildbot-0.8.0/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin PWD=/home/cltbld/talos-slave/test SHELL=/bin/bash SHLVL=1 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass TERM=xterm USER=cltbld _=/home/cltbld/bin/python using PTY: False abort: HTTP Error 500: Internal Server Error program finished with exit code 255 elapsedTime=1.282653 ========= Finished 'hg clone ...' failed (results: 2, elapsed: 1 secs) (at 2012-11-05 04:24:05.955716) ========= }
Comment 1•12 years ago
|
||
found in triage.
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Reporter | ||
Updated•12 years ago
|
Whiteboard: [mozharness] → [mozharness][sheriff-want]
Reporter | ||
Comment 2•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16986671&tree=Mozilla-Inbound#error0
Reporter | ||
Updated•12 years ago
|
Reporter | ||
Comment 3•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16986673&tree=Mozilla-Inbound#error0
Reporter | ||
Comment 4•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16986918&tree=Mozilla-Inbound#error0
Assignee | ||
Updated•12 years ago
|
Summary: hg clone http://hg.mozilla.org/build/mozharness should RETRY on "abort: HTTP Error 500: Internal Server Error" → ScriptFactory hg steps should RETRY on "abort: HTTP Error 500: Internal Server Error"
Assignee | ||
Comment 5•12 years ago
|
||
Not as simple as s,ShellCommand,RetryingShellCommand, : [11:27] <catlee> we don't always have retry.py [11:27] <catlee> esp. if the script repo isn't tools
Reporter | ||
Comment 6•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17023592&tree=Mozilla-Inbound
Reporter | ||
Comment 7•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17023393&tree=Mozilla-Inbound
Reporter | ||
Comment 8•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17023129&tree=Mozilla-Inbound
Reporter | ||
Comment 9•12 years ago
|
||
Can we at least make buildbot RETRY using https://hg.mozilla.org/build/buildbotcustom/file/tip/status/errors.py like we do for other "HTTP error .*"?
Reporter | ||
Comment 10•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17023505&tree=Mozilla-Aurora https://tbpl.mozilla.org/php/getParsedLog.php?id=17023387&tree=Mozilla-Aurora
Assignee | ||
Comment 11•12 years ago
|
||
Assignee | ||
Comment 12•12 years ago
|
||
Q: [11:11] <aki> is there a reason we don't use the Mercurial step in ScriptFactory? A: Because we pass 'branch' in as a property to ScriptFactory, and the Mercurial appends this to the repo path, even if you actually want to check out build/tools or build/mozharness and consider the 'branch' property as something for the Mercurial step to ignore. If we want to use the Mercurial step, we need to stop passing in 'branch' and perhaps pass in 'branch_name' or something. Which means anything using ScriptFactory needs to stop relying on 'branch' being named 'branch' and instead look for 'branch_name'. I imagine that won't be a small change; looking to see how large+ugly it might actually be.
Assignee | ||
Comment 13•12 years ago
|
||
( /usr/local/bin/hg clone --verbose --noupdate http://hg.mozilla.org/build/toolsmozilla-central scripts ) Also, I don't know if this will work on the test slaves, many of which do not have hg in their PATH. We might be able to get around that via env manipulation.
Assignee | ||
Comment 14•12 years ago
|
||
(it is retrying the build/toolsmozilla-central clone, however.)
Assignee | ||
Comment 15•12 years ago
|
||
1. This is the current plan for renaming branch -> branch_name and using the Mercurial step: X write test patch to try Mercurial step X try Mercurial step _ write test patch to rename branch to branch_name _ test patch to rename branch to branch_name -- if yes, proceed, if not, ditch Mercurial step and replan _ large scale testing for anything that uses ScriptFactory _ write buildbot-configs patch to add branch_name to properties _ write (script-side) patch(es) to use branch_name instead of branch _ write patch to remove branch from properties _ clean up patch to use Mercurial step _ possibly more testing _ roll out branch_name property _ (wait for reconfig) _ roll out branch_name scripts _ remove branch from properties _ (wait for reconfig) _ roll out Mercurial step _ (wait for reconfig) _ deal with any fallout not found in testing 2. Alternative that isn't as beneficial, but is faster, simpler, and less likely to bork everything: _ Write a ScriptFactoryMercurialCloneCommand a la http://hg.mozilla.org/build/buildbotcustom/file/b03160f50ca5/steps/source.py#l7 , except instead of wrapping with retry.py, just set buildbot RETRY status and bail out. _ test _ roll out _ (wait for reconfig) Trying approach #2.
Assignee | ||
Comment 16•12 years ago
|
||
And I don't even have to do that: http://hg.mozilla.org/build/buildbotcustom/file/b03160f50ca5/process/factory.py#l462 uses MercurialCloneCommand for cloning build/tools, even though you require build/tools to use retry.py. However, it sets retry=False, which means we fall back to only using the log_eval_func of hg_errors http://hg.mozilla.org/build/buildbotcustom/file/b03160f50ca5/status/errors.py#l12 , which already sets RETRY. _ change ShellCommand to MercurialCloneCommand(retry=False) for the two hg steps in ScriptFactory _ test _ roll out _ (wait for reconfig)
Assignee | ||
Comment 17•12 years ago
|
||
Sending r? to :bhearsum since he added http://hg.mozilla.org/build/buildbotcustom/annotate/b03160f50ca5/process/factory.py#l462 for bug 613953.
Assignee: nobody → aki
Attachment #681619 -
Attachment is obsolete: true
Attachment #681747 -
Flags: review?(bhearsum)
Assignee | ||
Comment 18•12 years ago
|
||
I was able to test this by a) setting mozharness_repo_path to something invalid "users/asasaki_mozilla.com/nonexistent" b) adding re.compile('404') to the list of RETRY errors in buildbotcustom/status/errors.py c) kicked off an android l10n nightly and watched it retry over and over til I removed the 404 from the RETRY list and reconfiged, then it went red.
Reporter | ||
Comment 19•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17064124&tree=Firefox
Reporter | ||
Comment 20•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=17064112&tree=Mozilla-Inbound
Comment 21•12 years ago
|
||
Comment on attachment 681747 [details] [diff] [review] use MercurialCloneCommand(retry=False) Review of attachment 681747 [details] [diff] [review]: ----------------------------------------------------------------- ::: process/factory.py @@ +6225,5 @@ > workdir=".", > + haltOnFailure=True, > + retry=False, > + )) > + self.addStep(MercurialCloneCommand( What's the reasoning behind using MercurialCloneCommand instead of ShellCommand? There's no failure mode here that's worth retrying AFAIK.
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #21) > Comment on attachment 681747 [details] [diff] [review] > use MercurialCloneCommand(retry=False) > > Review of attachment 681747 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: process/factory.py > @@ +6225,5 @@ > > workdir=".", > > + haltOnFailure=True, > > + retry=False, > > + )) > > + self.addStep(MercurialCloneCommand( > > What's the reasoning behind using MercurialCloneCommand instead of > ShellCommand? There's no failure mode here that's worth retrying AFAIK. for the update? I can change that back easily if you want.
Assignee | ||
Comment 23•12 years ago
|
||
Attachment #681747 -
Attachment is obsolete: true
Attachment #681747 -
Flags: review?(bhearsum)
Attachment #682068 -
Flags: review?(bhearsum)
Updated•12 years ago
|
Attachment #682068 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 682068 [details] [diff] [review] only MercurialCloneCommand the clone Thanks Ben! http://hg.mozilla.org/build/buildbotcustom/rev/97dd45bbc94a pending reconfig
Attachment #682068 -
Flags: checked-in+
Assignee | ||
Comment 25•12 years ago
|
||
In production.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•