Closed
Bug 613953
Opened 14 years ago
Closed 14 years ago
Build / repack uploads should be retried if they fail
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: bhearsum)
References
Details
(Whiteboard: [automation])
Attachments
(2 files, 3 obsolete files)
34.41 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
554 bytes,
patch
|
jhford
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
Regular and release builds and repacks often hit problems uploading to stage (bug 610399).
Instead of dying, we should re-try the upload a few times.
Reporter | ||
Updated•14 years ago
|
Assignee | ||
Comment 1•14 years ago
|
||
Planning to fix this this quarter.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•14 years ago
|
||
The repack part of this was fixed in bug 613970.
Depends on: 613970
Assignee | ||
Comment 3•14 years ago
|
||
This patch adds retry.py to a bunch of places, including (most) uploads. Still doing a bunch of testing on it.
Assignee | ||
Comment 4•14 years ago
|
||
Due to the upload_errors stuff I added in bug 661401 this got a bit more complicated. In cases where we have to retry at least once, but succeed in the end, there will be error messages caught by the log_eval_func and the overall status will get set to RETRY. To workaround that, all of the Retrying* steps will always succeed if the return code is 0, putting the onus on retry.py to exit correctly.
I tested this by using 'iptables -A OUTPUT -d 10.2.71.82 -j REJECT' on mv-moz2-linux-ix-slave01 to cause quick Connection Refused messages. I tested the succeeds-on-first-attempt, succeeds-on-subsequent-attempt, and never-succeeds scenarios. The first two resulted in SUCCESS, the latter in RETRY.
Attachment #537775 -
Attachment is obsolete: true
Attachment #538049 -
Flags: review?(catlee)
Reporter | ||
Updated•14 years ago
|
Attachment #538049 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 5•14 years ago
|
||
Comment on attachment 538049 [details] [diff] [review]
refined version
Landed on the default branch of buildbotcustom.
Attachment #538049 -
Flags: checked-in+
Assignee | ||
Comment 6•14 years ago
|
||
This made it to production.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 7•14 years ago
|
||
Backed out due to posix path to retry.py borking 192 win32 .
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 8•14 years ago
|
||
I can't find any errors like the one you describe on http://tbpl.mozilla.org/?tree=Firefox3.6, can you point me at the specific issue?
Assignee | ||
Updated•14 years ago
|
Attachment #538049 -
Flags: checked-in+ → checked-in-
Assignee | ||
Comment 9•14 years ago
|
||
Found it: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1307745554.1307746489.759.gz&fulltext=1
python /e/builds/moz2_slave/192-w32-unittest/tools/buildfarm/utils/retry.py -s 1 -r 5 python 192-w32-unittest/tools/clobberer/clobberer.py -s tools -t 168 http://build.mozilla.org/clobberer/index.php mozilla-1.9.2 WINNT 5.2 mozilla-1.9.2 unit test 192-w32-unittest mw32-ix-slave14 http://buildbot-master08.build.scj1.mozilla.com:8001/
in dir e:\builds\moz2_slave\192-w32-unittest\.. (timeout 3600 secs)
watching logfiles {}
argv: ['python', '/e/builds/moz2_slave/192-w32-unittest/tools/buildfarm/utils/retry.py', '-s', '1', '-r', '5', 'python', '192-w32-unittest/tools/clobberer/clobberer.py', '-s', 'tools', '-t', '168', 'http://build.mozilla.org/clobberer/index.php', 'mozilla-1.9.2', 'WINNT 5.2 mozilla-1.9.2 unit test', '192-w32-unittest', 'mw32-ix-slave14', u'http://buildbot-master08.build.scj1.mozilla.com:8001/']
python: can't open file '/e/builds/moz2_slave/192-w32-unittest/tools/buildfarm/utils/retry.py': [Errno 2] No such file or directory
program finished with exit code 2
elapsedTime=0.110000
Assignee | ||
Comment 10•14 years ago
|
||
Same as before, except I've added the "pwd -W" toolsdir fix in UnittestBuildFactory, because it uses MercurialCloneCommand (which uses retry.py), and I've turned MozillaClobberer back into a ShellCommand, because toolsdir isn't set properly on Windows in MozillaBuildFactory. I ran some 1.9.2 and m-c builds in staging, including unittests - they worked fine.
Attachment #538926 -
Flags: review?(catlee)
Reporter | ||
Updated•14 years ago
|
Attachment #538926 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•14 years ago
|
Attachment #538049 -
Attachment is obsolete: true
Assignee | ||
Comment 11•14 years ago
|
||
Comment on attachment 538926 [details] [diff] [review]
full patch, with fix for UnittestBuildFactory/MozillaClobber
This is on the default branch again, heading to production later today.
Attachment #538926 -
Flags: checked-in+
Assignee | ||
Comment 12•14 years ago
|
||
Haven't seen any additional fallout.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Comment 13•14 years ago
|
||
this is causing errors with make upload in scratchbox commands.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•14 years ago
|
||
This patch should fix the root problem. I haven't had a chance to run test_masters yet, but I have tested that the list comprehension is valid
>>> [str(x) for x in [1,2,3,'john']]
['1', '2', '3', 'john']
Attachment #539246 -
Flags: review?(bhearsum)
Assignee | ||
Comment 15•14 years ago
|
||
Comment on attachment 539246 [details] [diff] [review]
fix scratchboxcommand fallout
I don't want to take this as a bustage fix. I'm attaching a safer fix.
Assignee | ||
Comment 16•14 years ago
|
||
Attachment #539248 -
Flags: review?(jhford)
Comment 17•14 years ago
|
||
Comment on attachment 539248 [details] [diff] [review]
work around limitations by forcing timeout to be a string
r+ for bustage
Attachment #539248 -
Flags: review?(jhford) → review+
Assignee | ||
Comment 18•14 years ago
|
||
Comment on attachment 539246 [details] [diff] [review]
fix scratchboxcommand fallout
We didn't end up using this patch.
Attachment #539246 -
Attachment is obsolete: true
Attachment #539246 -
Flags: review?(bhearsum)
Assignee | ||
Comment 19•14 years ago
|
||
Comment on attachment 539248 [details] [diff] [review]
work around limitations by forcing timeout to be a string
This was landed a few days ago.
Attachment #539248 -
Flags: checked-in+
Assignee | ||
Comment 20•14 years ago
|
||
This is all done, again. bug 664211
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•