Closed
Bug 627310
Opened 13 years ago
Closed 10 years ago
log uploader needs to handle missing files gracefully
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: nthomas, Unassigned)
References
Details
(Whiteboard: [buildmasters][logs])
Suppose a test run hangs, or otherwise takes longer than 24 hours to complete the job. When the job does end we try to upload the log into tinderbox-builds/ but the destination directory has now been deleted. This causes an exception on the master (raise CalledProcessError) when log_uploader.py returns exit status 1. If we keep tinderbox-builds for a longer period this may be a non-issue.
Updated•13 years ago
|
Assignee: nobody → catlee
Hardware: x86 → All
Comment 1•13 years ago
|
||
I don't understand how this happens...Doesn't post_upload.py ensure directories are created?
Whiteboard: [buildmasters] → [buildmasters][logs]
Reporter | ||
Comment 2•13 years ago
|
||
Perhaps I misunderstood the issue and the log is disappearing on master side after 4 days of stuck job. Unfortunately I can't think of a recent example of this because slaveduty has been keeping the slaves under control.
Comment 4•13 years ago
|
||
I think it would be helpful to turn these into single lines, either in the log uploader or in the exception watcher, so that it's easier to see other errors through this noise. Longer term, fixing the root causes of hung jobs, and making clean-up smart enough not to delete logs for running jobs, would reduce the frequency of these. I've filed bug 641809 on the latter.
Updated•13 years ago
|
Assignee: catlee → nobody
Reporter | ||
Comment 5•13 years ago
|
||
Bug 691179 cleaned up a bunch of long running jobs which buildbot had failed to terminate promptly. Between the builds starting and the intervention the master cleanup job had come along and removed the logs, and we hit errors like this when trying to upload the log: Running [u'/builds/buildbot/build1/bin/python', u'/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py', u'stage.mozilla.org', u'-u', u'ffxbld', u'-i', u'/home/cltbld/.ssh/ffxbld_dsa', u'-b', u'mozilla-central', u'-p', u'linux64-debug', u'--product', u'firefox', u'/builds/buildbot/build1/master/mozilla-central-linux64-debug', u'105'] Traceback (most recent call last): File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py", line 210, in <module> logfile = formatLog(local_tmpdir, build) File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py", line 142, in formatLog data = log.getTextWithHeaders() File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 341, in getTextWithHeaders return "".join(self.getChunks(onlyText=True)) File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 356, in getChunks f = self.getFile() File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 334, in getFile return open(self.getFilename(), "r") IOError: [Errno 2] No such file or directory: '/builds/buildbot/build1/master/mozilla-central-linux64-debug/105-log-get_buildername-output' The uploader could not crash out if a file for a build step can't be found, or the maintenance script could load the appropriate buildbot file to find out which jobs are in progress (and leave those alone).
Summary: log uploader needs to handle missing directory gracefully → log uploader needs to handle files gracefully
Reporter | ||
Updated•13 years ago
|
Summary: log uploader needs to handle files gracefully → log uploader needs to handle missing files gracefully
Reporter | ||
Comment 6•13 years ago
|
||
And in the case of try builds, we send 6 emails about a buildbot exception to the poor patch author before giving up. Presumably we are retrying in that situation.
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Reporter | ||
Comment 8•10 years ago
|
||
We hit this again a couple of days ago, but it's pretty infrequent so I'll leave it be.
You need to log in
before you can comment on or make changes to this bug.
Description
•