Closed Bug 627310 Opened 13 years ago Closed 10 years ago

log uploader needs to handle missing files gracefully

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: nthomas, Unassigned)

References

Details

(Whiteboard: [buildmasters][logs])

Suppose a test run hangs, or otherwise takes longer than 24 hours to complete the job. When the job does end we try to upload the log into tinderbox-builds/ but the destination directory has now been deleted. This causes an exception on the master (raise CalledProcessError) when log_uploader.py returns exit status 1.

If we keep tinderbox-builds for a longer period this may be a non-issue.
Assignee: nobody → catlee
Hardware: x86 → All
I don't understand how this happens...Doesn't post_upload.py ensure directories are created?
Whiteboard: [buildmasters] → [buildmasters][logs]
Perhaps I misunderstood the issue and the log is disappearing on master side after 4 days of stuck job. Unfortunately I can't think of a recent example of this because slaveduty has been keeping the slaves under control.
I think it would be helpful to turn these into single lines, either in the log uploader or in the exception watcher, so that it's easier to see other errors through this noise.

Longer term, fixing the root causes of hung jobs, and making clean-up smart enough not to delete logs for running jobs, would reduce the frequency of these. I've filed bug 641809 on the latter.
Assignee: catlee → nobody
Bug 691179 cleaned up a bunch of long running jobs which buildbot had failed to terminate promptly. Between the builds starting and the intervention the  master cleanup job had come along and removed the logs, and we hit errors like this when trying to upload the log:

Running [u'/builds/buildbot/build1/bin/python', u'/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py', u'stage.mozilla.org', u'-u', u'ffxbld', u'-i', u'/home/cltbld/.ssh/ffxbld_dsa', u'-b', u'mozilla-central', u'-p', u'linux64-debug', u'--product', u'firefox', u'/builds/buildbot/build1/master/mozilla-central-linux64-debug', u'105']
Traceback (most recent call last):
  File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py", line 210, in <module>
    logfile = formatLog(local_tmpdir, build)
  File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbotcustom/bin/log_uploader.py", line 142, in formatLog
    data = log.getTextWithHeaders()
  File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 341, in getTextWithHeaders
    return "".join(self.getChunks(onlyText=True))
  File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 356, in getChunks
    f = self.getFile()
  File "/builds/buildbot/build1/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/status/builder.py", line 334, in getFile
    return open(self.getFilename(), "r")
IOError: [Errno 2] No such file or directory: '/builds/buildbot/build1/master/mozilla-central-linux64-debug/105-log-get_buildername-output'

The uploader could not crash out if a file for a build step can't be found, or the maintenance script could load the appropriate buildbot file to find out which jobs are in progress (and leave those alone).
Summary: log uploader needs to handle missing directory gracefully → log uploader needs to handle files gracefully
Summary: log uploader needs to handle files gracefully → log uploader needs to handle missing files gracefully
And in the case of try builds, we send 6 emails about a buildbot exception to the poor patch author before giving up. Presumably we are retrying in that situation.
Product: mozilla.org → Release Engineering
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
We hit this again a couple of days ago, but it's pretty infrequent so I'll leave it be.
You need to log in before you can comment on or make changes to this bug.