Closed
Bug 869758
Opened 11 years ago
Closed 11 years ago
command_runner doesn't always restart cleanly
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Unassigned)
Details
sometimes dies with an exception like this: Traceback (most recent call last): File "tools/buildbot-helpers/command_runner.py", line 199, in <module> main() File "tools/buildbot-helpers/command_runner.py", line 196, in main runner.loop() File "tools/buildbot-helpers/command_runner.py", line 112, in loop self.monitor() File "tools/buildbot-helpers/command_runner.py", line 100, in monitor self.q.remove(job.item_id) File "/builds/buildbot/queue/tools/lib/python/mozilla_buildtools/queuedir.py", line 191, in remove os.unlink(os.path.join(self.cur_dir, item_id)) OSError: [Errno 2] No such file or directory: '/dev/shm/queue/commands/cur/1367978378-0-22524RDEZrh'
Reporter | ||
Comment 1•11 years ago
|
||
I suspect this is due to running it with -j4, not due to restarting it
Reporter | ||
Comment 2•11 years ago
|
||
http://hg.mozilla.org/build/tools/rev/b339c1d70d4f seems to fix it The problem was that with -j1, we would end up in this block of code when waiting for a job to finish: http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l114 no problems there, a nice simple busy loop. If -j > 1, then we get into this part of the code while waiting for jobs to finish: http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l124 and without pyinotify, we would wait up to 1000s, or until a new job came along to wake us up. we could end up waiting more than 5 minutes, which is enough time for the job files to be cleaned up by various processes. now we wait only 1 second, so we can go back and touch all the job files we have active.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•