sometimes dies with an exception like this: Traceback (most recent call last): File "tools/buildbot-helpers/command_runner.py", line 199, in <module> main() File "tools/buildbot-helpers/command_runner.py", line 196, in main runner.loop() File "tools/buildbot-helpers/command_runner.py", line 112, in loop self.monitor() File "tools/buildbot-helpers/command_runner.py", line 100, in monitor self.q.remove(job.item_id) File "/builds/buildbot/queue/tools/lib/python/mozilla_buildtools/queuedir.py", line 191, in remove os.unlink(os.path.join(self.cur_dir, item_id)) OSError: [Errno 2] No such file or directory: '/dev/shm/queue/commands/cur/1367978378-0-22524RDEZrh'
I suspect this is due to running it with -j4, not due to restarting it
http://hg.mozilla.org/build/tools/rev/b339c1d70d4f seems to fix it The problem was that with -j1, we would end up in this block of code when waiting for a job to finish: http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l114 no problems there, a nice simple busy loop. If -j > 1, then we get into this part of the code while waiting for jobs to finish: http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l124 and without pyinotify, we would wait up to 1000s, or until a new job came along to wake us up. we could end up waiting more than 5 minutes, which is enough time for the job files to be cleaned up by various processes. now we wait only 1 second, so we can go back and touch all the job files we have active.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.