Closed Bug 869758 Opened 11 years ago Closed 11 years ago

command_runner doesn't always restart cleanly

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned)

Details

sometimes dies with an exception like this:
Traceback (most recent call last):
  File "tools/buildbot-helpers/command_runner.py", line 199, in <module>
    main()
  File "tools/buildbot-helpers/command_runner.py", line 196, in main
    runner.loop()
  File "tools/buildbot-helpers/command_runner.py", line 112, in loop
    self.monitor()
  File "tools/buildbot-helpers/command_runner.py", line 100, in monitor
    self.q.remove(job.item_id)
  File "/builds/buildbot/queue/tools/lib/python/mozilla_buildtools/queuedir.py", line 191, in remove
    os.unlink(os.path.join(self.cur_dir, item_id))
OSError: [Errno 2] No such file or directory: '/dev/shm/queue/commands/cur/1367978378-0-22524RDEZrh'
I suspect this is due to running it with -j4, not due to restarting it
http://hg.mozilla.org/build/tools/rev/b339c1d70d4f seems to fix it

The problem was that with -j1, we would end up in this block of code when waiting for a job to finish:
http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l114

no problems there, a nice simple busy loop.

If -j > 1, then we get into this part of the code while waiting for jobs to finish:
http://hg.mozilla.org/build/tools/file/b339c1d70d4f/buildbot-helpers/command_runner.py#l124

and without pyinotify, we would wait up to 1000s, or until a new job came along to wake us up. we could end up waiting more than 5 minutes, which is enough time for the job files to be cleaned up by various processes.

now we wait only 1 second, so we can go back and touch all the job files we have active.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.