Closed Bug 734764 Opened 12 years ago Closed 12 years ago

Queues need to be created with mode 755 so that nagios can check status

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: catlee)

Details

Attachments

(1 file)

Bug 734728 ended up rebooting four buildbot masters in SCL1, which had nagios alerts like this afterwards
 buildbot-master04.build.scl1:Command Queue is UNKNOWN: Unhandled exception

It turned out that /dev/shm/queue and subdirs had mode 700 on cltbld:cltbld, so nagios can't come along and read the files there. A 'chmod -R 755 /dev/shm/queue' fixes it up, but we should create the dirs with the right mode.
At face value you'd expect this to have already fixed it
 http://hg.mozilla.org/build/tools/rev/d9e1c085fa02

catlee said he'd take a look Monday.
Assignee: nobody → catlee
Priority: -- → P2
very strange...all the tools checkouts I can find on this machine are up-to-date. the only thing out of place I could find was an old lib/python/buildtools/queuedir.pyc reference. maybe some script was picking that up instead of lib/python/mozilla_buildtools/queuedir.py.

I've removed the old .pyc files and the empty directory.
bm04 was jsut rebooted, and this happened again.

/dev/shm/queue is 0700, as are /dev/shm/queue/{commands,pulse}. The subdirectories of {commands,pulse} are 0755 though.
so one way to reproduce this is if you have a umask of 077. then os.makedirs(dirname, 0755) results in 0700 permissions on those directories. I'm not sure if this is actually happening or not, but one change we could make is to explicitly set the umask of buildbot and the queue consumers to 022.
the problem here is that buildbot is running with umask 077, and it starts before the queue processors do, so the directories get created with too-restrictive directories.
Attachment #609394 - Flags: review?(rail)
Attachment #609394 - Flags: review?(rail) → review+
Attachment #609394 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: