Closed Bug 555988 Opened 14 years ago Closed 14 years ago

re-imaged try slaves (mac, linux) aren't working properly with puppet

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
All
task
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lsblakk, Assigned: jabba)

References

Details

(Whiteboard: [waiting on releng])

moz2-linux-slave51 has no buildbot installation on it, neither does the mac slave.


from /var/log/system.log on moz2-darwin9-slave68:

Mar 30 08:08:21 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot-tac.firstrun.com): Ignored this key: UserName
Mar 30 08:08:21 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot.start.slave): Ignored this key: UserName
Mar 30 08:08:21 moz2-darwin9-slave68 ARDAgent [96]: ********ARDAgent Launched********
Mar 30 08:08:21 moz2-darwin9-slave68 ARDAgent [96]: ********ARDAgent Ready********
Mar 30 08:08:22 moz2-darwin9-slave68 buildbot-tac.firstrun.com[91]: /builds/slave/buildbot.tac already exists, not doing anything
Mar 30 08:08:22 moz2-darwin9-slave68 buildbot-tac.firstrun.com[91]: /Users/cltbld/.buildbot.tac.control says not to run, not doing anything
Mar 30 08:08:22 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot.start.slave[92]): posix_spawnp("/tools/buildbot/bin/buildbot", ...): No such file or directory
Mar 30 08:08:22 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot.start.slave[92]): Exited with exit code: 1
Mar 30 08:08:23 moz2-darwin9-slave68 /System/Library/CoreServices/coreservicesd[59]: SFLSharePointsEntry::CreateDSRecord: dsCreateRecordAndOpen(Administrator's Public Folder) returned -14135
Mar 30 08:08:23 moz2-darwin9-slave68 /System/Library/CoreServices/coreservicesd[59]: SFLSharePointsEntry::CreateDSRecord: dsCreateRecordAndOpen(cltbld's Public Folder) returned -14135
Mar 30 08:08:42 moz2-darwin9-slave68 org.nagios.nrpe[128]: launchproxy[128]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 56378
Mar 30 08:09:07 moz2-darwin9-slave68 org.nagios.nrpe[128]: launchproxy[128]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 56556
Mar 30 08:09:18 moz2-darwin9-slave68 kernel[0]: AppleYukon2: 00000000,00000001 sk98osx sky2 -  - sk98osx_sky2::replaceOrCopyPacket tried N times
Mar 30 08:09:29 moz2-darwin9-slave68 org.nagios.nrpe[128]: launchproxy[128]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 56693
Mar 30 08:10:55 moz2-darwin9-slave68 org.nagios.nrpe[223]: launchproxy[223]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 57193
Mar 30 08:14:08 moz2-darwin9-slave68 org.nagios.nrpe[235]: launchproxy[235]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 46060
Mar 30 08:14:32 moz2-darwin9-slave68 org.nagios.nrpe[235]: launchproxy[235]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 46204
Mar 30 08:15:04 moz2-darwin9-slave68 login[246]: USER_PROCESS: 246 ttys000
Mar 30 08:15:54 moz2-darwin9-slave68 org.nagios.nrpe[258]: launchproxy[258]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 46658
Mar 30 08:18:21 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot.start.slave[272]): posix_spawnp("/tools/buildbot/bin/buildbot", ...): No such file or directory
Mar 30 08:18:21 moz2-darwin9-slave68 com.apple.launchd[86] (buildbot.start.slave[272]): Exited with exit code: 1
Mar 30 08:18:44 moz2-darwin9-slave68 org.nagios.nrpe[276]: launchproxy[276]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 49158
Mar 30 08:19:07 moz2-darwin9-slave68 org.nagios.nrpe[276]: launchproxy[276]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 49345
Mar 30 08:19:32 moz2-darwin9-slave68 org.nagios.nrpe[276]: launchproxy[276]: /usr/local/nagios/sbin/nrpe: Connection from: 10.2.71.20 on port: 49493
Mar 30 08:20:09 moz2-darwin9-slave68 sshd[290]: USER_PROCESS: 294 ttys001
log excerpt from moz2-linux-slave51 /var/log/messages:

Mar 30 08:19:09 moz2-linux-slave51 puppetd[2514]: (//Node[moz2-linux-slave51.build.mozilla.org]/staging-buildslave/buildbot/File[/etc/default/buildbot]) Failed to retrieve current state of resource: No specified source was found from /N/centos5/etc/default/buildbot
Mar 30 08:19:09 moz2-linux-slave51 puppetd[2514]: (//Node[moz2-linux-slave51.build.mozilla.org]/staging-buildslave/buildbot/Service[buildbot]) Dependency file[/etc/init.d/buildbot] has 1 failures
Mar 30 08:19:09 moz2-linux-slave51 puppetd[2514]: (//Node[moz2-linux-slave51.build.mozilla.org]/staging-buildslave/buildbot/Service[buildbot]) Dependency file[/etc/init.d/buildbot-tac] has 1 failures
Mar 30 08:19:09 moz2-linux-slave51 puppetd[2514]: (//Node[moz2-linux-slave51.build.mozilla.org]/staging-buildslave/buildbot/Service[buildbot]) Dependency file[/etc/default/buildbot] has 1 failures
Mar 30 08:19:09 moz2-linux-slave51 puppetd[2514]: (//Node[moz2-linux-slave51.build.mozilla.org]/staging-buildslave/buildbot/Service[buildbot]) Skipping because of failed dependencies
The problem on the mac slave is that it can't mount /N to get at the files to deploy.
Should this be an IT bug, as previous issues with mounting /N were traced to firewall issues?
Can somebody please verify that the firewall is set up correctly for these slaves?
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Derek, can you check the firewall to see if these hosts have the access they need?
Assignee: server-ops → dmoore
Hey there - any news on if these hosts are connecting properly through the firewalls?
Were these slaves moved out of the sandbox network? (Used to be sm-try slaves)
It's been a couple of weeks - still can't use these slaves since they are not synching properly with puppet.

These slaves were originally in the sandbox network and should not be in build.mozilla.org - please confirm that they are out of the sandbox and check any firewall settings for them to be able to mount /N for puppet access.
Severity: normal → major
Correction:
> These slaves were originally in the sandbox network and should NOW be in
> build.mozilla.org
I had issues with darwin slaves a couple of weeks ago that were not able to reach the mount shared drive.
IT please see this comment from bug 555790 to see if the same thing is happening here:

(In reply to comment #6)
> Fixed by updating static routes on the NFS server to reflect the new Castro
> build netmask.
Assignee: dmoore → server-ops
Assignee: server-ops → dmoore
(In reply to comment #9)
> It's been a couple of weeks - still can't use these slaves since they are not
> synching properly with puppet.
> 
> These slaves were originally in the sandbox network and should not be in
> build.mozilla.org - please confirm that they are out of the sandbox and check
> any firewall settings for them to be able to mount /N for puppet access.

Which hosts are you talking about?

You can actually figure it out on your own - anything in 10.2.76.0/24 is "sandbox".  Anything in 10.2.71.0/24 or 10.2.90.0/23 is the colo build network.  Anything in 10.250.48.0/22 is the castro build network. 

"build.mozilla.org" is ambiguous since hosts under that domain could live in one of three different networks.
Assignee: dmoore → jdow
Whiteboard: [waiting on releng]
Depends on: 560460
No response in a while, closing.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Yes, this was fixed by bug 560460.
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.