talos-r4-snow-001 talos-r4-snow-002 talos-r4-snow-003 talos-r4-snow-002:~ cltbld$ /usr/local/bin/screenresolution set 1600x1200x32 2011-12-23 13:51:53.181 screenresolution[355:903] starting screenresolution argv=/usr/local/bin/screenresolution set 1600x1200x32 2011-12-23 13:51:53.189 screenresolution[355:903] Error: mode 1600x1200x32 not available on display 0 err: Could not find server : getaddrinfo: nodename nor servname provided, or not known err: //Node[talos-r4-snow-001]/talos_osx_rev4/File[/Users/cltbld/.bash_profile]: Failed to retrieve current state of resource: Could not find server staging-puppet.build.mozilla.org Could not describe /staging/darwin10-i386/test/Users/cltbld/.bash_profile: Could not find server staging-puppet.build.mozilla.org at /etc/puppet/manifests/os/talos_osx_rev4.pp:74
this is happening because of all of the slaves hammering the puppet master at the same time. Ben determined this while trying to restart some linux slaves and found that out. we are killing puppetd on the slaves to let it pass on to the buildbot step
Assignee: nobody → bear
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Priority: -- → P3
Resolution: --- → FIXED
These slaves were actually giving me trouble for days. Perhaps snow-001 is related to what you mention. They are staging slaves and I would like if possible jhford to give me a hand looking at them.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
oh - sorry, I saw that error and jumped at the fact it was like what ben was fighting. /me takes his fingers off of the keyboard for tonight
Actually, they weren't failing because of master load. Somehow, the staging puppet launch daemon for 10.6 on staging-puppet was pointing the slaves at scl-production-puppet. I have fixed this issue, but the real solution is to track our /N/ files in a repository. I have deployed watch_puppet.py to staging-puppet.build.mozilla.org, but its only emailing me for now talos-r4-snow-001: reset bad keys, fixed launch daemon to sync to staging-puppet. talos-r4-snow-002: not registering dongle, see bug 700672 for background. I scheduled a reboot using instructions from bug 700672#c80. talos-r4-snow-003: reset bad keys, fixed launch daemon to sync to staging-puppet. On reboot, started to demonstrate dongle issues so did the delayed reboot. Because these slaves are supposed to attach to preproduction master, which is down, they aren't connecting to buildbot. They are, however, properly syncing with puppet now and should be fine to use for staging runs by locking to a master and rebooting.
Assignee: bear → jhford
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago → 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.