Closed Bug 858040 Opened 11 years ago Closed 11 years ago

reimaged bld-lion-r5 machines can't run buildbot

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P2)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 760093

People

(Reporter: bhearsum, Assigned: coop)

References

Details

Not sure if this is a 100% reproducible problem or not, but we've seen it on bld-lion-r5-018 and 052:
Apr  3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: Error sending notice to nagios (ignored)
Apr  3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]: dyld: Library not loaded: @executable_path/../.Python
Apr  3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]:   Referenced from: /tools/buildbot-0.8.4-pre-moz2/bin/python
Apr  3 07:06:20 bld-lion-r5-052 org.mozilla.build.buildslave[467]:   Reason: image not found
Apr  3 07:06:20 bld-lion-r5-052 com.apple.launchd.peruser.501[179] (org.mozilla.build.buildslave[467]): Exited with code: 251
Apr  3 07:06:20 bld-lion-r5-052 ReportCrash[469]: Saved crash report for python[468] version ??? (???) to /Users/cltbld/Library/Logs/DiagnosticReports/python_2013-04-03-070620_bld-lion-r5-052.crash

http://debugfix.com/2011/11/dyld-library-loaded-executable_path-python/ says:
"You’re trying to use a virtual environment created on a different computer, or you’ve upgraded / reformated your machine and you’re using the virtenv from your backup of the old machine."

It also says the solution is to recreate the virtualenv. This doesn't sound right....makes me think that the machine got the wrong image or something. The reimaging did happen over 2 months ago, let's try again....
Just noticed that a very fresh slave's /tools/buildbot-0.8.4-pre-moz2/bin/python works fine, but it stops working after being puppetized. Possible fallout from bug 602908?
This is an order-of-operations problem with the old puppet WRT the python and buildbot install. My past experience with these lion machines indicates that if you remove the /tools/buildbot* dirs (and corresponding puppet refs on disk) after the newer version of python is installed and then re-run puppet, the buildbot dirs will be re-created with the new python and will work properly.

I can quickly resurrect the affected machines and will have see what the best path forward here is. I don't want to waste too much effort fixing the old puppet, but at the very least I'll write a script that will return a given slave to a working state.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
bld-lion-r5-052 is back taking jobs again. Here's the basic script for fixing this on other machines (if required) while I investigate the puppet options:

#!/bin/bash
# As root
rm -rf /var/db/.puppet_pkgdmg_installed_buildbot*
rm -rf /tools/buildbot*
/usr/bin/puppetd --onetime --no-daemonize --logdest console --server scl3-production-puppet.srv.releng.scl3.mozilla.com
# reboot
(In reply to Ben Hearsum [:bhearsum] from comment #4)
> Can this get documented somewhere on
> https://wiki.mozilla.org/ReleaseEngineering/How_To/
> Set_Up_a_Freshly_Imaged_Slave?

Done.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Re-imaged machines come up with this issue.
They cannot start buildbot until it gets fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Armen Zambrano G. [:armenzg] from comment #6)
> Re-imaged machines come up with this issue.
> They cannot start buildbot until it gets fixed.

Can we just take a new image with the correct buildbot setup?
These will shortly be managed by PuppetAgain.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Release Engineering
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.