Closed
Bug 643506
Opened 14 years ago
Closed 13 years ago
Move 7 automation and tools machines to new Addons Testing Secure buildbot pool
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cmtalbert, Unassigned)
References
Details
(Whiteboard: [buildslaves][slaveduty])
In order to find a way forward until new slaves are purchased for replacement, we are donating 7 machines to the addons testing buildbot pool. These machines will form a secure pool where AMO folks can wire in a "push button" functionality to their site to initiate talos tests at will on these slaves for a particular addon. i.e. User x goes to his addon, clicks the test button. A buildbot master (from bug 617762) will initiate talos tests on these slaves. The 7 minis we are donating are below and can all be found in AFK, atop the black cabinet. Please take only these 7: * tools-r3-fed-002 * tools-r3-fed64-002 * tools-r3-snow-002 * tools-r3-leopard-002 * tools-r3-xp-002 * tools-r3-w7-002 * tools-r3-w764-002 These machines are not active and will be shut down in a few minutes after this bug is filed.
Sent email to bhearsum and zandr with usernames/passwords for these machines.
Updated•14 years ago
|
Assignee: server-ops → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Comment 2•14 years ago
|
||
These should be re-imaged and renamed as follows: addon-r3-fed-001 addon-r3-fed-002 addon-r3-snow-001 addon-r3-snow-002 addon-r3-w7-001 addon-r3-w7-002 addon-r3-w7-003 This gives a reasonable (but small) amount of machines per-platform. We would like these to live on an isolated network with talos-addon-master1. ctalbert tells me that the machines are still just powered down and have not been touched since this bug was originally filed.
Updated•14 years ago
|
Updated•14 years ago
|
Assignee: server-ops-releng → zandr
Status: NEW → UNCONFIRMED
Ever confirmed: false
Comment 3•14 years ago
|
||
These machines will do initial baking in the releng vpn - please talk to dustin for details.
Comment 4•14 years ago
|
||
tools-r3-xp-002 -> addon-r3-fed-001 Port 6 reimage done tools-r3-w7-002 -> addon-r3-fed-002 Port 12 reimage done tools-r3-w764-002 -> addon-r3-snow-001 Port 13 booted into deploystudio tools-r3-fed64-002 -> addon-r3-snow-002 Port 16 booted into deploystudio tools-r3-fed-002 -> addon-r3-w7-001 Port 17 booted into deploystudio tools-r3-leopard-002 -> addon-r3-w7-002 Port 18 booted into deploystudio tools-r3-snow-002 -> addon-r3-w7-003 Port 19 booted into deploystudio
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment 5•14 years ago
|
||
These have been added as addon-r3-<os>-<#>.build.scl1.mozilla.com and imaged appropriately. Over to Releng for baking.
Assignee: zandr → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Updated•14 years ago
|
Whiteboard: [buildslaves][slaveduty]
Updated•14 years ago
|
Priority: -- → P3
(In reply to comment #5) > These have been added as addon-r3-<os>-<#>.build.scl1.mozilla.com and imaged > appropriately. > > Over to Releng for baking. As I am fairly new to this whole OPSI/Puppet stuff, I'm curious: How long does something need to be in the network to achieve "fully baked"?
Comment 7•14 years ago
|
||
Not long - it's almost trivial for puppet, but OPSI always takes some nontrivial TLC (aka, poking with a sharp stick) before it starts working. I'll work on it today or tomorrow, depending on what comes up.
Comment 8•14 years ago
|
||
These two are done: addon-r3-fed-001 addon-r3-fed-002 Each has its root, cltbld, and vnc passwords set to something different from the default - I'll give it to you on request in IRC. These are ready to point at a buildmaster by simply adding a buildbot.tac in ~/talos-slave. It's currently running runslave.py, which tries to access the slave allocator and nagios on every boot, but this won't hurt anything. There are no SSH keys on these machines. Hostnames are set to the above names, unqualified, which I don't expect to cause problems. Sadly, these two: addon-r3-snow-001 addon-r3-snow-002 have the wrong version of Mac OS X on them. I thought we had updated the refimage for the new version, but apparently not. With Nick and Bear's help, I found the updater from Apple, but the update failed. So, updating the refimage is now a blocker for this bug. The -snow-* slaves are halted awaiting re-re-imaging. Nothing's ever easy. I'll work on getting the w7 slaves to talk to OPSI tomorrow.
Depends on: 655199
Comment 9•14 years ago
|
||
Please ensure that you scrub out puppet/opsi before you hand back, so that those talos slaves don't block on reboot waiting on those resources.
Comment 10•14 years ago
|
||
Ah, there are only W7 systems, so OPSI is not involved - a pleasure for everyone, let me tell you. So addon-r3-w7-001 addon-r3-w7-002 addon-r3-w7-003 are finished, as well. I shut down -001 and then thought better of it, since I may have forgotten something that I'll need to touch up. For reference, since I assume you'll want to change the password I've assigned, the instructions for changing passwords are all here: https://intranet.mozilla.org/Build:Farm:Password_Maintenance (for many you need to read the script). So we're just waiting for the re-re-image of the snow machines.
Comment 11•14 years ago
|
||
From the last few comments here, Dustin is obviously doing the work, so pushing to him to avoid confusion in other groups. (In reply to comment #10) > So we're just waiting for the re-re-image of the snow machines. Dustin, can you add the bug# tracking the re-re-imaging as a dep.bug so we can all follow along?
Assignee: nobody → dustin
Comment 12•14 years ago
|
||
The reimage will be on this bug, but the refimage snapshot that's blocking it is bug 656042
Updated•14 years ago
|
Assignee: dustin → zandr
Comment 13•14 years ago
|
||
Which network are these machines now located on? They should be isolated on a network with tools-addon-master1 to protect the rest of our infrastructure.
Comment 14•14 years ago
|
||
They're still on the build network - waiting on bug 656042 to re-image the snow machines (and a few dozen others..)
Comment 15•14 years ago
|
||
w7-001 is powered back up, and snow-00[12] have been imaged. I'm going file a new bug for the network infra required to isolate these machines on a new VLAN.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 16•14 years ago
|
||
Go ahead with the network move. I'll make sure the snow machines are puppeted while you work on that.
Assignee: zandr → dustin
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•14 years ago
|
||
Hmm, dustin@lorentz ~ $ ssh cltbld@addon-r3-snow-001.build.scl1.mozilla.com ssh: connect to host addon-r3-snow-001.build.scl1.mozilla.com port 22: Connection refused dustin@lorentz ~ $ ssh cltbld@addon-r3-snow-002.build.scl1.mozilla.com ssh: connect to host addon-r3-snow-002.build.scl1.mozilla.com port 22: Connection refused
Comment 18•14 years ago
|
||
So let me contradict myself in comment #16 - don't move these to the new network yet, until I get access to the snow-leopard machines and make sure they're configured correctly. And please help me with that :)
Comment 19•14 years ago
|
||
(In reply to comment #18) > And please help me with that :) Sigh. There's an issue with our new SL image. Will resolve tomorrow.
Comment 20•13 years ago
|
||
Does it make sense to split the snow-leopard slaves out of this bug, and move forward with the rest of this?
Comment 21•13 years ago
|
||
If you like, but I have a new SL image and will hit these this afternoon. Comment 19 was written before the win64 firedrills.
Comment 22•13 years ago
|
||
Given that there's network changes to make, better to do that once than try to piece it out. Let's make these the first SL slaves to get reimaged, though.
Comment 23•13 years ago
|
||
Reimaging of the SL machines is complete, and the hostnames have been updated.
Comment 24•13 years ago
|
||
Hooray! The snow machines are now fully puppeted as well. For the record, on the fedora systems, I edited run-puppet-and-buildslave.sh to remove the puppet runs: if false; then ## commented out for addons testing .. puppet stuff fi .. start buildbot On the snow-leopard systems, I removed /Library/LaunchDaemons/com.reductivelabs.puppet.plist and --- /Library/LaunchAgents/org.mozilla.build.buildslave.plist.old 2011-06-06 19:49:48.000000000 -0700 +++ /Library/LaunchAgents/org.mozilla.build.buildslave.plist 2011-06-06 19:50:02.000000000 -0700 @@ -29,14 +29,9 @@ <string>/usr/bin/python</string> <string>/usr/local/bin/runslave.py</string> </array> - <!-- do not run immediately when loaded --> + <!-- run immediately when loaded --> <key>RunAtLoad</key> - <false/> - <!-- but run when puppet (which is running as root) touches this file --> - <key>WatchPaths</key> - <array> - <string>/var/puppet/run/puppet.finished</string> - </array> + <true/> <key>WorkingDirectory</key> <string>/Users/cltbld</string> </dict> Over to zandr for the new netops bug.
Assignee: dustin → zandr
Comment 25•13 years ago
|
||
As a note, we don't currently have a master to connect these slaves to - so we are not blocked by having this sit for a while longer. See bug 617762, bug 659512.
Comment 27•13 years ago
|
||
zandr -- just the network move for these slaves. I'll leave you to interface with netops there, as I don't have the details of the new vlan/network.
Comment 28•13 years ago
|
||
My master has appeared so now I'm interested in the status of the these slaves? Are they still in netops limbo?
Comment 29•13 years ago
|
||
Dustin - the w7 machines are requesting an activation key, I thought that that was built into the image. Otherwise, do you have keys for them?
Comment 30•13 years ago
|
||
I'll take care of activating them. Might not happen today, if not I'll hit them first thing tomorrow.
Comment 31•13 years ago
|
||
(alice, I suspect they needed to be re-activated after moving to the new network)
Comment 34•13 years ago
|
||
addon-r3-w7-001 and 002 have been activated. I sawed off the limb I was sitting on, so 003 will take a site visit.
Comment 35•13 years ago
|
||
moving so I can set colo-trip
Assignee: zandr → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Updated•13 years ago
|
colo-trip: --- → scl1
Comment 36•13 years ago
|
||
and -003 is activated.
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Comment 37•13 years ago
|
||
These are up and working now.
Component: Server Operations: RelEng → Release Engineering
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•