Closed Bug 688838 Opened 14 years ago Closed 14 years ago

Repurpose 16 OS X 10.5 mac minis for Thunderbird builds (8 10.5/8 10.6)

Categories

(Infrastructure & Operations :: RelOps: General, task, P3)

x86
macOS
task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhopkins, Assigned: arich)

References

Details

(Whiteboard: [hardware])

Please repurpose eight OS X 10.5 mac minis for use as Thunderbird build slaves as soon as possible. I have a clonezilla image that we can hopefully use to image these machines.
Blocks: 688554
Armen: can you figure out where best to pull these slaves from, please? They can be any rev3 minis, i.e. they don't have to be running 10.5 currently, since they'll be re-imaged anyway. If Amy would prefer these minis to be contiguous or in a particular colo, she should weigh-in on that. We can reassign this bug over to relops once we've decide which slaves to re-purpose, have stopped buildbot on them, and have tagged them as going over to Thunderbird in slavealloc.
Assignee: nobody → armenzg
OS: Linux → Mac OS X
Priority: -- → P3
Whiteboard: [hardware]
I have disabled in slavealloc the following: * moz2-darwin9-slave064 * moz2-darwin9-slave065 * moz2-darwin9-slave066 * moz2-darwin9-slave067 * moz2-darwin9-slave069 * moz2-darwin9-slave070 * moz2-darwin9-slave071 * moz2-darwin9-slave072 * moz2-darwin9-slave073 * moz2-darwin9-slave074 * moz2-darwin9-slave075 NOTE: there is no 68 We will have to remove them from nagios, slavealloc, buildbot-configs and puppet. Feel free to do this at any time.
Assignee: armenzg → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Our new minis are being moved to the MPT colo for simplicity, so these older minis should do the same.
These minis are already in sjc1 and so is the rest of the thunderbird infrastructure. Right now the thunderbird infra is self contained, and we do not router their vlans across the rest of the sjc1 datacenter as far as I understand it. In talking to gozer earlier (also cced on this bug), he said that there's room in the thunderbird rack and on the switch for more mini servers. It would merely be a question of making sure there's enough power there (I've cced mrz for this purpose). If there is not sufficient power in that rack, we can work to try and route the thunderbird vlans across the datacenter, but this would be more complicated. There is also still a question of how to get the thunderbird OS image onto these minis, and perhaps jhopkins can be of some help there.
I spoke with dmoore, and he says we have space and power in the rack right across from the tbird rack if we do not have space in the rack itself. We can make the hardware side of things work.
Please update here when we have a timeframe for when the minis will be ready.
Releng folks, I believe that these (there are 11 listed, not 8, btw) are good to move now, correct (I'll take care of removing them from nagios)?
Ah, I also have no record of the following 3 machines in dns or nagios, so maybe that's where the spurious machines above 8 came in: * moz2-darwin9-slave073 * moz2-darwin9-slave074 * moz2-darwin9-slave075
I've removed the following from nagios: * moz2-darwin9-slave64 * moz2-darwin9-slave65 * moz2-darwin9-slave66 * moz2-darwin9-slave67 * moz2-darwin9-slave69 * moz2-darwin9-slave70 * moz2-darwin9-slave71 * moz2-darwin9-slave72 I've left moz2-darwin9-slave68 in nagios. The following don't exist: * moz2-darwin9-slave73 * moz2-darwin9-slave74 * moz2-darwin9-slave75
Depends on: 689244
Severity: normal → critical
Please move these hand run networking to the momo router. Please also reimage half of these with 10.5 and half with 10.6 using deploystudio (we can walk someone through the reimage).
I have set aside another 8 machines (coop & arr are on the loop): * moz2-darwin9-slave55 - still building * moz2-darwin9-slave56 * moz2-darwin9-slave57 * moz2-darwin9-slave58 * moz2-darwin9-slave59 * moz2-darwin9-slave60 - still building * moz2-darwin9-slave61 - still building * moz2-darwin9-slave62 - still building * moz2-darwin9-slave63 I have disabled all of them from slavealloc, added a note and gracefully shutdown all of them. Except the 4 builds going on everything else can go to be reimaged.
To be clear, this new batch should also be split 50/50 between 10.5 and 10.6. Sorry for the last minute addition.
FYI you listed 9 machines, not 8. Should that be: * moz2-darwin9-slave56 * moz2-darwin9-slave57 * moz2-darwin9-slave58 * moz2-darwin9-slave59 * moz2-darwin9-slave60 - still building * moz2-darwin9-slave61 - still building * moz2-darwin9-slave62 - still building * moz2-darwin9-slave63 ?
moz2-darwin9-slave59 doesn't exist, i.e. it doesn't appear in slavealloc or nagios AFAICT.
The full list of machines should be: * moz2-darwin9-slave55 * moz2-darwin9-slave56 * moz2-darwin9-slave57 * moz2-darwin9-slave58 * moz2-darwin9-slave60 * moz2-darwin9-slave61 * moz2-darwin9-slave62 * moz2-darwin9-slave63 * moz2-darwin9-slave64 * moz2-darwin9-slave65 * moz2-darwin9-slave66 * moz2-darwin9-slave67 * moz2-darwin9-slave69 * moz2-darwin9-slave70 * moz2-darwin9-slave71 * moz2-darwin9-slave72
The first 8 in this bug imaged without error and are racked in/near the momo rack. At this point, we have a power adapter issue which we'll be working to solve tomorrow morning. We'll also reimage the other 8 machines tomorrow morning and move them in/next to the momo rack. Once that's done, we'll cable them up to the momo switch, and the configuration changes that gozer made to momo dns/dhcp should allow them to boot up on the tbird build network. Note that they'll be configured just like firefox build slave minis, so the passwords will be the same, and they will be trying to talk to the releng puppet server. jhopkins is going to get the pws from the releng team and work on getting the tbird builds working on these machines after they're up. A huge thanks to everyone in IT (mrz, phong, jabba, dmoore, ravi, dustin, gozer, et.al.) who's scrambled to make this happen at the last minute.
Assignee: server-ops-releng → arich
Summary: Repurpose 8 OS X 10.5 mac minis for Thunderbird builds → Repurpose 16 OS X 10.5 mac minis for Thunderbird builds (8 10.5/8 10.6)
yes, thanks from the vancouver & TB teams as well.
Internal DNS names/IPs have been allocated for these: ; Network 10.200.80.0 tb2-darwin9-slave55 IN A 55 tb2-darwin9-slave56 IN A 56 tb2-darwin9-slave57 IN A 57 tb2-darwin9-slave58 IN A 58 tb2-darwin9-slave60 IN A 60 tb2-darwin9-slave61 IN A 61 tb2-darwin9-slave62 IN A 62 tb2-darwin9-slave63 IN A 63 tb2-darwin9-slave64 IN A 64 tb2-darwin9-slave65 IN A 65 tb2-darwin9-slave66 IN A 66 tb2-darwin9-slave67 IN A 67 tb2-darwin9-slave69 IN A 69 tb2-darwin9-slave70 IN A 70 tb2-darwin9-slave71 IN A 71 tb2-darwin9-slave72 IN A 72 I kept the names the same (s/moz2/tb2/) to keep things simple.
Networking is ready for them, just hook them up in order on momo-core3 (HP Procurve), starting at port 5. tb2-darwin9-slave55 5 tb2-darwin9-slave56 6 tb2-darwin9-slave57 7 tb2-darwin9-slave58 8 tb2-darwin9-slave60 9 tb2-darwin9-slave61 10 tb2-darwin9-slave62 11 tb2-darwin9-slave63 12 tb2-darwin9-slave64 13 tb2-darwin9-slave65 14 tb2-darwin9-slave66 15 tb2-darwin9-slave67 16 tb2-darwin9-slave69 17 tb2-darwin9-slave70 18 tb2-darwin9-slave71 19 tb2-darwin9-slave72 20 They should get everything via DHCP.
(In reply to Amy Rich [:arich] from comment #17) > A huge thanks to everyone in IT (mrz, phong, jabba, dmoore, ravi, dustin, > gozer, et.al.) who's scrambled to make this happen at the last minute. Seriously! Well executed :D
(In reply to Amy Rich [:arich] from comment #17) > The first 8 in this bug imaged without error and are racked in/near the momo > rack. At this point, we have a power adapter issue which we'll be working > to solve tomorrow morning. We'll also reimage the other 8 machines tomorrow > morning and move them in/next to the momo rack. Once that's done, we'll > cable them up to the momo switch, and the configuration changes that gozer > made to momo dns/dhcp should allow them to boot up on the tbird build > network. Based on comment #20, does IT have all the information from gozer now to get these networked this morning? jhopkins already has login info for these machines, so he's ready to start as soon as these machines are up. > A huge thanks to everyone in IT (mrz, phong, jabba, dmoore, ravi, dustin, > gozer, et.al.) who's scrambled to make this happen at the last minute. Yes, thanks to everyone who has helped out on this firedrill.
(In reply to Chris Cooper [:coop] from comment #22) > Based on comment #20, does IT have all the information from gozer now to get > these networked this morning? jhopkins already has login info for these > machines, so he's ready to start as soon as these machines are up. Yes, that's underway now. As a reminder, inventory will need to be updated when the dust settles. I'm happy to help with that - let's just make sure we have the necessary info (switchports, rack locations).
Here are the MACs host moz2-darwin9-slave55 { hardware ethernet 00:16:cb:af:a1:c0 host moz2-darwin9-slave56 { hardware ethernet 00:16:cb:af:a2:06 host moz2-darwin9-slave57 { hardware ethernet 00:16:cb:af:a1:ec host moz2-darwin9-slave58 { hardware ethernet 00:16:cb:b0:75:66 host moz2-darwin9-slave60 { hardware ethernet 00:16:cb:af:5b:90 host moz2-darwin9-slave61 { hardware ethernet 00:16:cb:af:40:39 host moz2-darwin9-slave62 { hardware ethernet 00:16:cb:af:24:cd host moz2-darwin9-slave63 { hardware ethernet 00:16:cb:af:6c:04 host moz2-darwin9-slave64 { hardware ethernet 00:16:cb:af:24:7f host moz2-darwin9-slave65 { hardware ethernet 00:16:cb:af:71:72 host moz2-darwin9-slave66 { hardware ethernet 00:16:cb:af:9d:d5 host moz2-darwin9-slave67 { hardware ethernet 00:16:cb:af:9d:83 host moz2-darwin9-slave69 { hardware ethernet 00:16:CB:AE:AF:08 host moz2-darwin9-slave70 { hardware ethernet 00:16:CB:AE:26:FF host moz2-darwin9-slave71 { hardware ethernet 00:1F:F3:46:C6:CD host moz2-darwin9-slave72 { hardware ethernet 00:16:CB:AF:6B:FF
As a note, the stickers on these say "tb-dar.." not "tb2-dar..". I'll add a note in inventory to that effect to head off future confusion. If we can note rack position here, that would be great. Adam, when you find these on the switches, if you can add switchport info, that'd also be great. If you don't get a chance, I can go back later to find them, too.
We have two of the above build slaves (one 10.5, one 10.6) networked, configured with minimal changes, and running tryserver builds at the moment. Might be a bit of tweaking yet but things are looking pretty good so far.
Here are the OS layouts. Sorry it's kinda random: tb-darwin9-slave55 Leopard 10.5 tb-darwin0-slave56 Leopard 10.5 tb-darwin9-slave57 Leopard 10.5 tb-darwin9-slave58 Leopard 10.5 tb-darwin9-slave60 Snow Leopard 10.6 tb-darwin9-slave61 Snow Leopard 10.6 tb-darwin9-slave62 Snow Leopard 10.6 tb-darwin9-slave63 Snow Leopard 10.6 tb-darwin9-slave64 Snow Leopard 10.6 tb-darwin9-slave65 Leopard 10.5 tb-darwin9-slave66 Snow Leopard 10.6 tb-darwin9-slave67 Leopard 10.5 tb-darwin9-slave69 Snow Leopard 10.6 tb-darwin9-slave70 Leopard 10.5 tb-darwin9-slave71 Snow Leopard 10.6 tb-darwin9-slave72 Leopard 10.5 All are done and I verified the OS booted up. The hostnames above match the stickers on the minis. I did not set hostnames in the OS on any of them.
jabba and adam finished the network set up on these so they're all reachable from within the tbird network now. Over to gozer/jhopkins for more build magic.
Port information: moz2-darwin9-slave55 0016cb-afa1c0 - asx103-07b:37 moz2-darwin9-slave56 0016cb-afa206 - asx103-07b:35 moz2-darwin9-slave57 0016cb-afa1ec - asx103-07b:33 moz2-darwin9-slave58 0016cb-b07566 - asx103-07a:10 moz2-darwin9-slave60 0016cb-af5b90 - asx103-07a:3 moz2-darwin9-slave61 0016cb-af4039 - asx103-07a:5 moz2-darwin9-slave62 0016cb-af24cd - asx103-07a:7 moz2-darwin9-slave63 0016cb-af6c04 - asx103-07a:9 moz2-darwin9-slave64 0016cb-af247f - ????????????? moz2-darwin9-slave65 0016cb-af7172 - asx103-07a:48 moz2-darwin9-slave66 0016cb-af9dd5 - asx103-07a:43 moz2-darwin9-slave67 0016cb-af9d83 - asx103-07a:45 moz2-darwin9-slave69 0016cb-aeaf08 - asx103-07a:42 moz2-darwin9-slave70 0016cb-ae26ff - asx103-07a:46 moz2-darwin9-slave71 001ff3-46c6cd - asx103-07a:44 moz2-darwin9-slave72 0016cb-af6bff - asx103-07a:47
Inventory updated/verified.
(In reply to Adam Newman from comment #29) > Port information: > moz2-darwin9-slave64 0016cb-af247f - ????????????? This one is plugged into the momo procurve switch, port number 5, down on the 14th floor in the momo rack.
(In reply to Justin Dow [:jabba] from comment #31) > (In reply to Adam Newman from comment #29) > > Port information: > > > moz2-darwin9-slave64 0016cb-af247f - ????????????? > > This one is plugged into the momo procurve switch, port number 5, down on > the 14th floor in the momo rack. Is the plan to leave that one there ?
All hosts are up and on our network, yay! Host tb2-darwin9-slave55.sj.mozillamessaging.com (10.200.80.55) appears to be up. Host tb2-darwin9-slave56.sj.mozillamessaging.com (10.200.80.56) appears to be up. Host tb2-darwin9-slave57.sj.mozillamessaging.com (10.200.80.57) appears to be up. Host tb2-darwin9-slave58.sj.mozillamessaging.com (10.200.80.58) appears to be up. Host tb2-darwin9-slave60.sj.mozillamessaging.com (10.200.80.60) appears to be up. Host tb2-darwin9-slave61.sj.mozillamessaging.com (10.200.80.61) appears to be up. Host tb2-darwin9-slave62.sj.mozillamessaging.com (10.200.80.62) appears to be up. Host tb2-darwin9-slave63.sj.mozillamessaging.com (10.200.80.63) appears to be up. Host tb2-darwin9-slave64.sj.mozillamessaging.com (10.200.80.64) appears to be up. Host tb2-darwin9-slave65.sj.mozillamessaging.com (10.200.80.65) appears to be up. Host tb2-darwin9-slave66.sj.mozillamessaging.com (10.200.80.66) appears to be up. Host tb2-darwin9-slave67.sj.mozillamessaging.com (10.200.80.67) appears to be up. Host tb2-darwin9-slave69.sj.mozillamessaging.com (10.200.80.69) appears to be up. Host tb2-darwin9-slave70.sj.mozillamessaging.com (10.200.80.70) appears to be up. Host tb2-darwin9-slave71.sj.mozillamessaging.com (10.200.80.71) appears to be up. Host tb2-darwin9-slave72.sj.mozillamessaging.com (10.200.80.72) appears to be up. There is a small inconsistency in naming, the 10.6 ones should be called darwin10, but I'll do that at a later time. slave64 and slave65 are currently connected to the try master and running their first builds now. If these turn green, we can move these minis into production and start moving the other ones in as well. It's a bit of a manual process, but not too bad. NOTE: We are changing passwords and nuking firefox keys from there
Inventory updated with new names and a new location for slave64. As adam noted, the switchports were correct.
(In reply to Philippe M. Chiasson (:gozer) from comment #33) > All hosts are up and on our network, yay! ... > slave64 and slave65 are currently connected to the try master and running > their first builds now. If these turn green, we can move these minis into > production and start moving the other ones in as well. ... more sweet progress, thanks. I'll keep my fingers crossed for green builds! :-)
I don't think gozer put his last update in before he was off for the night, but we had a green build on the 10.5 machine: http://build.mozillamessaging.com/buildbot/try/builders/OS%20X%2010.5.2%20try%20build/builds/262 He was going to push more builds try builds into the queue to give them some exercise. If all goes well, he was expecting to be able to put them into the production pool in the morning. I didn't see any comment from him on the 10.6 build before he left for the evening.
Status of the most recent tryserver build on each: 55 - ok - 10.5 56 - ok - 10.5 57 - ok - 10.5 58 - ok - 10.5 65 - ok - 10.5 67 - ok - 10.5 72 - ok - 10.5 60 - fail - 10.6 62 - fail - 10.6 64 - fail - 10.6 66 - fail - 10.6 69 - fail - 10.6 70 - fail - 10.5 61 - no builds - 10.6 63 - no builds - 10.6 71 - no builds - 10.6
I've stopped the buildbot client on the systems listed above that are marked "ok" so that new builds will test the others. Most of the others were failing on yasm being out of date. Turns out there is a puppet recipe to upgrade it to 1.1.0. Got the .dmg from rail and installed it on all out of date build slaves. slave64 was having an auto-login issue so buildbot didn't start automatically. Fixed. slave70 is still broken - it needs a reinstall or fix of mercurial. /opt/local/bin/hg is missing. Copying that file from a couple of other systems didn't work. I've stopped the buildbot client on this build slave. I pushed enough changes to get builds running on all the "failed" or "no builds" slaves above, except slave70. They all seem to have made it to the compile stage so far. Since yasm was out of date, we should do an audit of the required package versions on these systems to make sure there aren't other older dependencies that need upgrading.
(In reply to John Hopkins (:jhopkins) from comment #38) > slave70 is still broken - it needs a reinstall or fix of mercurial. > /opt/local/bin/hg is missing. Copying that file from a couple of other > systems didn't work. I've stopped the buildbot client on this build slave. slave70 mercurial fixed, not sure what went on there, had to rsync /tools/ from another 10.6 host.
Per quick phone call w/jhopkins just now: (In reply to John Hopkins (:jhopkins) from comment #37) > Status of the most recent tryserver build on each: > > 55 - ok - 10.5 > 56 - ok - 10.5 > 57 - ok - 10.5 > 58 - ok - 10.5 > 65 - ok - 10.5 > 67 - ok - 10.5 > 72 - ok - 10.5 These are looking good. We should be able to make a "go/nogo" decision on moving these into Thunderbird production ~1pm PDT today. > 60 - fail - 10.6 > 62 - fail - 10.6 > 64 - fail - 10.6 > 66 - fail - 10.6 > 69 - fail - 10.6 > 70 - fail - 10.5 > 61 - no builds - 10.6 > 63 - no builds - 10.6 > 71 - no builds - 10.6 These machines still being investigated.
gozer: (In reply to John Hopkins (:jhopkins) from comment #38) > I've stopped the buildbot client on the systems listed above that are marked > "ok" so that new builds will test the others. > > Most of the others were failing on yasm being out of date. Turns out there > is a puppet recipe to upgrade it to 1.1.0. Got the .dmg from rail and > installed it on all out of date build slaves. > > slave64 was having an auto-login issue so buildbot didn't start > automatically. Fixed. > > slave70 is still broken - it needs a reinstall or fix of mercurial. > /opt/local/bin/hg is missing. Copying that file from a couple of other > systems didn't work. I've stopped the buildbot client on this build slave. > > I pushed enough changes to get builds running on all the "failed" or "no > builds" slaves above, except slave70. They all seem to have made it to the > compile stage so far. > > Since yasm was out of date, we should do an audit of the required package > versions on these systems to make sure there aren't other older dependencies > that need upgrading. (In reply to Philippe M. Chiasson (:gozer) from comment #39) > (In reply to John Hopkins (:jhopkins) from comment #38) > > slave70 is still broken - it needs a reinstall or fix of mercurial. > > /opt/local/bin/hg is missing. Copying that file from a couple of other > > systems didn't work. I've stopped the buildbot client on this build slave. > > slave70 mercurial fixed, not sure what went on there, had to rsync /tools/ > from another 10.6 host. Its great to see these issues identified and fixed. However, I'm curious - what happened in the imaging/setup process that allowed these machines to be imaged incorrectly like this? What changes to imaging process do we need to make before we are confident that we can image machines successfully next time?
On 10.5, we are happy and have just moved 7 slaves to the production pool, leaving the other one in try: production: tb2-darwin9-slave55 tb2-darwin9-slave56 tb2-darwin9-slave57 tb2-darwin9-slave58 tb2-darwin9-slave65 tb2-darwin9-slave67 tb2-darwin9-slave70 try: tb2-darwin9-slave72
(In reply to John O'Duinn [:joduinn] from comment #41) > gozer: > [...] > > (In reply to Philippe M. Chiasson (:gozer) from comment #39) > > (In reply to John Hopkins (:jhopkins) from comment #38) > > > slave70 is still broken - it needs a reinstall or fix of mercurial. > > > /opt/local/bin/hg is missing. Copying that file from a couple of other > > > systems didn't work. I've stopped the buildbot client on this build slave. > > > > slave70 mercurial fixed, not sure what went on there, had to rsync /tools/ > > from another 10.6 host. > > Its great to see these issues identified and fixed. However, I'm curious - > what happened in the imaging/setup process that allowed these machines to be > imaged incorrectly like this? Except for the mercurial strangeness, it wasn't an imaging problem. Just differences between TB's refplatform and this one. > What changes to imaging process do we need to > make before we are confident that we can image machines successfully next > time? Not sure what happened on slave70, could have been operator error. But what I know is that there was a mercurial symlink under /tools that was pointing to a versionned /tools/hg-n.m.o directory that wasn't there on that box, but was on the others.
(In reply to Philippe M. Chiasson (:gozer) from comment #32) > (In reply to Justin Dow [:jabba] from comment #31) > > (In reply to Adam Newman from comment #29) > > > Port information: > ... > > > > This one is plugged into the momo procurve switch, port number 5, down on > > the 14th floor in the momo rack. > > Is the plan to leave that one there ? Just to clarify, that one *needs* to stay there so I can stick it in the calendar VLAN later.
I've done a software version compare (YVR minis vs. replacement minis) and gone over it with gozer. The only potential 10.6 version issue is that Xcode is version 3.2.2 on YVR minis but 3.2.1 on replacement minis. I'll ask standard8 whether this is a concern. If it is, we'll need an Xcode 3.2.2 .dmg to install from. As gozer mentioned above, we've deployed the 10.5 minis to production tests, however we are holding off on allowing these to do comm-1.9.2 builds until they've soaked for awhile. Once the 10.6 minis are ok'd (re: xcode version above) and have green builds, we'll move these to comm-central builds for awhile to let them soak, then move them to release builds once we're absolutely happy with them (they will not be used for 7.0.1 builds, so that QA can focus on the product changes, without concern for build environment changes). gozer is working through a virtualenv fix and test build is running right now.
Okay, found out why builds are a bit slow on these, they all have only 1GB of RAM, so swapiness ensues... But they are building fine so far!
The 10.5 builders aren't quite right, they are failing some unit tests: http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317290353.1317291834.6648.gz http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317246832.1317248141.16426.gz http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317240584.1317241901.1839.gz#err2 (note the test_smtpPasswordFailure3.js is a random orange). The older minis in Vancouver are passing those tests just fine. And now I look at the results - this looks like a umask issue on those builders, which I believe we've had before.
(In reply to Mark Banner (:standard8) from comment #47) > The 10.5 builders aren't quite right, they are failing some unit tests: > > http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317290353. > 1317291834.6648.gz > http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317246832. > 1317248141.16426.gz > http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1317240584. > 1317241901.1839.gz#err2 > > (note the test_smtpPasswordFailure3.js is a random orange). > > The older minis in Vancouver are passing those tests just fine. > > And now I look at the results - this looks like a umask issue on those > builders, which I believe we've had before. Right on, that was the problem. Umask fixed (umask = 002) on all tb2-darwin* slaves Also, mozmill (and friends) was missing for older (not in tree) mozmill runs, so fixed that too.
tb2-darwin9-slave70: http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird-Release-Release/1317309283.1317309289.15663.gz abort: couldn't find mercurial libraries in [/opt/local/lib/python2.5/site-packages /opt/local/bin /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python25.zip /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5 /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-darwin /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/plat-mac/lib-scriptpackages /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-tk /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-dynload /Library/Python/2.5/site-packages /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/PyObjC] (check your install and PYTHONPATH) There's still something not quite right there.
(In reply to Mark Banner (:standard8) from comment #49) > tb2-darwin9-slave70: > > http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird-Release-Release/ > 1317309283.1317309289.15663.gz That slave hadn't quite picked up the new PATH right, fixed now. > There's still something not quite right there.
Depends on: 690415
As a gentle reminder, please update inventory to match the darwin10 machines' new hostnames.
(In reply to Dustin J. Mitchell [:dustin] from comment #51) > As a gentle reminder, please update inventory to match the darwin10 > machines' new hostnames. Inventory gently updated. Noted that the labels say "tb-..." and not "tb2-..."
After the RAM upgrades, the 10.6 try slaves have been producing green builds! Last step is moving them into the production pool.
(In reply to Philippe M. Chiasson (:gozer) from comment #48) > Also, mozmill (and friends) was missing for older (not in tree) mozmill > runs, so fixed that too. I was looking at the comm-release and comm-beta mozmill tests for those Macs today - all the new macs are failing running the legacy mozmill installation, so this bit isn't quite right yet.
(In reply to Mark Banner (:standard8) from comment #54) > (In reply to Philippe M. Chiasson (:gozer) from comment #48) > > Also, mozmill (and friends) was missing for older (not in tree) mozmill > > runs, so fixed that too. > > I was looking at the comm-release and comm-beta mozmill tests for those Macs > today - all the new macs are failing running the legacy mozmill > installation, so this bit isn't quite right yet. I mistakenly installed the wrong version of mozmill (1.5.x) on these slaves, rolling it back fixed the problem.
These Mac Minis are on their way from Vancouver to MPT: 2 x OSX 10.6 minis via FedEx air ETA 10:30am Friday Sept. 30 6 x OSX 10.5 minis via FedEx ground ETA 3 business days
I think the work that this bug was intended for is now complete, and I can close it. Agreed? (gozer, jhopkins, coop, joduinn)?
gozer: +1
(In reply to Amy Rich [:arich] [:arr] from comment #57) > I think the work that this bug was intended for is now complete, and I can > close it. Agreed? (gozer, jhopkins, coop, joduinn)? Agreed
per irc w/gozer, jhopkins: 1) the corrected/rolledback mozmill is now giving us green builds. standard8, if you still seeing any problems, let us know. 2) the 10.5 machines are in SJ are running in production, alongside the machines in Vancouver. 3) the 10.6 machines are still being worked on. jhopkins/gozer to give status on what (if anything) left to do before powering off the machines in Vancouver office. Amy: Unclear at this time if there is anything left for IT to do here. I'd like to keep this open to track whatever work remains.
Status is being tracked in bug 688230. The 16 minis have been successfully repurposed so I am closing this bug; confirmed with joduinn.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 690866
Blocks: 690837
Blocks: 700737
Blocks: 717720
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.