Closed Bug 567147 Opened 14 years ago Closed 14 years ago

re-image production try slaves and move out of sandbox and into build network

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
All
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lsblakk, Assigned: phong)

References

Details

Here's the list of win32 slaves.  They have been scheduled for 10hrs of downtime in nagios, please update nagios when they are back online.

try-w32-slave01
try-w32-slave02
try-w32-slave03
try-w32-slave04
try-w32-slave06
try-w32-slave07
try-w32-slave08
try-w32-slave09
try-w32-slave10
try-w32-slave11
try-w32-slave15
try-w32-slave21
try-w32-slave22
try-w32-slave23
try-w32-slave24
try-w32-slave25 
try-w32-slave26
try-w32-slave27
try-w32-slave28
try-w32-slave29
Assignee: server-ops → phong
Try slaves are currently idle and can be turned off - also scheduled for 48 hours flexible nagios downtime

try-linux-slave02
try-linux-slave03
try-linux-slave04 
try-linux-slave07
try-linux-slave09
try-linux-slave11
try-linux-slave12
try-linux-slave17
try-linux-slave18
cancel re-imaging of:

try-linux-slave02
try-linux-slave03

we will be keeping those two in the sandbox on current try for now to carry on with maemo builds.
Here is the list of mac slaves to be re-imaged and moved to build:

try-mac-slave02
try-mac-slave03
try-mac-slave04
try-mac-slave06
try-mac-slave07 
try-mac-slave10
try-mac-slave12
try-mac-slave13
try-mac-slave14
try-mac-slave15
try-mac-slave17 
try-mac-slave18
try-mac-slave19

they have a 96 hour nagios downtime scheduled.
try-w32-slave12
try-w32-slave13
try-w32-slave14
try-w32-slave16
try-w32-slave17
try-w32-slave18
try-w32-slave19
try-w32-slave20

Those all have 80 GB build drives that need to be reduced to 30 GB.
(In reply to comment #1)
> Try slaves are currently idle and can be turned off - also scheduled for 48
> hours flexible nagios downtime
> 
> try-linux-slave02
> try-linux-slave03
> try-linux-slave04 
> try-linux-slave07
> try-linux-slave09
> try-linux-slave11
> try-linux-slave12
> try-linux-slave17
> try-linux-slave18

These are done.  I will come back and update nagios, but the should be online and ready.
(In reply to comment #0)
> Here's the list of win32 slaves.  They have been scheduled for 10hrs of
> downtime in nagios, please update nagios when they are back online.
> 
> try-w32-slave01
> try-w32-slave02
> try-w32-slave03
> try-w32-slave04
> try-w32-slave06
> try-w32-slave07
> try-w32-slave08
> try-w32-slave09
> try-w32-slave10
> try-w32-slave11
> try-w32-slave15
> try-w32-slave21

Those are done.
try-w32-slave22
try-w32-slave23
try-w32-slave24
try-w32-slave25

should be online now.
all linux and w32 slaves are done.
Flags: colo-trip+
win32 and linux slaves have been attached to the try master.
phong - try-linux-slave17 says it's the linux64 ref platform - any ideas?
i probably messed up and used the wrong template.  I will recreate it in the morning.
(In reply to comment #3)
> Here is the list of mac slaves to be re-imaged and moved to build:
> 
> try-mac-slave02
> try-mac-slave03
> try-mac-slave04
> try-mac-slave06
> try-mac-slave07 
> try-mac-slave10
> try-mac-slave12
> try-mac-slave13
> try-mac-slave14
> try-mac-slave15
> try-mac-slave17 
> try-mac-slave18
> try-mac-slave19
> 
> they have a 96 hour nagios downtime scheduled.

try-mac-slave13 was already done from a previous batch.  The rest have been re-imaged and updated in nagios.  I will bring these to MPT and rack them.
all done.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Looks like we ended up leaving try-linux-slave02/03 untouched to provide Maemo builds on the old try server. Please restore the DNS (and DHCP?) entries for them. ESX says they currently have
  try-linux-slave02.m.o   10.2.74.46
  try-linux-slave03.m.o   10.2.74.249

Nagios also thinks try-linux-slave03 is failing PING but none of the other tests.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
try-linux-slave02 had the correct IP of 10.2.76.46.
try-linux-slave03 got added back to DHCP and it's IP should be 10.2.76.65
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
try-mac-slave18 thinks that it's try-mac-slave16 - i reset the hostname but it didn't stick, on reboot it came back as 16.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
raising to critical as this is causing slaves to fail out, and hurting our wait times
Severity: normal → critical
scutil --set HostName try-mac-slave18.mozilla.org
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
i meant .build.mozilla.org.
Something's still missing here on some of the machines. For example, try-linux-slave02 can't connect to production-puppet:80.

Can you double check the linux and mac try machines and ensure they have the same access to production-puppet that their moz2-* counterparts do?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See comment #2.  Those 2 are still in the sandbox network.  That's to be expected.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #21)
> See comment #2.  Those 2 are still in the sandbox network.  That's to be
> expected.

They still need access to production-puppet:80.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Do we just need this for the 2 slaves left or all everything in that vlan?
(In reply to comment #23)
> Do we just need this for the 2 slaves left or all everything in that vlan?

What else is left there?
bug 571006 filed for this.
access to puppet granted in bug 571006
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.