bring up slaves from 3/17 IX repair trip

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: dustin, Assigned: zandr)

Tracking

Details

(Reporter)

Description

7 years ago
The following slaves came back from IX yesterday, and should be racked up now (from bug 596366 comment 102):

linux-ix-slave01:   bug 624371     A1-16072    4620   scl1
linux-ix-slave06:   bug 624210     A1-16077    4625   scl1
linux-ix-slave13:   bug 619624     A1-16084    4632   scl1
linux-ix-slave16:   comment 8      A1-16087    4635   scl1
linux-ix-slave17:   comment 55     A1-16088    4636   scl1
linux-ix-slave33:   bug 620124     A1-16163    4674   scl1
linux-ix-slave34:   comment 58     A1-16164    4675   scl1
linux-ix-slave35:   bug 620124     A1-16165    4676   scl1
linux-ix-slave42:   bug 624207     A1-16172    4773   scl1

linux64-ix-slave04: comment 78     A1-16176    4777   scl1
linux64-ix-slave10: comment 78     A1-16182    4783   scl1
linux64-ix-slave11: comment 78     A1-16183    4784   scl1
linux64-ix-slave12: comment 83     A1-16184    4785   scl1
linux64-ix-slave13: comment 83     A1-16185    4786   scl1
linux64-ix-slave16: comment 78     A1-16188    4789   scl1
linux64-ix-slave40: comment 102    A1-16212    4813   scl1
linux64-ix-slave41: comment 101    A1-16213    4814   scl1

mv-moz2-linux-ix-slave12:          A1-14132    3121   mtv1

w32-ix-slave07:                    A1-16053    4601   mtv1
w32-ix-slave08:     bug 635416#c31 A1-16054    4602   mtv1
w32-ix-slave23:     comment 51     A1-16069    4617   scl1
w32-ix-slave41:     bug 615744     A1-16104    4705   scl1

w64-ix-slave02:     bug 638814     A1-16107    4708   scl1
w64-ix-slave06:     bug 639628#c22 A1-16111    4712   scl1
w64-ix-slave07:     comment 78     A1-16112    4713   scl1
w64-ix-slave11:     comment 78     A1-16116    4717   scl1

This bug will track bringing them back into production, including adding them to slavealloc.
(Reporter)

Comment 1

7 years ago
It looks like these aren't re-imaged yet.  I don't know how to do that via IPMI, so I guess that's step one for someone else - zandr?
I am working on reimaging the linux-ix-slave* machines now, minus the ones that are not responding via IPMI (bug 651178)

The w64 reimages are waiting on the new ref image in bug 645024.

The linux64 images are waiting on the new ref image in bug 648342.
Assignee: server-ops-releng → arich
Depends on: 645024, 648342, 651178
The following servers are not yet back from IX:

** linux-ix-slave01 
** linux-ix-slave17   
** linux-ix-slave33    
** linux64-ix-slave10  
** linux64-ix-slave16   
** linux64-ix-slave41 
** w32-ix-slave23     
** w32-ix-slave41    
** w64-ix-slave07      



The following hosts that have been reimaged have also had their hostname set.  They have not been re-added back into puppet, per Dustin.

Built from linux-ix-ref-20110204:

linux-ix-slave06
linux-ix-slave16
linux-ix-slave35
linux-ix-slave42


linux-ix-slave13 is installing slowly and had block errors the first time I tried to image it.  It should be done by morning, and I'll check on it then.

linux-ix-slave34 is having issues getting to the boot menu from IPMI and needs physical intervention: bug 651306

Built from linux64-ix-ref-20110419:

linux64-ix-slave04
linux64-ix-slave11
linux64-ix-slave13
linux64-ix-slave40

linux64-ix-slave12 is still in the process of installing (quite slowly) and should be finished by morning.

The following hosts are on hold pending a new win64 image:

w64-ix-slave02          10.12.48.154
w64-ix-slave06          10.12.48.158
w64-ix-slave11          10.12.48.163

The following host is back in MTV waiting to be powered back on:

mv-moz2-linux-ix-slave12


I'd like some naming clarification on the following hosts before I image them, since they switched datacenters:

w32-ix-slave07
w32-ix-slave08

Should the records be for w32-ix-slaveNN or win32-ix-slaveNN?  The new ones were set up as w32, but it appears that the records for the old ones are win32.
Status: NEW → ASSIGNED
Depends on: 651306
Depends on: 651325
linux64-ix-slave12 is up as well.
linux-ix-slave13 failed after the second reimage, so I've opened up a hardware bug for it.
(Reporter)

Comment 5

7 years ago
w32-ix-slave07
w32-ix-slave08

should be scl1.build.mozilla.org, and should keep the same names.  Any records matching /win32-ix-slave.*/ are an error and should be deleted.
(Reporter)

Comment 6

7 years ago
I have bad news:

[root@linux-ix-slave06 ~]# hdparm -tT /dev/sda
 Timing cached reads:   29336 MB in  1.99 seconds = 14738.16 MB/sec
 Timing buffered disk reads:  148 MB in  3.00 seconds =  49.28 MB/sec

[root@linux-ix-slave16 ~]# hdparm -tT /dev/sda
 Timing cached reads:   29336 MB in  1.99 seconds = 14736.58 MB/sec
 Timing buffered disk reads:  214 MB in  3.02 seconds =  70.97 MB/sec

[root@linux-ix-slave42 ~]# hdparm -tT /dev/sda
 Timing cached reads:   29336 MB in  1.99 seconds = 14736.70 MB/sec
 Timing buffered disk reads:  186 MB in  3.01 seconds =  61.86 MB/sec

[root@linux64-ix-slave04 ~]# hdparm -tT /dev/sda
 Timing cached reads:   23900 MB in  2.00 seconds = 11964.91 MB/sec
 Timing buffered disk reads:  258 MB in  3.00 seconds =  85.91 MB/sec

[root@linux64-ix-slave11 ~]# hdparm -tT /dev/sda
 Timing cached reads:   23964 MB in  2.00 seconds = 11995.92 MB/sec
 Timing buffered disk reads:  266 MB in  3.01 seconds =  88.41 MB/sec

[root@linux64-ix-slave12 ~]# hdparm -tT /dev/sda
 Timing cached reads:   21128 MB in  2.00 seconds = 10576.90 MB/sec
 Timing buffered disk reads:   46 MB in  3.07 seconds =  14.97 MB/sec

[root@linux64-ix-slave13 ~]# hdparm -tT /dev/sda
 Timing cached reads:   23908 MB in  2.00 seconds = 11969.06 MB/sec
 Timing buffered disk reads:  242 MB in  3.01 seconds =  80.44 MB/sec

[root@linux64-ix-slave40 ~]# hdparm -tT /dev/sda
 Timing cached reads:   23924 MB in  2.00 seconds = 11977.16 MB/sec
 Timing buffered disk reads:  162 MB in  3.00 seconds =  53.96 MB/sec


All but linux64-ix-slave12 are idle (booted in multiuser mode, but nothing running).  All of them have a good bit of variance across multiple runs, but I only saw *one* check over 90MB/s, on linux64-ix-slave11.  If 90MB/s is our send-it-back-to-IX threshold, then all of these systems need to go back.  Zandr, what do you think?
(Assignee)

Comment 7

7 years ago
(In reply to comment #6)

> Zandr, what do you think?

I think I'm going to email this comment to iX and see what they have to say.
(Reporter)

Comment 8

7 years ago
Idle measurement of linux64-ix-slave12:
[root@linux64-ix-slave12 ~]# hdparm -tT /dev/sda
 Timing cached reads:   21432 MB in  2.00 seconds = 10727.27 MB/sec
 Timing buffered disk reads:   36 MB in  3.10 seconds =  11.62 MB/sec
(In reply to comment #5)
> w32-ix-slave07
> w32-ix-slave08

I've reimaged these servers.  I didn't see any steps to take for a postimaging, so they're just as they booted up.
(Assignee)

Comment 10

7 years ago
I stopped by iX systems last night and chatted with them a bit about what we've been seeing here.

They're going to package up their burnin script so we can run the same tests they do for production qualification.

Stay tuned.
FYI w32-ix-slave08 seems to autologin the cltbld user, but w32-ix-slave07 tries
to autologin Administrator and fails.
linux-ix-slave34 has been reimaged now as well and kernel panics saying that it
can not find /dev/root.

The only machine that is back from IX that has not been reimaged (attempted)
yet is mv-moz2-linux-ix-slave12, which is still waiting to be powered on.
Assignee: arich → zandr
(Reporter)

Comment 13

7 years ago
I don't think there's anything left to do here - these systems will all get batched and sent to iX as part of bug 655304.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.