650335 - bring up slaves from 3/17 IX repair trip

Reporter

Description

•

14 years ago

The following slaves came back from IX yesterday, and should be racked up now (from bug 596366 comment 102): linux-ix-slave01: bug 624371 A1-16072 4620 scl1 linux-ix-slave06: bug 624210 A1-16077 4625 scl1 linux-ix-slave13: bug 619624 A1-16084 4632 scl1 linux-ix-slave16: comment 8 A1-16087 4635 scl1 linux-ix-slave17: comment 55 A1-16088 4636 scl1 linux-ix-slave33: bug 620124 A1-16163 4674 scl1 linux-ix-slave34: comment 58 A1-16164 4675 scl1 linux-ix-slave35: bug 620124 A1-16165 4676 scl1 linux-ix-slave42: bug 624207 A1-16172 4773 scl1 linux64-ix-slave04: comment 78 A1-16176 4777 scl1 linux64-ix-slave10: comment 78 A1-16182 4783 scl1 linux64-ix-slave11: comment 78 A1-16183 4784 scl1 linux64-ix-slave12: comment 83 A1-16184 4785 scl1 linux64-ix-slave13: comment 83 A1-16185 4786 scl1 linux64-ix-slave16: comment 78 A1-16188 4789 scl1 linux64-ix-slave40: comment 102 A1-16212 4813 scl1 linux64-ix-slave41: comment 101 A1-16213 4814 scl1 mv-moz2-linux-ix-slave12: A1-14132 3121 mtv1 w32-ix-slave07: A1-16053 4601 mtv1 w32-ix-slave08: bug 635416#c31 A1-16054 4602 mtv1 w32-ix-slave23: comment 51 A1-16069 4617 scl1 w32-ix-slave41: bug 615744 A1-16104 4705 scl1 w64-ix-slave02: bug 638814 A1-16107 4708 scl1 w64-ix-slave06: bug 639628#c22 A1-16111 4712 scl1 w64-ix-slave07: comment 78 A1-16112 4713 scl1 w64-ix-slave11: comment 78 A1-16116 4717 scl1 This bug will track bringing them back into production, including adding them to slavealloc.

Dustin J. Mitchell [:dustin] (he/him)

Reporter

Comment 1

•

14 years ago

It looks like these aren't re-imaged yet. I don't know how to do that via IPMI, so I guess that's step one for someone else - zandr?

Amy Rich [:arr] [:arich]

Comment 2

•

14 years ago

I am working on reimaging the linux-ix-slave* machines now, minus the ones that are not responding via IPMI (bug 651178) The w64 reimages are waiting on the new ref image in bug 645024. The linux64 images are waiting on the new ref image in bug 648342.

Assignee: server-ops-releng → arich

Depends on: 645024, 648342, 651178

Amy Rich [:arr] [:arich]

Comment 3

•

14 years ago

The following servers are not yet back from IX: ** linux-ix-slave01 ** linux-ix-slave17 ** linux-ix-slave33 ** linux64-ix-slave10 ** linux64-ix-slave16 ** linux64-ix-slave41 ** w32-ix-slave23 ** w32-ix-slave41 ** w64-ix-slave07 The following hosts that have been reimaged have also had their hostname set. They have not been re-added back into puppet, per Dustin. Built from linux-ix-ref-20110204: linux-ix-slave06 linux-ix-slave16 linux-ix-slave35 linux-ix-slave42 linux-ix-slave13 is installing slowly and had block errors the first time I tried to image it. It should be done by morning, and I'll check on it then. linux-ix-slave34 is having issues getting to the boot menu from IPMI and needs physical intervention: bug 651306 Built from linux64-ix-ref-20110419: linux64-ix-slave04 linux64-ix-slave11 linux64-ix-slave13 linux64-ix-slave40 linux64-ix-slave12 is still in the process of installing (quite slowly) and should be finished by morning. The following hosts are on hold pending a new win64 image: w64-ix-slave02 10.12.48.154 w64-ix-slave06 10.12.48.158 w64-ix-slave11 10.12.48.163 The following host is back in MTV waiting to be powered back on: mv-moz2-linux-ix-slave12 I'd like some naming clarification on the following hosts before I image them, since they switched datacenters: w32-ix-slave07 w32-ix-slave08 Should the records be for w32-ix-slaveNN or win32-ix-slaveNN? The new ones were set up as w32, but it appears that the records for the old ones are win32.

Status: NEW → ASSIGNED

Depends on: 651306

Amy Rich [:arr] [:arich]

Updated

•

14 years ago

Depends on: 651325

Amy Rich [:arr] [:arich]

Comment 4

•

14 years ago

linux64-ix-slave12 is up as well. linux-ix-slave13 failed after the second reimage, so I've opened up a hardware bug for it.

Dustin J. Mitchell [:dustin] (he/him)

Reporter

Comment 5

•

14 years ago

w32-ix-slave07 w32-ix-slave08 should be scl1.build.mozilla.org, and should keep the same names. Any records matching /win32-ix-slave.*/ are an error and should be deleted.

Dustin J. Mitchell [:dustin] (he/him)

Reporter

Comment 6

•

14 years ago

I have bad news: [root@linux-ix-slave06 ~]# hdparm -tT /dev/sda Timing cached reads: 29336 MB in 1.99 seconds = 14738.16 MB/sec Timing buffered disk reads: 148 MB in 3.00 seconds = 49.28 MB/sec [root@linux-ix-slave16 ~]# hdparm -tT /dev/sda Timing cached reads: 29336 MB in 1.99 seconds = 14736.58 MB/sec Timing buffered disk reads: 214 MB in 3.02 seconds = 70.97 MB/sec [root@linux-ix-slave42 ~]# hdparm -tT /dev/sda Timing cached reads: 29336 MB in 1.99 seconds = 14736.70 MB/sec Timing buffered disk reads: 186 MB in 3.01 seconds = 61.86 MB/sec [root@linux64-ix-slave04 ~]# hdparm -tT /dev/sda Timing cached reads: 23900 MB in 2.00 seconds = 11964.91 MB/sec Timing buffered disk reads: 258 MB in 3.00 seconds = 85.91 MB/sec [root@linux64-ix-slave11 ~]# hdparm -tT /dev/sda Timing cached reads: 23964 MB in 2.00 seconds = 11995.92 MB/sec Timing buffered disk reads: 266 MB in 3.01 seconds = 88.41 MB/sec [root@linux64-ix-slave12 ~]# hdparm -tT /dev/sda Timing cached reads: 21128 MB in 2.00 seconds = 10576.90 MB/sec Timing buffered disk reads: 46 MB in 3.07 seconds = 14.97 MB/sec [root@linux64-ix-slave13 ~]# hdparm -tT /dev/sda Timing cached reads: 23908 MB in 2.00 seconds = 11969.06 MB/sec Timing buffered disk reads: 242 MB in 3.01 seconds = 80.44 MB/sec [root@linux64-ix-slave40 ~]# hdparm -tT /dev/sda Timing cached reads: 23924 MB in 2.00 seconds = 11977.16 MB/sec Timing buffered disk reads: 162 MB in 3.00 seconds = 53.96 MB/sec All but linux64-ix-slave12 are idle (booted in multiuser mode, but nothing running). All of them have a good bit of variance across multiple runs, but I only saw *one* check over 90MB/s, on linux64-ix-slave11. If 90MB/s is our send-it-back-to-IX threshold, then all of these systems need to go back. Zandr, what do you think?

Zandr Milewski [:zandr]

Assignee

Comment 7

•

14 years ago

(In reply to comment #6) > Zandr, what do you think? I think I'm going to email this comment to iX and see what they have to say.

Dustin J. Mitchell [:dustin] (he/him)

Reporter

Comment 8

•

14 years ago

Idle measurement of linux64-ix-slave12: [root@linux64-ix-slave12 ~]# hdparm -tT /dev/sda Timing cached reads: 21432 MB in 2.00 seconds = 10727.27 MB/sec Timing buffered disk reads: 36 MB in 3.10 seconds = 11.62 MB/sec

Amy Rich [:arr] [:arich]

Comment 9

•

14 years ago

(In reply to comment #5) > w32-ix-slave07 > w32-ix-slave08 I've reimaged these servers. I didn't see any steps to take for a postimaging, so they're just as they booted up.

Zandr Milewski [:zandr]

Assignee

Comment 10

•

14 years ago

I stopped by iX systems last night and chatted with them a bit about what we've been seeing here. They're going to package up their burnin script so we can run the same tests they do for production qualification. Stay tuned.

Amy Rich [:arr] [:arich]

Comment 11

•

14 years ago

FYI w32-ix-slave08 seems to autologin the cltbld user, but w32-ix-slave07 tries to autologin Administrator and fails.

Amy Rich [:arr] [:arich]

Comment 12

•

14 years ago

linux-ix-slave34 has been reimaged now as well and kernel panics saying that it can not find /dev/root. The only machine that is back from IX that has not been reimaged (attempted) yet is mv-moz2-linux-ix-slave12, which is still waiting to be powered on.

Amy Rich [:arr] [:arich]

Updated

•

14 years ago

Assignee: arich → zandr

Dustin J. Mitchell [:dustin] (he/him)

Reporter

Comment 13

•

14 years ago

I don't think there's anything left to do here - these systems will all get batched and sent to iX as part of bug 655304.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

Bugzilla

bring up slaves from 3/17 IX repair trip

Categories

(Infrastructure & Operations :: RelOps: General, task)

Tracking

(Not tracked)

People

(Reporter: dustin, Assigned: zandr)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Updated