Closed
Bug 1217493
Opened 9 years ago
Closed 9 years ago
reimage 30 linux64 hosts as w7 hosts
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: arich)
References
Details
Attachments
(1 file)
298.41 KB,
image/jpeg
|
Details |
we disabled all the talos hosts for linux32, now we have 99 machines for linux64- talos does not need 99 machines. lets start with 30 machines, allocate them as fit between the windows pools. This should help us reduce the load even more (and help me get windows talos results faster!)
Comment 1•9 years ago
|
||
Why doesn't it need 99 machines, and how will we tell whether or not 66 is the right number? Looks like yesterday we started 85% of the jobs in less than 15 minutes, what percentage of the jobs do you want to have take up to half an hour before they even start running?
Reporter | ||
Comment 2•9 years ago
|
||
this is more of a juggling act- we have really long backlogs on windows, and almost no wait on linux64. It appears we run graphics as well as talos on there. Lets look at data and see what the historic wait times are for this hardware pool. Coop was going to look at historical wait times.
Flags: needinfo?(coop)
Comment 3•9 years ago
|
||
Amy asked me to add a bit more information wrt to which jobs run on these L64 machines: * talos [1] * Android x86 S4 test jobs [2] [1] http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla-tests/config.py#l141 [2] http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla-tests/mobile_config.py#l123
Comment 4•9 years ago
|
||
And also let's split it like this: * Win7 13 machines * Win8 17 machines (a bit more backlogged atm) That would bring us to: * Win7 -> 202 -> 215 (1 disabled atm) * Win8 -> 194 -> 211 (7 disabled atm)
Comment 5•9 years ago
|
||
talos-linux64-ix-070-099 are available for reimaging
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Armen Zambrano Gasparnian [:armenzg] from comment #4) I think the allocation distribution is still under discussion since we might target one OS (likely w7) to beef up so we can start turning on some e10s tests. jgriffin was going to talk to the e10s team about options.
Comment 7•9 years ago
|
||
yeah I'd like to put them all in the win7 pool; post coming shortly to mozilla.tools!
Assignee | ||
Comment 8•9 years ago
|
||
Okay, I'll get started on turning these into w7 machines. I'll need some help from dcops/netops for the VLAN work before I can reinstall them.
Assignee | ||
Comment 9•9 years ago
|
||
These will be: t-w732-ix-204.wintest.releng.scl3.mozilla.com t-w732-ix-205.wintest.releng.scl3.mozilla.com t-w732-ix-206.wintest.releng.scl3.mozilla.com t-w732-ix-207.wintest.releng.scl3.mozilla.com t-w732-ix-208.wintest.releng.scl3.mozilla.com t-w732-ix-209.wintest.releng.scl3.mozilla.com t-w732-ix-210.wintest.releng.scl3.mozilla.com t-w732-ix-211.wintest.releng.scl3.mozilla.com t-w732-ix-212.wintest.releng.scl3.mozilla.com t-w732-ix-213.wintest.releng.scl3.mozilla.com t-w732-ix-214.wintest.releng.scl3.mozilla.com t-w732-ix-215.wintest.releng.scl3.mozilla.com t-w732-ix-216.wintest.releng.scl3.mozilla.com t-w732-ix-217.wintest.releng.scl3.mozilla.com t-w732-ix-218.wintest.releng.scl3.mozilla.com t-w732-ix-219.wintest.releng.scl3.mozilla.com t-w732-ix-220.wintest.releng.scl3.mozilla.com t-w732-ix-221.wintest.releng.scl3.mozilla.com t-w732-ix-222.wintest.releng.scl3.mozilla.com t-w732-ix-223.wintest.releng.scl3.mozilla.com t-w732-ix-224.wintest.releng.scl3.mozilla.com t-w732-ix-225.wintest.releng.scl3.mozilla.com t-w732-ix-226.wintest.releng.scl3.mozilla.com t-w732-ix-227.wintest.releng.scl3.mozilla.com t-w732-ix-228.wintest.releng.scl3.mozilla.com t-w732-ix-229.wintest.releng.scl3.mozilla.com t-w732-ix-230.wintest.releng.scl3.mozilla.com t-w732-ix-231.wintest.releng.scl3.mozilla.com t-w732-ix-232.wintest.releng.scl3.mozilla.com t-w732-ix-233.wintest.releng.scl3.mozilla.com
Assignee: relops → arich
Summary: reimage 30 linux64 hosts as w7 and w8 hosts → reimage 30 linux64 hosts as w7 hosts
Assignee | ||
Comment 10•9 years ago
|
||
updated in nagios as well.
Assignee | ||
Comment 11•9 years ago
|
||
I got some of these to install, but others don't seem to be making it through the process, and I'm not sure how to debug. Q, could you take a look at the following to make sure that they have everything installed properly: 204 205 208 210 212 213 214 218 219 220 226 227 228 229 231 232 233 And take a look at the following to see why they never complete: 206 207 209 211 215 216 217 221 222 223 224 225 230
Flags: needinfo?(q)
Comment 12•9 years ago
|
||
Finally found something useful. These machines are BSODing with a stop error of STOP: 0x00000019 when trying to reboot. The description of this error is "The issue occurs because the Hdaudio.sys driver tries to process the audio information when the uninitialized internal firmware memory within the video card is set in a certain manner." This error combined with a log vague log entry of "invalid gpu version" seems to be coming from something in the firmware on the video cards.
Flags: needinfo?(q)
Comment 13•9 years ago
|
||
Finally found something useful. These machines are BSODing with a stop error of STOP: 0x00000019 when trying to reboot. The description of this error is "The issue occurs because the Hdaudio.sys driver tries to process the audio information when the uninitialized internal firmware memory within the video card is set in a certain manner." This error combined with a log vague log entry of "invalid gpu version" seems to be coming from something in the firmware on the video cards.
Assignee | ||
Comment 14•9 years ago
|
||
The following should be ready to go into the pool, according to Q: t-w732-ix-204.wintest.releng.scl3.mozilla.com t-w732-ix-205.wintest.releng.scl3.mozilla.com t-w732-ix-208.wintest.releng.scl3.mozilla.com t-w732-ix-210.wintest.releng.scl3.mozilla.com t-w732-ix-212.wintest.releng.scl3.mozilla.com t-w732-ix-213.wintest.releng.scl3.mozilla.com t-w732-ix-214.wintest.releng.scl3.mozilla.com t-w732-ix-218.wintest.releng.scl3.mozilla.com t-w732-ix-219.wintest.releng.scl3.mozilla.com t-w732-ix-220.wintest.releng.scl3.mozilla.com t-w732-ix-226.wintest.releng.scl3.mozilla.com t-w732-ix-227.wintest.releng.scl3.mozilla.com t-w732-ix-228.wintest.releng.scl3.mozilla.com t-w732-ix-229.wintest.releng.scl3.mozilla.com t-w732-ix-231.wintest.releng.scl3.mozilla.com t-w732-ix-232.wintest.releng.scl3.mozilla.com t-w732-ix-233.wintest.releng.scl3.mozilla.com
Comment 15•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #2) > this is more of a juggling act- we have really long backlogs on windows, and > almost no wait on linux64. It appears we run graphics as well as talos on > there. Lets look at data and see what the historic wait times are for this > hardware pool. Coop was going to look at historical wait times. I've done this now. I still have reservations about this approach, but we are really stuck for options here. My biggest worry is that we're cannibalizing one of the few platforms where we currently *don't* have massive wait times for gfx performance testing. Going through the historical data, we've gone over 2000 test jobs for ubuntu64_hw 23 times. Sometimes that causes us to miss our 95% commitment, but mostly it doesn't. It really depends on whether the requests all come in at once or not, i.e. is the load "bursty" or not. Does this get worse when we take away a third of the pool? Likely, but we have few alternatives.
Flags: needinfo?(coop)
Assignee | ||
Comment 16•9 years ago
|
||
Q: can you verify that the following installed correctly? All but 224 and 230 had their graphics cards swapped out (we were using non-pingable as an indicator of BSOD issues with the graphics card, possibly firmware incompatibilities): t-w732-ix-206.wintest.releng.scl3.mozilla.com t-w732-ix-207.wintest.releng.scl3.mozilla.com t-w732-ix-209.wintest.releng.scl3.mozilla.com t-w732-ix-211.wintest.releng.scl3.mozilla.com t-w732-ix-215.wintest.releng.scl3.mozilla.com t-w732-ix-216.wintest.releng.scl3.mozilla.com t-w732-ix-217.wintest.releng.scl3.mozilla.com t-w732-ix-221.wintest.releng.scl3.mozilla.com t-w732-ix-222.wintest.releng.scl3.mozilla.com t-w732-ix-224.wintest.releng.scl3.mozilla.com t-w732-ix-230.wintest.releng.scl3.mozilla.com sal replaced the graphics cards in the following and kicked off a reimage, but they hadn't finished as of yet. There's a good chance they'll also be done by the time you get to this, so please give them a look, too. t-w732-ix-223.wintest.releng.scl3.mozilla.com t-w732-ix-225.wintest.releng.scl3.mozilla.com
Flags: needinfo?(q)
Comment 17•9 years ago
|
||
Checked: t-w732-ix-206.wintest.releng.scl3.mozilla.com t-w732-ix-207.wintest.releng.scl3.mozilla.com t-w732-ix-209.wintest.releng.scl3.mozilla.com t-w732-ix-211.wintest.releng.scl3.mozilla.com t-w732-ix-215.wintest.releng.scl3.mozilla.com t-w732-ix-216.wintest.releng.scl3.mozilla.com t-w732-ix-217.wintest.releng.scl3.mozilla.com All look good. They need to be enabled and rebooted to be added to the pool.
Flags: needinfo?(q)
Comment 18•9 years ago
|
||
221,222, and 225 are all good now
Assignee | ||
Comment 20•9 years ago
|
||
I've rebooted and added all but 223 and 230 into the pool.
Comment 21•9 years ago
|
||
223 says there is a monitor attached. Has stuck past one reboot. Doing it again just in case.
Comment 22•9 years ago
|
||
230 was offline. Brought back up and checking now. In other news tests are looking green on the boxes that already went into the pool.
Flags: needinfo?(q)
Comment 23•9 years ago
|
||
223 is back up and looks good without a reported monitor. I am adding into the slave pool and rebooting.
Comment 24•9 years ago
|
||
All machines are up and in slave pool
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 25•9 years ago
|
||
Thanks Q an Amy!
You need to log in
before you can comment on or make changes to this bug.
Description
•