Closed
Bug 624124
Opened 14 years ago
Closed 13 years ago
Please re-image talos-r3-w7-048
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: zandr)
References
Details
(Whiteboard: [reimage])
bjacob just finished with talos-r3-w7-048 in bug 623278, so we should re-image this slave to get it back to a known state.
Updated•14 years ago
|
Assignee: server-ops → zandr
Comment 1•14 years ago
|
||
zandr mind if I hold on to this slave for a little bit? I promise I will give it back! :)
Assignee | ||
Comment 2•14 years ago
|
||
armen: WFM, assign the bug back to me when you're done.
Assignee: zandr → armenzg
Comment 3•13 years ago
|
||
Armen, can we close this? you can reopen or file a new one when you are done.
Comment 4•13 years ago
|
||
Please go ahead and reimage. I am done with this machine. Thanks! PS = Wait times for win7 machines is quite bad because of two things: * several machines are down to re-image (since we loaned several of them) * win7 jobs take longer I will have to check how many are out of action but we will have to see how to improve our re-imaging turn around.
Assignee: armenzg → zandr
Updated•13 years ago
|
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to comment #4) > I will have to check how many are out of action but we will have to see how to > improve our re-imaging turn around. There are exactly two ways to do this: 1) Stop using Minis for OS's other than Mac OS. 2) Hire more minions. The former scales much better than the latter.
Comment 6•13 years ago
|
||
Please ignore the previous. There are not that many machines waiting for re-imaging (sometimes releng we take long to file the bug after the loan is over). Your turn around is good. My apologies zandr/IT. There were more than 15 w7 slaves out of action. * 1 caught correctly by nagios * few slaves with buildbot not running hence PINGable. I have a solution to tackle this * few of them were running buildbot but "hung". Running a job for days. I have another solution to tackle this as well My apologies again it was a mistake to say that. I spoke incorrectly. See bug 627070 if you are curious on what was going on.
Assignee | ||
Comment 7•13 years ago
|
||
Though, I did miss this bug last night while I was at the colo. :D I was just working from bug 620948, and missed this and a couple of others. Will make a short stop there again before Monday.
Comment 8•13 years ago
|
||
No worries. Shall we have a reimages bug and add dependencies like this one to it? I wonder if having a single point will also helps us see overtime how many machine we reimage? Not sure if it has too much value. It seems that we get more w7 reimages since devs book them more often. Anyways just thinking out loud. Have a good weekend, Armen
Whiteboard: [reimage]
Comment 9•13 years ago
|
||
https://spreadsheets.google.com/ccc?key=0AqefQEn4Wp2ydFVjSkMwM1ZlS28xdVRaVDNHUEpLaEE&hl=en is the current best tracker, but it does not have historical information.
Assignee | ||
Comment 10•13 years ago
|
||
I have some partially formed thoughts about leveraging nagios (which already has the alert and ack history) to do all of this tracking. Depends on some additional work in nagios to make it sane, but I'd be happy to chat about this with anyone about to embark on creating a different system. :D Otherwise, stay tuned.
Comment 11•13 years ago
|
||
nagios scans logfiles for its history, right? Isn't that why the historical queries are so slow?
Assignee | ||
Comment 12•13 years ago
|
||
Reimaged, needs setup.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•