Closed Bug 624124 Opened 14 years ago Closed 13 years ago

Please re-image talos-r3-w7-048

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: coop, Assigned: zandr)

References

Details

(Whiteboard: [reimage])

Chris Cooper [:coop] (he/him)

Reporter

Description

•

14 years ago

bjacob just finished with talos-r3-w7-048 in bug 623278, so we should re-image this slave to get it back to a known state.

Chris Cooper [:coop] (he/him)

Reporter

Updated

•

14 years ago

Blocks: 623274

Corey Shields [:cshields]

Updated

•

14 years ago

Assignee: server-ops → zandr

Armen [:armenzg]

Comment 1

•

14 years ago

zandr mind if I hold on to this slave for a little bit?

I promise I will give it back! :)

Zandr Milewski [:zandr]

Assignee

Comment 2

•

14 years ago

armen: WFM, assign the bug back to me when you're done.

Assignee: zandr → armenzg

Corey Shields [:cshields]

Comment 3

•

13 years ago

Armen, can we close this?  you can reopen or file a new one when you are done.

Armen [:armenzg]

Comment 4

•

13 years ago

Please go ahead and reimage.
I am done with this machine.

Thanks!

PS = Wait times for win7 machines is quite bad because of two things:
* several machines are down to re-image (since we loaned several of them)
* win7 jobs take longer

I will have to check how many are out of action but we will have to see how to improve our re-imaging turn around.

Assignee: armenzg → zandr

Corey Shields [:cshields]

Updated

•

13 years ago

Component: Server Operations → Server Operations: RelEng

QA Contact: mrz → zandr

Zandr Milewski [:zandr]

Assignee

Comment 5

•

13 years ago

(In reply to comment #4)
 
> I will have to check how many are out of action but we will have to see how to
> improve our re-imaging turn around.

There are exactly two ways to do this:

1) Stop using Minis for OS's other than Mac OS.
2) Hire more minions.

The former scales much better than the latter.

Armen [:armenzg]

Comment 6

•

13 years ago

Please ignore the previous.

There are not that many machines waiting for re-imaging (sometimes releng we take long to file the bug after the loan is over). Your turn around is good. My apologies zandr/IT.

There were more than 15 w7 slaves out of action.
* 1 caught correctly by nagios
* few slaves with buildbot not running hence PINGable. I have a solution to tackle this
* few of them were running buildbot but "hung". Running a job for days. I have another solution to tackle this as well

My apologies again it was a mistake to say that. I spoke incorrectly.

See bug 627070 if you are curious on what was going on.

Zandr Milewski [:zandr]

Assignee

Comment 7

•

13 years ago

Though, I did miss this bug last night while I was at the colo. :D

I was just working from bug 620948, and missed this and a couple of others. Will make a short stop there again before Monday.

Armen [:armenzg]

Comment 8

•

13 years ago

No worries.

Shall we have a reimages bug and add dependencies like this one to it?

I wonder if having a single point will also helps us see overtime how many machine we reimage? Not sure if it has too much value.

It seems that we get more w7 reimages since devs book them more often.

Anyways just thinking out loud.

Have a good weekend,
Armen

Whiteboard: [reimage]

Dustin J. Mitchell [:dustin] (he/him)

Comment 9

•

13 years ago

https://spreadsheets.google.com/ccc?key=0AqefQEn4Wp2ydFVjSkMwM1ZlS28xdVRaVDNHUEpLaEE&hl=en is the current best tracker, but it does not have historical information.

Zandr Milewski [:zandr]

Assignee

Comment 10

•

13 years ago

I have some partially formed thoughts about leveraging nagios (which already has the alert and ack history) to do all of this tracking.

Depends on some additional work in nagios to make it sane, but I'd be happy to chat about this with anyone about to embark on creating a different system. :D

Otherwise, stay tuned.

Dustin J. Mitchell [:dustin] (he/him)

Comment 11

•

13 years ago

nagios scans logfiles for its history, right?  Isn't that why the historical queries are so slow?

Zandr Milewski [:zandr]

Assignee

Comment 12

•

13 years ago

Reimaged, needs setup.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Please re-image talos-r3-w7-048

Categories

(Infrastructure & Operations :: RelOps: General, task)

Tracking

(Not tracked)

People

(Reporter: coop, Assigned: zandr)

References

Details

(Whiteboard: [reimage])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated