Closed Bug 1234261 (t-w732-ix-195) Opened 9 years ago Closed 8 years ago

t-w732-ix-195 problem tracking

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Phil Ringnalda (:philor)

Reporter

Description

•

9 years ago

On a rampage of failing the test runs when it manages to stay connected through s whole one, which it rarely does. Disabled.

Vlad Ciobancai [:vladC] [:vciobancai]

Comment 1

•

9 years ago

Re-imaged and enabled in slavealloc

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Phil Ringnalda (:philor)

Reporter

Comment 2

•

8 years ago

Heh. Didn't realize it had been that long that Q and grenade were running AWS slaves with various names which lied and pretended that they were this slave.

If something fails and claims to be this slave, you can bet it isn't actually this slave, which you can usually, at least so far, determine by looking in the log for the spew of env vars for a computername like T-W7-AWS-BASE.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Phil Ringnalda (:philor)

Reporter

Comment 3

•

8 years ago

Or, alternate fun possibility, that reimage at the time when 195 was both hardware and a lying AWS instance might have reimaged the hardware slave to think that its computername was T-W7-AWS-BASE, though probably not.

Vlad Ciobancai [:vladC] [:vciobancai]

Comment 4

•

8 years ago

:philor should we try to re-image again the slave ?

Vlad Ciobancai [:vladC] [:vciobancai]

Updated

•

8 years ago

Flags: needinfo?(philringnalda)

Phil Ringnalda (:philor)

Reporter

Comment 5

•

8 years ago

Maybe?

The only information I have is that while t-w732-ix-195 is enabled in slavealloc, something with the slavename t-w732-ix-195 and the env var "computername" t-w7-aws-base takes jobs and fails talos jobs by trying to tell graphserver that its name is t-w7-aws-base, and since I disabled t-w732-ix-195 in slavealloc I haven't seen another instance of that.

I can imagine that part of the setup to have t-w7-aws-* lie and claim to be t-w732-ix-195 resulted in a reimage of the actual t-w732-ix-195 being broken and that that has been reverted and another reimage would fix it; I can imagine that it hasn't been reverted and another reimage won't fix it; I can imagine that rather than it being the actual t-w732-ix-195 which was failing jobs it was a t-w7-aws instance which is obeying the disabling of t-w732-ix-195 in slavealloc; I can imagine that it was a t-w7-aws instance but rather than obeying the disabling it just happened to have been terminated around the time I disabled t-w732-ix-195.

Flags: needinfo?(philringnalda) → needinfo?(rthijssen)

Rob Thijssen [:grenade (EET/UTC+0300)]

Comment 6

•

8 years ago

I have looked in the EC2 instance list and seen that there is an instance sharing the name "t-w732-ix-195". The instance ID is i-8454df32 and has a moz-owner tag of q@mozilla.com. The instance state was 'stopped' at the time that I checked but I would guess that if it were started, it would create the sort of problems described above. I believe that the instance is probably being used to create the base image that we will later use to spawn golden and spot images. It would probably benefit from having its name changed to a name that doesn't exist in slave-alloc or buildbot-configs but as it isn't my instance, I don't want to make that change, in case there are circumstances or reasons I haven't considered for the name to remain as is.

Flags: needinfo?(rthijssen)

Alin Selagea [:aselagea]

Comment 7

•

8 years ago

@Q: would it be possible to change the name of the instance to one that doesn't exist in slavealloc or buildbot-configs?

Thanks.

Flags: needinfo?(q)

Comment 8

•

8 years ago

Alin,  I wiped out these instances and made sure that host name is out of the testing loop. I am making sure that the real 195 is working today.

Flags: needinfo?(q)

Comment 9

•

8 years ago

195 is taking jobs back in scl3 and things look good so far.

Alin Selagea [:aselagea]

Comment 10

•

8 years ago

Mostly green jobs at the moment. Marking as resolved for now.

Status: REOPENED → RESOLVED

Closed: 9 years ago → 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

4 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

t-w732-ix-195 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

Tracking

(Not tracked)

People

(Reporter: philor, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated