talos-r3-w7-038 isn't processing minidumps properly / is crashing extremely frequently during tests

RESOLVED FIXED

Status

Infrastructure & Operations
Buildduty
--
major
RESOLVED FIXED
6 years ago
16 days ago

People

(Reporter: emorley, Assigned: billm)

Tracking

Details

(Whiteboard: [buildduty])

(Reporter)

Description

6 years ago
Bug 768156, bug 767906 & bug 768483 lack proper stacks & are all from the same slave (talos-r3-w7-038).
(Reporter)

Updated

6 years ago
Summary: talos-r3-w7-038 isn't processing minidumps properly → talos-r3-w7-038 isn't processing minidumps properly / is crashing extremely frequently during tests
(Reporter)

Updated

6 years ago
Blocks: 767908
(Reporter)

Updated

6 years ago
Blocks: 767932
(Reporter)

Updated

6 years ago
Blocks: 767937
(Reporter)

Updated

6 years ago
Blocks: 767970
(Reporter)

Updated

6 years ago
Blocks: 768006
(Reporter)

Updated

6 years ago
Blocks: 768036
(Reporter)

Updated

6 years ago
Blocks: 768160
(Reporter)

Updated

6 years ago
Blocks: 768397
(Reporter)

Updated

6 years ago
Blocks: 768402
(Reporter)

Updated

6 years ago
Blocks: 768415
(Reporter)

Updated

6 years ago
Blocks: 768418
(Reporter)

Updated

6 years ago
Blocks: 768425
(Reporter)

Updated

6 years ago
Blocks: 768433
(Reporter)

Updated

6 years ago
Blocks: 768434
(Reporter)

Updated

6 years ago
Blocks: 768439
(Reporter)

Updated

6 years ago
Blocks: 768453
(Reporter)

Updated

6 years ago
Blocks: 768486
(Reporter)

Comment 5

6 years ago
Please may we remove this slave from production.
Severity: major → critical
Actually, we see users who have crashes like and we can never solve them. Could we try to figure out what's wrong with this slave so we can help our users?
(Reporter)

Comment 7

6 years ago
Armen, please may you give billm access to the slave after taking it out of production. Thanks! :-)

Updated

6 years ago
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Whiteboard: [buildduty]
I disabled the slave on slavealloc and added a note.

billm, would you be able to look what is wrong with this slave?
(Reporter)

Updated

6 years ago
Blocks: 768424
(Reporter)

Updated

6 years ago
Blocks: 768009
(Reporter)

Updated

6 years ago
Blocks: 767925
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #8)
> I disabled the slave on slavealloc and added a note.
> 
> billm, would you be able to look what is wrong with this slave?

Yes, I'd like to try. At the very least, it would be interesting to run a memory test. It may just be a hardware malfunction. Can you please send me a login?
I asked IT to get you access.
You should have the machine ready tomorrow morning.
Assignee: nobody → armenzg
I have sent the credentials to Bill.
Assignee: armenzg → wmccloskey

Comment 12

6 years ago
removing this from Critical to Major as it is in dev's hands to work with
Severity: critical → major
Bill: any update on the status of this slave? Found anything? Are you done with it?
I ran a memory test and didn't find anything. I tried running a few tests and none of them failed.

However, I don't think we can add this slave back to the pool. It will likely just cause more failures.

Updated

6 years ago
Depends on: 776924
(In reply to Bill McCloskey (:billm) from comment #14)
> I ran a memory test and didn't find anything. I tried running a few tests
> and none of them failed.
> 
> However, I don't think we can add this slave back to the pool. It will
> likely just cause more failures.

OK, thanks for trying. I'll get IT to re-educate the machine in bug 776924.
This slave has been fixed, at least in theory. If this recurs, please re-open and we'll just decommission this slave.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering

Updated

16 days ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.