Reboot requests requiring investigation|reimages (scl1)

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: MaRu, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildduty][buildslaves])

(Reporter)

Description

6 years ago
talos-r3-fed64-005
No Video, rebooted and no video after reboot, tried hard reboot and now have video, Cannot mount root filesystem boot has failed, sleeping forever. Will reimage in hopes to bring it back to life.

talos-r3-fed64-018
Hangs at Initializing network drop monitor service, rebooted with same results

talos-r3-fed64-035
Was stuck at a green screen, after reboot no video display

talos-r3-w7-045	
Throwing errors with the python script needs debug or reimage

talos-r3-fed-045
No display even after reboot, unable to ping
(Reporter)

Updated

6 years ago
colo-trip: --- → scl1
Assignee: server-ops-releng → mlarrain
It sounds like a reimage is the next step for each of these?

If you can txt me a screenshot from 045?
(Reporter)

Comment 2

6 years ago
talos-r3-fed-048          Boot has failed, sleeping forever
Assignee: mlarrain → server-ops-releng
(Reporter)

Comment 3

6 years ago
talos-r3-fed64-055         Boot has failed
(Reporter)

Comment 4

6 years ago
talos-r3-w7-045	has been reimaged
(In reply to Matthew Larrain[:digipengi] from comment #4)
> talos-r3-w7-045	has been reimaged

But not successfully. Pilot error the first time 'round, and then it fell off the net when I tried to reimage it the second time. Next person in scl1 please boot it into DeployStudio again.
fed64-053 and fed-045 still don't respond to ping after a reboot. 
Should investigate these with a crash cart.
Assignee: server-ops-releng → mlarrain
(Reporter)

Comment 7

6 years ago
talos-r4-snow-047               Not powering on

talos-r3-fed-045.build.scl1     No display|rebooted|Needs reimage

talos-r3-fed64-028.build.scl1   Stuck at Initalizing network drop monitor service
talos-r3-fed64-031.build.scl1   Boot has failed, sleeping forever
talos-r3-fed64-034.build.scl1   Stuck at Initalizing network drop monitor service
talos-r3-fed64-048.build.scl1   Stuck at Initalizing network drop monitor service
talos-r3-fed64-049.build.scl1   Stuck at Initalizing network drop monitor service
talos-r3-fed64-053.build.scl1   No display|rebooted|Needs reimage
(Reporter)

Comment 8

6 years ago
reimaged talos-r3-fed-045.build.scl1 it needs it's hostname and anything else done to it that releng requires.
talos-r3-w7-045 has been reimaged.
(Reporter)

Comment 10

6 years ago
talos-r3-fed64-053 Needs releng to finish hostname change
talos-r3-fed64-049 Needs releng to finish hostname change
talos-r3-fed64-048 reimaged stuck at Initalizing network drop monitor service
talos-r3-fed64-034 reimaged stuck at Initalizing network drop monitor service
talos-r3-fed64-031 Needs releng to finish hostname change
talos-r3-fed64-030 Needs releng to finish hostname change
talos-r3-fed64-028 reimaged stuck at Initalizing network drop monitor service

for the three still experiencing issues dustin has filed a new bug 699250
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 11

6 years ago
Opps didn't mean to close this as releng still needs to get the hostnames for the working boxes updated.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: mlarrain → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
(Assignee)

Updated

6 years ago
OS: Mac OS X → All
Priority: -- → P3
Hardware: x86 → All
Whiteboard: [buildduty][buildslaves]

Updated

6 years ago
Priority: P3 → --
(Assignee)

Updated

6 years ago
Assignee: nobody → coop
Status: REOPENED → NEW
Priority: -- → P3
(Assignee)

Comment 12

6 years ago
(In reply to Matthew Larrain[:digipengi] from comment #10)
> talos-r3-fed64-053 Needs releng to finish hostname change
> talos-r3-fed64-049 Needs releng to finish hostname change
> talos-r3-fed64-048 reimaged stuck at Initalizing network drop monitor service
> talos-r3-fed64-034 reimaged stuck at Initalizing network drop monitor service
> talos-r3-fed64-031 Needs releng to finish hostname change
> talos-r3-fed64-030 Needs releng to finish hostname change
> talos-r3-fed64-028 reimaged stuck at Initalizing network drop monitor service

To refresh this list, it's only the following machines that need releng intervention:

talos-r3-fed64-030
talos-r3-fed64-031
talos-r3-fed64-053

All the others are being handled in bug 699250.
Status: NEW → ASSIGNED
Priority: P3 → P2

Comment 13

6 years ago
(In reply to Matthew Larrain[:digipengi] from comment #0)
> talos-r3-fed64-035
> Was stuck at a green screen, after reboot no video display
> 
What happened with this slave? It has not taken a job for 35 days.
(Assignee)

Comment 14

6 years ago
Amy: is DNS up-to-date for the 3 slaves listed in comment #12? I'm having a devil of a time getting them to to connect to the puppet masters -> getting "dnsdomainname: Unknown host" every time.
coop: did you change the hostname on the machines that were reimaged before you tried to puppet them (see comment 10)?
(Assignee)

Updated

6 years ago
Duplicate of this bug: 685151
(Assignee)

Comment 17

6 years ago
(In reply to Chris Cooper [:coop] from comment #14)
> Amy: is DNS up-to-date for the 3 slaves listed in comment #12? I'm having a
> devil of a time getting them to to connect to the puppet masters -> getting
> "dnsdomainname: Unknown host" every time.

After many iterations, I got these machines syncing with the production puppet master. They're all back in service now.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED

Comment 18

6 years ago
(In reply to Matthew Larrain[:digipengi] from comment #8)
> reimaged talos-r3-fed-045.build.scl1 it needs it's hostname and anything
> else done to it that releng requires.

I just fixed the hostname for this one but needs watching to get synced up and take jobs. Added slave-alloc note.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 19

6 years ago
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #18) 
> I just fixed the hostname for this one but needs watching to get synced up
> and take jobs. Added slave-alloc note.

Did the puppet dance with this one. Fixed.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.