Closed Bug 614821 Opened 14 years ago Closed 14 years ago

reboots 20101125

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: zandr)

References

Details

(Whiteboard: [needs SCL visit])

talos-r3-w7-012.build - unreachable
talos-r3-w7-040.build - host unknown
Blocks: 562459
linux-ix-slave12 -- down and unreachable via IPMI.
mw32-ix-slave22 - down and unreachable by ipmi
talos-r3-fed-012
talos-r3-fed-027
talos-r3-fed64-030
talos-r3-fed64-049
talos-r3-xp-012
talos-r3-fed64-053
talos-r3-fed64-047.build
talos-r3-fed64-043.build
talos-r3-fed64-013.build
talos-r3-fed64-001.build
talos-r3-fed-040.build
talos-r3-fed-036.build
talos-r3-fed-027.build
talos-r3-fed-024.build
talos-r3-fed-022.build
talos-r3-fed-030.build
talos-r3-xp-052.build
talos-r3-w7-009.build
Flags: colo-trip+
Whiteboard: [needs SCL visit]
talos-r3-fed-009.build
talos-r3-w7-036.build
jlaz/jabba? Phong's outta town. jlaz, punt around as needed.

Thanks guys!
Assignee: server-ops → jlazaro
talos-r3-w7-011.build
talos-r3-fed-044.build
talos-r3-fed-037.build

Bumping severity because we've lost 20% of our Fedora 32-bit pool.
Severity: normal → critical
talos-r3-fed64-016.build

Any ETA on these?
Copying from previous reboot bug.

Needs re-image (mount errors):

fed-012
fed-022
fed-024
fed-036
fed-040

fed64-53
Is there any way we can get these done today? The 32-bit Fedora wait times are getting really bad.
Severity: critical → blocker
Assignee: jlazaro → server-ops
It would be really nice to figure out the issue (hw clock issue) that causes the Fedora minis to need to be re-imaged on a regular basis. Re-imaging isn't as quick as rebooting.
Assignee: server-ops → zandr
What happened to cause so many Fedora hosts to need manual reboots?

What happened to require so many reimages?
w7-11: rebooted
fed-044: rebooted
fed-037: rebooted
fed-009: rebooted
w7-009: rebooted
xp-052: offline for power reasons
fed-012: rebooted
fed-027: rebooted
fed64-030: on my desk in mv
fed64-049: offline for power reasons
xp-012: offline for power reasons
fed64-053: offline for power reasons
fed64-016: rebooted
fed64-047: MIA, possibly in MV
fed64-043: offline for power reasons
fed64-013: rebooted
fed64-001: reboooted
fed-040: rebooted
fed-036: rebooted
fed-027: rebooted
fed-024: rebooted
fed-022: rebooted
fed-030: rebooted
w7-036: rebooted
fed64-053: offline for power reasons
(In reply to comment #16)

Still can't ping:

> fed-044: rebooted
> fed-012: rebooted
> fed64-016: rebooted
> fed64-001: reboooted
> fed-040: rebooted
> fed-036: rebooted
> fed-024: rebooted
> fed-022: rebooted
(In reply to comment #16)
These seem to be online and connected
> w7-11: rebooted
> fed-037: rebooted
> fed-009: rebooted
> fed64-013: rebooted
> fed-027: rebooted

Online, needs puppet cleanup:
> fed-027: rebooted

Also not pingable:
> w7-036: rebooted
> w7-009: rebooted
> fed-030: rebooted
fed-012: reimaged
fed-022: reimaged
fed-024: reimaged
fed-036: reimaged
fed-040: pulled. Has a CD stuck in the drive that it won't boot from.
>linux-ix-slave12 -- down and unreachable via IPMI.
>mw32-ix-slave22 - down and unreachable by ipmi

Bounced around 18:00PDT
Can we reboot these?

talos-r3-w7-052.build  7d 7h 25m 50s
talos-r3-w7-036.build  6d 22h 13m 44s
talos-r3-w7-032.build  0d 18h 6m 56s
talos-r3-w7-012.build  16d 14h 55m 21s
talos-r3-w7-009.build  11d 15h 34m 56s
talos-r3-w7-008.build  1d 14h 57m 51s

That's ~10% of our win7 capacity.
(In reply to comment #25)
> Can we reboot these?
> 
> talos-r3-w7-052.build  7d 7h 25m 50s
> talos-r3-w7-036.build  6d 22h 13m 44s
> talos-r3-w7-032.build  0d 18h 6m 56s
> talos-r3-w7-012.build  16d 14h 55m 21s
> talos-r3-w7-009.build  11d 15h 34m 56s
> talos-r3-w7-008.build  1d 14h 57m 51s
> 
> That's ~10% of our win7 capacity.

Will swing by scl1 on the way home tonight.

-Z
It could wait until Monday/Tuesday as there is no pending jobs and at this time of the day people won't be pushing like mad people.

Your call.

Have a good weekend.
Given the allhands next week, I'm not certain I'll be able to get down there. It's not really out of my way tonight.
(In reply to comment #25)
> Can we reboot these?
> 
> talos-r3-w7-052.build  7d 7h 25m 50s
> talos-r3-w7-036.build  6d 22h 13m 44s
> talos-r3-w7-032.build  0d 18h 6m 56s
> talos-r3-w7-012.build  16d 14h 55m 21s
> talos-r3-w7-009.build  11d 15h 34m 56s
> talos-r3-w7-008.build  1d 14h 57m 51s

Rebooted, responding to ping, lots of ports open.

Two of these we'd pulled power on, but I think we can get away with it for now.
This looks like the list of machines in Nagios that need rebooting; sorry for any dups.

linux-ix-slave31.build.scl1
linux-ix-slave32.build.scl1
linux-ix-slave33.build.scl1
linux-ix-slave35.build.scl1
linux-ix-slave38.build.scl1
linux-ix-slave42.build.scl1
mv-moz2-linux-ix-slave05.build
talos-r3-fed-012.build
talos-r3-fed-029.build
talos-r3-fed-033.build
talos-r3-fed-036.build
talos-r3-fed-038.build
talos-r3-fed-041.build
talos-r3-fed64-021.build
talos-r3-fed64-027.build
talos-r3-fed64-044.build
talos-r3-fed64-055.build
talos-r3-snow-004.build
All SCL machines got a reboot today, closing this. Filed bug 620041 to track remaining down minis.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Alias: reboots
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.