Closed Bug 666490 Opened 14 years ago Closed 14 years ago

reboot requests (scl1)

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: zandr)

References

Details

Big haul today: talos-r3-fed-001 talos-r3-fed-004 talos-r3-fed-021 talos-r3-fed-026 talos-r3-fed-031 talos-r3-fed-036 talos-r3-fed-048 talos-r3-fed-063 talos-r3-fed64-003 talos-r3-fed64-020 talos-r3-fed64-024 talos-r3-fed64-036 talos-r3-fed64-041 talos-r3-fed64-049 talos-r3-fed64-056 21:26 <@zandr> I think that we had a DHCP event about 10:20, and possibly another one at 11:10am today. 21:27 <@zandr> I'm going to bet that the giant pile of dead fedora machines are all dead_fish. I'll gather some data over in the fix-dead_fish bug. I think NetworkManager is the problem.
I think that the origin of this particular failure cascade was the firewall work this morning. vlan48 and vlan75 are different zones, so there were times when the clients couldn't reach the DHCP servers. The machines *should* be able to recover from this, but apparently can't.
talos-r3-fed64-056: gray screen talos-r3-fed-001 talos-r3-fed-004 talos-r3-fed-021 talos-r3-fed-026 talos-r3-fed-031 talos-r3-fed-036 talos-r3-fed-048 talos-r3-fed-063 talos-r3-fed64-003 talos-r3-fed64-020 talos-r3-fed64-024 talos-r3-fed64-036 talos-r3-fed64-041 talos-r3-fed64-049 All dead_fish.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Alias: reboots-scl1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
See Also: → 664629
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.