production-prometheus-vm02 is down again

VERIFIED FIXED

Status

P2
major
VERIFIED FIXED
10 years ago
5 years ago

People

(Reporter: tonymec, Assigned: nthomas)

Tracking

({regression})

Firefox Tracking Flags

(Not tracked)

Details

- Looks like the fix for bug 471003 didn't hold.
- Since bug 471098 has been resolved WONTFIX, the underlying problem won't be investigated, which means the tinderbox will have to be restarted by hand whenever it goes on strike.
(Assignee)

Comment 1

10 years ago
Given all the network issues recently, I reboot of this box seems like a good idea (uptime was 99 days). It's busy running fsck on the / partition, which was also on my list of quick checks.
Assignee: server-ops → nthomas
Component: Server Operations: Tinderbox Maintenance → Release Engineering: Maintenance
Priority: -- → P2
QA Contact: mrz → release
(Assignee)

Comment 2

10 years ago
From twistd.log:
  2008/12/31 05:52 PST [Broker,client] lost remote
  <lots of retry attempts to connect to master>
so this box never recovered it's network connection when DHCP/DNS when out (bug 471679).

Disk check was clean; buildbot reconnected to the master on boot and it's building a nightly now.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Ah, so this could be due to the Tinderbox and DHCP/DNS maintenance in the night before New Year's Eve -- except that this box produced a nightly (which I'm still using at the mo') and even one hourly after it on 31 December; it only really went on strike on 2008-12-31 05:23:09 PST (or thereabouts). See bug 471679 comment #25 where I posted a copy of Reed's maintenance warning from mozilla.dev.planning.

I see the build's yellow box on the mozilla1.8 tinderbox page When (and if ;-) ) the build finishes, I'll try to download it.
(Assignee)

Comment 4

10 years ago
The timestamp of failure isn't a very good measure here, if it was the DHCP problem (because IP and DNS info was being issued with a 7 day expiry, and a random start date).
(In reply to comment #4)
> The timestamp of failure isn't a very good measure here, if it was the DHCP
> problem (because IP and DNS info was being issued with a 7 day expiry, and a
> random start date).

ah, I see.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.21pre) Gecko/20090101 BonEcho/2.0.0.21pre - Build ID: 2009010114

=> VERIFIED.
Status: RESOLVED → VERIFIED

Updated

10 years ago
Component: Release Engineering: Maintenance → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.