Closed Bug 471003 Opened 17 years ago Closed 17 years ago

production-prometheus-vm02 is down

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: tonymec, Assigned: bhearsum)

References

()

Details

Production-prometheus-vm02 is down, and has been since 2008-12-22 15:21:45. This means no Fx2 builds for Linux. (Other boxen are still building Fx2 for W32 & Mac.) All builds have been removed from http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/production-prometheus-vm02-mozilla1.8/ including those built after the latest nightly (on Monday 22).
I just logged into the VI console and this VM is up and running.
Running, I don't know, but is it building? (Do you do anything to restart it?) I don't see it (yet?) on the waterfall, nor do I (yet?) see any of its "tinderbox-builds" on the FTP site.
s/(Do/(Did/
These probably weren't restarted after the 2.0.0.20 release. I'll have a look.
Assignee: server-ops → bhearsum
Looks like it got hung for some reason. I just restarted the slave - it should be ok now.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(In reply to comment #5) > Looks like it got hung for some reason. I just restarted the slave - it should > be ok now. For a few days in succession recently, Fx2-Linux nightlies (but not tinderbox-builds) got hung ("building" forever) and had to be restarted. I wonder if there isn't something more serious lurking in there. I see the tinderbox building (and has been for almost 2 hours, I hope it's normal for a Linux nightly), I'll VERIFY this bug after the build finishes and I get a new working nightly dated today from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla1.8/
I wasn't aware of that. Please do re-open if it hangs again and someone can investigate this further.
(In reply to comment #7) > I wasn't aware of that. Please do re-open if it hangs again and someone can > investigate this further. How long do you think we should let the build proceed before deciding it is hung? I wouldn't want to interrupt it while there's still hope to see the build terminate properly.
That box has been "building" for over 2½ hours by now. What do you think, Ben? Going back a few days shows that successful nightly builds' duration on that box was between 1 and 1½ hours, sometimes immediately preceded by a yellow cell (a "building" cycle with no appparent termination) whose length I didn't include. OTOH -- is this a dep build or a clobber build? If dep, maybe unusually many source files have been changed while the box was sleeping and need to be recompiled?
Yeah, there's definitely something up here. It has been unable to finish checking out code. I'll investigate further.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Actually, that's a lie. I caught it *right* when it started checking out new code for the next run. The first one this morning (started around 4am PST) was a delayed nightly. The one which is currently building is a dep build. Does that match up with what you're seeing?
Well, well, well... Suddently the yellow cell at tinderbox.m.o. has been replaced by two green ones, much shorter together than the yellow one used to be; and I see a nightly build dated today 05:32 on the FTP site, and a tinderbox-build dated 05:43. I wonder what happened, that the t.m.o. page didn't register the end of the build until you prodded it (and it doesn't show any "build in progress" for that box ATM either).
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
OK, that nightly loads correctly AFAICT so I'm marking this bug VERIFIED. You may (or may not) want to file followup bugs about the hangs which have been happening this past week and/or the apparent lack of communication between the building box and whatever is generating the Mozilla1.8 tinderbox.m.o page. I'm going into town for a couple of hours now; maybe I'll REOPEN this bug if I don't see any new tinderbox-builds on the FTP site after I come back. Is there anything else I should do to make sure that you (or someone) get the message?
Status: RESOLVED → VERIFIED
OK. Looks like this bug can remain VERIFIED. :-) But I've opened bug 471098 about the numerous recent hangs and put you (Ben) on the CC list.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.