Closed Bug 480322 Opened 15 years ago Closed 15 years ago

Linux unittest box having massive timeouts, crashing -- needs attention probably

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: bhearsum)

References

Details

The linux unittest box just went wonky for two consecutive cycles.

First cycle:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1235645906.1235650452.6081.gz
Description:
 - Massive timeouts in reftests
 - Crash during crashtests (after loading 323495-1.html)
 - Hang while starting mochitests
 - Timeout in mochichrome test test_domstorage.xul

Second cycle:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1235653106.1235659476.530.gz
Description:
 - "configure: line 2911: echo: write error: Broken pipe"
 - Segfault in crashtests after loading 354458-1.html
 - Timeout in mochitests (300sec) after passing one of the tests in test_bug467672-2.html
 - Mochichrome failures in test_bug418874.xul

The only code change between these two orange builds and the previous green build was the addition of two crashtests, which both passed, so I don't think that triggered the problem.  That changeset is: http://hg.mozilla.org/mozilla-central/rev/d3ce1f2c44bb
Summary: Linux unittest box having massive timeouts, crashing -- needs reboot? → Linux unittest box having massive timeouts, crashing -- needs attention probably
For the record, there are 16 different slaves that do these builds. If there's a machine problem it's affecting most or all of them. I'm still investigating this a bit and won't be rebooting anything just yet.
Assignee: server-ops → nobody
Component: Server Operations: Tinderbox Maintenance → Release Engineering
Priority: -- → P2
QA Contact: mrz → release
So, the two latest oranges are on moz2-linux-slave18 and moz2-linux-slave07 respectively. slave18 is freshly cloned, and this was it's first m-c unittest build. slave07 has been around awhile, and did a dep build. Given that, I can pretty safely say that clobbering won't help.

I logged on to both of the machines and there weren't any hung processes, Xvfb and metacity were both started properly - everything looked fine there.

I also checked out the load on the ESX hosts - it was fine.

I'd like to see another run on one of these machines with the same revision before doing any rebooting. If it's an intermittent test failure this should help determine that. If we get the same results again we can reboot one of them and see if that helps.

Both are building right now, as soon as one becomes free I'll pull it out of the pool and run the tests manually on it - since there's no way to force a run on a specific slave.
(In reply to comment #2)
> slave18 is freshly cloned, and this was it's first m-c unittest
> build.

Interesting -- that's the one that had the "configure: line 2911: echo: write error: Broken pipe".  Could that be related to it being freshly cloned?  (something not getting set up correctly, maybe?)

In any case, thanks for looking into this!
(In reply to comment #3)
> (In reply to comment #2)
> > slave18 is freshly cloned, and this was it's first m-c unittest
> > build.
> 
> Interesting -- that's the one that had the "configure: line 2911: echo: write
> error: Broken pipe".  Could that be related to it being freshly cloned? 
> (something not getting set up correctly, maybe?)

It's possible. However, it's done dep and leak test builds and not had that problem - so I dunno!
Status: NEW → ASSIGNED
(In reply to comment #0)
>  - Crash during crashtests (after loading 323495-1.html)
[ snip ]
>  - Segfault in crashtests after loading 354458-1.html

Bug 480300 covers ths sporadic-crash-during-crashtests issue.
moz2-linux-slave07 became idle. I pulled it out of the pool and am running through the tests manually right now.
I just repro'ed the crashtest crash on slave07 after about 5 runs. Unfortunately, I forgot to set XPCOM_DEBUG_BREAK, so I'm trying to repro it again now.
I suspect the mochitest timeouts are caused by the crashtest crash filed in https://bugzilla.mozilla.org/show_bug.cgi?id=480300. I'll keep this bug open until that fix lands though, to be sure. Please let me know if you see a mochitest timeout which isn't preceeded by a crashtest crash.
Assignee: nobody → bhearsum
Depends on: 480300
I'm pretty certain this has been fixed since http://hg.mozilla.org/mozilla-central/rev/87f92525dccf.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.