431745 - qm-centos5-02 is intermittently failing test_sleep_wake.js

:Gavin Sharp [email: gavin@gavinsharp.com]

Reporter

Description

•

17 years ago

qm-centos5-02 is failing, but qm-centos5-01 and qm-centos5-03 are green, so it seems likely that whatever's wrong is specific to that machine. Can it be restarted?

Dave Miller [:justdave]

Comment 1

•

17 years ago

This machine is not on the Tier 1 support document. http://wiki.mozilla.org/Buildbot/IT_Support_Document

Assignee: server-ops → nobody

Component: Server Operations: Tinderbox Maintenance → Release Engineering

QA Contact: justin → release

Phil Ringnalda (:philor)

Updated

•

17 years ago

Depends on: 431784

Phil Ringnalda (:philor)

Comment 2

•

17 years ago

(In reply to comment #1) > This machine is not on the Tier 1 support document. Filed bug 431784 on that documentation bug.

Rob Campbell [:rc] (:robcee)

Updated

•

17 years ago

OS: Windows Vista → Linux

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 3

•

17 years ago

qm-centos-02 and qm-centos-03 were newly added, see bug#425791 for details. The idea was that so long as two-of-three-identical-machines were consistent, it reduced the overall need for tier1 pager support. I note that qm-centos-01 has also gone orange. We're investigating if this is really a unit test machine problem impacting multiple machines or if any code landings could be causing this.

Component: Release Engineering → Release Engineering: Maintenance

Priority: -- → P1

:Gavin Sharp [email: gavin@gavinsharp.com]

Reporter

Comment 4

•

17 years ago

(In reply to comment #3) > I note that qm-centos-01 has also gone orange. We're investigating if this is > really a unit test machine problem impacting multiple machines or if any code > landings could be causing this. qm-centos5-01 wasn't orange when I filed this bug, and hasn't been consistently orange like qm-centos5-02 (it has been sporadically orange, but that's somewhat normal for unit test machines). Given comment 0, and the fact that it's been consistently orange for days, can we just reboot the machine? It's extremely unlikely that the test would be failing on only one of the 3 identical machines this consistently due to a code problem.

Chris Cooper [:coop] (he/him)

Updated

•

17 years ago

Assignee: nobody → ccooper

Chris Cooper [:coop] (he/him)

Comment 5

•

17 years ago

Rebooting now.

Status: NEW → ASSIGNED

Chris Cooper [:coop] (he/him)

Comment 6

•

17 years ago

Slave restarted.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 7

•

17 years ago

Unfortunately rebooting didn't seem to help

:Gavin Sharp [email: gavin@gavinsharp.com]

Reporter

Comment 8

•

17 years ago

I've hidden the box from the waterfall since it's perma-orange is misleading people into thinking they can't check in. The other two machines are both green.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 9

•

17 years ago

Adding to the fun was nthomas's discovery about that some drives were read-only this morning. See bug#432012. It had already also failed out with the "make check" errors, but this adds to the fun. To get around that, we've restarted the VM just now. Also, discovered qm-centos5-02 was configured with low RAM. We bumped it up from 512->1024 while we were rebooting.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 10

•

17 years ago

Taking this bug, as coop is on leave. I note that the most recent runs on qm-centos5-02 are all failing out during "make -f client.mk checkout" with: /bin/sh: mozilla/.mozconfig.out: Read-only file system Adding client.mk options from /builds/slave/trunk_centos5_2/mozilla/.mozconfig: MOZ_CO_PROJECT=browser MOZ_OBJDIR=$(TOPSRCDIR)/objdir MOZ_CO_MODULE=mozilla/testing/tools rm: cannot remove `.mozconfig.out': Read-only file system make: *** [checkout] Error 1 program finished with exit code 2

Assignee: ccooper → joduinn

Status: ASSIGNED → NEW

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 11

•

17 years ago

Updating the kernel fixed the read-only drive, see details in https://bugzilla.mozilla.org/show_bug.cgi?id=407796#c64. The next run passed green, so I've added machine back onto tinderbox and closed this bug. Please reopen if this happens again.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Nick Thomas [:nthomas] (UTC+12)

Comment 12

•

17 years ago

Unfortunately it started failing in make check on the second and subsequent runs: ../../../../_tests/xpcshell-simple/test_dm/unit/test_sleep_wake.js: command timed out: 2400 seconds without output, killing pid 2743 Reopening, and re-hidden from Firefox tree.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 13

•

17 years ago

ugh. On buildbot waterfall, I see this is running clean since at least 15:43 this afternoon, which is as far back as the waterfall page goes. I'm still tracking back looking for the builds that failed out. I wont put this back on tinderbox again, until I find the breaking build and see what happened.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 14

•

17 years ago

Changing summary to match new symptoms, as previous problem now fixed.

Summary: qm-centos5-02 is failing |make check| → qm-centos5-02 is intermittently failing test_sleep_wake.js

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

17 years ago

Assignee: joduinn → rcampbell

Status: REOPENED → NEW

Rob Campbell [:rc] (:robcee)

Comment 15

•

17 years ago

currently this vm is failing a different check: ../../../../_tests/xpcshell-simple/test_dm/unit/test_resume.js: /builds/slave/trunk_centos5_2/mozilla/tools/test-harness/xpcshell-simple/test_all.sh: line 111: 18021 Segmentation fault (core dumped) That's from the most-recent run.

Rob Campbell [:rc] (:robcee)

Comment 16

•

17 years ago

from an irc conversation this morning. 08:11 < nthomas> all three of qm-centos5-01,02,03 are VM's. 02 is on netapp-b-vmware, the other two on netapp-d-fcal1 this is certainly one difference between the different VMs. Could it account for these failures? I have no idea.

Rob Campbell [:rc] (:robcee)

Comment 17

•

17 years ago

since this was filed, we've seen this failure on a few other machines. Notably qm-centos5-moz2-01. Has anyone looked at the test code at all?

debugging patch 17 years ago :Gavin Sharp [email: gavin@gavinsharp.com] 4.98 KB, patch	sdwilsh : review+	Details \| Diff \| Splinter Review
v1.0 17 years ago Shawn Wilsher :sdwilsh 5.96 KB, patch	Gavin : review+	Details \| Diff \| Splinter Review
v1.1 17 years ago Shawn Wilsher :sdwilsh 6.92 KB, patch		Details \| Diff \| Splinter Review
branch version 17 years ago Shawn Wilsher :sdwilsh 2.64 KB, patch		Details \| Diff \| Splinter Review