debug crashtests: 11 tests, and then whole suite, time out reliably

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: dbaron, Assigned: dbaron)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

The debug unit test machines that are running crashtests have reliable test timeouts.  I looked at 1 Linux run, 2 Mac runs, and 1 Windows run, and they all had the same exact set of test failures, all due to "timed out waiting for reftest-wait to be removed (after onload fired)":

layout/base/crashtests/500467-1.html
layout/forms/crashtests/366537-1.xhtml
layout/forms/crashtests/367587-1.html
layout/forms/crashtests/370703-1.html
layout/forms/crashtests/370940-1.html
layout/forms/crashtests/373586-1.xhtml
layout/generic/crashtests/225868-1.html
layout/generic/crashtests/307979-1.html
layout/generic/crashtests/324318-1.html
layout/generic/crashtests/334105-1.xhtml
layout/generic/crashtests/337883-1.html

I *don't* see the problem locally (although my local run does hang on layout/generic/crashtests/438509-1.html , which we might want to make skip-if(isDebugBulid); it's a performance test).
Now this seems to have stopped happening on Linux (two cycles in a row), though it was reliable on all platforms before.
It's intermittent on Linux, but still reliable on Mac and Windows.
Summary: debug crashtests: 10 tests, and then whole suite, time out reliably → debug crashtests: 11 tests, and then whole suite, time out reliably
The Linux debug everythingelse log for the above changeset is:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256313337.1256320715.28178.gz&fulltext=1
and it looks like none of the script in the testcases was executed at all.

The regular Linux everythingelse log is:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256309297.1256313655.9057.gz&fulltext=1
and it showed what I expected:
DEBUGGING BUG 523934: in doIt
DEBUGGING BUG 523934: timeout set
DEBUGGING BUG 523934: in AttrModifiedListener
DEBUGGING BUG 523934: AttrModifiedListener: set timeout
DEBUGGING BUG 523934: in AttrModifiedListenerContinuation
DEBUGGING BUG 523934: in FinishWaitingForTestEnd
DEBUGGING BUG 523934: AttrModifiedListenerContinuation: done waiting

and the Linux opt everythingelse log is:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256309037.1256312221.25095.gz&fulltext=1

So it seems like something is preventing the script from being executed in the first place.

(Going to check the Mac and Windows logs shortly.)
The Mac debug everythingelse log is here:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256308882.1256313397.6210.gz&fulltext=1
(it failed because of the output size limit, but after some of the required data)

That cycle actually didn't have this problem:

DEBUGGING BUG 523934: in doIt
DEBUGGING BUG 523934: timeout set
DEBUGGING BUG 523934: in AttrModifiedListener
DEBUGGING BUG 523934: AttrModifiedListener: set timeout
DEBUGGING BUG 523934: in AttrModifiedListenerContinuation
DEBUGGING BUG 523934: in FinishWaitingForTestEnd
DEBUGGING BUG 523934: AttrModifiedListenerContinuation: done waiting
REFTEST TEST-PASS | file:///builds/slave/mozilla-central-macosx-debug-unittest-everythingelse/build/reftest/tests/layout/base/crashtests/500467-1.html | (LOAD ONLY)

...

DEBUGGING BUG 523934: in boom
DEBUGGING BUG 523934: in AttrModifiedListener
DEBUGGING BUG 523934: AttrModifiedListener: set timeout
DEBUGGING BUG 523934: removed class attribute
DEBUGGING BUG 523934: in AttrModifiedListenerContinuation
DEBUGGING BUG 523934: in FinishWaitingForTestEnd
DEBUGGING BUG 523934: AttrModifiedListenerContinuation: done waiting
REFTEST TEST-PASS | file:///builds/slave/mozilla-central-macosx-debug-unittest-everythingelse/build/reftest/tests/layout/forms/crashtests/366537-1.xhtml | (LOAD ONLY)



It looks like windows debug everythingelse didn't test that changeset (or hasn't yet).
(In reply to comment #4)
> The Linux debug everythingelse log for the above changeset is:
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1256313337.1256320715.28178.gz&fulltext=1
> and it looks like none of the script in the testcases was executed at all.

Actually, that's not true.  It looks like what happened is:

DEBUGGING BUG 523934: in doIt
[... lots of printing from a cycle collection ...]
REFTEST TEST-UNEXPECTED-FAIL | file:///builds/moz2_slave/mozilla-central-linux-debug-unittest-everythingelse/build/reftest/tests/layout/base/crashtests/500467-1.html | timed out waiting for reftest-wait to be removed (after onload fired)

But there was actually nothing printed for the second test, nor was the second dump in the first test ever hit.


It seems like maybe having a cycle collection mid-script is putting things in a bad state?
I tried to reproduce the problem on Linux by downloading the packages and running the same commands that the debug unit test box runs, and I didn't see the problem.

I tried the same on Windows, but I couldn't get the executables to run ("Bad file number").
[2009-10-27 11:53:07] <bc> dbaron: mochitest doesn't set dom.max_script_run_time or dom.max_chrome_script_run_time do they?
[2009-10-27 11:53:43] <ted> i think mochitest does, yes
[2009-10-27 11:54:24] <bc> i thought i ran into it during a valgrind run and when looking didn't see it. 
[2009-10-27 11:54:37] <bc> automation.py does, but mochitest doesn't do the same thing.
[2009-10-27 11:54:38] <ted> i don't think reftest does (or i'm not sure)
[2009-10-27 11:54:47] <bhearsum> mochitest uses automation.py, doesn't it?
[2009-10-27 11:54:55] <ctalbert> yes
[2009-10-27 11:54:58] <ted> bc: i think you have your test suites backwards
[2009-10-27 11:54:58] <bc> only for crash checking iirc
[2009-10-27 11:55:10] <bc> beer?
[2009-10-27 11:55:24] <bhearsum> always
[2009-10-27 11:55:42] <ted> http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#237
[2009-10-27 11:55:50] <ted> mochitest uses that
[2009-10-27 11:55:58] <ted> http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#62
[2009-10-27 11:56:02] <ted> reftest doesn't set that pref
[2009-10-27 11:56:20] <ted> dbaron: we could try setting that pref in the reftest profile and see if that fixes crashtest
I think this fixed it.  I've unhid Linux debug everythingelse, but there are still other issues uncovered on Windows and Mac.  (Mac seems to be getting less and less stable.)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee: nobody → dbaron
You need to log in before you can comment on or make changes to this bug.