Closed Bug 527614 Opened 14 years ago Closed 13 years ago

Intermittent timeout in test_value_cloning.html

Categories

(Testing :: Mochitest, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: philor, Assigned: dbaron)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

This is absolutely *not* the first time it timed out, so there's no reason to believe it's related to anything pushed today {insert rant about how people who are getting paid to do it don't watch the tree or file on intermittent failures either before or after they push}.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1257814288.1257818050.5171.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/09 16:51:28  

55332 INFO TEST-PASS | /tests/layout/style/test/test_value_cloning.html | serialization should match when cloning overflow: inherit
55333 INFO TEST-PASS | /tests/layout/style/test/test_value_cloning.html | computed values should match when cloning overflow: inherit
++DOMWINDOW == 88 (0xd76db88) [serial = 3203] [outer = 0xf01aa30]
55334 INFO TEST-PASS | /tests/layout/style/test/test_value_cloning.html | serialization should match when cloning overflow: -moz-initial
55335 INFO TEST-PASS | /tests/layout/style/test/test_value_cloning.html | computed values should match when cloning overflow: -moz-initial
55336 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_value_cloning.html | Test timed out.
Whiteboard: [orange]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1257813545.1257817024.25931.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/09 16:39:05
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1257869682.1257873178.25097.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/10 08:14:42

David - you landed this test in late October and judging from philor's comment 0 it's been intermittent basically since landing. Can you take a look?
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1257971721.1257975409.22759.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/11 12:35:21
s: moz2-linux-slave15
Blocks: 438871
I think this is just the slaves being too slow (sometimes running on hosts that have more or less other load) or having clock skew; this test does a good bit of work and takes a decent amount of time.  Given the way we run slaves in VMs, I don't think there's an answer to the question of "how much work is it ok for a test to do before it completes", since even very short tests can occasionally time out, the frequency is just higher for longer tests.

I could probably split this test into two or five or twenty (at considerably cost in complexity of the test), which would probably reduce the frequency of it timing out by some amount, but I don't have any reason to believe it would cut that frequency to zero.

Is that worth it?  Do we have any statement of what level of performance we can expect from our test VMs and thus the amount of work that's ok to do in a single test?
(In reply to comment #4)
> I think this is just the slaves being too slow (sometimes running on hosts that
> have more or less other load) or having clock skew; this test does a good bit
> of work and takes a decent amount of time.  Given the way we run slaves in VMs,
> I don't think there's an answer to the question of "how much work is it ok for
> a test to do before it completes", since even very short tests can occasionally
> time out, the frequency is just higher for longer tests.
> 
> I could probably split this test into two or five or twenty (at considerably
> cost in complexity of the test), which would probably reduce the frequency of
> it timing out by some amount, but I don't have any reason to believe it would
> cut that frequency to zero.
> 
> Is that worth it?  Do we have any statement of what level of performance we can
> expect from our test VMs and thus the amount of work that's ok to do in a
> single test?

Let's ask some people!
1) So far, this bug lists failures on linux VMs. Has this ever failed on Win32 VMs?

2) Not sure if splitting up a test suite like that would really help. Before you look at complex split-up work, can we try a simpler experiment first? 

How about re-running this test several times in a row on the same build? We could trigger this 'n' times in a row, on the same build, when you are standing by. If it passes on some VMs and fails on others, we could pull the failing VMs out of production for you to debug on, and we could compare load/usage stats between successful and failing VMs. 

It wont take long to do this experiment, and its possible that it would uncover a race/timing condition in the test, or in the code-being-tested... which splitting up the test would not address.
And to answer a couple of the specific questions:
* Our VMs do not have clock skew problems
* The loading of our ESX hosts hasn't changed in the past few weeks. We appear to be slightly overloaded on a few of them, and OK on the rest.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258027034.1258029138.8929.gz
WINNT 5.2 mozilla-central debug test mochitests-4/5 on 2009/11/12 03:57:14
OS: Linux → All
Hardware: x86 → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258046724.1258049509.19278.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/12 09:25:24  
s: moz2-linux-slave16
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258072891.1258076182.32541.gz
WINNT 5.2 mozilla-central debug test mochitests-4/5 on 2009/11/12 16:41:31  
56288 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_value_cloning.html | Test timed out.
I think the thing to do here is add a mechanism for a test to say that it needs a longer timeout.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258145311.1258148625.19896.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/13 12:48:31
s: moz2-linux-slave41
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258492776.1258496946.2384.gz
WINNT 5.2 mozilla-central debug test mochitests-4/5 on 2009/11/17 13:19:36
"s: moz2-win32-slave01"
55436 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_value_cloning.html | Test timed out.
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
Priority: -- → P4
Comment on attachment 412313 [details] [diff] [review]
patch to allow tests to request a longer timeout (as a multiple of the normal one)

>diff --git a/testing/mochitest/tests/SimpleTest/SimpleTest.js b/testing/mochitest/tests/SimpleTest/SimpleTest.js
>--- a/testing/mochitest/tests/SimpleTest/SimpleTest.js
>+++ b/testing/mochitest/tests/SimpleTest/SimpleTest.js
>+SimpleTest.requestLongerTimeout = function (factor) {
>+    if (parentRunner) {
>+        parentRunner.requestLongerTimeout(factor);
>+    }
>+}
>+

Could you add a short comment above this giving a little more detail about what it does?

r=me with that change.
Attachment #412313 - Flags: review?(ted.mielczarek) → review+
Component: Style System (CSS) → Mochitest
Product: Core → Testing
QA Contact: style-system → mochitest
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258592371.1258595017.30664.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/18 16:59:31
s: moz2-linux-slave25
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258590035.1258593778.16560.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/18 16:20:35
s: moz2-linux-slave14
http://hg.mozilla.org/mozilla-central/rev/3b7b772a2f98

(landed despite all the orange)
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 529860
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1258656136.1258662075.20691.gz
Linux mozilla-central debug test mochitests-4/5 on 2009/11/19 10:42:16
s: moz2-linux-slave01
55957 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_value_cloning.html | Test timed out.

(one changeset before the fix above landed)
I guess a dedicated test for this new feature could be added.

This feature may be wanted on m-1.9.2 (and m-1.9.1) too...
Flags: in-testsuite?
Target Milestone: --- → mozilla1.9.3a1
Depends on: 558705
Perhaps the requestLongerTimeout(3) should be increased? See also bug 558705?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Do the stacks in those logs show us at the event loop or in the middle of something?

Any idea why this would have started happening again today?  I'd suggest that maybe it shouldn't be the same bug.
Was that spate of occurrences fixed by
http://hg.mozilla.org/mozilla-central/rev/69f2d0457750 ?
I think it was.

In any case, if this starts again, please file a new bug.
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.