Closed Bug 614474 Opened 9 years ago Closed 9 years ago

[Windows 2003] Intermittent "ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out."

Categories

(Core :: Storage: IndexedDB, defect, major)

x86
Windows Server 2003
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla2.0b11

People

(Reporter: khuey, Assigned: sgautherie)

References

()

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [See comments 57 and 137-142] )

Attachments

(1 file, 2 obsolete files)

Blocks: 438871
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1290826011.1290829974.10424.gz
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2010/11/26 18:46:51
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1291148235.1291150786.19416.gz&fulltext=1
WINNT 5.2 tryserver debug test mochitests-2/5 on 2010/11/30 12:17:15
{
607 INFO TEST-START | /tests/dom/indexedDB/test/test_bfcache.html
...
608 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | don't double-enable bfcache - "undefined" should equal "undefined"
609 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | set up database successfully - "go" should equal "go"
...
610 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out.
611 INFO TEST-END | /tests/dom/indexedDB/test/test_bfcache.html | finished in 308095ms
612 INFO TEST-START | /tests/dom/indexedDB/test/test_clear.html
###!!! ASSERTION: Didn't have this database in our list!: 'Error', file e:/builds/moz2_slave/tryserver-win32-debug/build/dom/indexedDB/IndexedDatabaseManager.cpp, line 585
...
###!!! ASSERTION: XPConnect is being called on a scope without a 'Components' property!: 'Error', file e:/builds/moz2_slave/tryserver-win32-debug/build/js/src/xpconnect/src/xpcwrappednativescope.cpp, line 779
...
JavaScript error: http://mochi.test:8888/tests/dom/indexedDB/test/bfcache_iframe2.html, line 18: res is undefined
...
JavaScript error: http://mochi.test:8888/tests/dom/indexedDB/test/bfcache_iframe2.html, line 21: parent is null
613 INFO TEST-PASS | /tests/dom/indexedDB/test/test_clear.html | got first key - 1 should not equal undefined
}

Assertion(s) + log leaking into the next test.
Severity: normal → major
Keywords: assertion
Whiteboard: [orange] → [See comment 17 log] [orange]
Version: unspecified → Trunk
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1291419724.1291423874.2588.gz
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2010/12/03 15:42:04
When, months from now, someone notices that this hasn't happened for months and closes it: you're closing it because of bug 614956, this was just timing out on Win2K3 debug on the old slaves, because they were brutally slow.
Depends on: 565307, 614956
OS: Windows 7 → Windows Server 2003
Assignee: nobody → philringnalda
Assignee: philringnalda → nobody
Summary: 606 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out. → ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out.
Blocks: 404077
status2.0: --- → ?
Why should this block?  It's just timing out on the slow slaves.  That's not really interesting.
Er, nevermind, read the status flag as blocking.
(In reply to comment #17)
> Assertion(s) + log leaking into the next test.

Let's assume this happens only because the test is actually still running when the timeout harness triggers :-/


(In reply to comment #94)
> It's just timing out on the slow slaves.  That's not really interesting.

Maybe not "interesting", but quite annoying given it happens really often!
Assignee: nobody → sgautherie.bz
Status: NEW → ASSIGNED
Bug 580875 used '5', but I don't know why, then let's start with '2'...
Attachment #498400 - Flags: review?(bent.mozilla)
Attachment #498400 - Flags: review?(Olli.Pettay)
Attachment #498400 - Flags: review?(bent.mozilla)
Attachment #498400 - Flags: review?(Olli.Pettay)
Attachment #498400 - Flags: review+
Comment on attachment 498400 [details] [diff] [review]
(Av1) Add |SimpleTest.requestLongerTimeout(2);|
[Backed out: Comment 140]

http://hg.mozilla.org/mozilla-central/rev/e0c3acc44e9e
Attachment #498400 - Attachment description: (Av1) Add |SimpleTest.requestLongerTimeout(2);| → (Av1) Add |SimpleTest.requestLongerTimeout(2);| [Checked in: Comment 122]
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status2.0: ? → ---
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [See comment 17 log] [orange] → [See comments 17 and 57] [orange]
Target Milestone: --- → mozilla2.0b9
(In reply to comment #123)
> WINNT 5.2 mozilla-central debug test mochitests-2/5 on 2010/12/21 22:18:45

That was the build before the check-in, so just ftr.
I think we should reopen and back out the timeout patch. It's not a timeout issue.
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1293048775.1293050959.12579.gz&fulltext=1
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2010/12/22 12:12:55

*****

(In reply to comment #126)
> It's not a timeout issue.

What let you think that? What do you think the issue could be?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm ready to go up to 5 (per comment 103), 1 step at a time...
Attachment #499426 - Flags: review?(jonas)
When the problem is that the timeout is too short, for tests like this which have several bits of output you will see the output continuing up to the timeout, and the timeout hitting at a fairly random point in the output, and the passing cases taking a significant fraction of the existing timeout on the same slaves.

In this case, you see instead that the failure is always 

set up database successfully - "go" should equal "go"
open a domwindow
nothing's happening so GC
timeout

and that the passing cases for the affected win32- slaves only take 3 or 4 seconds, not 20 or 25 seconds.
(In reply to comment #131)

> When the problem is that the timeout is too short, for tests like this which

I'm not sure what you're point is in this comment, as you wrote exactly the opposite in comment 57.

> have several bits of output you will see the output continuing up to the
> timeout, and the timeout hitting at a fairly random point in the output, and

Why random only, if for example there is a lengthy/backend operation going on at this step of the test?

> the passing cases taking a significant fraction of the existing timeout on the
> same slaves.

Why not, if we assume a random overload of the test machine...
As the simplest example (which this does not appear to be), picture doing something async, and then trying to use the result of that on a 5ms setTimeout. If a fast machine has done it in 5ms, you're fine, if a slow machine has not, then increasing the overall test timeout from 300 seconds to 600 seconds will not help you.

I did say something misleading in comment 131, but it was "20 or 25 seconds" because I'm too used to browser-chrome, where it actually does help to get a longer timeout than the default 30 seconds.

In this case, all you've done is switch from the mochitest harness timing it out after 300 seconds to automation timing it out after 330 seconds with no output. The test output is trying to help you here: before your patch it was "Test timed out." and if you look at the log, "finished in 304687ms" and after your patch it was (other than the TraceMonkey one, where your patch isn't merged) "application timed out after 330 seconds with no output" and "command timed out: 1200 seconds without output."

You've turned it from a single failure in a single test to aborting the suite, and increasing the mochitest timeout from 600 seconds to 900 seconds isn't going to change that, because it will continue to have no output for 330 seconds.
["Mid-air collision detected!"...]

Let me check the failures after comment 122 check-in.

Here are examples of the 3 cases:

0) Passing case:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1293159526.1293162314.14220.gz&fulltext=1
WINNT 5.2 mozilla-central debug test mochitests-2/5 on 2010/12/23 18:58:46
{
...
615 INFO TEST-END | /tests/dom/indexedDB/test/test_bfcache.html | finished in 1574ms
...
622 INFO TEST-END | /tests/dom/indexedDB/test/test_clear.html | finished in 1999ms
}

and ftr,
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1293160227.1293163035.16464.gz&fulltext=1
Rev3 WINNT 6.1 mozilla-central debug test mochitests-2/5 on 2010/12/23 19:10:27
{
607 INFO TEST-START | /tests/dom/indexedDB/test/test_bfcache.html
...
609 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | set up database successfully - "go" should equal "go"
++DOMWINDOW == 32 (1311B470) [serial = 1168] [outer = 131200C0]
610 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | version was set correctly - "1.0" should equal "1.0"
...
615 INFO TEST-END | /tests/dom/indexedDB/test/test_bfcache.html | finished in 5111ms

616 INFO TEST-START | /tests/dom/indexedDB/test/test_clear.html
...
622 INFO TEST-END | /tests/dom/indexedDB/test/test_clear.html | finished in 5556ms
}


1) Timeout in test harness:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1290583608.1290586659.18320.gz
WINNT 5.2 mozilla-central debug test mochitests-2/5 on 2010/11/23 23:26:48
{
605 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | set up database successfully - "go" should equal "go"
++DOMWINDOW == 22 (09120038) [serial = 1168] [outer = 0AF97E60]
WARNING: NS_ENSURE_SUCCESS(rv, 0) failed with result 0x80004005: file e:/builds/moz2_slave/mozilla-central-win32-debug/build/content/base/src/nsContentUtils.cpp, line 2923
--DOMWINDOW == 21 (09B9DEC8) [serial = 1151] [outer = 00000000] [url = about:blank]
...
WARNING: 1 sort operation has occurred for the SQL statement '0xa0c8aa0'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file e:/builds/moz2_slave/mozilla-central-win32-debug/build/storage/src/mozStoragePrivateHelpers.cpp, line 138
606 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out.
607 INFO TEST-END | /tests/dom/indexedDB/test/test_bfcache.html | finished in 328945ms

608 INFO TEST-START | /tests/dom/indexedDB/test/test_clear.html
--DOMWINDOW == 15 (0991CF98) [serial = 1163] [outer = 06B3B6B8] [url = http://mochi.test:8888/tests/dom/indexedDB/test/test_bad_keypath.html]
++DOMWINDOW == 16 (09121BE8) [serial = 1169] [outer = 06B3B6B8]
###!!! ASSERTION: Didn't have this database in our list!: 'Error', file e:/builds/moz2_slave/mozilla-central-win32-debug/build/dom/indexedDB/IndexedDatabaseManager.cpp, line 585
...
###!!! ASSERTION: XPConnect is being called on a scope without a 'Components' property!: 'Error', file e:/builds/moz2_slave/mozilla-central-win32-debug/build/js/src/xpconnect/src/xpcwrappednativescope.cpp, line 779
...
JavaScript error: http://mochi.test:8888/tests/dom/indexedDB/test/bfcache_iframe2.html, line 18: res is undefined
...
JavaScript error: http://mochi.test:8888/tests/dom/indexedDB/test/bfcache_iframe2.html, line 21: parent is null
609 INFO TEST-PASS | /tests/dom/indexedDB/test/test_clear.html | got first key - 1 should not equal undefined
WARNING: Copying all results for cursor snapshot, do something smarter!: file e:/builds/moz2_slave/mozilla-central-win32-debug/build/dom/indexedDB/IDBObjectStore.cpp, line 1909
...
614 INFO TEST-END | /tests/dom/indexedDB/test/test_clear.html | finished in 3572ms
...
(maybe more errors yet)
}
which is what I already mentioned in my comment 17.


1) Timeout in builbot:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1293160014.1293162264.13659.gz&fulltext=1
WINNT 5.2 mozilla-central debug test mochitests-2/5 on 2010/12/23 19:06:54
{
609 INFO TEST-PASS | /tests/dom/indexedDB/test/test_bfcache.html | set up database successfully - "go" should equal "go"
++DOMWINDOW == 61 (0A1F48C8) [serial = 1168] [outer = 0552B580]
WARNING: NS_ENSURE_SUCCESS(rv, 0) failed with result 0x80004005: file e:/builds/moz2_slave/cen-w32-dbg/build/content/base/src/nsContentUtils.cpp, line 2899
--DOMWINDOW == 60 (07480DA8) [serial = 1106] [outer = 00000000] [url = about:blank]
...
WARNING: An event was posted to a thread that will never run it (rejected): file e:/builds/moz2_slave/cen-w32-dbg/build/xpcom/threads/nsThread.cpp, line 386
WARNING: leaking reference to nsTimerImpl: file e:/builds/moz2_slave/cen-w32-dbg/build/xpcom/threads/nsTimerImpl.cpp, line 491
WARNING: 1 sort operation has occurred for the SQL statement '0x9f2a000'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file e:/builds/moz2_slave/cen-w32-dbg/build/storage/src/mozStoragePrivateHelpers.cpp, line 138
TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | application timed out after 330 seconds with no output
command timed out: 1200 seconds without output
}


So, I'm increasing the timeout factor per my comment 102.

Yet, if all these errors and "leakings" are the cause and not the consequence of the test failure, then the test and/or backend need more actual care than it got up to now.
(In reply to comment #136)

> As the simplest example (which this does not appear to be), picture doing
> something async, and then trying to use the result of that on a 5ms setTimeout.

I concur that such a test is fragile: it should use an event/whatever, or at least explicitly report/fail when the result is unavailable after the expected delay ... not time out "forever".

Nb: This is exactly what my "lengthy/backend operation" was referring to in my comment 135.

> If a fast machine has done it in 5ms, you're fine, if a slow machine has not,
> then increasing the overall test timeout from 300 seconds to 600 seconds will
> not help you.

I concur.

> In this case, all you've done is switch from the mochitest harness timing it
> out after 300 seconds to automation timing it out after 330 seconds with no
> output.

Agreed: 300 < 330 < 600 < 1200 :-/

This just/at_least confirms the "no output for 5mn(30s)" symptom :-|

> You've turned it from a single failure in a single test to aborting the suite,
> and increasing the mochitest timeout from 600 seconds to 900 seconds isn't
> going to change that, because it will continue to have no output for 330
> seconds.

Agreed, thanks for answering my comment 129 question wrt to this point:
I just wanted to be explicit about why I should back out.

NB: Actually, the test now fails randomly in harness or automation. Yet, I agree it's a regression.
Comment on attachment 499426 [details] [diff] [review]
(Bv1) Increase timeout factor to 3 from 2

This would not make a difference, per previous comments.
Attachment #499426 - Attachment is obsolete: true
Attachment #499426 - Flags: review?(jonas)
Comment on attachment 498400 [details] [diff] [review]
(Av1) Add |SimpleTest.requestLongerTimeout(2);|
[Backed out: Comment 140]

http://hg.mozilla.org/mozilla-central/rev/8d588d1f53c2
Backout "(Av1) Add |SimpleTest.requestLongerTimeout(2);|" which didn't really help.
Attachment #498400 - Attachment description: (Av1) Add |SimpleTest.requestLongerTimeout(2);| [Checked in: Comment 122] → (Av1) Add |SimpleTest.requestLongerTimeout(2);| [Backed out: Comment 140]
Attachment #498400 - Attachment is obsolete: true
Test code is
{
44       is(event.data, "go", "set up database successfully");
45 
46       iframe.src = "bfcache_iframe2.html";
47       res = JSON.parse((yield).data);
48       is(res.version, "1.0", "version was set correctly");
}

Guess: is the failure happening at line 47 because line 46 (async exec) has not completed yet?
Should I temporarily add dumps to narrow down where the failure happens?
...
Status: REOPENED → ASSIGNED
status2.0: --- → ?
Flags: in-testsuite+
Whiteboard: [See comments 17 and 57] [orange] → [See comments 57 and 137] [orange]
Target Milestone: mozilla2.0b9 → ---
(In reply to comment #141)

> Guess: is the failure happening at line 47 because line 46 (async exec) has not
> completed yet?

So, line 47 |yield| waits for message from
http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/test/bfcache_iframe2.html?force=1
20     trans.oncomplete = function() {
21       parent.postMessage(JSON.stringify(res), "http://mochi.test:8888");
22     }

I can reproduce a +/- similar timeout/failure by commenting out line 21.

In addition, per comment 137 "bfcache_iframe2.html, line 18: res is undefined", it seems line 17 isn't called in the first place.
17     trans.objectStore("mystore").get(42).onsuccess = function(e) {
18       res.value = JSON.stringify(e.result);
19     }

> Should I temporarily add dumps to narrow down where the failure happens?
> ...

And/or is there some kind of debug mode that could be activated on indexedDB for this test/frame?
Keywords: helpwanted
Whiteboard: [See comments 57 and 137] [orange] → [See comments 57 and 137-142] [orange]
Jonas, I hear you wrote this code and this test, I think you'd be a more appropriate owner here...
Assignee: sgautherie.bz → jonas
Can we resolve this as WFM now that Win2k3 is dead?
A fun question. SeaMonkey can't currently manage to run M2, but if it did, on the Win2K3 VMs that it still uses, I presume this would still happen to them. Are we required to keep this bug open, even though the reporter was only interested in the failures on Firefox? Is it actually a good idea to keep it open, with hundreds of uninteresting comments and digressions down blind alleys? Is the assignee actually an appropriate owner, considering the only place it will ever be seen again?
(In reply to comment #166)
> Can we resolve this as WFM now that Win2k3 is dead?

Is it?


(In reply to comment #167)

> A fun question. SeaMonkey can't currently manage to run M2, but if it did, on
> the Win2K3 VMs that it still uses, I presume this would still happen to them.

SeaMonkey does run mochitests-2, which is still (near) perma-orange due to this bug:
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1295154514.1295158405.19245.gz
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2011/01/15 21:08:34
{
517 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out.
}

> Are we required to keep this bug open, even though the reporter was only
> interested in the failures on Firefox?

As this bug is in Core, SM should be considered just as Firefox was, I think.

> Is it actually a good idea to keep it
> open, with hundreds of uninteresting comments and digressions down blind
> alleys? Is the assignee actually an appropriate owner, considering the only
> place it will ever be seen again?

If people agree, I would suggest to turn this test into a todo when just taking too long (or run on Win2k3 or SeaMonkey or whatever condition that would fix SM tinderboxes) :-|
That is, if the test can't be "fixed" not to time out.
I think we should just disable the test when running on Windows 2k3.  This is a kludge, since the real issue is painfully slow VMs, but it's good enough IMO.  Since the test will still be running on Firefox builders on Windows, I don't think the loss of test coverage on SeaMonkey will be a significant problem.
Per comment 169.

NB: I don't think we can check whether it's a debug build, can we?
Attachment #504278 - Flags: review?(jonas)
Blocks: 629996
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1296504130.1296508109.14534.gz
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2011/01/31 12:02:10
Comment on attachment 504278 [details] [diff] [review]
(Cv1) Disable this test on (all) Windows 2003 as a workaround
[Checked in: Comment 172]

http://hg.mozilla.org/mozilla-central/rev/1c6e36066e73
Attachment #504278 - Attachment description: (Cv1) Disable this test on (all) Windows 2003 as a workaround → (Cv1) Disable this test on (all) Windows 2003 as a workaround [Checked in: Comment 172]
Assignee: jonas → sgautherie.bz
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
status2.0: ? → ---
Flags: in-testsuite+
Keywords: helpwanted
Resolution: --- → FIXED
Summary: ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out. → [Windows 2003] Intermittent "ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test timed out."
Target Milestone: --- → mozilla2.0b11
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1296530254.1296536385.24202.gz&fulltext=1
WINNT 5.2 comm-central-trunk debug test mochitests-2/5 on 2011/01/31 19:17:34
{
515 INFO TEST-KNOWN-FAIL | /tests/dom/indexedDB/test/test_bfcache.html | Test disabled on (too slow debug) Windows 2003 (tinderboxes)
}

V.Fixed
Status: RESOLVED → VERIFIED
Component: Document Navigation → DOM: IndexedDB
Target Milestone: mozilla2.0b11 → ---
Version: Trunk → unspecified
Target Milestone: --- → mozilla2.0b11
Version: unspecified → Trunk
Whiteboard: [See comments 57 and 137-142] [orange] → [See comments 57 and 137-142]
You need to log in before you can comment on or make changes to this bug.