Closed Bug 782641 Opened 12 years ago Closed 11 years ago

Intermittent test_nodb_pluschanges.js | test failed (with xpcshell return code: 0) | Timeout - See following stack | head.js | exception thrown from do_timeout callback: 2147500036

Categories

(Firefox :: Search, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: emorley, Assigned: Yoric)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 3 obsolete files)

Rev3 WINNT 6.1 fx-team pgo test xpcshell on 2012-08-13 17:16:12 PDT for push 84324c323762

slave: talos-r3-w7-039

https://tbpl.mozilla.org/php/getParsedLog.php?id=14355004&tree=Fx-Team

{
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js | Forcing flush
*** Search: SRCH_SVC_saveSortedEngineList: starting
*** Search: SRCH_SVC_saveSortedEngineList: something to do
*** Search: epsCommit: start
*** Search: epsCommit: (re)setting timer
*** Search: Lazy_go: starting
*** Search: Lazy_go: creating timer
*** Search: SRCH_SVC_saveSortedEngineList: done
*** Search: _buildCache: Writing to cache file.
*** Search: Lazy_flush: starting
*** Search: epsWriteCommit: start

TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js | Commit complete

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js | [null : 66] true == true

TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js | Parsing metadata
*** Search: epsWriteCommit: done 0
*** Search: epsWriteCommit: done 0

TEST-UNEXPECTED-FAIL | c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js | Timeout - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 451
JS frame :: c:/talos-slave/test/build/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_nodb_pluschanges.js :: <TOP_LEVEL> :: line 99
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 121

TEST-INFO | (xpcshell/head.js) | exiting test

TEST-UNEXPECTED-FAIL | c:\talos-slave\test\build\xpcshell\head.js | exception thrown from do_timeout callback: 2147500036 - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 451
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 123
}
Since this is reported only under Windows, I suspect that this is due to a file being improperly closed.
Whiteboard: [orange]
Self-reviewing a trivial typo fix.
Not 100% sure it will fix the current bug, but it's worth a try.
Assignee: nobody → dteller
Attachment #731189 - Flags: review+
Alternative fix, equally trivial, as suggested by gavin.
Attachment #731189 - Attachment is obsolete: true
Attachment #731233 - Flags: review+
After more analysis, I believe that the fix will not prevent the intermittent test failures.

I have the strong impression that this is yet another case of Windows not removing files when it claims to have removed them, and rather producing "Access denied" errors when one attempts to do anything that could touch that file. The problem would appear in |writeCommit|.
https://hg.mozilla.org/mozilla-central/rev/3f725a32ba89
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 22
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: Firefox 22 → ---
I'll see if an active wait can ensure that we always wait long enough that Windows flushes its buffers.
Ok, here's a version of the test suite that spins until the file is effectively removed. Something of a hack, but it seems to do the trick.
Attachment #731233 - Attachment is obsolete: true
Attachment #750424 - Flags: review?(gavin.sharp)
Can you explain what the problem is exactly, and why this fixes it? I don't understand based on earlier comments here.
Flags: needinfo?(dteller)
Windows has an interesting implementation of file removal. After the system call has returned, there is a brief interval during which the file *name* is unavailable, i.e. any system call involving this file name will fail. There is no system call to inform clients that the file name has become available again.

In the testsuite, we cleanup some files that may be leftover from other tests, then proceed to call code that immediately recreates these files. This causes the behavior mentioned above.

The fix simply loops until the file name is available before proceeding.
Flags: needinfo?(dteller)
(In reply to David Rajchenbach Teller [:Yoric] from comment #27)
> In the testsuite, we cleanup some files that may be leftover from other
> tests, then proceed to call code that immediately recreates these files.
> This causes the behavior mentioned above.
> 
> The fix simply loops until the file name is available before proceeding.

How does calling open("I do not exist") do this? Shouldn't be you be calling open() on the filename that we depend on being available instead?
(sorry, I meant "read", not "open")
Might be related to me confusing two patches for distinct bugs :)
Attachment #750424 - Attachment is obsolete: true
Attachment #750424 - Flags: review?(gavin.sharp)
Attachment #750701 - Flags: review?(gavin.sharp)
Comment on attachment 750701 [details] [diff] [review]
Hacking around Windows file system oddities

This makes much more sense!

IIRC this problem has come up a lot before, I forget how it was solved in other cases. A while(true)exists() loop seems dangerous - if the removal fails for some reason, this could iloop. I guess polling less frequently and adding a callback would require more extensive test changes...
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #31)
> Comment on attachment 750701 [details] [diff] [review]
> Hacking around Windows file system oddities
> 
> This makes much more sense!
> 
> IIRC this problem has come up a lot before, I forget how it was solved in
> other cases. A while(true)exists() loop seems dangerous - if the removal
> fails for some reason, this could iloop.

Well, I assumed that we would timeout. I can add a 1second timeout, if necessary.

> I guess polling less frequently and
> adding a callback would require more extensive test changes...

It would. If you deem it necessary, though, I can rewrite the tests to make them somewhat async.
Hmm, interestingly this problem seems to no longer occur. Did we asyncify something that makes this problem less likely to occur in practice now?
Comment on attachment 750701 [details] [diff] [review]
Hacking around Windows file system oddities

I guess this is fine as a test-hack, but let's not land it unless we need to.
Attachment #750701 - Flags: review?(gavin.sharp) → review+
Let's WONTFIX this for the moment, then.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → WONTFIX
Resolution: WONTFIX → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: