Closed Bug 911249 Opened 6 years ago Closed 6 years ago

Intermittent Windows xpcshell "timed out after 1000 seconds of no output" (with "Can't trigger Breakpad, just killing process" and "mozprocess timed out")

Categories

(Testing :: XPCShell Harness, defect)

x86
Windows XP
defect
Not set

Tracking

(firefox24 unaffected, firefox25 unaffected, firefox26 fixed)

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 --- unaffected
firefox26 --- fixed

People

(Reporter: emorley, Assigned: mihneadb)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

Windows XP 32-bit mozilla-inbound opt test xpcshell on 2013-08-29 21:38:52 PDT for push 14619f24a8a8

slave: t-xp32-ix-011

https://tbpl.mozilla.org/php/getParsedLog.php?id=27209828&tree=Mozilla-Inbound

{
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_bug595059.js | test passed (time: 6660.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_bug794211.js | test passed (time: 6675.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0190_rmrfdirFileInUse_xp_win_complete.js | test passed (time: 6690.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\tools\profiler\tests\test_shared_library.js | test passed (time: 6751.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test_timermanager\unit\test_0010_timermanager.js | test passed (time: 6767.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0191_rmrfdirFileInUse_xp_win_partial.js | test passed (time: 6766.000ms)
21:44:37     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0189_fileInUse_xp_win_partial.js | test passed (time: 7209.000ms)
21:44:37     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpn90wyt. Waiting.
21:44:37     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmp4n7my7. Waiting.
21:44:37     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpsp1ypt. Waiting.
21:44:38     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpn90wyt. Waiting.
21:44:38     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmp4n7my7. Waiting.
21:44:38     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpsp1ypt. Waiting.
21:44:39     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpn90wyt. Waiting.
21:44:39     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmp4n7my7. Waiting.
21:44:39     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpsp1ypt. Waiting.
21:44:40     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpn90wyt. Waiting.
21:44:40     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmp4n7my7. Waiting.
21:44:40     INFO -  TEST-INFO | Failed to remove directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpsp1ypt. Waiting.
21:44:45     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\extensions\test\xpcshell\test_update_strictcompat.js | test passed (time: 30801.000ms)
21:44:45     INFO -  TEST-PASS | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\extensions\test\xpcshell\test_update.js | test passed (time: 30816.000ms)
21:48:43     INFO -  Can't trigger Breakpad, just killing process
22:05:23     INFO - mozprocess timed out
22:05:23    ERROR - timed out after 1000 seconds of no output
22:05:23    ERROR - Return code: 572
}
Bug 890026 *might* fix this.
Depends on: 890026
It seems like windows does not let us kill those processes via kill. Probably
the same thing happens with the ctypes version but we don't get the exception and
it just hangs.
Attachment #798109 - Flags: review?(ted)
Assignee: nobody → mihneadb
Status: NEW → ASSIGNED
Attachment #798109 - Flags: review?(ted) → review+
https://hg.mozilla.org/mozilla-central/rev/cde2e1a9a49c
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Still hitting this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ted, maybe you have some insight. This [1] log shows that the new logic kicks in, however at some time the thread blocks again.

A way to fix this would be to add a separate "test is done" variable, other than thread.isAlive and to set that to True whenever the test is actually done or before we try to kill the process. This way, the harness will not block. I'll post a follow up patch so you can see what I mean.

[1] https://tbpl.mozilla.org/php/getParsedLog.php?id=27268228&tree=Mozilla-Inbound#error0
Flags: needinfo?(ted)
(In reply to Mihnea Dobrescu-Balaur (:mihneadb) from comment #19)
> Created attachment 798316 [details] [diff] [review]
> Don't block XPCShell test harness on hangs caused by os.kill on Windows
> 
> https://tbpl.mozilla.org/?tree=Try&rev=43c8999bb87b

Did a bunch of retriggers, looking good. I expect this to get rid of the intermittent.
Attachment #798316 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
Checkin needed for the 2nd patch. Thanks
Keywords: checkin-needed
Or not.
Whiteboard: [leave open]
Even though we added the done attr (which we still need), we kept trying to
join on blocked threads. This fixes it and makes sure that if a thread blocks
(basically means test timed out on Windows) the corresponding test will be run
in isolation at the end of the run.
Attachment #799091 - Flags: review?(ted)
Attachment #799091 - Flags: review?(ted) → review+
(Note for sheriff: only the last patch needs to get checked in.)
https://hg.mozilla.org/mozilla-central/rev/cb07c1c976f0
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
It doesn't look like WindowsError is defined anywhere?  Maybe we should just except Exception, e and print the exception?
(In reply to Jeff Hammel [:jhammel] from comment #58)
> It doesn't look like WindowsError is defined anywhere?  Maybe we should just
> except Exception, e and print the exception?

We went with OSError.
You need to log in before you can comment on or make changes to this bug.