Closed Bug 580525 Opened 14 years ago Closed 14 years ago

xpcshell: "TimeoutError: command timed out: 1200 seconds without output" in test_simple_wrap.js, test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, test_redirect_passing_wrap.js, test_redirect-caching_passing_wrap.js, etc.)

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: cjones, Assigned: jduell.mcbugs)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Release/1279677104.1279678587.12873.gz#err0
Rev3 Fedora 12 mozilla-central-release opt test xpcshell on 2010/07/20 18:51:44  
s: talos-r3-fed-021
buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 1957

I tried to reproduce locally on a debug/x86-64 build was unable.  This might be related to bug 576206 or (its likely dup) bug 579542, and/or it might have been fixed by the recent landing of bug 572980 on e10s.
Should add for clarity

TEST-INFO | /home/cltbld/talos-slave/mozilla-central-release-fedora-opt-u-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_progress_wrap.js | running test ...
TEST-PASS | /home/cltbld/talos-slave/mozilla-central-release-fedora-opt-u-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_progress_wrap.js | test passed
TEST-INFO | /home/cltbld/talos-slave/mozilla-central-release-fedora-opt-u-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_simple_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 1957
process killed by signal 9
program finished with exit code -1


This was during unit_ipc so doesn't affect ff4 (though it does affect fennec2!).
Most likely the same cause as bug 579542.
Seeing this on windows 7 too.
OS: Linux → All
Summary: Linux opt xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" → opt xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test"
Seeing this in other _wrap tests, probably the same root cause.

http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1281766895.1281768471.6489.gz

TEST-INFO | /home/cltbld/talos-slave/electrolysis_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_reentrancy_wrap.js | running test ...
TEST-PASS | /home/cltbld/talos-slave/electrolysis_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_reentrancy_wrap.js | test passed
TEST-INFO | /home/cltbld/talos-slave/electrolysis_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 1984
After test_progress_wrap, the test afterwards (in this case test_redirect-caching_canceled_wrap) fails on opt xpcshell Linux:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281773774.1281775388.7956.gz

TEST-PASS | /home/cltbld/talos-slave/mozilla-central_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_progress_wrap.js | test passed
TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_redirect-caching_canceled_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2013
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281875454.1281877243.19357.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/15 05:30:54  

TEST-INFO | e:\builds\moz2_slave\mozilla-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281876333.1281877909.22850.gz
Rev3 Fedora 12 mozilla-central opt test xpcshell on 2010/08/15 05:45:33

TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2003
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281879426.1281881159.6161.gz
Rev3 Fedora 12x64 mozilla-central debug test xpcshell

TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora64-debug_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2057
Summary: opt xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" → xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" (also test_resumable_channel_wrap.js , etc.)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281890583.1281892320.20124.gz
Rev3 Fedora 12x64 mozilla-central debug test xpcshell on 2010/08/15 09:43:03
s: talos-r3-fed64-018

TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora64-debug_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2006
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281577547.1281579296.13945.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/11 18:45:47
s: mw32-ix-slave11

TEST-INFO |
e:\builds\moz2_slave\mozilla-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_redirect_failure_wrap.js
| running test ...

command timed out: 1200 seconds without output
Blocks: 438871
Hardware: x86 → All
Summary: xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" (also test_resumable_channel_wrap.js , etc.) → xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" (also test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, etc.)
Whiteboard: [orange]
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281599138.1281600683.340.gz
Rev3 Fedora 12 mozilla-central opt test xpcshell on 2010/08/12 00:45:38
s: talos-r3-fed-004

TEST-INFO |
/home/cltbld/talos-slave/mozilla-central_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_redirect-caching_passing_wrap.js
| running test ...

command timed out: 1200 seconds without output, killing pid 2012

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281645966.1281647512.20687.gz
Rev3 Fedora 12x64 mozilla-central opt test xpcshell on 2010/08/12 13:46:06
s: talos-r3-fed64-048

TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora64_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_redirect-caching_passing_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2044

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281795957.1281797777.2538.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/14 07:25:57
s: mw32-ix-slave10

TEST-INFO | e:\builds\moz2_slave\mozilla-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_redirect_passing_wrap.js | running test ...

command timed out: 1200 seconds without output

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281792606.1281794414.20569.gz
Rev3 Fedora 12 mozilla-central debug test xpcshell on 2010/08/14 06:30:06
s: talos-r3-fed-013

TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora-debug_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_redirect_passing_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 1957
Summary: xpcshell: "TimeoutError: command timed out: 1200 seconds without output" after "test_simple_wrap.js | running test" (also test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, etc.) → etc.)http://tinderbox.moz xpcshell: "TimeoutError: command timed out: 1200 seconds without output" in test_simple_wrap.js, test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, test_redirect_passing_wrap.js, test_redirect-caching_passing_wrap.js
Summary: etc.)http://tinderbox.moz xpcshell: "TimeoutError: command timed out: 1200 seconds without output" in test_simple_wrap.js, test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, test_redirect_passing_wrap.js, test_redirect-caching_passing_wrap.js → xpcshell: "TimeoutError: command timed out: 1200 seconds without output" in test_simple_wrap.js, test_resumable_channel_wrap.js, test_redirect_failure_wrap.js, test_redirect_passing_wrap.js, test_redirect-caching_passing_wrap.js, etc.)
About out of summary space, with several left - maybe we could rename the files to things like test_r-c_p_w.js?
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281896456.1281898250.13338.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/15 11:20:56
s: mw32-ix-slave04

TEST-INFO | e:\builds\moz2_slave\mozilla-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_event_sink_wrap.js | running test ...

command timed out: 1200 seconds without output
(In reply to comment #15)
> About out of summary space, with several left - maybe we could rename the files
> to things like test_r-c_p_w.js?

Good if you want no one know what the tests are about.
-> me
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
TEST-INFO | /home/cltbld/talos-slave/mozilla-central_fedora64_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_redirect-caching_passing_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 1997
process killed by signal 9
program finished with exit code -1
elapsedTime=1467.370283
TinderboxPrint: xpcshell<br/><em class="testfail">T-FAIL</em>
buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 1997

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1281947814.1281949364.23234.gz
Blocks: 586743
Blocks: fennecko
I'm absolutely unable to reproduce on my windows debug build.  Going to try this on mac optimized build.

Anyone, feel free to take this if there are any toughs or efforts to move this forward.
I can reproduce this locally, Linux 64 debug:

TEST-INFO | /home/dwitte/builds/mozilla/obj-fennec-debug/_tests/xpcshell/test_necko/unit_ipc/test_redirect-caching_canceled_wrap.js | running test ...

(gdb) bt 20
#0  0x000000373540b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5bed7e16c7 in PR_WaitCondVar (cvar=0x21d3780, timeout=4294967295)
    at /home/dwitte/builds/mozilla/electrolysis/nsprpub/pr/src/pthreads/ptsynch.c:417
#2  0x00007f5bed7e1e59 in PR_Wait (mon=0x21d3cd0, timeout=4294967295)
    at /home/dwitte/builds/mozilla/electrolysis/nsprpub/pr/src/pthreads/ptsynch.c:614
#3  0x00007f5beead23fa in nsAutoMonitor::Wait (this=0x7fff51ecb010, interval=4294967295)
    at ../../../dist/include/nsAutoLock.h:346
#4  0x00007f5beff86067 in nsEventQueue::GetEvent (this=0x21d3bc8, mayWait=1, result=0x7fff51ecb0c0)
    at /home/dwitte/builds/mozilla/electrolysis/xpcom/threads/nsEventQueue.cpp:85
#5  0x00007f5beff88d22 in nsThread::nsChainedEventQueue::GetEvent (this=0x21d3bb8, mayWait=1, event=
    0x7fff51ecb0c0) at /home/dwitte/builds/mozilla/electrolysis/xpcom/threads/nsThread.h:112
#6  0x00007f5beff88387 in nsThread::ProcessNextEvent (this=0x21d3b80, mayWait=1, result=0x7fff51ecb338)
    at /home/dwitte/builds/mozilla/electrolysis/xpcom/threads/nsThread.cpp:530
#7  0x00007f5beffa4ab0 in NS_InvokeByIndex_P (that=0x21d3b80, methodIndex=8, paramCount=2, params=
    0x7fff51ecb320)
    at /home/dwitte/builds/mozilla/electrolysis/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:208
#8  0x00007f5bef811fbc in CallMethodHelper::Invoke (this=0x7fff51ecb2e0)
    at /home/dwitte/builds/mozilla/electrolysis/js/src/xpconnect/src/xpcwrappednative.cpp:3073
#9  0x00007f5bef80ffbb in CallMethodHelper::Call (this=0x7fff51ecb2e0)
    at /home/dwitte/builds/mozilla/electrolysis/js/src/xpconnect/src/xpcwrappednative.cpp:2340
#10 0x00007f5bef80bfde in XPCWrappedNative::CallMethod (ccx=..., mode=XPCWrappedNative::CALL_METHOD)
    at /home/dwitte/builds/mozilla/electrolysis/js/src/xpconnect/src/xpcwrappednative.cpp:2304
#11 0x00007f5bef81c99e in XPC_WN_CallMethod (cx=0x24ca1c0, obj=0x7f5bdb924ea0, argc=1, argv=0x7fff51ecb698, 
    vp=0x7fff51ecb6a0)
    at /home/dwitte/builds/mozilla/electrolysis/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1738

So we're just spinning in the event loop. Honza, let me know if you want me to give you an account on my machine, or if you want me to debug it...
FWIW I'm just running 'while true; do make SOLO_FILE=test_redirect-caching_canceled_wrap.js check-one; done' and waiting. After a dozen or so runs through, it hangs.
Dan, please produce and submit a log from your hang run with: NSPR_LOG_MODULES=timestamp,nsHttp:5,DocLoader:5,URILoader:5

Thanks!
One place I'm seeing the spin is after an asyncOpen:

http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/test_redirect-caching_canceled.js#34

The asyncOpen gets executed but we never get an onStartRequest.

I haven't debugged the guts of the HttpChannels yet to see what's going on, I'll look tomorrow.
Attached file parent log
Log from parent. (Note the 'OnMsgPruneDeadConnections' bits at the end are getting appended long after the hang and aren't relevant.)
Attached file child log
Child log.

For this pair of logs, the hang is occurring after onStartRequest: http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/head_channels.js#67 completes, but then onDataAvailable is never called. I don't know if this is the same bug as onStartRequest not getting called...
Just to be sure, aren't there any IPC failures/ASSERTION/whatever errors in the log of the test (test_redirect-caching_canceled.log)?
I don't have a test_redirect-caching_canceled.log (do I have to enable xpcshell logging?), but there aren't any IPC errors. There are a couple of threadmanager assertions, I think, but those happen early on and are present even in passing tests.
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1282044859.1282046519.23393.gz
Rev3 Fedora 12 tracemonkey opt test xpcshell on 2010/08/17 04:34:19
s: talos-r3-fed-006

TEST-INFO | /home/cltbld/talos-slave/tracemonkey_fedora_test-xpcshell/build/xpcshell/tests/test_necko/unit_ipc/test_simple_wrap.js | running test ...
command timed out: 1200 seconds without output, killing pid 2016
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1282035957.1282051297.18807.gz
Rev3 WINNT 6.1 tracemonkey opt test xpcshell on 2010/08/17 02:05:57
s: talos-r3-w7-017

TEST-INFO | c:\talos-slave\tracemonkey_win7_test-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_redirect_passing_wrap.js | running test ...
command timed out: 1200 seconds without output
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282063002.1282064799.15325.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/17 09:36:42
s: mw32-ix-slave20

TEST-INFO | e:\builds\moz2_slave\mozilla-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_event_sink_wrap.js | running test ...
command timed out: 1200 seconds without output
(In reply to comment #27)
> For this pair of logs, the hang is occurring after onStartRequest:
> http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/head_channels.js#67
> completes, but then onDataAvailable is never called. I don't know if this is
> the same bug as onStartRequest not getting called...

The test exactly does:
- first round: request for http://localhost:4444/redirect/something
- caches the 301 response
- redirect observer DOESN'T CANCEL
- redirects to and loads http://localhost:4444/content
- second round: request for http://localhost:4444/redirect/something
- uses the previously cached 301 response
- redirect observer DOESN'T CANCEL
- redirects to and loads http://localhost:4444/content
- third round: request for http://localhost:4444/redirect/something
- uses the previously cached 301 response
- redirect observer _DOES_ CANCEL
- canceled before redirects to http://localhost:4444/content
- must only provide OnStartRequest and OnStopRequest(failure-code)

The HttpChannelParent(Listener) in round 3 gets both the OnStart/OnStop callbacks, as expected, but the child only gets RecvOnStartRequest and never gets RecvOnStopRequest.

There is a lot of places missing logging, so I cannot say if this is the queuing code that hangs or it is an IPC problem.

I suspect the former, but now it's just a guess.  Going to try to use the while true; done approach to reproduce with more logs added.
Depends on: 588216
Assignee: honzab.moz → nobody
Status: ASSIGNED → NEW
This patch served me well when we were debugging similar missing msg problems in xpcshell.   Printfs when the IPDL IPC thread receives the msg.   May tell you whether the issue is that msg is not sent (no printf), or received but not dispatched (printf)
At least this isn't hanging our win7 boxes forever now...
Yeah, just failing :)

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282146132.1282150463.13588.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/18 08:42:12  
test_redirect-caching_passing_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282160910.1282162743.21015.gz
WINNT 5.2 mozilla-central opt test xpcshell on 2010/08/18 12:48:30  
test_event_sink_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282158380.1282162439.18029.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/18 12:06:20  
test_xmlhttprequest_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282162049.1282165340.2416.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/18 13:07:29  
test_resumable_channel_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1282146348.1282151461.18405.gz
Rev3 WINNT 6.1 tracemonkey opt test xpcshell on 2010/08/18 08:45:48  
test_redirect-caching_passing_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282170076.1282171658.3236.gz
Rev3 Fedora 12x64 mozilla-central opt test xpcshell on 2010/08/18 15:21:16  
test_event_sink_wrap.js
No longer blocks: 586743
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282172921.1282176191.24841.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/18 16:08:41  
test_channel_close_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1282178442.1282181716.19104.gz
Rev3 WINNT 6.1 tracemonkey opt test xpcshell on 2010/08/18 17:40:42  
test_redirect-caching_canceled_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1282181084.1282184249.30345.gz
Rev3 WINNT 6.1 tracemonkey opt test xpcshell on 2010/08/18 18:24:44  
test_channel_close_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282187327.1282190550.26467.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/18 20:08:47  
test_redirect_canceled_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282202870.1282206110.26538.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/19 00:27:50
test_redirect-caching_passing_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282222524.1282225887.6080.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell [testfailed]

TEST-INFO | c:\talos-slave\mozilla-central_win7_test-xpcshell\build\xpcshell\tests\test_necko\unit_ipc\test_resumable_channel_wrap.js | running test ...

command timed out: 1200 seconds without output, killing pid 2628
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282225948.1282227475.13304.gz
Rev3 Fedora 12x64 mozilla-central opt test xpcshell on 2010/08/19 06:52:28  
test_reentrancy_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282210708.1282213925.2757.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/19 02:38:28  
test_resumable_channel_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282238012.1282241294.12644.gz
Rev3 WINNT 6.1 mozilla-central opt test xpcshell on 2010/08/19 10:13:32  
test_redirect_passing_wrap.js
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1282252442.1282254207.8333.gz
Rev3 Fedora 12x64 mozilla-central debug test xpcshell on 2010/08/19 14:14:02  
test_resumable_channel_wrap.js
Please reopen if this appears in builds after http://hg.mozilla.org/mozilla-central/rev/69c6ce104f45 .
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Assignee: nobody → jduell.mcbugs
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.