Last Comment Bug 738914 - Intermittent test_websocket.html | Exited with code -1073741819 from a crash [@ nsHttpPipeline::IsDone()]
: Intermittent test_websocket.html | Exited with code -1073741819 from a crash ...
Status: RESOLVED FIXED
: intermittent-failure
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: Trunk
: x86_64 Windows 7
: -- normal (vote)
: mozilla15
Assigned To: Patrick McManus [:mcmanus]
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks: 438871 739762
  Show dependency treegraph
 
Reported: 2012-03-24 05:02 PDT by Marco Bonardo [::mak]
Modified: 2012-11-25 19:31 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
affected
fixed


Attachments
patch 0 (13.33 KB, patch)
2012-04-10 18:33 PDT, Patrick McManus [:mcmanus]
jduell.mcbugs: review+
akeybl: approval‑mozilla‑aurora-
Details | Diff | Splinter Review

Description Marco Bonardo [::mak] 2012-03-24 05:02:16 PDT
could be related to recent changes to http pipelining

https://tbpl.mozilla.org/php/getParsedLog.php?id=10344326&tree=Mozilla-Inbound

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
INFO | automation.py | Application ran for: 0:06:54.891000
INFO | automation.py | Reading PID log: c:\users\cltbld\appdata\local\temp\tmphuue0opidlog
==> process 2936 launched child process 3804
INFO | automation.py | Checking for orphan process with PID: 3804
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1332579810/firefox-14.0a1.en-US.win32.crashreporter-symbols.zip
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpizwgjc\minidumps\cbc89f78-dbcc-4ea2-895c-2e6e7569b1be.dmp
Operating system: Windows NT
                  6.1.7600 
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash address: 0xffffffffddddddf9

Thread 0 (crashed)
 0  xul.dll!nsHttpPipeline::IsDone() [nsHttpPipeline.cpp:9094735bcb46 : 634 + 0xd]
    eip = 0x6941f9c7   esp = 0x002bd6a0   ebp = 0x002bd6c0   ebx = 0x00000001
    esi = 0x14828804   edi = 0x00000000   eax = 0x1916d6d0   ecx = 0x1916d6d0
    edx = 0xdddddddd   efl = 0x00210297
    Found by: given as instruction pointer in context
 1  xul.dll!nsHttpConnection::TakeTransport(nsISocketTransport * *,nsIAsyncInputStream * *,nsIAsyncOutputStream * *) [nsHttpConnection.cpp:9094735bcb46 : 929 + 0x12]
    eip = 0x693fb0b9   esp = 0x002bd6b0   ebp = 0x002bd6c0
    Found by: stack scanning
 2  xul.dll!nsHttpChannel::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsHttpChannel.cpp:9094735bcb46 : 4488 + 0x52]
    eip = 0x6941cfc0   esp = 0x002bd6c8   ebp = 0x002bd704
    Found by: call frame info
 3  xul.dll!nsInputStreamPump::OnStateStop() [nsInputStreamPump.cpp:9094735bcb46 : 583 + 0x18]
    eip = 0x6938b36a   esp = 0x002bd70c   ebp = 0x002bd72c
    Found by: call frame info
 4  xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *) [nsInputStreamPump.cpp:9094735bcb46 : 405 + 0x7]
    eip = 0x6938b414   esp = 0x002bd734   ebp = 0x002bd740
    Found by: call frame info
 5  xul.dll!nsInputStreamReadyEvent::Run() [nsStreamUtils.cpp:9094735bcb46 : 114 + 0x10]
    eip = 0x6a03a08d   esp = 0x002bd748   ebp = 0x002bd758
    Found by: call frame info
 6  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:9094735bcb46 : 656 + 0xd]
    eip = 0x6a04eb78   esp = 0x002bd760   ebp = 0x002bd788
    Found by: call frame info
 7  xul.dll!NS_ProcessNextEvent_P(nsIThread *,bool) [nsThreadUtils.cpp:9094735bcb46 : 245 + 0xc]
    eip = 0x6a0116a7   esp = 0x002bd790   ebp = 0x002bd79c
    Found by: call frame info
 8  xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [MessagePump.cpp:9094735bcb46 : 134 + 0x9]
    eip = 0x69f305b8   esp = 0x002bd7a4   ebp = 0x002bd7c8
Comment 1 Patrick McManus [:mcmanus] 2012-03-24 10:28:09 PDT
I'm out of the office this week so don't panic if you don't see an immediate patch - but this is firmly on my radar.
Comment 2 Treeherder Robot 2012-03-28 15:24:44 PDT
jfkthame
https://tbpl.mozilla.org/php/getParsedLog.php?id=10449733&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-03-28 14:24:51

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2624 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 4 Treeherder Robot 2012-03-29 05:10:14 PDT
edmorley
https://tbpl.mozilla.org/php/getParsedLog.php?id=10463801&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-03-29 00:59:44

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3724 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 5 Treeherder Robot 2012-03-30 10:39:40 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10506232&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-03-30 10:23:17

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 420 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 6 Treeherder Robot 2012-03-31 00:01:57 PDT
peterv
https://tbpl.mozilla.org/php/getParsedLog.php?id=10524181&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-03-30 23:08:43

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1676 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 7 Treeherder Robot 2012-04-01 05:45:03 PDT
Ms2ger%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=10546213&tree=Mozilla-Inbound
Rev3 WINNT 5.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-01 02:17:45

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1496 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 8 Treeherder Robot 2012-04-02 11:36:53 PDT
Ms2ger%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=10565133&tree=Try
Rev3 WINNT 5.1 try debug test mochitests-1/5 on 2012-04-02 07:18:50

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3536 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 9 Treeherder Robot 2012-04-05 15:48:53 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10678335&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-05 15:28:11

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 840 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 10 Treeherder Robot 2012-04-09 08:10:33 PDT
Ms2ger%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=10744741&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-09 04:15:34

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3336 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 11 Treeherder Robot 2012-04-09 08:10:33 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10744741&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-09 04:15:34

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3336 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 12 Patrick McManus [:mcmanus] 2012-04-10 18:04:21 PDT
Thread 0 (crashed)
 0  xul.dll!nsHttpPipeline::IsDone() [nsHttpPipeline.cpp:9094735bcb46 : 634 + 0xd]
    eip = 0x6941f9c7   esp = 0x002bd6a0   ebp = 0x002bd6c0   ebx = 0x00000001
    esi = 0x14828804   edi = 0x00000000   eax = 0x1916d6d0   ecx = 0x1916d6d0
    edx = 0xdddddddd   efl = 0x00210297
    Found by: given as instruction pointer in context
 1  xul.dll!nsHttpConnection::TakeTransport(nsISocketTransport * *,nsIAsyncInputStream * *,nsIAsyncOutputStream * *) [nsHttpConnection.cpp:9094735bcb46 : 929 + 0x12]
    eip = 0x693fb0b9   esp = 0x002bd6b0   ebp = 0x002bd6c0
    Found by: stack scanning
 2  xul.dll!nsHttpChannel::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsHttpChannel.cpp:9094735bcb46 : 4488 + 0x52]
    eip = 0x6941cfc0   esp = 0x002bd6c8   ebp = 0x002bd704
    Found by: call frame info

Thread 0 is the main thread. That obviously is going to have some problems with the nshttppipeline implementation of nsAHttpTransaction. Its not clear to me that this is safe even with the normal mTransaction && mTransaciton->IsDone() check nsHttpConnection does.. that could be racy if an EOF or something arrived on the network thread.

To get this stable this code should ping pong onto the network thread for the taketransport() check and then back to the main thread for OnTransportAvailable().. that's sub-optimal but this is in the high-latency handshake portion of the program anyhow - it won't impact actual message latency in the normal states. We can get it stable and then later figure out how to streamline the handshale.

using the pipeline datastructure makes this more obvious, but imo this is a long standing bug that may need porting.
Comment 13 Patrick McManus [:mcmanus] 2012-04-10 18:33:40 PDT
Created attachment 613839 [details] [diff] [review]
patch 0

move nsHttpConnection::TakeTransport() to the socket thread, while leaving OnTransportAvailable on the main thread.

Also, don't call content/nsWebSocket.cpp::OnStart() until the IO handlers are fully installed (we need both OnStartRequest and OnTransportAvailable to have been executed, not just OSR).. not doing this means that when firefox writes a message before the IO handlers are installed it might never go out - this was really obviously triggerable with the threading changes mentioned above, but it was a pre-existing problem.
Comment 14 Treeherder Robot 2012-04-11 05:49:07 PDT
ryanvm%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=10804691&tree=Mozilla-Inbound
Rev3 WINNT 5.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-11 01:49:37

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3532 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 15 Treeherder Robot 2012-04-11 17:36:10 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10826618&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-11 17:02:00

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2268 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 16 Treeherder Robot 2012-04-11 19:33:55 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10829861&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-11 19:04:17

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3384 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 17 Treeherder Robot 2012-04-11 19:35:12 PDT
bas.schouten%live.nl
https://tbpl.mozilla.org/php/getParsedLog.php?id=10829861&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-11 19:04:17

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3384 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 18 Treeherder Robot 2012-04-13 10:54:27 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10879776&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-13 10:38:02

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1640 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 19 Treeherder Robot 2012-04-13 12:06:18 PDT
jwalker%mozilla.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=10882415&tree=Try
Rev3 WINNT 6.1 try debug test mochitests-1/5 on 2012-04-13 11:50:52

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 552 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 20 Treeherder Robot 2012-04-13 15:12:48 PDT
ttaubert
https://tbpl.mozilla.org/php/getParsedLog.php?id=10887839&tree=Fx-Team
Rev3 WINNT 6.1 fx-team debug test mochitests-1/5 on 2012-04-13 14:32:28

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1360 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 21 Treeherder Robot 2012-04-16 18:39:56 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=10956853&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-16 18:04:38

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2120 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 22 Treeherder Robot 2012-04-17 04:47:54 PDT
jfkthame
https://tbpl.mozilla.org/php/getParsedLog.php?id=10969154&tree=Mozilla-Inbound
Rev3 WINNT 5.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-17 03:16:46

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1280 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 23 Treeherder Robot 2012-04-17 05:34:02 PDT
past
https://tbpl.mozilla.org/php/getParsedLog.php?id=10972028&tree=Fx-Team
Rev3 WINNT 5.1 fx-team debug test mochitests-1/5 on 2012-04-17 05:08:08

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1388 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 24 Treeherder Robot 2012-04-18 08:36:45 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=11007270&tree=Birch
Rev3 WINNT 6.1 birch debug test mochitests-1/5 on 2012-04-18 08:05:26
slave: talos-r3-w7-063

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2704 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 25 Treeherder Robot 2012-04-22 15:43:14 PDT
ryanvm%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=11112285&tree=Birch
Rev3 WINNT 6.1 birch debug test mochitests-1/5 on 2012-04-22 14:58:37
slave: talos-r3-w7-055

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1288 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 26 Treeherder Robot 2012-04-23 16:42:06 PDT
ryanvm%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=11136722&tree=Mozilla-Inbound
Rev3 WINNT 5.1 mozilla-inbound debug test mochitests-1/5 on 2012-04-23 16:25:22
slave: talos-r3-xp-026

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3428 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 27 Jason Duell [:jduell] (needinfo me) 2012-04-25 16:27:35 PDT
Comment on attachment 613839 [details] [diff] [review]
patch 0

Review of attachment 613839 [details] [diff] [review]:
-----------------------------------------------------------------

So the reason IsDone was segfaulting was re-entrant usage of mResponseQ/mRequestQ from both main/socket threads, correct?

Is there a reason we also need to move the call to mListener->OnStart() from WebSocketChannel::OnStartRequest to WebSocketChannel::StartWebsocketData?  The move seems ok, and it's intuitively sensible to wait for both OnTransportAvailable and OnStartRequest to be called before calling OnStart.  The only issue I could think of is error handling:  if for some reason TakeTransport fails (is it possible in practice in this control path?) you don't wind up calling OnTransportAvailable, so you'd now never call OnStart.  Not that this appears to make much difference: we don't seem to have error handling for failed TakeTransport either before or after this patch.  I'm ok with fixing that in another bug if it's indeed an issue.

All the code to move TakeTransport to the socket thread looks solid.

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +387,5 @@
> +public:
> +nsCompleteUpgradeData(nsAHttpConnection *aConn,
> +                      nsIHttpUpgradeListener *aListener)
> +    : mConn(aConn)
> +        , mUpgradeListener(aListener) {}

weird indenting.  Just put both inits on same line? Or at least same indent level.
Comment 28 Treeherder Robot 2012-04-25 17:03:32 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=11208554&tree=Mozilla-Aurora
Rev3 WINNT 6.1 mozilla-aurora debug test mochitests-1/5 on 2012-04-25 16:45:19
slave: talos-r3-w7-064

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1952 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 29 Patrick McManus [:mcmanus] 2012-04-25 18:51:12 PDT
(In reply to Jason Duell (:jduell) from comment #27)
> Comment on attachment 613839 [details] [diff] [review]
> patch 0
> 
> Review of attachment 613839 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> So the reason IsDone was segfaulting was re-entrant usage of
> mResponseQ/mRequestQ from both main/socket threads, correct?
> 

yes.

but it really exposed a fundamental problem that we aren't seeing reports of - the transaction could just as easily be closed on the socket thread due to an EOF while IsDone() is running on the main thread.. so this isn't so much a regression in 14 as much as 14 bringing that issue to the surface.

> Is there a reason we also need to move the call to mListener->OnStart() from
> WebSocketChannel::OnStartRequest to WebSocketChannel::StartWebsocketData? 

without that change the content code can submit a message before the http upgrade is complete (i.e. before the transport object can do real websocket I/O) and OnOutputStreamReady() doesn't send it because mSocketOut is still not setup.. the change I made seemed as you say to be the intuitive way to fix that - an alternative would have been to manually call OnOutputStreamReady out of StartWebsocketData to clear the queue, but in general pushing the buffering back towards the source-generator is good policy.

> The move seems ok, and it's intuitively sensible to wait for both
> OnTransportAvailable and OnStartRequest to be called before calling OnStart.
> The only issue I could think of is error handling:  if for some reason
> TakeTransport fails (is it possible in practice in this control path?) you
> don't wind up calling OnTransportAvailable, so you'd now never call OnStart.
> Not that this appears to make much difference: we don't seem to have error
> handling for failed TakeTransport either before or after this patch.  I'm ok
> with fixing that in another bug if it's indeed an issue.
>

yes, that's orthogonal if its a problem at all. I don't think you have to call OTA (though I'd have to think about it), but I'm not sure the right start/stop gets sent to the content class.
Comment 30 Patrick McManus [:mcmanus] 2012-04-25 19:19:48 PDT
ff15
https://hg.mozilla.org/integration/mozilla-inbound/rev/92d22954cbd1
Comment 31 Ed Morley [:emorley] 2012-04-26 10:36:09 PDT
https://hg.mozilla.org/mozilla-central/rev/92d22954cbd1
Comment 32 Patrick McManus [:mcmanus] 2012-04-30 08:13:20 PDT
Comment on attachment 613839 [details] [diff] [review]
patch 0

[Approval Request Comment]
Regression caused by (bug #): 447866
User impact if declined: crash risk
Testing completed (on m-c, etc.): baked on m-c, has cured an orange there
Risk to taking this patch (and alternatives if risky):
String changes made by this patch: none
Comment 33 Alex Keybl [:akeybl] 2012-04-30 10:41:40 PDT
(In reply to Patrick McManus [:mcmanus] from comment #32)
> Risk to taking this patch (and alternatives if risky):

What risk does this patch to nsHttpChannel.cpp carry? I'd only fix an intermittent orange on Aurora if it had very low risk.
Comment 34 Patrick McManus [:mcmanus] 2012-04-30 13:49:12 PDT
(In reply to Alex Keybl [:akeybl] from comment #33)
> (In reply to Patrick McManus [:mcmanus] from comment #32)
> > Risk to taking this patch (and alternatives if risky):
> 
> What risk does this patch to nsHttpChannel.cpp carry? I'd only fix an
> intermittent orange on Aurora if it had very low risk.

I never know how to answer this in a quantifiable way. So the question is always easier to ask than answer.

It's a moderate amount of platfrom C++ code. Mostly a rearrangement onto other execution threads of existing code though. It has test coverage and performs better with that same coverage on m-c than on aurora. The code path is pretty central to websockets, so we can be sure it is being exercised outside of tests too.

The aurora code can crash due to a race condition. This is a long standing issue, but a non websockets checkin on 14 made it worse by adding to the list of things that could race dangerously, so the fix for 13 and 14 would be the same - but the risk of triggering the bug is larger on 14. That's why I am not proposing beta?
Comment 35 Alex Keybl [:akeybl] 2012-04-30 13:54:25 PDT
(In reply to Patrick McManus [:mcmanus] from comment #34)
> The aurora code can crash due to a race condition. This is a long standing
> issue, but a non websockets checkin on 14 made it worse by adding to the
> list of things that could race dangerously, so the fix for 13 and 14 would
> be the same - but the risk of triggering the bug is larger on 14. That's why
> I am not proposing beta?

I'm trying to avoid taking moderate amounts of code change on any branch without data that suggests this will be a common/significant problem for our users. Without this fix, would we expect the crash signature [@ nsHttpPipeline::IsDone()] to be apparent in our crash data, or is that just one way the bug is apparent?
Comment 36 Patrick McManus [:mcmanus] 2012-04-30 14:00:08 PDT
(In reply to Alex Keybl [:akeybl] from comment #35)
> 
> I'm trying to avoid taking moderate amounts of code change on any branch
> without data that suggests this will be a common/significant problem for our
> users. Without this fix, would we expect the crash signature [@
> nsHttpPipeline::IsDone()] to be apparent in our crash data, or is that just
> one way the bug is apparent?

nsHttpPipeline::IsDone() at the top will likely be the most common - they should all contain nsHttpConnection::TakeTransport() somewhere in the stack.
Comment 37 Alex Keybl [:akeybl] 2012-04-30 14:48:27 PDT
From looking at crash-stats, I don't believe this is prevalent enough to take any  code change to resolve for FF14. If related crashes do rise, we'll reconsider taking the change.
Comment 38 Treeherder Robot 2012-05-03 12:36:10 PDT
mbrubeck
https://tbpl.mozilla.org/php/getParsedLog.php?id=11435590&tree=Mozilla-Aurora
Rev3 WINNT 6.1 mozilla-aurora debug test mochitests-1/5 on 2012-05-03 12:05:11
slave: talos-r3-w7-009

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1896 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 39 Treeherder Robot 2012-06-04 19:56:41 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12371676&tree=Mozilla-Beta
Rev3 WINNT 5.1 mozilla-beta debug test mochitests-1/5 on 2012-06-04 19:25:22
slave: talos-r3-xp-023

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3940 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 40 Jason Duell [:jduell] (needinfo me) 2012-06-05 11:01:27 PDT
Patrick--any idea of recent changes that might be causing this to fail again/still?
Comment 41 Patrick McManus [:mcmanus] 2012-06-05 11:10:42 PDT
(In reply to Jason Duell (:jduell) from comment #40)
> Patrick--any idea of recent changes that might be causing this to fail
> again/still?

what are you referring to?

This is wontfix for FF13 and FF14 (see alex comment 37). The crash in 38 is FF14 and comment 39 is either ff 13 or 14 depending on the build time.
Comment 42 Jason Duell [:jduell] (needinfo me) 2012-06-05 11:16:21 PDT
Ah, right.  Resume your normal programming...
Comment 43 Treeherder Robot 2012-06-05 15:58:11 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12396264&tree=Mozilla-Beta
Rev3 WINNT 6.1 mozilla-beta debug test mochitests-1/5 on 2012-06-05 15:40:14
slave: talos-r3-w7-077

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3588 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 44 Treeherder Robot 2012-06-08 16:52:11 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12503569&tree=Mozilla-Beta
Rev3 WINNT 6.1 mozilla-beta debug test mochitests-1/5 on 2012-06-08 16:25:21
slave: talos-r3-w7-048

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 3996 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 45 Treeherder Robot 2012-06-13 23:16:40 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12649762&tree=Mozilla-Beta
Rev3 WINNT 6.1 mozilla-beta debug test mochitests-1/5 on 2012-06-13 22:49:38
slave: talos-r3-w7-065

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1024 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 46 Treeherder Robot 2012-06-22 15:17:06 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12912767&tree=Mozilla-Beta
Rev3 WINNT 5.1 mozilla-beta debug test mochitests-1/5 on 2012-06-22 14:49:49
slave: talos-r3-xp-017

29717 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_bug435425.html | Wrong responseText - got Bad request
TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code -1073741819 during test run
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 1384 | automationutils.processLeakLog() | missing output line for total leaks!

Note You need to log in before you can comment on or make changes to this bug.