Closed Bug 712215 Opened 8 years ago Closed 7 years ago

Intermittent test_websocket.html | Exited with code 1 during test run from a crash [@ nsHttpTransaction::SetConnection]

Categories

(Core :: Networking: WebSockets, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=8046735&tree=Mozilla-Inbound
Rev3 MacOSX Leopard 10.5.8 mozilla-inbound debug test mochitests-1/5 on 2011-12-19 18:10:07 PST for push 4a81c89bb466

39131 INFO TEST-PASS | /tests/content/base/test/test_websocket.html | Created websocket for test 19

TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:04:11.402961
INFO | automation.py | Reading PID log: /var/folders/Xr/Xr--yJnSEY0U11ET5NZuMU+++TM/-Tmp-/tmpHHRwZspidlog
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application crashed (minidump found)
Crash dump filename: /var/folders/Xr/Xr--yJnSEY0U11ET5NZuMU+++TM/-Tmp-/tmpykGXLs/minidumps/C136D690-5E8D-4EEA-BA1A-B9B09E068D64.dmp
Operating system: Mac OS X
                  10.5.8 9L31a
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x57e58959

Thread 0 (crashed)
 0  XUL!nsHttpTransaction::SetConnection [nsISupportsUtils.h : 94 + 0x5]
    eip = 0x03328e84   esp = 0xbfffcda0   ebp = 0xbfffcdb8   ebx = 0x03318104
    esi = 0x295745c0   edi = 0x0331dea0   eax = 0x57e58955   ecx = 0x00000000
    edx = 0x00000000   efl = 0x00210206
    Found by: given as instruction pointer in context
 1  XUL!nsHttpConnectionMgr::nsConnectionHandle::~nsConnectionHandle [nsHttpConnectionMgr.cpp:4a81c89bb466 : 1705 + 0xd]
    eip = 0x03318186   esp = 0xbfffcdc0   ebp = 0xbfffce08   ebx = 0x03318104
    esi = 0x0331dea0   edi = 0x00000000
    Found by: call frame info
 2  XUL!nsHttpConnectionMgr::nsConnectionHandle::Release [nsHttpConnectionMgr.cpp:4a81c89bb466 : 1709 + 0xd]
    eip = 0x03314b7d   esp = 0xbfffce10   ebp = 0xbfffce48   ebx = 0x03314b34
    esi = 0x00000000   edi = 0x2ab63310
    Found by: call frame info
 3  XUL!nsHttpChannel::OnStopRequest [nsAutoPtr.h : 907 + 0xa]
    eip = 0x03359198   esp = 0xbfffce50   ebp = 0xbfffcef8   ebx = 0x03358f61
    esi = 0x276e6000   edi = 0x804b000d
    Found by: call frame info
 4  XUL!nsInputStreamPump::OnStateStop [nsInputStreamPump.cpp:4a81c89bb466 : 580 + 0x19]
    eip = 0x032249ac   esp = 0xbfffcf00   ebp = 0xbfffcf48   ebx = 0x0322476e
    esi = 0x0011f4e0   edi = 0x2d77e4e0
    Found by: call frame info
 5  XUL!nsInputStreamPump::OnInputStreamReady [nsInputStreamPump.cpp:4a81c89bb466 : 405 + 0x7]
    eip = 0x03225148   esp = 0xbfffcf50   ebp = 0xbfffcfa8   ebx = 0x03224f8e
    esi = 0x00000003   edi = 0x2d77e4e0
    Found by: call frame info
 6  XUL!nsInputStreamReadyEvent::Run [nsStreamUtils.cpp:4a81c89bb466 : 114 + 0xb]
    eip = 0x04f5a664   esp = 0xbfffcfb0   ebp = 0xbfffcff8   ebx = 0x04f5a63e
    esi = 0x2d77e4e4   edi = 0x0fd0ed00
    Found by: call frame info
This is clearly new with the websockets changes that went in around the 19th.

The stack doesn't make a heck of a lot of sense, but my best guess is it would be involved with the error path in nshttpchannel

                 mUpgradeProtocolCallback->OnTransportAvailable(socketTransport,
                                                                socketIn,
                                                                socketOut);

I don't have a problem with the error path - but basically it should never be taken so I'm trying to figure out how we got there (if we did at all - its just a hunch)

..

anyhow I put in an assert on that error path looking to see if we got the assert instead of the crash, but despite running this test 125 times on try it passed every time. I only ran it on os x debug which is the only place with crashes in this bug. See https://tbpl.mozilla.org/?tree=Try&rev=0c3c561a4b41
re: comment 5,

Yeah, the stack trace doesn't make much sense.  Both this and bug 718565 have nsConnectionHandle::~nsConnectionHandle as a possible culprit, but there's not much in that function to go on:

  nsHttpConnectionMgr::nsConnectionHandle::~nsConnectionHandle()
  {
      if (mConn) {
          gHttpHandler->ReclaimConnection(mConn);
          NS_RELEASE(mConn);
      }    
  }

The test is crashing in the middle of test_websocket.html, so it seems very unlikely that gHttpHandler is null (I suppose we could add a check for it, but we're basically hosed in general if gHttpHandler is null during regular operation).

> This is clearly new with the websockets changes that went in around the 19th.

The group of patches I landed (final WS spec support, etc) were mostly on the 20th, so they're after the first instance of this (on Dec 19th).  The only WS patches we landed immediately before this was HSTS support (bug 664284) and binary message support (bug 676439). Since this crash is with the initial HTTP upgrade, only HSTS is a reasonable cuplrit (but note that the test seems to fail before test-41, which is the only code that exercises the redirect logic AFAICT).

Given that this is HTTP code that's crashing, I also looked at changes to the HTTP code shortly before the 19th.  It looks like this might be SPDY-related, given that the bug only seems to happen every 3-4 days or less on average:

changeset:   83000:b4d859a1e338
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Mon Dec 19 16:36:26 2011 -0500
summary:     bug 712021 - spdy cancel stream should remove from pending queue r=honzab

changeset:   82868:a28ebec62c3a
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Sat Dec 17 20:05:56 2011 -0500
summary:     bug 710310 spdy google docs large file save intermittent failure r=honzab

changeset:   82852:dfc239e7dcf4
user:        Geoff Brown <gbrown@mozilla.com>
date:        Sat Dec 17 11:30:29 2011 +0100
summary:     Bug 648429: HTTP cache: compress all compressible files; r=michal.novotny

changeset:   82500:cf0b31ff2b6d
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Tue Dec 13 10:55:50 2011 -0500
summary:     bug 528288 - reland spdy after libxul weightloss a=khuey CLOSED TREE

At this point I'm a bit stumped.  I don't know the HTTP transaction/connection code well, so this could be a big time sink for me.  The code is a crash, so we ought to fix it, but OTOH it doesn't seem easily exploitable (doesn't happen predictably or often AFAICT).  I'm open to ideas for proceeding, but am moving on to other WS bugs for now.
has not appeared in 90 days.. dup'd up with some other unknown bug
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.