Intermittent browser_social_errorPage.js | application crashed [@ mozilla::net::SpdySession3::ReadTimeoutTick(unsigned int)]

RESOLVED FIXED in mozilla24

Status

()

defect
--
critical
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: emorley, Assigned: mcmanus)

Tracking

({crash, intermittent-failure})

Trunk
mozilla24
x86
Windows 8
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [spdy], crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

WINNT 6.2 mozilla-inbound debug test mochitest-browser-chrome on 2013-06-10 18:46:17 PDT for push 9786c829bf3c

slave: t-w864-ix-049

https://tbpl.mozilla.org/php/getParsedLog.php?id=23991583&tree=Mozilla-Inbound

{
19:00:37     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/social/browser_social_errorPage.js | is on social error page
19:00:37     INFO -  --DOMWINDOW == 68 (1FDB9E10) [serial = 2489] [outer = 1FDBA7B0] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 67 (1F87ABB0) [serial = 2478] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 66 (1F879AD8) [serial = 2476] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html]
19:00:37     INFO -  --DOMWINDOW == 65 (1F875C48) [serial = 2473] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 64 (1F8759E0) [serial = 2471] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=2]
19:00:37     INFO -  --DOMWINDOW == 63 (1F878060) [serial = 2468] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 62 (1F874DD8) [serial = 2466] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=1]
19:00:37     INFO -  --DOMWINDOW == 61 (1F877928) [serial = 2463] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 60 (1F876D20) [serial = 2461] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=2]
19:00:37     INFO -  --DOMWINDOW == 59 (1FA709E0) [serial = 2458] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 58 (1FA6EF68) [serial = 2456] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=1]
19:00:37     INFO -  --DOMWINDOW == 57 (1FA71F88) [serial = 2453] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 56 (1FA75E18) [serial = 2451] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=2]
19:00:37     INFO -  --DOMWINDOW == 55 (1FA75210) [serial = 2448] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 54 (1FA73A00) [serial = 2446] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=1]
19:00:37     INFO -  --DOMWINDOW == 53 (1612D970) [serial = 2443] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 52 (1612D708) [serial = 2441] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html]
19:00:37     INFO -  --DOMWINDOW == 51 (16130258) [serial = 2438] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 50 (1612EF18) [serial = 2436] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html?id=1]
19:00:37     INFO -  --DOMWINDOW == 49 (1612D4A0) [serial = 2433] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 48 (1612EA48) [serial = 2431] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html]
19:00:37     INFO -  --DOMWINDOW == 47 (13485568) [serial = 2428] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 46 (13486640) [serial = 2426] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_chat.html]
19:00:37     INFO -  --DOMWINDOW == 45 (0E523998) [serial = 2419] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 44 (1FDBC490) [serial = 2392] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 43 (1FA743A0) [serial = 2371] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 42 (0E523730) [serial = 2417] [outer = 00000000] [url = about:blank]
19:00:37     INFO -  --DOMWINDOW == 41 (1FA6FB70) [serial = 2233] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_sidebar.html]
19:00:37     INFO -  --DOMWINDOW == 40 (1FDBA548) [serial = 2483] [outer = 00000000] [url = about:neterror?e=netOffline&u=https%3A//example.com/browser/browser/base/content/test/social/social_sidebar.html&c=UTF-8&d=Firefox%20is%20currently%20in%20offline%20mode%20and%20can%27t%20browse%20the%20Web.]
19:00:37     INFO -  --DOMWINDOW == 39 (13486D78) [serial = 2423] [outer = 00000000] [url = https://example.com/browser/browser/base/content/test/social/social_sidebar.html]
19:00:37     INFO -  NOTE: child process received `Goodbye', closing down
19:00:37     INFO -  WARNING: pipe error: 109: file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 286
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  NPP_Destroy
19:00:37     INFO -  nsStringStats
19:00:37     INFO -   => mAllocCount:            147
19:00:37     INFO -   => mReallocCount:            1
19:00:37     INFO -   => mFreeCount:             147
19:00:37     INFO -   => mShareCount:            157
19:00:37     INFO -   => mAdoptCount:              0
19:00:37     INFO -   => mAdoptFreeCount:          0
19:00:37     INFO -  ++DOMWINDOW == 40 (1FDBAA18) [serial = 2492] [outer = 1FDBA7B0]
19:00:37     INFO -  ++DOMWINDOW == 41 (1FDB6450) [serial = 2493] [outer = 1FDBA7B0]
19:00:37     INFO -  TEST-KNOWN-FAIL | chrome://mochitests/content/browser/browser/base/content/test/social/browser_social_errorPage.js | Bug 833207 - should be no callback when error page loads.
19:00:37     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/social/browser_social_errorPage.js | is on social error page
19:00:38     INFO -  WARNING: shutting down early because of crash!: file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/dom/plugins/ipc/PluginModuleChild.cpp, line 703
19:00:38     INFO -  WARNING: plugin process _exit()ing: file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/dom/plugins/ipc/PluginModuleChild.cpp, line 668
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  NPP_Destroy
19:00:38     INFO -  nsStringStats
19:00:38     INFO -   => mAllocCount:            367
19:00:38     INFO -   => mReallocCount:            1
19:00:38     INFO -   => mFreeCount:             280  --  LEAKED 87 !!!
19:00:38     INFO -   => mShareCount:            165
19:00:38     INFO -   => mAdoptCount:              0
19:00:38     INFO -   => mAdoptFreeCount:          0
19:00:38  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/social/browser_social_errorPage.js | Exited with code -1073741819 during test run
19:00:38     INFO -  INFO | automation.py | Application ran for: 0:07:04.227000
19:00:38     INFO -  INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmpmz7x1wpidlog
19:00:38     INFO -  ==> process 3844 launched child process 1628
19:00:38     INFO -  ==> process 3844 launched child process 3128
19:00:38     INFO -  ==> process 3844 launched child process 1600
19:00:38     INFO -  ==> process 3844 launched child process 3344
19:00:38     INFO -  ==> process 3844 launched child process 3028
19:00:38     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 1628
19:00:38     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3128
19:00:38     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 1600
19:00:38     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3344
19:00:38     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 3028
19:00:44  WARNING -  PROCESS-CRASH | chrome://mochitests/content/browser/browser/base/content/test/social/browser_social_errorPage.js | application crashed [@ mozilla::net::SpdySession3::ReadTimeoutTick(unsigned int)]
19:00:44     INFO -  Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmptnbc9g\minidumps\0d5f3790-84b5-4eb3-a995-c8b9612fd9a0.dmp
19:00:44     INFO -  Operating system: Windows NT
19:00:44     INFO -                    6.2.9200
19:00:44     INFO -  CPU: x86
19:00:44     INFO -       GenuineIntel family 6 model 30 stepping 5
19:00:44     INFO -       8 CPUs
19:00:44     INFO -  Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
19:00:44     INFO -  Crash address: 0xffffffffdddddddd
19:00:44     INFO -  Thread 6 (crashed)
19:00:44     INFO -   0  xul.dll!mozilla::net::SpdySession3::ReadTimeoutTick(unsigned int) [SpdySession3.cpp:9786c829bf3c : 279 + 0x6]
19:00:44     INFO -      eip = 0x72820649   esp = 0x03a6f488   ebp = 0x03a6f4c8   ebx = 0x266b0ae0
19:00:44     INFO -      esi = 0x000e9b0f   edi = 0x00000000   eax = 0xdddddddd   ecx = 0x007e6148
19:00:44     INFO -      edx = 0x00000000   efl = 0x00010246
19:00:44     INFO -      Found by: given as instruction pointer in context
19:00:44     INFO -   1  xul.dll!nsHttpConnection::ReadTimeoutTick(unsigned int) [nsHttpConnection.cpp:9786c829bf3c : 933 + 0xe]
19:00:44     INFO -      eip = 0x7283caee   esp = 0x03a6f4d0   ebp = 0x03a6f4e0
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   2  xul.dll!nsHttpConnectionMgr::TimeoutTickCB(nsACString_internal const &,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry> &,void *) [nsHttpConnectionMgr.cpp:9786c829bf3c : 2397 + 0x17]
19:00:44     INFO -      eip = 0x72845c4a   esp = 0x03a6f4e8   ebp = 0x03a6f52c
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   3  xul.dll!nsBaseHashtable<nsCStringHashKey,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>,nsHttpConnectionMgr::nsConnectionEntry *>::s_EnumStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *) [nsBaseHashtable.h:9786c829bf3c : 454 + 0x12]
19:00:44     INFO -      eip = 0x72809472   esp = 0x03a6f534   ebp = 0x03a6f540
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   4  xul.dll!PL_DHashTableEnumerate [pldhash.cpp:9786c829bf3c : 714 + 0xa]
19:00:44     INFO -      eip = 0x73910f0e   esp = 0x03a6f548   ebp = 0x03a6f574
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   5  xul.dll!nsBaseHashtable<nsCStringHashKey,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>,nsHttpConnectionMgr::nsConnectionEntry *>::Enumerate(PLDHashOperator (*)(nsACString_internal const &,nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry> &,void *),void *) [nsBaseHashtable.h:9786c829bf3c : 223 + 0xe]
19:00:44     INFO -      eip = 0x7280979f   esp = 0x03a6f57c   ebp = 0x03a6f594
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   6  xul.dll!nsHttpConnectionMgr::TimeoutTick() [nsHttpConnectionMgr.cpp:9786c829bf3c : 2381 + 0xd]
19:00:44     INFO -      eip = 0x7284603b   esp = 0x03a6f59c   ebp = 0x03a6f5a8
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   7  xul.dll!nsHttpConnectionMgr::Observe(nsISupports *,char const *,wchar_t const *) [nsHttpConnectionMgr.cpp:9786c829bf3c : 254 + 0x4]
19:00:44     INFO -      eip = 0x7284613f   esp = 0x03a6f5b0   ebp = 0x03a6f5b4
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   8  xul.dll!nsTimerImpl::Fire() [nsTimerImpl.cpp:9786c829bf3c : 551 + 0xd]
19:00:44     INFO -      eip = 0x7395898e   esp = 0x03a6f5bc   ebp = 0x03a6f644
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -   9  xul.dll!nsTimerEvent::Run() [nsTimerImpl.cpp:9786c829bf3c : 627 + 0xe]
19:00:44     INFO -      eip = 0x73958af9   esp = 0x03a6f64c   ebp = 0x03a6f674
19:00:44     INFO -      Found by: call frame info
19:00:44     INFO -  10  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:9786c829bf3c : 626 + 0xd]
19:00:44     INFO -      eip = 0x739554c4   esp = 0x03a6f67c   ebp = 0x03a6f6cc
19:00:44     INFO -      Found by: call frame info
}
in addition to whatever the bug is here, this report is a sign that the mochitest run is contacting an external server somewhere.. I say that because this is on a connection that negotiated spdy and, afaik, there aren't any spdy servers in our local mochitest servers. (we have xpcshell coverage).
can we setup a spdy server during the test run?  maybe set some prefs to 404 it or point to one we instantiate on localhost?  we do this for http, ssltunnel, websockets already.
(In reply to Joel Maher (:jmaher) from comment #2)
> can we setup a spdy server during the test run?  maybe set some prefs to 404
> it or point to one we instantiate on localhost?  we do this for http,
> ssltunnel, websockets already.

I was unclear - we have spdy test coverage in xpcshell - so while we can do what you suggest it isn't needed right now.

my unclear point was that this trace indicates this mochitest run is contacting a server off our network, which is something I believe we prohibit by policy. It doesn't excuse the bug of course.
(In reply to Joel Maher (:jmaher) from comment #4)
> I see references to wikipedia:
> https://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/
> social/social_worker.js#99
> 
> https://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/
> social/browser_social_toolbar.js#42
> 
> not sure if we do anything with profileURL in our social framework or these
> tests.

We can change those urls, however I could not find any test that is opening the user profile, so those urls wouldn't be hit.
Flags: needinfo?(mixedpuppy)
the external access is likely not in the social test.. the stack here is in a timeout callback so it probably happened considerably earlier in the mochitest.

tomorrow I can make a try build that will show where it happens when we run mochi.
Whiteboard: [spdy]
The test that contacts www.google.com in browser-chrome is browser_keywordSearch_postData.js

it does this by clicking on the search box, which triggers a speculative connect as a side effect. It doesn't ever submit the request to google. If we really wanted to ban off-host access in mochitest we could disable the speculative connect via a pref, but in this case the off-host access has nothing to do with whether the test passes or fails, so I'm inclined to just let it go. In any event - that's a different decision than this bug.

crash-stats actually shows this stack trace at the level of about .5 per day (across all channels) going back 4 weeks.. roughly none of it on nightly. So we can say it isn't related to the new spdy push code, its just a latent issue. Maybe brought to light by a fairly recent change in browser_keywordSearch_postData.js or maybe that's just a coincidence given that its only shown up on tbpl once.
No longer blocks: 790388
This looks like a use after free, but its deep into the ReadTimeoutTick function.. so the object wasn't freed before the method was invoked.

Each case appears to be right after a ping has been sent and resumerecv() has been called. This is even true across channels as code has been changed in this function.

resumerecv calls socketIn->AsyncWait().. AsyncWait, despite its name, can actually call onInputStreamReady synchronously when there is data already sitting in its buffer.. that's not what this function is expecting.. and that data could certainly be some other spdy frame like GOAWAY that causes the session to be deleted.. and when resumerecv comes back off the stack its suddenly in use after free.
I think the ->asyncwait()->oninputstreamready() only happens to a network socket on an error condition where the socket has previously recorded an error during a previous IO operation. The normal case is just that the poll flags get or'd with POLL_READ.. the latter is what both the spdy and normal http code is assuming.

frankly I don't think 10 year old legacy HTTP code can withstand the synchronous call the way it uses ResumeRecv() any better than the spdy code.. I think the only difference is that the IO operation causing the error in spdy is a write (the ping) that isn't caught synchronously (it expects the read path to handle the errors) where in normal http it is coincidentally handled straight up.

The expeditious thing to do here is just have nsSocketTransport stop making direct callbacks on this edge case. It can just PostEvent(MSG_INPUT_PENDING) and the right thing should happen.
I can reproduce this just after the resumerecv if I manually set nsSocketTransport2::mCondition to NS_ERROR_ABORT after writing the ping.. so its pretty clear some rare network event correlates to that.

even better, I can confirm the fix. We don't really have the infrastructure to get a server hangup to meet the race condition for an automated test.
Assignee: nobody → mcmanus
Attachment #762093 - Attachment is obsolete: true
Attachment #762095 - Flags: review?(hurley) → review+
https://hg.mozilla.org/mozilla-central/rev/0eca15d14c2a
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
You need to log in before you can comment on or make changes to this bug.