Closed Bug 829120 Opened 11 years ago Closed 11 years ago

ABORT: AddStream duplicate transaction pointer: '!mStreamTransactionHash.Get(aHttpTransaction)'

Categories

(Core :: Networking: HTTP, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox20 + fixed
firefox21 + fixed
firefox22 + fixed

People

(Reporter: dbaron, Assigned: mcmanus)

References

Details

(Keywords: crash, Whiteboard: [spdy])

Attachments

(2 files)

I run debug builds as my regular browser.  The most common crash I've hit over the past few weeks is this abort:

###!!! ABORT: AddStream duplicate transaction pointer: '!mStreamTransactionHash.Get(aHttpTransaction)', file /home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/SpdySession2.cpp, line 305

I think I first saw it on Christmas day (I have a note that I saw it then); I just saw it again

I think I've hit it more than once doing a session restore on a large session, but I've also hit it other times.

My current tree (in which I just saw it) is based on https://hg.mozilla.org/mozilla-central/rev/b801161d887c0c7526038b671dc124eb9170b6e4 plus local patches.

The stack was:
###!!! ABORT: AddStream duplicate transaction pointer: '!mStreamTransactionHash.Get(aHttpTransaction)', file /home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/SpdySession2.cpp, line 305
nsHttpConnection::AddTransaction(nsAHttpTransaction*, int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnection.cpp:449)
nsHttpConnection::Activate(nsAHttpTransaction*, unsigned int, int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnection.cpp:310)
nsHttpConnectionMgr::DispatchTransaction(nsHttpConnectionMgr::nsConnectionEntry*, nsHttpTransaction*, nsHttpConnection*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1600)
nsHttpConnectionMgr::ProcessSpdyPendingQ(nsHttpConnectionMgr::nsConnectionEntry*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1875)
nsHttpConnectionMgr::ProcessSpdyPendingQCB(nsACString_internal const&, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>&, void*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1895)
PL_DHashTableEnumerate (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/xpcom/build/pldhash.cpp:718)
nsBaseHashtable<nsCStringHashKey, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>, nsHttpConnectionMgr::nsConnectionEntry*>::Enumerate(PLDHashOperator (*)(nsACString_internal const&, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>&, void*), void*) (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/netwerk/protocol/http/../../../dist/include/nsBaseHashtable.h:224)
nsHttpConnection::OnSocketWritable() (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnection.cpp:1248)
nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnection.cpp:1541)
nsHttpConnection::Activate(nsAHttpTransaction*, unsigned int, int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnection.cpp:350)
nsHttpConnectionMgr::DispatchAbstractTransaction(nsHttpConnectionMgr::nsConnectionEntry*, nsAHttpTransaction*, unsigned int, nsHttpConnection*, int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1676)
nsHttpConnectionMgr::DispatchTransaction(nsHttpConnectionMgr::nsConnectionEntry*, nsHttpTransaction*, nsHttpConnection*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1619)
nsHttpConnectionMgr::TryDispatchTransaction(nsHttpConnectionMgr::nsConnectionEntry*, bool, nsHttpTransaction*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:1537)
nsHttpConnectionMgr::ProcessPendingQForEntry(nsHttpConnectionMgr::nsConnectionEntry*, bool) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:950)
nsHttpConnectionMgr::ProcessAllTransactionsCB(nsACString_internal const&, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>&, void*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp:750)
PL_DHashTableEnumerate (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/xpcom/build/pldhash.cpp:718)
nsBaseHashtable<nsCStringHashKey, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>, nsHttpConnectionMgr::nsConnectionEntry*>::Enumerate(PLDHashOperator (*)(nsACString_internal const&, nsAutoPtr<nsHttpConnectionMgr::nsConnectionEntry>&, void*), void*) (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/netwerk/protocol/http/../../../dist/include/nsBaseHashtable.h:224)
nsHttpConnectionMgr::nsConnEvent::Run() (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.h:549)
nsThread::ProcessNextEvent(bool, bool*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/xpcom/threads/nsThread.cpp:627)
NS_ProcessNextEvent_P(nsIThread*, bool) (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/xpcom/build/nsThreadUtils.cpp:238)
nsSocketTransportService::Run() (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/base/src/nsSocketTransportService2.cpp:649)
nsThread::ProcessNextEvent(bool, bool*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/xpcom/threads/nsThread.cpp:627)
NS_ProcessNextEvent_P(nsIThread*, bool) (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/xpcom/build/nsThreadUtils.cpp:238)
nsThread::ThreadFunc(void*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/xpcom/threads/nsThread.cpp:264)
_pt_root (/home/dbaron/builds/ssd/mozilla-central/mozilla/nsprpub/pr/src/pthreads/ptthread.c:159)
start_thread (/build/buildd/eglibc-2.15/nptl/pthread_create.c:308)
clone (/build/buildd/eglibc-2.15/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:114)
Blocks: 819044
Hit it again today, while loading multiple tabs from https://www.united.com/
And again today, on bugzilla and/or hg.mozilla.org.

Is there anything I should look at in the debugger when I next hit this (assuming I notice that a background thread has crashed and networking is broken before the 300s sleep ends, rather than just assuming my network connection is being flaky), or any debugging code I could add to my tree, that would give you more info here?

Fatal asserts that fire frequently aren't acceptable; we actually want our debug builds to be usable.  (I admit the frequency of this one is borderline, though I think it would easily be the #1 topcrash if it affected non-DEBUG builds.)
Flags: needinfo?(mcmanus)
hi David, this one is on my radar - so I should have better answers soon. even if those answers are what to look for when you reproduce. I've just had a long run of afk distractions that is finally coming to an end, so when I dig out of the really serious stuff I can look at this.

I have an inkling it might be the root cause of why another patch was backed out.

Do you think a timeout might be involved in your reproductions? (e.g. 3 minutes since a particular tab was used... or maybe after a sleep?)
Flags: needinfo?(mcmanus)
...gathered rather aimlessly

(censored out my google cookie)
Bugzilla seems to have cut off the first few lines of that attachment (eek!), which were:

(gdb) inf thr
  Id   Target Id         Frame
David, thanks for this.

among other things it shows ::Activate() being called re-entrantly with the same this object. That's surprising. I'm not sure if its totally wrong yet, but its surprising for sure and something to probe at.

Also for my debug notes: at crash time there is 1 stream in the session transaciton hash but 0 in the streamID hash.. so the double add being complained about must be occurring in that smallish window (as they are cleaned up together).
frame 22 shows processpendingqforentry calling trydispatchtransaction() at line 1007. It is dispatching a transaction onto a newly formed connection that will speak ssl. The connection will turn out to be spdy, but we have only negotiated TCP at this point and haven't done NPN.

The return code from trydispatchtransaction() will be used to remove the entry from the pending queue (or not, if it does not dispatch it)... this is done on line 1015.

Unfortunately before returning to frame 22 the stack manages to iterate the pending transaction list again and re-dispatches the transaction  that frame 22 is working on.. thus this assert.

It does this because after spdy is negotiated via NPN that result is reported to the connection manager (reportspdyconnection()). We do that quickly so that any spdy coalescing maps can be established with the new SPDY connection. It then synchronously calls ProcessAllSpdyPendingQ() because its possible (likely even) that there are pending transactions that can now be dispatched thanks to the new connection and mapping.

Without the ABORT of a debug build, this second activate results in removal and closing of the transaction that is still on the stack in lower places.. including OnSocketWritable which is the crash in 787158. It's not exactly a lock for that bug, but it is consistent with it.

I think that can be fixed by simply posting the processallspdypendingq() logic back to the same thread rather than running it synchronously on the stack. That code that might be on the stack certainly does not expect to be re-entrant nor is there an urgency to bypass the normal socket thread work queue.

David, a million thank yous for hanging with this bug.
(In reply to Patrick McManus [:mcmanus] from comment #7)
> I think that can be fixed by simply posting the processallspdypendingq()
> logic back to the same thread rather than running it synchronously on the
> stack. That code that might be on the stack certainly does not expect to be
> re-entrant nor is there an urgency to bypass the normal socket thread work
> queue.

I agree, maybe after this is fixed do an audit of the whole http conn manager for such issues and add posts where appropriate?

> 
> David, a million thank yous for hanging with this bug.

Yes, I join also my thanks :)
Is this regression from bug 708415 or from bug 528288 or even something else?
(In reply to Honza Bambas (:mayhemer) from comment #9)
> Is this regression from bug 708415 or from bug 528288 or even something else?

seems to be here since 528288 (i.e. original spdy code)
Attached patch patch 0Splinter Review
honza if you could prioritize this review that would be great. Alex would like to try some reasonable things on beta that might be relevant to 787158 and I think this fits the bill.
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Attachment #723774 - Flags: review?(honzab.moz)
Comment on attachment 723774 [details] [diff] [review]
patch 0

this may be of help with 787158 which is interested in beta fixes asap, so I'll nom it before it lands. Obviously any approval is contingent on it landing on m-c.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): original spdy code
User impact if declined: race condition leading to instability.
Testing completed (on m-c, etc.): just hand testing so far
Risk to taking this patch (and alternatives if risky): change is very minimal
String or UUID changes made by this patch: none
Attachment #723774 - Flags: approval-mozilla-beta?
Attachment #723774 - Flags: approval-mozilla-aurora?
Blocks: 787158
Comment on attachment 723774 [details] [diff] [review]
patch 0

Review of attachment 723774 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab based on comment 7.

Crossing fingers!
Attachment #723774 - Flags: review?(honzab.moz) → review+
Comment on attachment 723774 [details] [diff] [review]
patch 0

As mentioned in comment 12 - approval is contingent on landing on m-c but doing approvals now to speed up the landing process in the morning.
Attachment #723774 - Flags: approval-mozilla-beta?
Attachment #723774 - Flags: approval-mozilla-beta+
Attachment #723774 - Flags: approval-mozilla-aurora?
Attachment #723774 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/aa68f61aed74
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: