Last Comment Bug 653530 - HTTP Transaction multiply dispatched
: HTTP Transaction multiply dispatched
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Patrick McManus [:mcmanus]
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks: 640003 651635
  Show dependency treegraph
 
Reported: 2011-04-28 12:16 PDT by Patrick McManus [:mcmanus]
Modified: 2012-03-29 02:06 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
affected


Attachments
double dispatch bug fix v1 (6.29 KB, patch)
2011-04-28 12:21 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Splinter Review
stack showing single-thread deadlock on socket transport thread (10.83 KB, text/plain; charset=UTF-8)
2011-04-28 16:35 PDT, David Baron :dbaron: ⌚️UTC-10
no flags Details
double dispatch bug fix v2 (6.45 KB, patch)
2011-04-29 12:27 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
asa: approval‑mozilla‑aurora-
Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] 2011-04-28 12:16:58 PDT
A transaction can be on the pendingQ after generating a nsHalfOpen. There is a bug that allows the same transaction to generate another nsHalfOpen without expiring the syn retry-timer. The same transaction will get dispatched to both of these connections when they complete, to ill effect.

As part of feature / bug 632948 non-dispatched transactions stay on this queue intentionally. The benefit of leaving it on the queue instead of binding it strictly to the nsHalfOpen is that an idle persistent connection might open up before the half-open is resolved and the transaction can be dispatched there instead with lower latency. That's awesome - and in my network traces I see it happen all the time.

Trivially fix this by making sure pendingQ transactions do not create new connection objects more than once per transaction.

right now this bug is on mozilla-central and aurora-5
Comment 1 Patrick McManus [:mcmanus] 2011-04-28 12:21:55 PDT
Created attachment 528923 [details] [diff] [review]
double dispatch bug fix v1
Comment 2 Patrick McManus [:mcmanus] 2011-04-28 15:43:05 PDT
(In reply to comment #0)

> As part of feature / bug 632948

correction: feature / bug 623948
Comment 3 David Baron :dbaron: ⌚️UTC-10 2011-04-28 16:35:23 PDT
Created attachment 528991 [details]
stack showing single-thread deadlock on socket transport thread

I hit this single-thread deadlock on the socket transport thread (beware the stack is in an debug+optimized build).  Note that frame #4 is trying to acquire the lock held on the same thread by frame #35.

mcmanus suggests on IRC (#developers) that this is a symptom of this bug:

[2011-04-28 15:41:38] <dbaron> so... how would I debug a currently running browser that doesn't seem to want to make any network connections?
...
[2011-04-28 15:52:18] <mcmanus> dbaron - I'm just about to break off for the day, but maybe bug 653530 would be of interest (patch attached).. 
[2011-04-28 15:53:10] <dbaron> mcmanus, how would I detect being in a bad state there?
[2011-04-28 15:54:25] <mcmanus> its ugly.. find the "ent" in nshttpconnectionmgr::getConnection() and see how long the "halfopen" array is.. it should basically be 0 except during an active tcp handshake
[2011-04-28 15:55:56] <mcmanus> dbaron - moz-central?
[2011-04-28 15:56:05] <dbaron> mcmanus, yes... but getconnection isn't getting called
[2011-04-28 15:56:38] <dbaron> mcmanus, I've learned so far that I am getting to nsHttpChannel::AsyncOpen and I'm not getting to nsHttpConnectionMgr::GetConnection
[2011-04-28 15:57:11] <mcmanus> do you get nsHttpConnectionMgr::ProcessNewTransaction?
[2011-04-28 15:57:50] <mcmanus> I'm guessing no. hmm.
...
[2011-04-28 16:01:58] <dbaron> I do hit nsHttpConnectionMgr::AddTransaction
[2011-04-28 16:03:38] <dbaron> but that doesn't get through to ProcessNewTransaction
[2011-04-28 16:12:45] <dbaron> oh, PSM looks like it did a same-thread deadlock on thread 17
...
[2011-04-28 16:25:09] <mcmanus> dbaron - you've got processpendingqforentry on that stack, before it recurses weirdly. and that is the root function of the bug I mentioned earlier.. it is probably related.
[2011-04-28 16:26:37] <mcmanus> dbaron - yeah, its almost certainly that. I hadn't seen it manifest itself like that. Anyhow hopefully honzab will review the patch tomorrow and it can get checked in.
[2011-04-28 16:28:53] <dbaron> mcmanus, so not related to the fact that I have ocsp.require set?
[2011-04-28 16:29:20] <mcmanus> no.. just a bug in http connection handling.
[2011-04-28 16:30:05] <mcmanus> the bug requires a pretty quick close/open connection cycle - so maybe the ocsp code is the driver there. . or maybe not.
[2011-04-28 16:30:27] <dbaron> mcmanus, anyway, I'll stick the stack on the bug
[2011-04-28 16:30:40] <mcmanus> thanks
Comment 4 Patrick McManus [:mcmanus] 2011-04-28 18:20:53 PDT
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

That stack is related but a little different.
Comment 5 Patrick McManus [:mcmanus] 2011-04-28 18:44:43 PDT
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

The stack is confirmed to be the same issue, the duplicate dispatch results in the same sockettransport object being present in 2 different httpconnection objects on that stack, which is the bug and deadlock.
Comment 6 Honza Bambas (:mayhemer) 2011-04-29 11:33:06 PDT
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>@@ -556,17 +556,29 @@ nsHttpConnectionMgr::ProcessPendingQForE
>+            // if the pending transaction is already half open we want to
>+            // consider scheduling it on a recently released idle pconn,
>+            // but we do not want to start another open cycle for it

Rephrase like this please (also first letter capital and period at the end):

"When this transaction has already established a half-open connection, we want to prevent any duplicate half-open connections be establish and bind to this transaction.  Allow only use of an idle persistent connection (if found) for transactions referred by a half-open connection.  See bug 653530 for more details."

In general, please avoid usage of abbrevs (e.g. as "pconn") everywhere in the code.  It is confusing for those that read the code for the first time.  We have already confusion with "trans" being both transaction and transport, so no need to introduce more.

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.h
>     void     GetConnection(nsConnectionEntry *, nsHttpTransaction *,
>-                           nsHttpConnection **);
>+                           PRBool, nsHttpConnection **);

Maybe use 'bool' instead of 'PRBool' ?  Also in the other code.


Good catch.

r=honzab with the updates.
Comment 7 Patrick McManus [:mcmanus] 2011-04-29 12:27:54 PDT
Created attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

update based on review.. carry forward r=honzab
Comment 8 Patrick McManus [:mcmanus] 2011-04-29 14:13:25 PDT
http://hg.mozilla.org/mozilla-central/rev/9b12c34434f4
Comment 9 :Ms2ger (⌚ UTC+1/+2) 2011-04-30 07:58:57 PDT
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

>diff --git a/netwerk/protocol/http/nsHttpConnectionMgr.cpp b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>--- a/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>@@ -556,17 +556,31 @@ nsHttpConnectionMgr::ProcessPendingQForE
>+            for (PRInt32 j = 0; j < ((PRInt32) ent->mHalfOpens.Length()); j++) {

What's wrong with PRUint32 j? Please fix.
Comment 10 Honza Bambas (:mayhemer) 2011-05-02 05:11:29 PDT
> Comment on attachment 529172 [details] [diff] [review] [review]
> What's wrong with PRUint32 j? Please fix.

This seems to be on many other places.  Rather fix this all in a small side bug.
Comment 11 Johnathan Nightingale [:johnath] 2011-05-04 12:13:16 PDT
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

Approving this for aurora, but sayre is going to find patrick to talk about whether we should just kick syn retry out of this release, since this is the second issue we've discovered
Comment 12 Patrick McManus [:mcmanus] 2011-05-05 19:08:13 PDT
I'm going to back out of aurora-5 and not land this there. Though it will stay active on mozilla-central.
Comment 13 Asa Dotzler [:asa] 2011-05-11 11:36:21 PDT
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

removing approval since this isn't intended to land.

Note You need to log in before you can comment on or make changes to this bug.