Closed Bug 653530 Opened 13 years ago Closed 13 years ago

HTTP Transaction multiply dispatched


(Core :: Networking: HTTP, defect)

Not set



Tracking Status
firefox5 - affected


(Reporter: mcmanus, Assigned: mcmanus)




(2 files, 1 obsolete file)

A transaction can be on the pendingQ after generating a nsHalfOpen. There is a bug that allows the same transaction to generate another nsHalfOpen without expiring the syn retry-timer. The same transaction will get dispatched to both of these connections when they complete, to ill effect.

As part of feature / bug 632948 non-dispatched transactions stay on this queue intentionally. The benefit of leaving it on the queue instead of binding it strictly to the nsHalfOpen is that an idle persistent connection might open up before the half-open is resolved and the transaction can be dispatched there instead with lower latency. That's awesome - and in my network traces I see it happen all the time.

Trivially fix this by making sure pendingQ transactions do not create new connection objects more than once per transaction.

right now this bug is on mozilla-central and aurora-5
Attached patch double dispatch bug fix v1 (obsolete) — Splinter Review
Assignee: nobody → mcmanus
Attachment #528923 - Flags: review?(honzab.moz)
(In reply to comment #0)

> As part of feature / bug 632948

correction: feature / bug 623948
I hit this single-thread deadlock on the socket transport thread (beware the stack is in an debug+optimized build).  Note that frame #4 is trying to acquire the lock held on the same thread by frame #35.

mcmanus suggests on IRC (#developers) that this is a symptom of this bug:

[2011-04-28 15:41:38] <dbaron> so... how would I debug a currently running browser that doesn't seem to want to make any network connections?
[2011-04-28 15:52:18] <mcmanus> dbaron - I'm just about to break off for the day, but maybe bug 653530 would be of interest (patch attached).. 
[2011-04-28 15:53:10] <dbaron> mcmanus, how would I detect being in a bad state there?
[2011-04-28 15:54:25] <mcmanus> its ugly.. find the "ent" in nshttpconnectionmgr::getConnection() and see how long the "halfopen" array is.. it should basically be 0 except during an active tcp handshake
[2011-04-28 15:55:56] <mcmanus> dbaron - moz-central?
[2011-04-28 15:56:05] <dbaron> mcmanus, yes... but getconnection isn't getting called
[2011-04-28 15:56:38] <dbaron> mcmanus, I've learned so far that I am getting to nsHttpChannel::AsyncOpen and I'm not getting to nsHttpConnectionMgr::GetConnection
[2011-04-28 15:57:11] <mcmanus> do you get nsHttpConnectionMgr::ProcessNewTransaction?
[2011-04-28 15:57:50] <mcmanus> I'm guessing no. hmm.
[2011-04-28 16:01:58] <dbaron> I do hit nsHttpConnectionMgr::AddTransaction
[2011-04-28 16:03:38] <dbaron> but that doesn't get through to ProcessNewTransaction
[2011-04-28 16:12:45] <dbaron> oh, PSM looks like it did a same-thread deadlock on thread 17
[2011-04-28 16:25:09] <mcmanus> dbaron - you've got processpendingqforentry on that stack, before it recurses weirdly. and that is the root function of the bug I mentioned earlier.. it is probably related.
[2011-04-28 16:26:37] <mcmanus> dbaron - yeah, its almost certainly that. I hadn't seen it manifest itself like that. Anyhow hopefully honzab will review the patch tomorrow and it can get checked in.
[2011-04-28 16:28:53] <dbaron> mcmanus, so not related to the fact that I have ocsp.require set?
[2011-04-28 16:29:20] <mcmanus> no.. just a bug in http connection handling.
[2011-04-28 16:30:05] <mcmanus> the bug requires a pretty quick close/open connection cycle - so maybe the ocsp code is the driver there. . or maybe not.
[2011-04-28 16:30:27] <dbaron> mcmanus, anyway, I'll stick the stack on the bug
[2011-04-28 16:30:40] <mcmanus> thanks
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

That stack is related but a little different.
Attachment #528923 - Flags: review?(honzab.moz) → review-
Attachment #528923 - Flags: review-
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

The stack is confirmed to be the same issue, the duplicate dispatch results in the same sockettransport object being present in 2 different httpconnection objects on that stack, which is the bug and deadlock.
Attachment #528923 - Flags: review?(honzab.moz)
Comment on attachment 528923 [details] [diff] [review]
double dispatch bug fix v1

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>@@ -556,17 +556,29 @@ nsHttpConnectionMgr::ProcessPendingQForE
>+            // if the pending transaction is already half open we want to
>+            // consider scheduling it on a recently released idle pconn,
>+            // but we do not want to start another open cycle for it

Rephrase like this please (also first letter capital and period at the end):

"When this transaction has already established a half-open connection, we want to prevent any duplicate half-open connections be establish and bind to this transaction.  Allow only use of an idle persistent connection (if found) for transactions referred by a half-open connection.  See bug 653530 for more details."

In general, please avoid usage of abbrevs (e.g. as "pconn") everywhere in the code.  It is confusing for those that read the code for the first time.  We have already confusion with "trans" being both transaction and transport, so no need to introduce more.

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.h
>     void     GetConnection(nsConnectionEntry *, nsHttpTransaction *,
>-                           nsHttpConnection **);
>+                           PRBool, nsHttpConnection **);

Maybe use 'bool' instead of 'PRBool' ?  Also in the other code.

Good catch.

r=honzab with the updates.
Attachment #528923 - Flags: review?(honzab.moz) → review+
update based on review.. carry forward r=honzab
Attachment #528923 - Attachment is obsolete: true
Attachment #529172 - Flags: review+
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #529172 - Flags: approval-mozilla-aurora?
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

>diff --git a/netwerk/protocol/http/nsHttpConnectionMgr.cpp b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>--- a/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>@@ -556,17 +556,31 @@ nsHttpConnectionMgr::ProcessPendingQForE
>+            for (PRInt32 j = 0; j < ((PRInt32) ent->mHalfOpens.Length()); j++) {

What's wrong with PRUint32 j? Please fix.
> Comment on attachment 529172 [details] [diff] [review] [review]
> What's wrong with PRUint32 j? Please fix.

This seems to be on many other places.  Rather fix this all in a small side bug.
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

Approving this for aurora, but sayre is going to find patrick to talk about whether we should just kick syn retry out of this release, since this is the second issue we've discovered
Attachment #529172 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I'm going to back out of aurora-5 and not land this there. Though it will stay active on mozilla-central.
Comment on attachment 529172 [details] [diff] [review]
double dispatch bug fix v2

removing approval since this isn't intended to land.
Attachment #529172 - Flags: approval-mozilla-aurora+ → approval-mozilla-aurora-
You need to log in before you can comment on or make changes to this bug.