760608 - Preconnecting first connection with NullTransaction to non-SPDY HTTPS servers broken, hang for 30 seconds

Reporter

Description

•

12 years ago

Attached file example log — Details

This is here more for a record and potential backout/disable-only bug, since it will be fixed with bug 715905.

I found this out during work on allowing HTTPS testing in xpcshell.

Problem is that NullTransaction gets stuck on waiting for a response.  But we didn't send any request, so nothing can arrive from the server.

Transaction timeout logic kills this transaction in 30 seconds and the original pending transaction gets then restarted.

I believe I've hit this bug few times my self during daily usage wondering what's the hack is going on.


I vote for disabling of using NullTransaction for SSL pre-negotiation.


The log attached is an example of using a fresh profile with just following changes:
- disabled forgery and attack detection
- HTTP cache was emptied before start
- blank start page
- added an SSL exception for my local IIS testing server's certificate

I was connecting my local IIS server as https://127.0.0.1/
I experience the same hang with ssltunnel

To reproduce this bug opening cache entry must be slower then creation of the connection.  On some mobile devices in WiFi I believe this can happen quit often.

This is not reproducible when I'm connecting https://localhost (both IIS and ssltunnel) since the first address necko tries is ::1 and it fails.  This is changing the timing and no NullTransaction is then created.

Patrick McManus [:mcmanus]

Assignee

Comment 1

•

12 years ago

I think this is a pipeline integration bug. I'm not sure what you're seeing that makes it look ssl on http/1 only.. ?

here's the key part:

2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpConnection::EnsureNPNComplete cc26440 negotiated to ''
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpConnection::EnsureNPNComplete setting complete to true
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]:   writing transaction request stream

NPN and the SSL handshake go ok ('' - plain http). and then it calls readsegments..

2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpPipeline::ReadSegments [this=cb320b8 count=32768]
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]:   ReadSegments returned [rv=0 read=0 sock-cond=0]

nulltransaction::readsegments is supposed to return BASE_STREAM_CLOSED and set isDone on itself - which will result in the transaction being completed.. but the pipeline layer has turned that into rv=NS_OK somehow.

I would guess this pipeline code is the issue
      // return EOF if send buffer is empty
        if (avail == 0) {
            *countRead = 0;
            return NS_OK;
        }

Honza Bambas (:mayhemer)

Reporter

Updated

•

12 years ago

Blocks: 466524

Honza Bambas (:mayhemer)

Reporter

Comment 2

•

12 years ago

Patrick, do you think you can provide a simple and stable fix for this?

Patrick McManus [:mcmanus]

Assignee

Comment 3

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #2)
> Patrick, do you think you can provide a simple and stable fix for this?

assuming my interpretation of the log in comment 1 is correct - definitely.

Honza Bambas (:mayhemer)

Reporter

Updated

•

12 years ago

status-firefox15: --- → affected

tracking-firefox15: --- → ?

Honza Bambas (:mayhemer)

Reporter

Comment 4

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #1)
> I'm not sure what you're seeing
> that makes it look ssl on http/1 only.. ?

Non-ssl connections are not armed with NullTransaction.

SPDY-sends out some data very soon, that moves the negotiation forward again, I think.  I've simply never seen case of a hang when there were SPDY and also NullTransaction involved on a connection.  It may also indicate the bug is actually in the pipelining code, since pipelines are not present when running SPDY.

Boris Zbarsky [:bzbarsky]

Updated

•

12 years ago

Blocks: 758899

Honza Bambas (:mayhemer)

Reporter

Comment 5

•

12 years ago

Attached patch v1 — Details — Splinter Review

This is a simple patch disabling SSL pre-negotiation with NullTransaction.

Assignee: nobody → honzab.moz

Status: NEW → ASSIGNED

Attachment #629799 - Flags: review?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 6

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #5)
> Created attachment 629799 [details] [diff] [review]
> v1
> 
> This is a simple patch disabling SSL pre-negotiation with NullTransaction.

I would hope we don't need that - I'm testing a very simple fix now. SSL is a big part of this feature, especially when combined with the search box logic. And we're only on nightly (with it likely slipping into week 1 of aurora).. so lots of time to verify the

Honza Bambas (:mayhemer)

Reporter

Comment 7

•

12 years ago

We really should fix this quickly and safely before tomorrow.

Patrick McManus [:mcmanus]

Assignee

Comment 8

•

12 years ago

Attached patch patch 0 — Details — Splinter Review

Attachment #629832 - Flags: review?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Comment 9

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #8)
> Created attachment 629832 [details] [diff] [review]
> patch 0

@try too.

took me a little while to repro - for whatever reason https://www.mozilla.com was writing to the connection soon after handshake (even though it doesn't speak spdy), triggering OnSocketReadable so the bug wasn't evident. I found another SSL server that didn't do that and could clearly see the problem by introducing a short sleep in nsHttpChannel immediately after the SpeculativeLookup call. (so the main thread slept for 300ms while neck did its thing - like a slow cache lookup).

I could then, finally, confirm the patch did what was intended and fixed the issue.

Patrick McManus [:mcmanus]

Assignee

Comment 10

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #4)

> 
> SPDY-sends out some data very soon, that moves the negotiation forward
> again, I think.

I didn't think of that. You're seeing the settings frame from the server that it normally sends first thing after the 3whs. (it's not a required element, but everybody does it.)

Honza Bambas (:mayhemer)

Reporter

Comment 11

•

12 years ago

Comment on attachment 629832 [details] [diff] [review]
patch 0

Review of attachment 629832 [details] [diff] [review]:
-----------------------------------------------------------------

I have to confirm this fixes this issue.

I'm leaving this open for now.  Since I'm not able to test this patch very well, I'm not much keen to land this.

::: netwerk/protocol/http/NullHttpTransaction.cpp
@@ +91,5 @@
>  
>  PRUint32
>  NullHttpTransaction::Available()
>  {
> +  mIsDone = true;

You are just putting code where it fits to you.  Call to Available() changing state of an object is quit strange.

::: netwerk/protocol/http/nsHttpPipeline.cpp
@@ +613,5 @@
>          // return EOF if send buffer is empty
>          if (avail == 0) {
>              *countRead = 0;
> +            if (IsDone())
> +                return NS_BASE_STREAM_CLOSED;

This change worries me since we have absolutely no tests for keep-alive and pipelined connections (my fault).  This is a core change hard to test what trouble it may cause.

Honza Bambas (:mayhemer)

Reporter

Comment 12

•

12 years ago

Comment on attachment 629832 [details] [diff] [review]
patch 0

Testing with pipelining on and aggressive using google image search as source of large parallelism this works.

Josh, I'd like to let you potentially veto based on whether we want to disable a never released feature of preconnecting ssl session while opening a cache entry (my patch) OR to allow potentially dangerous change at core level that may influence user even w/o pipelining on (Pat's patch).

I more tend to put this in now, however I hate only manual testing.

Attachment #629832 - Flags: superreview?(joshmoz)

Attachment #629832 - Flags: review?(honzab.moz)

Attachment #629832 - Flags: review+

Patrick McManus [:mcmanus]

Assignee

Comment 13

•

12 years ago

> 
> ::: netwerk/protocol/http/NullHttpTransaction.cpp
> @@ +91,5 @@
> >  
> >  PRUint32
> >  NullHttpTransaction::Available()
> >  {
> > +  mIsDone = true;
> 
> You are just putting code where it fits to you.  Call to Available()
> changing state of an object is quit strange.
> 

not true!

you might recall that until a couple weeks ago with the happy-eyeball change, null-transaction was *always* done. (there wasn't even a state variable). The change was made so that it only marked itself done based on I/O happening - so that the TCP retry stuff could determine whether or not the handshake was still in progress. This change is consistent with that - nsHttpPipeline uses nsAHttpTransaction::Available() as a I/O function. (in lieu of ReadSegments basically).

as a testing comment - the only user of nsAHttpTransaction::Available is nsHttpPipeline

Patrick McManus [:mcmanus]

Assignee

Comment 14

•

12 years ago

I'm going to look into another more conservative approach. Honza, you might be more comfortable with that if it bears fruit.

Josh Aas

Comment 15

•

12 years ago

Comment on attachment 629832 [details] [diff] [review]
patch 0

Review of attachment 629832 [details] [diff] [review]:
-----------------------------------------------------------------

This is a request for comment on a Necko-specific issue that I'm not qualified to assess. If you're OK with either option and you need a tie-breaker opinion I think that should be Patrick's.

Attachment #629832 - Flags: superreview?(joshmoz)

Patrick McManus [:mcmanus]

Assignee

Comment 16

•

12 years ago

Attached patch alternative 1 - v0 (obsolete) — Details — Splinter Review

Here is an alternative approach - do not use nsHttpPipeline for CLASS_SOLO transactions. NullHttpTransaction's are SOLO.

SOLO's are never reused until the connection is quiescent, so this doesn't remove any functionality. (specifically it is not a return to the old bursty behavior.. transactions eligible to pipeline are not SOLO just because they don't have a pipeline partner at the time of scheduling).. SOLO is for things like audio/video, and XHR, etc..

I liked the property of driving everything through nsHttpPipeline, but it was leading to hacks like the other version of this patch. So I'm ok with the split.

The code for this patch is still small. It also contains an addition of the Keep-Alive flag to the NullTransaction caps in case the in originating transaction was missing it. It doesn't make much sense without it and I saw that happen in at least one of my tests.

Attachment #629957 - Flags: review?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Comment 17

•

12 years ago

Comment on attachment 629957 [details] [diff] [review]
alternative 1 - v0

I need to dig into a test fail before resetting r?

Attachment #629957 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Reporter

Comment 18

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #13)
> not true!
> 
> you might recall that until a couple weeks ago with the happy-eyeball
> change, null-transaction was *always* done. (there wasn't even a state
> variable). The change was made so that it only marked itself done based on
> I/O happening - so that the TCP retry stuff could determine whether or not
> the handshake was still in progress. 

Yes, the IO happening means to call Read or WriteSegments to me.

> This change is consistent with that -
> nsHttpPipeline uses nsAHttpTransaction::Available() as a I/O function. (in
> lieu of ReadSegments basically).

Hmm.. calling Available doesn't mean to do actual IO.  Anyone can call this method to just check the state.

> 
> as a testing comment - the only user of nsAHttpTransaction::Available is
> nsHttpPipeline

Yes, for now.  When we introduce a change that may call Available sooner then actual Read/Write is done then we may regress the IPv6 fallback patch that introduced mIsDone.  That is simply not correct to me.

(In reply to Patrick McManus [:mcmanus] from comment #16)
> Created attachment 629957 [details] [diff] [review]
> alternative 1 - v0
> 
> Here is an alternative approach - do not use nsHttpPipeline for CLASS_SOLO
> transactions. NullHttpTransaction's are SOLO.

Thanks!  That seems to me to be the correct approach, I like it.

Honza Bambas (:mayhemer)

Reporter

Updated

•

12 years ago

Assignee: honzab.moz → mcmanus

Bill Gianopoulos [:WG9s]

Comment 19

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #16)
> Created attachment 629957 [details] [diff] [review]
> alternative 1 - v0

I can report that I have verified that this patch corrects the issue I reported in Bug 758899.

I have builds available at http://www.wg9s.com/mozilla/firefox that include this patch.

So far, Win32 and Linux 32-bit.  Android (Native and XUL) and Linux 64-bit are still in progress.

Patrick McManus [:mcmanus]

Assignee

Comment 20

•

12 years ago

Attached patch alternative 1 - v1 — Details — Splinter Review

it turns out we do need to disable the SSL handshake via NullHttpTransaction in the presence of a HTTP proxy because NullHttpTransaction does not know how to drive the CONNECT sequence. (all HTTP parsing is done in http transaction, and the CONNECT tunnel is a HTTP transaction of sorts).

on one hand this is unfortunate as SSL is end to end even in the presence of the proxy and a potentially high lantecny operation.. otoh explicit proxy users are a distinct minority so most folks will still get the optimizaton.

rather than teaching nullhttptransaction about how to do tunnels we can just expect that this is part of bug 715905.

@try again

Attachment #629957 - Attachment is obsolete: true

Attachment #630179 - Flags: review?(honzab.moz)

Bill Gianopoulos [:WG9s]

Comment 21

•

12 years ago

I will be doing my windows build over with this (since it is the only one from which I can test the issue from bug 758899).  I will report back with results.

Patrick McManus [:mcmanus]

Assignee

Comment 22

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #18)

> Hmm.. calling Available doesn't mean to do actual IO.  Anyone can call this
> method to just check the state.
> 

I don't really buy it - it's not an idl or some header with a inter-module scope. We can define it how we want to for the most value. And it seems like I/O to me in the same sense that PEEK is clearly I/O even though its non-destructive.

But I/O is probably the wrong thing anyhow.. mDone was really introduced to draw a line between the TCP/SSL handshake and the other transaction logic.. probly should have just introduced a settr at that point rather than implicitly from I/O functions.

likely all moot anyhow as I suspect we're going with the no-nshttptransaction-on-solo patch anyhow :)

Patrick McManus [:mcmanus]

Assignee

Updated

•

12 years ago

Depends on: 760912

Patrick McManus [:mcmanus]

Assignee

Comment 23

•

12 years ago

(In reply to Bill Gianopoulos [:WG9s] from comment #21)
> I will be doing my windows build over with this (since it is the only one
> from which I can test the issue from bug 758899).  I will report back with
> results.

thanks bill

Blocks: 760912

No longer depends on: 760912

Patrick McManus [:mcmanus]

Assignee

Comment 24

•

12 years ago

try is looking fine

Honza Bambas (:mayhemer)

Reporter

Comment 25

•

12 years ago

Comment on attachment 630179 [details] [diff] [review]
alternative 1 - v1

Review of attachment 630179 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.  This looks good to me.  Tested for my case.

r=honzab

Attachment #630179 - Flags: review?(honzab.moz) → review+

Bill Gianopoulos [:WG9s]

Comment 26

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #20)
> Created attachment 630179 [details] [diff] [review]
> alternative 1 - v1
This patch has been tested and fixes bug 758899 also.

Patrick McManus [:mcmanus]

Assignee

Comment 27

•

12 years ago

we'll let this bake for a few days on m-c and then nom for aurora if all goes well.

https://hg.mozilla.org/integration/mozilla-inbound/rev/3b1f2455579a

status-firefox16: --- → fixed

Target Milestone: --- → mozilla16

Ed Morley [:emorley]

Comment 30

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/3b1f2455579a

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Bill Gianopoulos [:WG9s]

Comment 31

•

12 years ago

Verified fixed with the latest nightly.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/16.0 Firefox/16.0a1
http://hg.mozilla.org/mozilla-central/rev/7e4c2abb9fc9

Status: RESOLVED → VERIFIED

Patrick McManus [:mcmanus]

Assignee

Comment 32

•

12 years ago

Comment on attachment 630179 [details] [diff] [review]
alternative 1 - v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 729123
User impact if declined: occasional connection hang with ssl site. reload fixes it.
Testing completed (on m-c, etc.): on m-c for 5 days and verified fixed by non-developer
Risk to taking this patch (and alternatives if risky): low - it fundamentally reverts one particular code path to old behavior
String or UUID changes made by this patch: none 

bug impacts only nightly (fixed already) and aurora.

Attachment #630179 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #630179 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Patrick McManus [:mcmanus]

Assignee

Comment 33

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/384eece25973

status-firefox15: affected → fixed

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

tracking-firefox15: ? → ---

Patrick McManus [:mcmanus]

Assignee

Updated

•

12 years ago

Attachment #629799 - Flags: review?(mcmanus)

Honza Bambas (:mayhemer)

Reporter

Updated

•

12 years ago

Whiteboard: [http-conn]

example log 12 years ago Honza Bambas (:mayhemer) 125.25 KB, text/plain		Details
v1 12 years ago Honza Bambas (:mayhemer) 2.09 KB, patch		Details \| Diff \| Splinter Review
patch 0 12 years ago Patrick McManus [:mcmanus] 1.37 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review
alternative 1 - v0 12 years ago Patrick McManus [:mcmanus] 3.83 KB, patch		Details \| Diff \| Splinter Review
alternative 1 - v1 12 years ago Patrick McManus [:mcmanus] 5.23 KB, patch	mayhemer : review+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review