Preconnecting first connection with NullTransaction to non-SPDY HTTPS servers broken, hang for 30 seconds

VERIFIED FIXED in Firefox 15

Status

()

Core
Networking: HTTP
--
major
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: mayhemer, Assigned: mcmanus)

Tracking

(Blocks: 1 bug)

Trunk
mozilla16
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox15 fixed, firefox16 fixed)

Details

(Whiteboard: [http-conn])

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
Created attachment 629260 [details]
example log

This is here more for a record and potential backout/disable-only bug, since it will be fixed with bug 715905.

I found this out during work on allowing HTTPS testing in xpcshell.

Problem is that NullTransaction gets stuck on waiting for a response.  But we didn't send any request, so nothing can arrive from the server.

Transaction timeout logic kills this transaction in 30 seconds and the original pending transaction gets then restarted.

I believe I've hit this bug few times my self during daily usage wondering what's the hack is going on.


I vote for disabling of using NullTransaction for SSL pre-negotiation.


The log attached is an example of using a fresh profile with just following changes:
- disabled forgery and attack detection
- HTTP cache was emptied before start
- blank start page
- added an SSL exception for my local IIS testing server's certificate

I was connecting my local IIS server as https://127.0.0.1/
I experience the same hang with ssltunnel

To reproduce this bug opening cache entry must be slower then creation of the connection.  On some mobile devices in WiFi I believe this can happen quit often.

This is not reproducible when I'm connecting https://localhost (both IIS and ssltunnel) since the first address necko tries is ::1 and it fails.  This is changing the timing and no NullTransaction is then created.
(Assignee)

Comment 1

5 years ago
I think this is a pipeline integration bug. I'm not sure what you're seeing that makes it look ssl on http/1 only.. ?

here's the key part:

2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpConnection::EnsureNPNComplete cc26440 negotiated to ''
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpConnection::EnsureNPNComplete setting complete to true
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]:   writing transaction request stream

NPN and the SSL handshake go ok ('' - plain http). and then it calls readsegments..

2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]: nsHttpPipeline::ReadSegments [this=cb320b8 count=32768]
2012-06-01 17:39:11.771000 UTC - 5760[6ec1d48]:   ReadSegments returned [rv=0 read=0 sock-cond=0]

nulltransaction::readsegments is supposed to return BASE_STREAM_CLOSED and set isDone on itself - which will result in the transaction being completed.. but the pipeline layer has turned that into rv=NS_OK somehow.

I would guess this pipeline code is the issue
      // return EOF if send buffer is empty
        if (avail == 0) {
            *countRead = 0;
            return NS_OK;
        }
(Reporter)

Updated

5 years ago
Blocks: 466524
(Reporter)

Comment 2

5 years ago
Patrick, do you think you can provide a simple and stable fix for this?
(Assignee)

Comment 3

5 years ago
(In reply to Honza Bambas (:mayhemer) from comment #2)
> Patrick, do you think you can provide a simple and stable fix for this?

assuming my interpretation of the log in comment 1 is correct - definitely.
(Reporter)

Updated

5 years ago
status-firefox15: --- → affected
tracking-firefox15: --- → ?
(Reporter)

Comment 4

5 years ago
(In reply to Patrick McManus [:mcmanus] from comment #1)
> I'm not sure what you're seeing
> that makes it look ssl on http/1 only.. ?

Non-ssl connections are not armed with NullTransaction.

SPDY-sends out some data very soon, that moves the negotiation forward again, I think.  I've simply never seen case of a hang when there were SPDY and also NullTransaction involved on a connection.  It may also indicate the bug is actually in the pipelining code, since pipelines are not present when running SPDY.

Updated

5 years ago
Blocks: 758899
(Reporter)

Comment 5

5 years ago
Created attachment 629799 [details] [diff] [review]
v1

This is a simple patch disabling SSL pre-negotiation with NullTransaction.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #629799 - Flags: review?(mcmanus)
(Assignee)

Comment 6

5 years ago
(In reply to Honza Bambas (:mayhemer) from comment #5)
> Created attachment 629799 [details] [diff] [review]
> v1
> 
> This is a simple patch disabling SSL pre-negotiation with NullTransaction.

I would hope we don't need that - I'm testing a very simple fix now. SSL is a big part of this feature, especially when combined with the search box logic. And we're only on nightly (with it likely slipping into week 1 of aurora).. so lots of time to verify the
(Reporter)

Comment 7

5 years ago
We really should fix this quickly and safely before tomorrow.
(Assignee)

Comment 8

5 years ago
Created attachment 629832 [details] [diff] [review]
patch 0
Attachment #629832 - Flags: review?(honzab.moz)
(Assignee)

Comment 9

5 years ago
(In reply to Patrick McManus [:mcmanus] from comment #8)
> Created attachment 629832 [details] [diff] [review]
> patch 0

@try too.

took me a little while to repro - for whatever reason https://www.mozilla.com was writing to the connection soon after handshake (even though it doesn't speak spdy), triggering OnSocketReadable so the bug wasn't evident. I found another SSL server that didn't do that and could clearly see the problem by introducing a short sleep in nsHttpChannel immediately after the SpeculativeLookup call. (so the main thread slept for 300ms while neck did its thing - like a slow cache lookup).

I could then, finally, confirm the patch did what was intended and fixed the issue.
(Assignee)

Comment 10

5 years ago
(In reply to Honza Bambas (:mayhemer) from comment #4)

> 
> SPDY-sends out some data very soon, that moves the negotiation forward
> again, I think.

I didn't think of that. You're seeing the settings frame from the server that it normally sends first thing after the 3whs. (it's not a required element, but everybody does it.)
(Reporter)

Comment 11

5 years ago
Comment on attachment 629832 [details] [diff] [review]
patch 0

Review of attachment 629832 [details] [diff] [review]:
-----------------------------------------------------------------

I have to confirm this fixes this issue.

I'm leaving this open for now.  Since I'm not able to test this patch very well, I'm not much keen to land this.

::: netwerk/protocol/http/NullHttpTransaction.cpp
@@ +91,5 @@
>  
>  PRUint32
>  NullHttpTransaction::Available()
>  {
> +  mIsDone = true;

You are just putting code where it fits to you.  Call to Available() changing state of an object is quit strange.

::: netwerk/protocol/http/nsHttpPipeline.cpp
@@ +613,5 @@
>          // return EOF if send buffer is empty
>          if (avail == 0) {
>              *countRead = 0;
> +            if (IsDone())
> +                return NS_BASE_STREAM_CLOSED;

This change worries me since we have absolutely no tests for keep-alive and pipelined connections (my fault).  This is a core change hard to test what trouble it may cause.
(Reporter)

Comment 12

5 years ago
Comment on attachment 629832 [details] [diff] [review]
patch 0

Testing with pipelining on and aggressive using google image search as source of large parallelism this works.

Josh, I'd like to let you potentially veto based on whether we want to disable a never released feature of preconnecting ssl session while opening a cache entry (my patch) OR to allow potentially dangerous change at core level that may influence user even w/o pipelining on (Pat's patch).

I more tend to put this in now, however I hate only manual testing.
Attachment #629832 - Flags: superreview?(joshmoz)
Attachment #629832 - Flags: review?(honzab.moz)
Attachment #629832 - Flags: review+
(Assignee)

Comment 13

5 years ago
> 
> ::: netwerk/protocol/http/NullHttpTransaction.cpp
> @@ +91,5 @@
> >  
> >  PRUint32
> >  NullHttpTransaction::Available()
> >  {
> > +  mIsDone = true;
> 
> You are just putting code where it fits to you.  Call to Available()
> changing state of an object is quit strange.
> 

not true!

you might recall that until a couple weeks ago with the happy-eyeball change, null-transaction was *always* done. (there wasn't even a state variable). The change was made so that it only marked itself done based on I/O happening - so that the TCP retry stuff could determine whether or not the handshake was still in progress. This change is consistent with that - nsHttpPipeline uses nsAHttpTransaction::Available() as a I/O function. (in lieu of ReadSegments basically).

as a testing comment - the only user of nsAHttpTransaction::Available is nsHttpPipeline
(Assignee)

Comment 14

5 years ago
I'm going to look into another more conservative approach. Honza, you might be more comfortable with that if it bears fruit.

Comment 15

5 years ago
Comment on attachment 629832 [details] [diff] [review]
patch 0

Review of attachment 629832 [details] [diff] [review]:
-----------------------------------------------------------------

This is a request for comment on a Necko-specific issue that I'm not qualified to assess. If you're OK with either option and you need a tie-breaker opinion I think that should be Patrick's.
Attachment #629832 - Flags: superreview?(joshmoz)
(Assignee)

Comment 16

5 years ago
Created attachment 629957 [details] [diff] [review]
alternative 1 - v0

Here is an alternative approach - do not use nsHttpPipeline for CLASS_SOLO transactions. NullHttpTransaction's are SOLO.

SOLO's are never reused until the connection is quiescent, so this doesn't remove any functionality. (specifically it is not a return to the old bursty behavior.. transactions eligible to pipeline are not SOLO just because they don't have a pipeline partner at the time of scheduling).. SOLO is for things like audio/video, and XHR, etc..

I liked the property of driving everything through nsHttpPipeline, but it was leading to hacks like the other version of this patch. So I'm ok with the split.

The code for this patch is still small. It also contains an addition of the Keep-Alive flag to the NullTransaction caps in case the in originating transaction was missing it. It doesn't make much sense without it and I saw that happen in at least one of my tests.
Attachment #629957 - Flags: review?(honzab.moz)
(Assignee)

Comment 17

5 years ago
Comment on attachment 629957 [details] [diff] [review]
alternative 1 - v0

I need to dig into a test fail before resetting r?
Attachment #629957 - Flags: review?(honzab.moz)
(Reporter)

Comment 18

5 years ago
(In reply to Patrick McManus [:mcmanus] from comment #13)
> not true!
> 
> you might recall that until a couple weeks ago with the happy-eyeball
> change, null-transaction was *always* done. (there wasn't even a state
> variable). The change was made so that it only marked itself done based on
> I/O happening - so that the TCP retry stuff could determine whether or not
> the handshake was still in progress. 

Yes, the IO happening means to call Read or WriteSegments to me.

> This change is consistent with that -
> nsHttpPipeline uses nsAHttpTransaction::Available() as a I/O function. (in
> lieu of ReadSegments basically).

Hmm.. calling Available doesn't mean to do actual IO.  Anyone can call this method to just check the state.

> 
> as a testing comment - the only user of nsAHttpTransaction::Available is
> nsHttpPipeline

Yes, for now.  When we introduce a change that may call Available sooner then actual Read/Write is done then we may regress the IPv6 fallback patch that introduced mIsDone.  That is simply not correct to me.

(In reply to Patrick McManus [:mcmanus] from comment #16)
> Created attachment 629957 [details] [diff] [review]
> alternative 1 - v0
> 
> Here is an alternative approach - do not use nsHttpPipeline for CLASS_SOLO
> transactions. NullHttpTransaction's are SOLO.

Thanks!  That seems to me to be the correct approach, I like it.
(Reporter)

Updated

5 years ago
Assignee: honzab.moz → mcmanus
(In reply to Patrick McManus [:mcmanus] from comment #16)
> Created attachment 629957 [details] [diff] [review]
> alternative 1 - v0

I can report that I have verified that this patch corrects the issue I reported in Bug 758899.

I have builds available at http://www.wg9s.com/mozilla/firefox that include this patch.

So far, Win32 and Linux 32-bit.  Android (Native and XUL) and Linux 64-bit are still in progress.
(Assignee)

Comment 20

5 years ago
Created attachment 630179 [details] [diff] [review]
alternative 1 - v1

it turns out we do need to disable the SSL handshake via NullHttpTransaction in the presence of a HTTP proxy because NullHttpTransaction does not know how to drive the CONNECT sequence. (all HTTP parsing is done in http transaction, and the CONNECT tunnel is a HTTP transaction of sorts).

on one hand this is unfortunate as SSL is end to end even in the presence of the proxy and a potentially high lantecny operation.. otoh explicit proxy users are a distinct minority so most folks will still get the optimizaton.

rather than teaching nullhttptransaction about how to do tunnels we can just expect that this is part of bug 715905.

@try again
Attachment #629957 - Attachment is obsolete: true
Attachment #630179 - Flags: review?(honzab.moz)
I will be doing my windows build over with this (since it is the only one from which I can test the issue from bug 758899).  I will report back with results.
(Assignee)

Comment 22

5 years ago
(In reply to Honza Bambas (:mayhemer) from comment #18)

> Hmm.. calling Available doesn't mean to do actual IO.  Anyone can call this
> method to just check the state.
> 

I don't really buy it - it's not an idl or some header with a inter-module scope. We can define it how we want to for the most value. And it seems like I/O to me in the same sense that PEEK is clearly I/O even though its non-destructive.

But I/O is probably the wrong thing anyhow.. mDone was really introduced to draw a line between the TCP/SSL handshake and the other transaction logic.. probly should have just introduced a settr at that point rather than implicitly from I/O functions.

likely all moot anyhow as I suspect we're going with the no-nshttptransaction-on-solo patch anyhow :)
(Assignee)

Updated

5 years ago
Depends on: 760912
(Assignee)

Comment 23

5 years ago
(In reply to Bill Gianopoulos [:WG9s] from comment #21)
> I will be doing my windows build over with this (since it is the only one
> from which I can test the issue from bug 758899).  I will report back with
> results.

thanks bill
Blocks: 760912
No longer depends on: 760912
(Assignee)

Comment 24

5 years ago
try is looking fine
(Reporter)

Comment 25

5 years ago
Comment on attachment 630179 [details] [diff] [review]
alternative 1 - v1

Review of attachment 630179 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.  This looks good to me.  Tested for my case.

r=honzab
Attachment #630179 - Flags: review?(honzab.moz) → review+
(In reply to Patrick McManus [:mcmanus] from comment #20)
> Created attachment 630179 [details] [diff] [review]
> alternative 1 - v1
This patch has been tested and fixes bug 758899 also.
(Assignee)

Comment 27

5 years ago
we'll let this bake for a few days on m-c and then nom for aurora if all goes well.

https://hg.mozilla.org/integration/mozilla-inbound/rev/3b1f2455579a
status-firefox16: --- → fixed
Target Milestone: --- → mozilla16
(Assignee)

Updated

5 years ago
Duplicate of this bug: 760912
(Assignee)

Updated

5 years ago
Duplicate of this bug: 758899
https://hg.mozilla.org/mozilla-central/rev/3b1f2455579a
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Verified fixed with the latest nightly.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/16.0 Firefox/16.0a1
http://hg.mozilla.org/mozilla-central/rev/7e4c2abb9fc9
Status: RESOLVED → VERIFIED
(Assignee)

Comment 32

5 years ago
Comment on attachment 630179 [details] [diff] [review]
alternative 1 - v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 729123
User impact if declined: occasional connection hang with ssl site. reload fixes it.
Testing completed (on m-c, etc.): on m-c for 5 days and verified fixed by non-developer
Risk to taking this patch (and alternatives if risky): low - it fundamentally reverts one particular code path to old behavior
String or UUID changes made by this patch: none 

bug impacts only nightly (fixed already) and aurora.
Attachment #630179 - Flags: approval-mozilla-aurora?

Updated

5 years ago
Attachment #630179 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 33

5 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/384eece25973
status-firefox15: affected → fixed
tracking-firefox15: ? → ---
(Assignee)

Updated

5 years ago
Attachment #629799 - Flags: review?(mcmanus)
(Reporter)

Updated

5 years ago
Whiteboard: [http-conn]
You need to log in before you can comment on or make changes to this bug.