Last Comment Bug 752648 - TLS intolerance detection and SSL fallback are broken
: TLS intolerance detection and SSL fallback are broken
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- major (vote)
: mozilla14
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
https://solo1.nordea.fi/nsp/engine
Depends on:
Blocks: 447866
  Show dependency treegraph
 
Reported: 2012-05-07 13:07 PDT by Gian-Carlo Pascutto [:gcp]
Modified: 2012-05-17 06:47 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed


Attachments
pach 0 (2.32 KB, patch)
2012-05-08 08:59 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
patch version 2 r0 (9.19 KB, patch)
2012-05-10 19:10 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Splinter Review

Description Gian-Carlo Pascutto [:gcp] 2012-05-07 13:07:07 PDT
Split from bug 698203, starting from comment 19 and later

https://solo1.nordea.fi/nsp/engine

This will fail with a Connection Reset error on the first attempt. Subsequent atttemps will work.

Bisection: https://bugzilla.mozilla.org/show_bug.cgi?id=698203#c24
Comment 1 Patrick McManus [:mcmanus] 2012-05-07 13:19:36 PDT
one bug where we fallback too often, one where we don't fallback soon enough. And I didn't even change code in psm :) I need a goldilocks consult.

I'll look at this tomorrow.
Comment 2 Honza Bambas (:mayhemer) 2012-05-07 13:27:10 PDT
 	xul.dll!nsSSLIOLayerHelpers::addIntolerantSite(const nsCString & str={...})  Line 1270	C++
 	xul.dll!nsSSLIOLayerHelpers::rememberPossibleTLSProblemSite(nsNSSSocketInfo * socketInfo=0x095f66e8)  Line 681 + 0x9 bytes	C++
 	xul.dll!`anonymous namespace'::checkHandshake(int bytesTransfered=-1, bool wasReading=false, PRFileDesc * ssl_layer_fd=0x0d41c898, nsNSSSocketInfo * socketInfo=0x095f66e8)  Line 921 + 0x9 bytes	C++
 	xul.dll!PSMSend(PRFileDesc * fd=0x0d41c898, const void * buf=0x5a0c542f, int amount=0, int flags=0, unsigned int timeout=4294967295)  Line 1158 + 0x13 bytes	C++
 	xul.dll!nsSSLIOLayerWrite(PRFileDesc * fd=0x0d41c898, const void * buf=0x5a0c542f, int amount=0)  Line 1170 + 0x15 bytes	C++
 	nspr4.dll!PR_Write(PRFileDesc * fd=0x0d41c898, const void * buf=0x5a0c542f, int amount=0)  Line 146 + 0x16 bytes	C
>	xul.dll!nsSocketOutputStream::Write(const char * buf=0x5a0c542f, unsigned int count=0, unsigned int * countWritten=0x0723fa48)  Line 587 + 0x12 bytes	C++
 	xul.dll!nsHttpConnection::EnsureNPNComplete()  Line 310 + 0x33 bytes	C++
 	xul.dll!nsHttpConnection::OnSocketWritable()  Line 1230 + 0x8 bytes	C++
 	xul.dll!nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream * out=0x095f5660)  Line 1530 + 0xb bytes	C++
 	xul.dll!nsSocketOutputStream::OnSocketReady(unsigned int condition=0)  Line 526	C++
 	xul.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd=0x0d41c898, short outFlags=2)  Line 1571	C++
 	xul.dll!nsSocketTransportService::DoPollIteration(bool wait=true)  Line 764 + 0x21 bytes	C++
 	xul.dll!nsSocketTransportService::Run()  Line 650	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x0723fc3f)  Line 656 + 0x19 bytes	C++


Error is SSL_ERROR_NO_CYPHER_OVERLAP.  When I remove breakpoint from that line on top of the stack, I can reproduce.  But with the breakpoint (just stay few secs in the debugger) I cannot, so it's some timing issue.
Comment 3 Honza Bambas (:mayhemer) 2012-05-07 13:48:42 PDT
Here [1] mSentData is true.  The rest is false.  So we don't call Restart().  The "timing problem" is that simply connReused turns to true since the connection hangs for more then 950 ms.

> if (!mReceivedData && (!mSentData || connReused || mPipelinePosition))

Problem is that pipeline is prebuffering the data before it actually writes.

This is another reason to implement bug 715905, by the way.  We can detect TLS intolerance very soon and restart the connection it self and not to be dependent on complex and sensitive transaction restart logic.


[1] http://hg.mozilla.org/mozilla-central/annotate/8899c0604bd1/netwerk/protocol/http/nsHttpTransaction.cpp#l728
Comment 4 Alice0775 White 2012-05-07 18:05:01 PDT
Last good f8656bbf0818
First bad 3dd62d76cc6d
Triggered by3dd62d76cc6d	Patrick McManus — bug 447866 http pipelining is bursty r=honzab
Comment 5 Patrick McManus [:mcmanus] 2012-05-08 08:59:33 PDT
Created attachment 622000 [details] [diff] [review]
pach 0

This moves the failed-ssl detection earlier, pretty much any point before the ssl handshake is complete, and avoids calling the transaction i/o functions before declaring failure. That makes restart work as expected.
Comment 6 Honza Bambas (:mayhemer) 2012-05-09 06:29:29 PDT
Comment on attachment 622000 [details] [diff] [review]
pach 0

Review of attachment 622000 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Comment 7 Patrick McManus [:mcmanus] 2012-05-09 06:59:37 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/47ab726a2a25
Comment 8 Patrick McManus [:mcmanus] 2012-05-09 07:08:14 PDT
Comment on attachment 622000 [details] [diff] [review]
pach 0

[Approval Request Comment]
Regression caused by (bug #): 447866 (ff14)

User impact if declined: Connection reset on first connection attempt to host that does not support TLSv1 (which is uncommon). Future attempts in the same session will work successfully with SSLv3 fallback. The fix makes the SSLv3 fallback work correctly on the first attempt too.

Testing completed (on m-c, etc.): on m-c. URL in bug report verified

Risk to taking this patch (and alternatives if risky):risk is low - its basically an error check at an earlier point in time. The code path that is changed is well exercised by all SSL connections, so if there is a problem it would likely be noticed quickly.

alternative would be to live with the ephemeral error with SSLv3 compatibility for FF14. "Try Again" works.

String changes made by this patch: none
Comment 9 Alex Keybl [:akeybl] 2012-05-09 16:13:44 PDT
Comment on attachment 622000 [details] [diff] [review]
pach 0

[Triage Comment]
Approved for Aurora 14.
Comment 10 Ed Morley [:emorley] 2012-05-10 07:37:50 PDT
https://hg.mozilla.org/mozilla-central/rev/47ab726a2a25
Comment 11 Patrick McManus [:mcmanus] 2012-05-10 09:48:34 PDT
this has somehow created a race for the untrusted cert dialog screen. (with the patch sometimes you get the screen, sometimes you get a connection reset error). We do have tests for that, but it never seems to trigger the error on the CI.

I can see it consistently e.g. with https://smtp.ducksong.com/ (which should show the error).

I'll revert this from m-c.
Comment 12 Patrick McManus [:mcmanus] 2012-05-10 10:13:52 PDT
backout
https://hg.mozilla.org/integration/mozilla-inbound/rev/339096e909d8
Comment 13 Patrick McManus [:mcmanus] 2012-05-10 11:53:02 PDT
here's a new version that just addresses the problem rather than trying to centralize connection establishment like the last version.

https://tbpl.mozilla.org/?tree=Try&rev=09bff948aa04

it is simpler yet - it just uses transport SENDING_TO events as the nsHttpTransaction::SentData indication rather than completion of ReadSegments(). That lets us know when the http data really gets written out rather than just buffered in a higher layer which is what's really important for the idempotence safety check being done with the value.

I made sure that sending a CONNECT does not generate a SENDING_TO for the transaction. (if it did, the we wouldn't get the Restart() for TLS intolerance when tunneling through a proxy).
Comment 14 Joe Drew (not getting mail) 2012-05-10 18:38:50 PDT
Merged backout: https://hg.mozilla.org/mozilla-central/rev/339096e909d8
Comment 15 Patrick McManus [:mcmanus] 2012-05-10 19:09:38 PDT
(In reply to Patrick McManus [:mcmanus] from comment #13)
>
> I made sure that sending a CONNECT does not generate a SENDING_TO for the
> transaction. (if it did, the we wouldn't get the Restart() for TLS
> intolerance when tunneling through a proxy).

unfortunately after further thinking and testing that approach was racy and insufficient.

To summarize:
 * we need to let the pipeline class buffer for efficient packet filling
 * its important to have the transaction i/o (as opposed to finding errors through ensureNPN()), in order to get the appropriate cert error dialogs. I'm sure this can be fixed in a bigger feature, but not in this bug which needs to be ported to ff14.
 * due to the vagaries of the way CONNECT is implemented, its hard to have transaction::mSentData truly reflect what we need for the restart logic

nsHttpTransaction can query nsHttpConnection to find out if the connection object has written any plain text bytes beyond the connect sequence. That will help it sort out the buffering issues at the beginning of the connection. (other situations aren't as important because we liberally restart on pconns anyhow). I've spent most of the day trying to avoid this conclusion, but it seems to be necessary.

https://hg.mozilla.org/try/rev/eda4401cd81e

As I had to add code to each nsAHttpConnection implementation I created a forwarding object macro.. that will actually let use remove (in a later patch) redundant code in those classes that just does forwarding and replace it with a macro in the header.
Comment 16 Patrick McManus [:mcmanus] 2012-05-10 19:10:52 PDT
Created attachment 623007 [details] [diff] [review]
patch version 2 r0
Comment 17 Honza Bambas (:mayhemer) 2012-05-11 03:52:03 PDT
Comment on attachment 623007 [details] [diff] [review]
patch version 2 r0

Patrick, this patch looks reasonable to me.  Do you want to submit it for review?
Comment 18 Patrick McManus [:mcmanus] 2012-05-11 05:23:33 PDT
Comment on attachment 623007 [details] [diff] [review]
patch version 2 r0

this looks good on try
Comment 19 Honza Bambas (:mayhemer) 2012-05-14 09:39:40 PDT
Comment on attachment 623007 [details] [diff] [review]
patch version 2 r0

Review of attachment 623007 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

Good workaround.
Comment 20 Patrick McManus [:mcmanus] 2012-05-14 11:51:24 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/90ef8e055139
Comment 21 Ed Morley [:emorley] 2012-05-15 06:51:20 PDT
https://hg.mozilla.org/mozilla-central/rev/90ef8e055139
Comment 22 Patrick McManus [:mcmanus] 2012-05-17 06:47:30 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/bac25b8ea4e7

Note You need to log in before you can comment on or make changes to this bug.