first attempt fails with proxy and TLS intolerant server

VERIFIED DUPLICATE of bug 87902

Status

Core Graveyard
Security: UI
P2
normal
VERIFIED DUPLICATE of bug 87902
17 years ago
a year ago

People

(Reporter: Kai Engert, Assigned: kaie)

Tracking

1.0 Branch
Future
x86
Linux

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

17 years ago
I used Mozilla, Linux build from August 17th.

Configure a proxy for https. Make sure, your SSL options enable all of SSL2,
SSL3, TLS.

Actual behaviour: Go to https://www.delphion.com . Page does not load on first
attempt. Press stop. Try to access the page again. Now it works.

Expected behaviour: Page loads immediately.

It seems, TLS step up is currently not working correctly. This website is known
to be intolerant with TLS. On the first connection attemt, the browser remembers
the behaviour of such sites. In later connections, it avoid using TLS, and uses
an older protocl version immediately. That's the reason why it works on the
second attempt, it already knows that TLS will not work.

If you want to reproduce this bug, make sure you restart the browser every time
you try it.
(Reporter)

Comment 1

17 years ago
I can see this on both Windows and Linux.

And this isn't a new bug. I checked with Milestone 0.9.3, it has the same
behaviour as todays build, both Linux and Windows. 

The trick to reproduce, it only does not work in the first connection attempt in
a new browser session.

Here is what happens when I try to access https://www.delphion.com with https
proxy enabled:

==================================
nsSSLIOLayerAddToSocket
  proxyhost true => setfirstwrite(false)

http proxy connection beginning
  ssl iolayer write called
  will not block, because firstwrite is still false
  sending out request in clear

ssl io layer read reads clear proxy answer

tlsstepup called
  triggers write with zero bytes -> iolayerwrite
  firstwrite still false
  still nonblocking, write returns -1, wouldblock
  now setting firstwrite to true

httptransaction wants to send request over secure channel
  ssliolayerwrite
  firstwrite is true
  blocking call used
  byteswritten -1
  tls is on
  setting byteswritten to 0, which is supposed to make necko retry
  setting tlsintolerant, adding site to list
  clearing firstwrite flag (false)
  setting back to nonblocking
  
no other event arrives at ssl io layer

pressing stop in browser still does not trigger any events in ssl io layer
==============================

While the above happens, the browser's status bar shows "connecting" all the time.
Status: NEW → ASSIGNED
(Reporter)

Comment 2

17 years ago
This problem appears only on certain combinations of Proxies and SSL sites. I
have a system and a proxy, where I'm always able to reproduce this bug.

I'm not sure how important this bug is or how many users will see this problem.

From the above sequence of action it seems like a race. If you have a very fast
connection to the proxy, the sequence of events can be different and cause the
hang, because no more actions are triggered.

I was able to produce a patch that always works for me, all combinations of SSL
options enabled, with or without proxy.

Should this patch go into PSM 2.1 / 0.9.4 ?
(Reporter)

Comment 3

17 years ago
Created attachment 47291 [details] [diff] [review]
Suggested fix

Comment 4

17 years ago
kai: i like your solution, but the indentation and brace style is not consistent
with the rest of the HTTP files.  fix that and r=darin on the http portions.
(Reporter)

Comment 5

17 years ago
CC'ing nelsonb. Nelson, what do you think about this patch?

Comment 6

17 years ago
kai: it occurs to me that the prematureEOF logic may be flawed.  by the time
OnHeaderAvailable has been called, it should not be possible to get a
prematureEOF, since some data will have already been read.  why is it important
to treat this case as a prematureEOF case?

Comment 7

17 years ago
target 2.1
Adding patch and review keywords.
Keywords: patch, review
Priority: -- → P2
Target Milestone: --- → 2.1
Version: unspecified → 2.1

Comment 8

17 years ago
kai
Assignee: ssaux → kai.engert
Status: ASSIGNED → NEW
(Reporter)

Comment 9

17 years ago
- First some general comment: Nelson pointed out that we should clarify the bug
title. We are not talking about "TLS Step Up", which means a switch in crypto
strength.

We are talking about Start-TLS, i.e. initiating a SSL connection over a
previously unencrypted channel.


- During the application SSL handshake, due to cryptographic reasons, both
parties can detect that they cannot communicate in the mode they tried at the
first attempt. The rules say, in this case the client must remember this
property of the other side and in the future try with a different (older)
protocol version.

When this happens, the connection must be shut down, and afterwards restarted
from scratch, including the cleartext phase to the proxy.


- In the scenario without a proxy, no change to the http code is required. If
during the first I/O from the socket (which initiates the SSL handshake), the
incompatibility is detected, the SSL layer returns "zero bytes read". This
already causes the prematureEOF.

When using a proxy, there is this initial phase where some headers are read in
cleartext.

OnHeadersAvailable get's called before and after the Start-TLS.

If I remember correctly from my debugging session, after the proxy answer has
been received, the nsHttpTransaction  is no longer in this "clean" state where
it would decide to see a premature EOF.

Fact is, returning a "zero bytes transfered" doesn't always result in the
prematureEOF, maybe because of the timing of the proxy.  Maybe this is caused by
order of read/write events when using SSL. Let me cite a comment from John G.
Myers in bug 88303.

"The SSL code is supposed to do two things.  One, whenever 
it is asked to read data, it is supposed to try to flush any buffered write 
data.  Two, when asked to select for PR_POLL_READ and there is buffered write 
data, it will select the underlying socket for both read and write, so if the 
underlying socket becomes writable, the calling event loop will wake up and call 
the SSL's read method to flush the buffered write data."

I saw the nsHttpTransaction has the flag prematureEOF for restarting from
scratch, and that's exactly what we want to do, because we have seen a "logical"
 prematureEOF, at the beginning of the application data phase (after the proxy
init). Currently only in OnDataReadble a check for this flag is being made. And
I saw that OnDataWritable sometimes is the first (and only) routine triggered
before it hangs. Therefore I chose the solution to manually set this flag when
we see this logical prematureEOF, and in addition check for this flag in
OnDataWritable, too.

As I stated in the bug, this only ever happened to me at the very first
connection attempt to a site during a session. At every later attempt, the
applications immediately tries the older protocol version, without the need to
restart.
I hope my words make sense. If you think this is still unclear, please let me
know and I will repeat my debugging to produce more details.
Status: NEW → ASSIGNED

Comment 10

17 years ago
kai: thanks for the explanation.  your solution makes sense to me; however, i
would like to suggest some modifications to the http portion of your patch.

Comment 11

17 years ago
1) please rename OnPrematureEOF to HandlePrematureEOF; i think that this would
be more consistent with the existing naming conventions.

2) pass mPrematureEOF to OnHeadersAvailable and eliminate the prematureEOF local
variable.

3) use correct indentation and eliminate braces when possible to match the
existing code style.

4) please document the NS_ERROR_NOT_AVAILABLE return value in
nsISSLSocketControl.idl

5) [optional] i would prefer NS_BASE_STREAM_CLOSED over NS_ERROR_NOT_AVAILABLE
as this, IMO, seems to better indicate a premature EOF and an inability to reuse
the current connection w/o any sense that further connection attempts would also
fail.  moreover, NS_ERROR_NOT_AVAILABLE could be generated for other reasons if
ProxyStepUp were called from xpconnect, for example (not that anyone would, but
hopefully you get my point).

with these fixes, r/sr=darin on the http portion of the patch.
(Reporter)

Comment 12

17 years ago
Created attachment 48091 [details] [diff] [review]
Updated Patch, including all of Darin's suggestions
(Reporter)

Comment 13

17 years ago
Created attachment 48092 [details] [diff] [review]
Changes to IDL file

Comment 14

17 years ago
kai: changes look good...
i happened to notice a spelling typo in the comments: "transcation"

Updated

17 years ago
Keywords: nsenterprise

Comment 15

17 years ago
Marking nsentreprise+
Keywords: nsenterprise → nsenterprise+

Comment 16

17 years ago
Kai: Please explain what the following snippet of code does. I'm confused why if
the current error and last error are the same, we return NS_OK. A comment in the
code would be great. Do that and you have r=ddrinan.

 static nsresult
 nsHandleSSLError(nsNSSSocketInfo *socketInfo, PRInt32 err)
 {
+  PRInt32 lastErr;
+  socketInfo->GetLastWarnedError(&lastErr);
+  if (err == lastErr)
+    return NS_OK;
+  
+  socketInfo->SetLastWarnedError(err);
+

(Reporter)

Comment 17

17 years ago
This function is used to display an error to the user only. I want to make sure
that the same error is not reported repeatedly to the user. I saw that behaviour
while I worked on this patch. I'm caching the last reported error for a socket,
and will not report it again, unless a different error shows up.

I will add a comment as suggested.

Comment 18

17 years ago
I'm latering this bug (with respect to checking it on the 0.9.4 branch/release):
1) This is an edge condition.
2) This is the kind of things that could bring other regression.
3) there are no localization changes. Hence we can check it in the trunk when we
have more time, and after we have verified that it's ok, we can go back to PDT
and ask them to accept it for the 0.9.4 branch post the 0.9.4 release.
I'm leaving the nsenterprise keyword, but removing nsenterprise+.

Kai, put this on the back burner.
Keywords: nsenterprise+ → nsenterprise
Target Milestone: 2.1 → Future
Changing summary as discussed above.
Summary: Page does not open on first attempt. TLS step up problem? → Page does not open on first attempt. star TLS problem?
More specific summary line.
Summary: Page does not open on first attempt. star TLS problem? → first attempt fails with proxy and TLS intolerant server
(Reporter)

Comment 21

17 years ago
Changing my prefered e-mail address.
Assignee: kai.engert → kaie
Status: ASSIGNED → NEW
(Assignee)

Updated

17 years ago
Status: NEW → ASSIGNED

Comment 22

17 years ago
marking nsenterprise-; will be reevaluated for nsenterprise in future release.

Keywords: nsenterprise-

Updated

17 years ago
QA Contact: ckritzer → junruh
(Assignee)

Comment 23

16 years ago
I think the situation has become worse, and I want to suggest it's now time to
look at this bug again.

When I had that problem in the past, I only saw this problem with a rather
exotic proxy daemon.

However, retesting it today, I find that I have problems with three different
proxies, on two different platforms.

I tried with 3 different proxies (for https):
- lab212sun.mcom.com:8080
- a squid running on my own machine
- that special java proxy (muffin.doit.org)

To reproduce, make sure you haven't access the URL in that browser session
before. It's best to use a fresh started browser.

Go to https://www.delphion.com
You never get a connect.
Even worse, disabling the proxy does not help in that browser session.

If you restart the browser, disable the proxy, and try to connect again, it works.

Seen on both Linux and Windows 2000.
(Assignee)

Comment 24

16 years ago
Created attachment 65202 [details] [diff] [review]
Updated patch, not working

This patch is just for documenting the code that I tried out today (created
from the old patch).
It does not to fix the problem.
Attachment #47291 - Attachment is obsolete: true
Attachment #48091 - Attachment is obsolete: true

Comment 25

16 years ago
TLS intolerant sites access via proxy is an edge case. It's also an issue that's
going to disappear as more and more sites turn away from TLS intolerant web
servers.  If a solution is at hand, fine, otherwise there are bugs with more
impact  that we should fix first.
(Assignee)

Comment 26

16 years ago
"Bugscape" bug 12934 should be read when this is being worked on. Ideally, the
changes suggested there should be implemented.

(Assignee)

Updated

16 years ago
Depends on: 149868
(Assignee)

Comment 27

16 years ago
This bug complains about problems with the same site as mentioned in bug 87902.
The patch in bug 149868 seems to make connecting to site 87902 work.

I'm resolving this bug as a duplicate of 87902.


*** This bug has been marked as a duplicate of 87902 ***
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → DUPLICATE

Comment 28

16 years ago
Verified dupe.
Status: RESOLVED → VERIFIED

Updated

13 years ago
Component: Security: UI → Security: UI
Product: PSM → Core

Updated

10 years ago
Version: psm2.1 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.