Closed Bug 96024 Opened 24 years ago Closed 23 years ago

first attempt fails with proxy and TLS intolerant server

Categories

(Core Graveyard :: Security: UI, defect, P2)

1.0 Branch
x86
Linux
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 87902
Future

People

(Reporter: inactive-mailbox, Assigned: KaiE)

References

()

Details

Attachments

(2 files, 2 obsolete files)

I used Mozilla, Linux build from August 17th. Configure a proxy for https. Make sure, your SSL options enable all of SSL2, SSL3, TLS. Actual behaviour: Go to https://www.delphion.com . Page does not load on first attempt. Press stop. Try to access the page again. Now it works. Expected behaviour: Page loads immediately. It seems, TLS step up is currently not working correctly. This website is known to be intolerant with TLS. On the first connection attemt, the browser remembers the behaviour of such sites. In later connections, it avoid using TLS, and uses an older protocl version immediately. That's the reason why it works on the second attempt, it already knows that TLS will not work. If you want to reproduce this bug, make sure you restart the browser every time you try it.
I can see this on both Windows and Linux. And this isn't a new bug. I checked with Milestone 0.9.3, it has the same behaviour as todays build, both Linux and Windows. The trick to reproduce, it only does not work in the first connection attempt in a new browser session. Here is what happens when I try to access https://www.delphion.com with https proxy enabled: ================================== nsSSLIOLayerAddToSocket proxyhost true => setfirstwrite(false) http proxy connection beginning ssl iolayer write called will not block, because firstwrite is still false sending out request in clear ssl io layer read reads clear proxy answer tlsstepup called triggers write with zero bytes -> iolayerwrite firstwrite still false still nonblocking, write returns -1, wouldblock now setting firstwrite to true httptransaction wants to send request over secure channel ssliolayerwrite firstwrite is true blocking call used byteswritten -1 tls is on setting byteswritten to 0, which is supposed to make necko retry setting tlsintolerant, adding site to list clearing firstwrite flag (false) setting back to nonblocking no other event arrives at ssl io layer pressing stop in browser still does not trigger any events in ssl io layer ============================== While the above happens, the browser's status bar shows "connecting" all the time.
Status: NEW → ASSIGNED
This problem appears only on certain combinations of Proxies and SSL sites. I have a system and a proxy, where I'm always able to reproduce this bug. I'm not sure how important this bug is or how many users will see this problem. From the above sequence of action it seems like a race. If you have a very fast connection to the proxy, the sequence of events can be different and cause the hang, because no more actions are triggered. I was able to produce a patch that always works for me, all combinations of SSL options enabled, with or without proxy. Should this patch go into PSM 2.1 / 0.9.4 ?
Attached patch Suggested fix (obsolete) — Splinter Review
kai: i like your solution, but the indentation and brace style is not consistent with the rest of the HTTP files. fix that and r=darin on the http portions.
CC'ing nelsonb. Nelson, what do you think about this patch?
kai: it occurs to me that the prematureEOF logic may be flawed. by the time OnHeaderAvailable has been called, it should not be possible to get a prematureEOF, since some data will have already been read. why is it important to treat this case as a prematureEOF case?
target 2.1 Adding patch and review keywords.
Keywords: patch, review
Priority: -- → P2
Target Milestone: --- → 2.1
Version: unspecified → 2.1
kai
Assignee: ssaux → kai.engert
Status: ASSIGNED → NEW
- First some general comment: Nelson pointed out that we should clarify the bug title. We are not talking about "TLS Step Up", which means a switch in crypto strength. We are talking about Start-TLS, i.e. initiating a SSL connection over a previously unencrypted channel. - During the application SSL handshake, due to cryptographic reasons, both parties can detect that they cannot communicate in the mode they tried at the first attempt. The rules say, in this case the client must remember this property of the other side and in the future try with a different (older) protocol version. When this happens, the connection must be shut down, and afterwards restarted from scratch, including the cleartext phase to the proxy. - In the scenario without a proxy, no change to the http code is required. If during the first I/O from the socket (which initiates the SSL handshake), the incompatibility is detected, the SSL layer returns "zero bytes read". This already causes the prematureEOF. When using a proxy, there is this initial phase where some headers are read in cleartext. OnHeadersAvailable get's called before and after the Start-TLS. If I remember correctly from my debugging session, after the proxy answer has been received, the nsHttpTransaction is no longer in this "clean" state where it would decide to see a premature EOF. Fact is, returning a "zero bytes transfered" doesn't always result in the prematureEOF, maybe because of the timing of the proxy. Maybe this is caused by order of read/write events when using SSL. Let me cite a comment from John G. Myers in bug 88303. "The SSL code is supposed to do two things. One, whenever it is asked to read data, it is supposed to try to flush any buffered write data. Two, when asked to select for PR_POLL_READ and there is buffered write data, it will select the underlying socket for both read and write, so if the underlying socket becomes writable, the calling event loop will wake up and call the SSL's read method to flush the buffered write data." I saw the nsHttpTransaction has the flag prematureEOF for restarting from scratch, and that's exactly what we want to do, because we have seen a "logical" prematureEOF, at the beginning of the application data phase (after the proxy init). Currently only in OnDataReadble a check for this flag is being made. And I saw that OnDataWritable sometimes is the first (and only) routine triggered before it hangs. Therefore I chose the solution to manually set this flag when we see this logical prematureEOF, and in addition check for this flag in OnDataWritable, too. As I stated in the bug, this only ever happened to me at the very first connection attempt to a site during a session. At every later attempt, the applications immediately tries the older protocol version, without the need to restart. I hope my words make sense. If you think this is still unclear, please let me know and I will repeat my debugging to produce more details.
Status: NEW → ASSIGNED
kai: thanks for the explanation. your solution makes sense to me; however, i would like to suggest some modifications to the http portion of your patch.
1) please rename OnPrematureEOF to HandlePrematureEOF; i think that this would be more consistent with the existing naming conventions. 2) pass mPrematureEOF to OnHeadersAvailable and eliminate the prematureEOF local variable. 3) use correct indentation and eliminate braces when possible to match the existing code style. 4) please document the NS_ERROR_NOT_AVAILABLE return value in nsISSLSocketControl.idl 5) [optional] i would prefer NS_BASE_STREAM_CLOSED over NS_ERROR_NOT_AVAILABLE as this, IMO, seems to better indicate a premature EOF and an inability to reuse the current connection w/o any sense that further connection attempts would also fail. moreover, NS_ERROR_NOT_AVAILABLE could be generated for other reasons if ProxyStepUp were called from xpconnect, for example (not that anyone would, but hopefully you get my point). with these fixes, r/sr=darin on the http portion of the patch.
kai: changes look good... i happened to notice a spelling typo in the comments: "transcation"
Keywords: nsenterprise
Marking nsentreprise+
Kai: Please explain what the following snippet of code does. I'm confused why if the current error and last error are the same, we return NS_OK. A comment in the code would be great. Do that and you have r=ddrinan. static nsresult nsHandleSSLError(nsNSSSocketInfo *socketInfo, PRInt32 err) { + PRInt32 lastErr; + socketInfo->GetLastWarnedError(&lastErr); + if (err == lastErr) + return NS_OK; + + socketInfo->SetLastWarnedError(err); +
This function is used to display an error to the user only. I want to make sure that the same error is not reported repeatedly to the user. I saw that behaviour while I worked on this patch. I'm caching the last reported error for a socket, and will not report it again, unless a different error shows up. I will add a comment as suggested.
I'm latering this bug (with respect to checking it on the 0.9.4 branch/release): 1) This is an edge condition. 2) This is the kind of things that could bring other regression. 3) there are no localization changes. Hence we can check it in the trunk when we have more time, and after we have verified that it's ok, we can go back to PDT and ask them to accept it for the 0.9.4 branch post the 0.9.4 release. I'm leaving the nsenterprise keyword, but removing nsenterprise+. Kai, put this on the back burner.
Target Milestone: 2.1 → Future
Changing summary as discussed above.
Summary: Page does not open on first attempt. TLS step up problem? → Page does not open on first attempt. star TLS problem?
More specific summary line.
Summary: Page does not open on first attempt. star TLS problem? → first attempt fails with proxy and TLS intolerant server
Changing my prefered e-mail address.
Assignee: kai.engert → kaie
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
marking nsenterprise-; will be reevaluated for nsenterprise in future release.
Keywords: nsenterprise-
QA Contact: ckritzer → junruh
I think the situation has become worse, and I want to suggest it's now time to look at this bug again. When I had that problem in the past, I only saw this problem with a rather exotic proxy daemon. However, retesting it today, I find that I have problems with three different proxies, on two different platforms. I tried with 3 different proxies (for https): - lab212sun.mcom.com:8080 - a squid running on my own machine - that special java proxy (muffin.doit.org) To reproduce, make sure you haven't access the URL in that browser session before. It's best to use a fresh started browser. Go to https://www.delphion.com You never get a connect. Even worse, disabling the proxy does not help in that browser session. If you restart the browser, disable the proxy, and try to connect again, it works. Seen on both Linux and Windows 2000.
This patch is just for documenting the code that I tried out today (created from the old patch). It does not to fix the problem.
Attachment #47291 - Attachment is obsolete: true
Attachment #48091 - Attachment is obsolete: true
TLS intolerant sites access via proxy is an edge case. It's also an issue that's going to disappear as more and more sites turn away from TLS intolerant web servers. If a solution is at hand, fine, otherwise there are bugs with more impact that we should fix first.
"Bugscape" bug 12934 should be read when this is being worked on. Ideally, the changes suggested there should be implemented.
Depends on: 149868
This bug complains about problems with the same site as mentioned in bug 87902. The patch in bug 149868 seems to make connecting to site 87902 work. I'm resolving this bug as a duplicate of 87902. *** This bug has been marked as a duplicate of 87902 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Verified dupe.
Status: RESOLVED → VERIFIED
Product: PSM → Core
Version: psm2.1 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: