Closed Bug 126944 Opened 23 years ago Closed 22 years ago

No error or pop-up when connecting to bad HTTPS server

Categories

(Core Graveyard :: Security: UI, defect, P1)

1.0 Branch
defect

Tracking

(Not tracked)

VERIFIED FIXED
psm2.3

People

(Reporter: julien.pierre, Assigned: KaiE)

References

Details

(Whiteboard: [adt1 RTM])

Attachments

(1 file)

In some situations, an SSL HTTP server can be in a "bad" state and the SSL
handshake will fail. See bug 125149 for more info about the specific situation
we were running into with the server.

When the SSL server was in that bad state, I observed that Netscape 6.2.1 and
Mozilla were just displaying a blank page when connecting to that bad server,
without displaying any kind of error or pop-up. Communicator was displaying a
pop-up error message about being unable to connect.

A command-line HTTPS test client revealed that on the client side, an NSPR "end
of file" error was being returned. Apparently Mozilla/Netscape6 ignore that
error and just don't display anything. This is wrong.

<speculation>
I suspect this is a bug in the HTTP engine implementation in Mozilla. The HTTP
spec says that the servers are allowed to close the connection between requests
at any time, so an end-of-file may not be viewed by the client as an error. I
think it is fine to treat it that way for keep-alive/pipeline connections, and
reopen a new connection. Still, the browser would have to reopen a new
connection with the server, and eventually display an error if it can't get a
response and keeps getting end-of-file.

I would also argue that for the first request/response after a connection,
getting end-of-file immediately should be treated as an error.
</speculation>
Priority: -- → P2
Target Milestone: --- → mozilla0.9.9
FYI, I didn't think of trying any other SSL URI protocol handler and connecting
to that bad server. It would take some effort (small patch to break something in
NSS and run that with selfserv) to recreate the bad server environment to test
this, but it can be done.
I suspect that this bug is a duplicate of bug 106865.
If you concur, please mark this bug as a duplicate.
Depends on: 106865
Nelson, yes, the symptoms look very similar. I'm not truly sure if this is the
same problem because the error that ends up being passed down to the top-level
here is not a security error but a plain NSPR error. It's possible that they are
all being treated the same way in the HTTP parser in Mozilla, though, at the
stage where no bytes have been received from the server. I'd like someone from
the client to examine my comments above about the HTTP code. I don't have to
look at the code myself. For now I want to keep this bug open.
The word "time" was missing from my last sentence, as in, "I don't have time to 
look at the [browser HTTP networking] code myself".
Err ... Is anybody looking at this bug in mozilla ?
This bug appears to be assigned to an inappropriate assignee.  
It is assigned to a mailing list.  Mailing lists do not fix bugs.

With help from Rangansen, I looked at function nsHandleSSLError() in 
http://lxr.mozilla.org/security/source/security/manager/ssl/src/nsNSSIOLayer.cpp
#431

This is the one function that displays error messages for all errors related to 
the use of SSL.  This function is able to display messages for errors generated 
by libSSL (e.g. errors named SSL_*) and by libNSS (named SEC_*), and any errors 
not recognized are displayed with a generic error message.

The problem is not in that function, but rather in the 3 calls to that function, 
one in nsSSLIOLayerRead and two in nsSSLIOLayerWrite, see lines 966, 1048 and 
1053.  Each of those 3 calls is preceeded by a test which is 
     if (IS_SSL_ERROR(err))

That test is wrong.  It is only true for the SSL_* errors.  It is not true
for any of the SEC_* errors, nor for any PR_* errors.  So, any SEC_* errors 
and PR_* errors will lead to the infamous blank page, UNLESS they are errors
that necko itself knows how to handle.

I submit that the proper test here is not (IS_SSL_ERROR(), but rather is
to test for the error codes that necko knows how to handle.  Only errors
that necko handles (other than with a blank page) should bypass the call
to nsHandleSSLError().  All others should go through nsHandleSSLError().

While I was reviewing nsSSLIOLayerWrite(), I noticed another issue, at 
http://lxr.mozilla.org/security/source/security/manager/ssl/src/nsNSSIOLayer.cpp
#1027
If the first SSL connection fails for ANY reason, and TLS was enabled, 
the code simply turns off TLS and gets necko to try again without TLS.
This is intended to recover from "TLS intolerant" servers.  But it retries
even on errors that are not due to a TLS intolerant server.  So, even when
the error is one that we KNOW will fail with both TLS and with SSL3, we
retry it with SSL3 anyway.  This is just wasteful, and doubles the number of 
error hits on most https servers.  For example, an error such as an invalid 
server cert, which is _never_ an indication of TLS intolerance, and which 
will always recur if we retry without TLS, gets tried twice anway.  

This really needs to get fixed.  The test at line 1030 that reads
    if (tlsOn) {
needs to be 
    if (tlsOn && errIndicatesTLSIntolerance(err)) {

and a new function errIndicatesTLSIntolerance(err) needs to be written to 
distinguish among the various cases.  It is probably safest for this 
function to simply detect the errors that are known to NOT be indicators
of TLS intolerance, and return false for them and true for all others.
Changing product to PSM.  The bug is in 
mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp
Component: Networking → Client Library
Product: Browser → PSM
Target Milestone: mozilla0.9.9 → ---
Version: other → 2.3
Reassign bug to owner and QA contact of selected component
Assignee: new-network-bugs → ssaux
QA Contact: benc → junruh
re: mailing list.

There is supposed to be an engineer on this mailing list. I'll work on that some
more. 

I think the real problem here was that the bug was filed in the wrong component,
and lacked a reproducible test case. We get a lot of bugs in this component that
eventually leave, so we do our best with the traffic.
Just checking Build 2002041903 on Win98... The problem continues.

Try to login to http://www.sourceforge.net/ using SSL. I used to use Mozilla,
but since Summer i've had to downgrade to MS Internet Explorer just to check my
projects ans post patches and bugs in Sourceforge :-(.
This problem seems related to "proxy" usage and SSL2+SSL3+TLS support. If you
have a proxy, TLS support and the remote server doesn´t support TLS but SSL3,
you see the problem. I haven't bug IDs at hand, but the problem seems reported
also in this context. Search the bugzilla database.

Since I can only navigate thru a proxy (squid), this is a showstop for me and a
big NO-NO for a inminent 1.0 release.
to Kai.
This is something that everybody is going to see at some point.
Since Nelson was kind enough to diagnose the problem in comment #6 I think we
should spend the time and add the necessary code.
Assignee: ssaux → kaie
Keywords: nsbeta1+
Priority: P2 → P1
Whiteboard: [adt1 RTM]
Target Milestone: --- → 2.3
Jesus,

I don't see the problem connecting to sourceforge. I'm not using a proxy though. 
The initial test case that I had for this bug did not involve a proxy, so I 
believe your problem is a different one.
Blocks: 143047
I have come to the conclusion that support for SSL over proxies is rather broken
in Mozilla. Nelson gave me a lot of information in bugscape bug 12934. I need to
copy that information over into a new bugzilla bug and produce a fix.

Depends on: 149868
Ok, I think I was somewhat distracted by comments from Jesus.
Jesus, you are complaining about problems when connecting to sites using a proxy.
I disagree. This bug is about something else.
Please have a look at bug 149868 instead, and the bugs it is blocking.
No longer depends on: 149868
Stephane, while Nelson did indeed suggest how to approach a fix for this bug, I
don't see an easy solution in hand, and wonder whether this bug really deserves
the current priority?


Nelson made two suggestions in his comment, and I think we should separate them.

1) This bug deals with missing error messages.

I don't know what the exact list of error codes would be that we should handle,
and this list seems difficult to assemble.
In addition I wonder whether we really should try to implement a negative list
of error codes. Doing so steals away error messages that necko might want to
handle in the future.
IMO, because our code only deals with security logic, I think we should limit
our code to handling explicitly those error message that are a result of
security / NSS problems.

I assume that necko never handles any NSS errors. Searching necko for SEC_*
constants does not produce any hits.

So I agree that PSM should handle all errors that are NSS specific. Is there a
test one can use to detect NSS error codes?


2.) Another suggestion was made to optimize communication with TLS intolerant
servers. I think this should be handled in a separate bug. I have filed bug 149910.
Kai, Nelson.  From Nelson's comment #6 and Kai's comment #16.  I'll try to
summarize.

Problem about error handling.

Nelson says: we should handle all errors that necko doesn't.
Kai says: we shouldn't steal errors from necko.
Stephane says: clearly, we should handle the following:
  - SSL errors (we do with IS_SSL_ERROR())
  - SEC errors (we don't.  We SHOULD, and nss/lib/util/secerr.h defines
IS_SEC_ERROR(), AND nsHandleSSLError (a bit of a misnomer) handles the most
common SEC error with already existing pipnss.properties strings, and has a
catch all)
  - Are there an PR_ errors that are specific to the SSL handshake at hand? Can
we determine that? Then clearly we should handle these PR_ errors as (from the
customer point of view) an SSL error.
  The test would then becomes:
if ( IS_SSL_ERROR(err) || IS_SEC_ERROR(err) || IS_PR_SSL_RELATED_ERROR(err) )
  Maybe designing and adding the third test is hard, but adding the second is not.

Kai: Do you think you can you add the second test? (I think it's safe to do so).
Nelson: can you comment on whether we can code a IS_PR_SSL_RELATED_ERROR().  I
understand that we may have to look at more than the error. If this is too hard,
we can later that part of the fix.

Regarding TLS, That's fine to file a separate bug.
Stephane asked:
> Are there an PR_ errors that are specific to the SSL handshake at hand? Can
> we determine that? Then clearly we should handle these PR_ errors as (from the
> customer point of view) an SSL error.

The SSL functions would be in a better position than PSM is to determine
that because they have more context about the errors.  But whether the
error is an SSL_, SEC_, or PR_ error, it seems that PSM should handle the
error if Necko doesn't know how to handle it.

Does Necko handle PR_ errors?

I agree that we (PSM) should definitely handle the SEC errors.
Adding Darin to the CC list.  Darin, can you help with the necko questions in
this bug?

I also wonder at this being an ADT1 bug since it requires a server to be in a
bad state and the behavior we exhibit is not extreme badness.  I would put this
at ADT3 unless there's evidence that this happens frequently.  (I'm assuming the
issues not related to https connection failures actually move out to other bugs.)
if necko reads EOF after writing a HTTP request to a socket, it will retry the
request on a new socket connection.  it'll do this up to 10 times per request if
need be.  if on the 10-th try, necko still gets an immediate EOF, it'll fail the
request with a NS_ERROR_NET_RESET error which will cause docshell to post the
infamous "The document contains no data" error message to the user.

> Does Necko handle PR_ errors?

necko looks at some PR_ errors... which errors are you interested in?
Darin,

My question was really "does necko handle all PR_ errors?",
so you already answered my question.

Where can I find a list of the PR_ errors that necko looks
at?
Darin,

There is no "document contains no data" error displayed in this case, just a 
blank page. If there are SSL errors on the first try (not NSPR EOS), we should 
not try again. It seems there is an NSS bug somewhere that causes EOS to be sent 
even for other error cases, so that causes the browser to mishandle it.
Still, it should display an error of some kind instead of a blank page.
julien: the "document contains no data" error message was added after netscape
6.x... can you reproduce the blank page using a recent trunk build or mozilla 1.0?

wtc: look in nsSocketTransport.cpp ... specifically, look for the places where
PR_Read is called.  we translate PR_ errors into nsresult's ... most just get
converted to NS_ERROR_FAILURE or something like that though.
answering Stephane's comment 17:

Adding IS_SEC_ERROR seems very reasonable to me. Considered the huge list of
SEC errors in nsHandleSSLError, that will now get handled by PSM (with the
patch applied), I'd guess this should fix many if not most cases.

I think it is worth trying to land this patch first.

If you decide that NSS should handle additional PR_* errors, I suggest we file
a separate NSS bug on that - unless you come to the conclusion that PSM itself
needs to handle PR_* errors.
Stephane, can you please review?
Darin, can you please sr?
Comment on attachment 87196 [details] [diff] [review]
Patch to handle SEC_* error codes

sr=darin (seems like the right thing to do)
Attachment #87196 - Flags: superreview+
Comment on attachment 87196 [details] [diff] [review]
Patch to handle SEC_* error codes

r=ssaux
Attachment #87196 - Flags: review+
Patch checked in to trunk.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch. pls check
this in asap, then add the "fixed1.0.1" keyword.
Keywords: adt1.0.1adt1.0.1+
Attachment #87196 - Flags: approval+
please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+"
keyword and add the "fixed1.0.1" keyword.
Checked in to 1_0 branch.
Verified per kaie's comment.
Status: RESOLVED → VERIFIED
Product: PSM → Core
Version: psm2.3 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: