Bugzilla

Assignee

Comment 3

•

22 years ago

Let me try to reproduce.
I will test whether this is a regression caused by my fix to bug 87902.
A previous attempt to fix 87902 resulted in a similar regression.

Assignee

Comment 4

•

22 years ago

Confirming all described behaviour on Linux.

I tested to back out 87902. This did not help, still showing same symptoms. So
at least, this is not caused by 87902.

I'll try to have a look what's happening.

I wonder if the suspicion mentioned in bug 153769 is correct and is related to
this problem.

Assignee

Comment 5

•

22 years ago

Error code is 804b0002 for all URL load attempts after the initial failure.

=> NS_BINDING_ABORTED

Assignee

Updated

•

22 years ago

Keywords: nsbeta1

Andrew Schultz

Comment 6

•

22 years ago

*** Bug 167421 has been marked as a duplicate of this bug. ***

Assignee

Comment 7

•

22 years ago

Attached file SSL Endlessloop stack — Details

Assignee

Comment 8

•

22 years ago

Nelson, this looks like an endlessloop in SSL, always reproducable.
Could you please have a look? Please see the attached stack. Thanks.

Target Milestone: --- → mozilla1.2beta

Comment 9

•

22 years ago

It appears that the socket on which SSL is operating is a blocking socket.
ssl was invoked via the function ssl_Write, which always sets an infinite
timeout, so ssl is behaving as intended here.

I think the problem is that this socket is blocking.

John Verel

Comment 10

•

22 years ago

Re: Nelson's comment that SSL may be behaving as intended, I note that Netscape
does not behave this way.  Stopping Netscape on an hung SSL connection does not
leave it hung up in an infinite loop.

Comment 11

•

22 years ago

interesting, so mailnews does blocking socket i/o on a background thread ... do
we need to avoid that?  can NSS be configured to have a non-infinite timeout for
blocking writes?

David :Bienvenu

Comment 12

•

22 years ago

the imap code writes data to a channel opened as follows:

            rv = m_channel->OpenOutputStream(0, PRUint32(-1), 0,
getter_AddRefs(m_outputStream));

does that mean that we do blocking writes when we call Write() on the channel?

Assignee

Comment 13

•

22 years ago

The issue is that PSM is currently doing a blocking initial write for any SSL
connection, and that is indeed the culprit.

I discussed with Nelson, and we concluded this is a bad thing (TM).

The solution is to change PSM to never to a blocking call.

Comment 14

•

22 years ago

oh right... my mistake.  even though necko hands out a blocking nsIOutputStream
from OpenOutputStream, the actual socket is still configured for non-blocking
mode, and we call PR_Poll to make nsIOutputStream::Write blocking.

-> PSM

Assignee: darin → ssaux

Component: Networking: HTTP → Client Library

Product: Browser → PSM

QA Contact: tever → junruh

Target Milestone: mozilla1.2beta → ---

Version: other → unspecified

Comment 15

•

22 years ago

There is no infinite loop here.  
An operation with an infinite timeout has been done on a blocking socket.
The progress of that thread is blocked until the operation completes.

Marko Macek

Comment 16

•

22 years ago

*** Bug 170087 has been marked as a duplicate of this bug. ***

Matthias Versen [:Matti]

Comment 17

•

22 years ago

*** Bug 170428 has been marked as a duplicate of this bug. ***

Stephane Saux

Comment 18

•

22 years ago

So kai how hard is this to do?

Assignee: ssaux → kaie

Priority: -- → P2

Target Milestone: --- → 2.4

Assignee

Comment 19

•

22 years ago

Stephane: I'm not sure how much work it is. It might be easy to do, by adding a
state variable (state = still waiting for connect), and make PSM I/O functions
pay attention to that flag in its read/write functions, and continue with the
connect logic as soon as Necko calls us again, when the socket becomes
readable/writeable.

Assignee

Comment 20

•

22 years ago

I suspected this bug could be the cause for bug 167764.
But they seem to be unrelated.

But we now have a patch for this one, and it seems to work on Linux and Mac OS X
(other platforms not yet tested).

Assignee

Comment 21

•

22 years ago

Attached patch Patch v1 (obsolete) — Details — Splinter Review

Assignee

Comment 22

•

22 years ago

Attached patch Patch v1 - ignoring whitespace, larger context - please review this one (obsolete) — Details — Splinter Review

Assignee

Comment 23

•

22 years ago

Javi can you please review? The changes are:

- no longer use blocking mode on initial write

- at the time we attempt the first write, if we see PR_WOULD_BLOCK_ERROR and no
bytes were written, do nothing (the socket transport will eventually let us try
again)

- execute the old fashion logic, if first write can not write bytes, and we see
a  result code different from PR_WOULD_BLOCK_ERROR

Assignee

Updated

•

22 years ago

Attachment #102127 - Attachment is obsolete: true

Assignee

Comment 24

•

22 years ago

Comment on attachment 102128 [details] [diff] [review]
Patch v1 - ignoring whitespace, larger context - please review this one


I take my review request back.

More testing showed the issue is more complicated. This patch breaks the TLS
intolerant detection (sigh).

Attachment #102128 - Attachment is obsolete: true

Assignee

Comment 25

•

22 years ago

We have the following new problem:

The SSL error handling within PSM is decoupled from the error handling within
Mozilla's general network component (socket transport).

By always using non blocking I/O in PSM, we loose the ability to detect sudden
disconnects on initial write within PSM. The error will be seen by the socket
transport when doing the next select. In my particular test case, PR_POLL_ERR
was seen, and socket transport closed the socket.


I see a workaround for now, but for an ideal solution I think I need help from
both necko and NSS/SSL.


PSM can remember, whether it already tried to trigger the initial write, but NSS
returned PR_WOULD_BLOCK_ERROR, and therefore the reaction from the remote site
is not yet known.
When the socket transport closes the socket and calls PSM's I/O layer close
function, we can detect the remembered state, and make the decision for TLS
intolerance at that time.

But the problem is: At this time, we do not have the failure reason code
available (only inside necko, but not passed to us). We do not know whether a
network problem occurred, and whether TLS can still be used, or whether the
remote site did disconnect and we therefore should retry without TLS.


We have the choice to always decide the site is TLS intolerant, but this might
result in wrong TLS intolerance decisions, which means we don't use TLS in some
cases, although we should.


For now, I don't have a better idea how I could solve this problem on my own,
without changing socket transport or NSS. I suggest I'll attach a patch to
always retry the connection on failure, after having initiated the first write,
and before the first write completed.


However, I think we should try to minimize the TLS intolerance decisions.
I see two approaches for future work:
- make socket transport pass socket close reason information down to lower NSPR
fd layers
and/or
- transfer the decision, whether a site should be treated as TLS intolerant, to
a new NSS function (possibly passing available socket error codes). Maybe NSS
could make a resaonable decision, based on state of the initial handshake?

Assignee

Updated

•

22 years ago

Blocks: 167764

Assignee

Comment 26

•

22 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

Assignee

Comment 27

•

22 years ago

Attached patch Patch v2 - diff -u10 -w for easier reviewing (obsolete) — Details — Splinter Review

Assignee

Comment 28

•

22 years ago

Javi, can you please review?

Status: NEW → ASSIGNED

Keywords: nsbeta1 → nsbeta1+

Assignee

Updated

•

22 years ago

Attachment #102139 - Attachment is obsolete: true

Assignee

Comment 29

•

22 years ago

Comment on attachment 102140 [details] [diff] [review]
Patch v2  - diff -u10 -w for easier reviewing


I found more arguments why this patch needs more work.

The patch from bug 162752 (not yet checked in) is not compatible with this new
logic. In general, SSL errors requiring an automatic retry because of TLS
intolerance can now happen at multiple locations.

Here is how I understand it:

As long as the underlying SSL handshake is not yet done, NSS will return
WOULD_BLOCK, even though it were able to transfer bytes as part of the
handshake. It can take multiple calls to read/write before the handshake
completes and finally application data can be transfered.

Because necko will see that neither reading nor writing works because of
WOULD_BLOCK, it calls both functions alternating.

In either of both functions could the handshake complete and NSS return the SSL
error to us.

Attachment #102140 - Attachment is obsolete: true

Assignee

Comment 30

•

22 years ago

Darin, can you confirm the following is expected, and that we can rely on that
behaviour

When necko/http calls a lower layer write function, it will retry, if the write
function returns -1 with PR_CONNECT_RESET_ERROR.

When necko/http calls a lower layer read function, this same error will not
cause a retry. But I saw that returning 0 causes it to retry.


Because of the additional SSL logic going on behind the scenes, while necko
waits for the real application data to get transfered, and because we eventually
decide that an automatic retry makes sense, PSM wants to trigger that retry,
triggering using return values and error codes of read/write.

Assignee

Comment 31

•

22 years ago

Attached patch Patch v3 (obsolete) — Details — Splinter Review

This latest patch seem to work quite well for this bug, it also fixes bug
162752, and bug 167764 seems fixed, too (although wtc suspects the problem in
bug 167764 is only being masked with this fix).

Reviews and comments for my previous questions welcome.

Assignee

Updated

•

22 years ago

Blocks: 162752

Comment 32

•

22 years ago

kai:

actually, it is the HTTP layer that detects a premature EOF.  it has configured
the socket transport to poll on read and write.  when poll wakes up (either for
read or write), the socket transport checks for PR_POLL_EXCEPT and PR_POLL_ERR.
 if either of these is true, then it cancels the socket transport read and write
requests with a status of NS_ERROR_NET_RESET.  that error code will be trapped
by the HTTP layer, and if the HTTP layer has not received any bytes for that
transaction, it will retry the transaction.

the HTTP layer will also retry the transaction if the first socket read returns
NS_OK and zero bytes read (indicating EOF).

finally, if the socket transport's calls to PR_Read or PR_Write fail with either
PR_CONNECT_ABORTED_ERROR or PR_CONNECT_RESET_ERROR, then socket transport read
and write requests will be canceled with a status of NS_ERROR_NET_RESET.  and
just like in the PR_POLL_EXCEPT/PR_POLL_ERR case, the HTTP layer will see
NS_ERROR_NET_RESET.  the HTTP layer will retry the transaction if it receives
this error before receiving any data on the socket.

Javier Delgadillo

Comment 33

•

22 years ago

Comment on attachment 102169 [details] [diff] [review]
Patch v3

r=javi

nit-pick:
This if statement has the potential to grow quite large become nearly
impossible to read as we addd more error codes that should be retried.	Perhaps
a method/function that takes the necessary parameters to return true/false to
determine if we wanna retry.
+      if (
+	   (!withInitialCleartext && PR_CONNECT_RESET_ERROR == err)
+	  ||
+	   (withInitialCleartext && PR_END_OF_FILE_ERROR == err)
+	  ||
+	   (SSL_ERROR_BAD_MAC_ALERT == err)
+	  ||
+	   (SSL_ERROR_BAD_MAC_READ == err)
+      ) {

Attachment #102169 - Flags: review+

Assignee

Comment 34

•

22 years ago

Darin, thanks for the explanation. I see another oddity:

In my test case, the failure happens reproducable at the time necko calls our
read function.

Regardless whether I return PR_CONNECT_RESET_ERROR or PR_CONNECT_ABORT_ERROR or
the simple EOF, necko will indeed automatically retry the connection.

However, there is difference.

If the read function returns EOF (0 bytes read), when necko re-opens the
connection, it will again call our read and write functions.

However, if I return PR_CONNECT_RESET/ABORT_ERROR, necko will repeated call our
open/close functions, but will never call our read/write funtions!

Is that a bug?
At least it makes it necessary to return EOF in read for this patch to work.

Assignee

Comment 35

•

22 years ago

Attached patch Patch v4 (obsolete) — Details — Splinter Review

This new patch moves the tls intolerance decision to a separate function, as
Javi suggested.

Comment 36

•

22 years ago

kai: yeah, that sounds like a necko bug then :(

Comment 37

•

22 years ago

In comment 29, Kai observes that 

> Because necko will see that neither reading nor writing works because of
> WOULD_BLOCK, it calls both functions alternating.

This certainly seems like a bug.  Perhaps it was a hack to work around some
other flaw.  an http client should send the entire http request, including
any post data, and then read the response.  Any chance this alternating
read/write behavior can be fixed?

Comment 38

•

22 years ago

can someone provide a socket transport log demonstrating the alternating calls
to PR_Read and PR_Write.  i'm surprised that we would do that.  by design we do
initate both a write and a read at the same time.  this is done to pick up an
EOF on the read request.  i'll be the first to admit that our architecture is
far from clean, but it should be valid to poll for both read and write at the
same time.

Assignee

Comment 39

•

22 years ago

Attached file Logfile for Darin — Details

> can someone provide a socket transport log demonstrating the alternating
calls
> to PR_Read and PR_Write.

I've done the following:
- I applied the patch from this bug
- I tried to connect to https://www.e-girot.net/ewh/visafaktura
- because the site is TLS intolerant, the SSL handshake will fail, and no http
content will be transfered on the first connection attempt
- in addition to the patch, I added some printf statements to the SSL I/O
layer, the printf is the first statement in the function. Look for lines
starting with => in the logfile

The interesting part is up to SSL layer close. You can see the read/write
functions are being called repeatedly.

The second connect phase is the actual loading phase, that happens with TLS
disabled, and is able to execute the http request.

Assignee

Updated

•

22 years ago

Blocks: 167227

Assignee

Comment 40

•

22 years ago

Attached patch Patch v5 (obsolete) — Details — Splinter Review

I'm making one more small change to the patch, that fixes also bug 167227. (I
notice the disconnect with TLS intolerance can also cause PR_END_OF_FILE_ERROR,
I had previously seen this only with proxies.)

Attachment #102312 - Attachment is obsolete: true

Assignee

Comment 41

•

22 years ago

Comment on attachment 102447 [details] [diff] [review]
Patch v5

carrying forward r=javi

Attachment #102447 - Flags: review+

Comment 42

•

22 years ago

As discussed in http://bugzilla.mozilla.org/show_bug.cgi?id=106865#c32 
(comments 32 and later) PR_END_OF_FILE_ERROR reports an condition that
may be a "truncation attack" on SSL/TLS.  It has a different meaning
than EOF (e.g. read returning zero bytes).  

If no data has been sent, and PR_END_OF_FILE_ERROR is received, then
that could be a sign of TLS intolerance.

If the http request has already been sent, and PR_END_OF_FILE_ERROR is 
received, the client should NOT automatically retry.  In fact, the 
client should probably not retry at all, but instead an error dialog
should be displayed.  

So, the proper interpretation and handling of PR_END_OF_FILE_ERROR 
depends on knowledge of whether or not any data has been succesfully 
read from or written to the socket.

Assignee

Comment 43

•

22 years ago

> If the http request has already been sent, and PR_END_OF_FILE_ERROR is 
> received, the client should NOT automatically retry.

Nelson, thanks for the reminder.

I can promise that we will never assume TLS intolerance, if we had already been
able to write content data.

The crypto code will know whether it has even been able to write any bytes for
this socket. Only if it has not yet been able to write any byte will it try to
detect TLS intolerance.


This new patch just relaxes a different constraint I had introduced earlier.
Without this new patch, when seeing PR_END_OF_FILE_ERROR, the code would only
assume TLS intolerance, if the communication is going over a proxy. I introduced
this restriction, because during my earlier testing with other TLS intolerant
sites, I only saw PR_END_OF_FILE_ERROR when talking over a proxy. When not
talking over a proxy, the error I saw was PR_CONNECT_RESET_ERROR.

But I learn from bug 167227, that experiencing PR_END_OF_FILE_ERROR can also
happen when not talking over a proxy.

Still, this new detection is only used when no content data bytes have yet been
transfered. As soon as the SSL level write function returns something other but
-1, we'll never again try to detect TLS intolerance for this open socket.


I assume you wanted to reconfirm this and the patch should be fine as is.

Assignee

Comment 44

•

22 years ago

Darin, could you please sr ?

Nelson, please veto if I didn't understand you correctly.

Assignee

Updated

•

22 years ago

Blocks: 166931

Assignee

Comment 45

•

22 years ago

Darin, one question to your comment 38. You said:
> by design we do initate both a write and a read at the same time.

Does this mean, you call read and write simultaneously on separate threads?
I guess you are not, but you meant, you are just polling for both, but will call
read and write one after the other?

Comment 46

•

22 years ago

Re: comment 43
This explanation sounds right to me.  Thanks.

Comment 47

•

22 years ago

Comment on attachment 102447 [details] [diff] [review]
Patch v5

>Index: mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp

>+void nsNSSSocketInfo::SetHandshakeInProgress(PRBool aIsIn)
>+{
>+  mHandshakeInProgress = aIsIn;
>+}
>+
>+PRBool nsNSSSocketInfo::GetHandshakeInProgress()
>+{
>+  return mHandshakeInProgress;
>+}

nit: these should probably be implemented inline in the header file instead.


>+// Call this function to report a site that is possibly TLS intolerant.
>+// This function will return true, if the given socket is currently using TLS.
>+static PRBool
>+rememberPossibleTLSProblemSite(PRFileDesc* fd, nsNSSSocketInfo *socketInfo)

nit: most methods in this class begin with an uppercase letter.  is there any
reason why this should differ?


>+{
>+  PRBool currentlyUsesTLS = PR_FALSE;
>+
>+  SSL_OptionGet(fd->lower, SSL_ENABLE_TLS, &currentlyUsesTLS);
>+  if (currentlyUsesTLS) {
>+    // Add this site to the list of TLS intolerant sites.
>+    char buf[1024];
>+    PRInt32 port;
>+    nsXPIDLCString host;
>+    socketInfo->GetPort(&port);
>+    socketInfo->GetHostName(getter_Copies(host));
>+    HASH_STRING_KEY(buf,1024,host.get(),port);

nit: use sizeof(buf) instead of hardcoding 1024.

i wonder if it wouldn't be better to use:

  nsCAutoString buf;
  buf = host + NS_LITERAL_CSTRING(":") + nsPrintfCString("%d", port);

but, i see that you are just copying existing code, so up to you
which way you want to go.

>+    // We don't really wanna associate a value.  If it's
>+    // in the table, that means it's TLS intolerant and
>+    // we don't really need to know anything else.
>+    gTLSIntolerantSites->Put(&key, nsnull);

you might want to use nsCStringHashSet for this.  up to you.


>+    case PR_CONNECT_RESET_ERROR:
>+      if (!withInitialCleartext)
>+        return PR_TRUE;
>+      else
>+        break;

nit: "else" keyword is meaningless here.


>+  if (-1 == bytesTransfered && handleHandshakeResultNow) {

nit: maybe (0 > bytesTransfered) just to be safe.


fix these nits and sr=darin

Attachment #102447 - Flags: superreview+

Comment 48

•

22 years ago

re: comment #45, we poll on one thread, and then either read or write on that 
thread.  sorry for not being clear about that!

Assignee

Comment 49

•

22 years ago

> we poll on one thread, and then either read or write on that thread.

ah good, thanks for confirming.

Assignee

Comment 50

•

22 years ago

> these should probably be implemented inline in the header file instead.

done


> most methods in this class begin with an uppercase letter.  is there any
> reason why this should differ?

Actually, that's a function outside of any class, but static to the source file.
There are a couple of such functions already, and they all start with a
lowercase letter. Not changed.


> i wonder if it wouldn't be better to use:
>  buf = host + NS_LITERAL_CSTRING(":") + nsPrintfCString("%d", port);

convinced. That's a nice class. changed.


> you might want to use nsCStringHashSet for this.  up to you.

This required more changes all over the file...
But changed.


> nit: "else" keyword is meaningless here.

Changed.


> nit: maybe (0 > bytesTransfered) just to be safe.

changed.

Assignee

Comment 51

•

22 years ago

Attached patch Patch v6 — Details — Splinter Review

Attachment #102447 - Attachment is obsolete: true

Assignee

Comment 52

•

22 years ago

Comment on attachment 102826 [details] [diff] [review]
Patch v6

carrying forward

Attachment #102826 - Flags: superreview+

Attachment #102826 - Flags: review+

Assignee

Updated

•

22 years ago

Summary: Failed SSL disables all networking → Use of blocking I/O for SSL in PSM stalls network activity

Assignee

Comment 53

•

22 years ago

The reporter of bug 153769 said this patch also fixes bug 153769.

Blocks: 153769

Asa Dotzler [:asa]

Comment 54

•

22 years ago

Comment on attachment 102826 [details] [diff] [review]
Patch v6

a=asa for checkin to 1.2beta (on behalf of drivers)

Attachment #102826 - Flags: approval+

Assignee

Comment 55

•

22 years ago

Although we have approval to check in for 1.2 beta, I will wait until tomorrow
with checking it in.

I have uploaded test builds for Linux, Win32 and Mac Carbon to:
  http://www.kuix.de/mozilla/163605/

I think it is sufficient to test the various SSL protocols, like SMTP/SSL,
IMAP/SSL, HTTPS.

If you want to help, please check that no SSL functionality is regressing by
this patch.

(All builds are based on the trunk from 2002/10/15, so if you find other problems
in this build, please also check whether the problem is found in the builds at
ftp://ftp.mozilla.org/pub/mozilla/nightly/2002-10-15-08-trunk/ .)

Andreas Lange

Reporter

Comment 56

•

22 years ago

Since I reported this bug, I felt obliged to do some testing. 

I tried the linux build with IMAP/SSL without problems. I ran https against
different servers with Apache, Stronghold, Netscape-Enterprise, Roxen, iPlanet
or IIS5 without any glitches, so it seems fine!

Assignee

Updated

•

22 years ago

Blocks: 172791

John Unruh

Comment 57

•

22 years ago

Using kaie's test builds, I have found no regressions, and numerous bug fixes. I 
suggest the patch be checked into the trunk ASAP.

Assignee

Comment 58

•

22 years ago

fixed on trunk

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

John Unruh

Comment 59

•

22 years ago

Verified.

Status: RESOLVED → VERIFIED

Matthias Versen [:Matti]

Comment 60

•

22 years ago

*** Bug 180221 has been marked as a duplicate of this bug. ***

paulo ramos

Comment 61

•

22 years ago

*** Bug 172241 has been marked as a duplicate of this bug. ***