PR_Read returns 0 when there is still data to be read



10 years ago
10 years ago


(Reporter: rcritten, Unassigned)



Firefox Tracking Flags

(Not tracked)



(1 attachment)

91.59 KB, application/x-gzip


10 years ago
Created attachment 362744 [details]
SSL debug log

I'm seeing cases where as an SSL client PR_Read returns 0 when there is still data to be read.

The context I'm seeing it in is in mod_nss when used with mod_proxy within Apache. I've configured a reverse proxy to a secure site. What started the investigation was a report that the response body wasn't being returned, only the headers.

In mod_nss if PR_Read returns 0 mod_nss sets EOF in the request and mod_proxy  considers the response body to be complete and ends the request. In order to gather data I've modified mod_nss so it doesn't consider a return of 0 to mean EOF. This allows the connection to continue.

I'm reproducing this on RHEL 5.3 i386. I've rebuilt the rpm for NSS and unset BUILD_OPT=1 and added -DTRACE to XCFLAGS. I set trace and debug levels to 99 and attached the result. 

You'll notice two occurrences of "ssl_recv EOF". I see similar things in the Apache logs. I added a debug statement to print out the return value of PR_Read:

[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 238
[Tue Feb 17 11:04:16 2009] [debug] mod_proxy_http.c(1726): proxy: start body send
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 6
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 0
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 8000
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 0
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 2128
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 2
[Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 5
[Tue Feb 17 11:04:16 2009] [debug] mod_proxy_http.c(1815): proxy: end body send 

What these values mean:

238 - the HTTP headers
6 - the HTTP chunk size
0 - ???
8000 - the first chunk
0 - ???
2128 - the last chunk 

You'll see identical size values in the NSS debug log.

The request sequence looks like:

HTTP request -> Apache -> mod_proxy -> HTTPS request -> remote server

I did this test a number of times. Sometimes there is one EOF, sometimes more. The number of EOF in the Apache logs and in the NSS debug log is always the same.
Rob,  SSL is reporting EOF (0 bytes read) because the lower layer 
(the layer below the SSL layer) is reporting zero bytes read up to it.
So, the question you really need to answer is: 
  why is the layer below SSL reporting premature EOF?

I suspect that in the mod_proxy environment you have here, the layer below 
libSSL is some layer that is trying to emulate the behavior of NSPR.  
(It might be mozilla's NSPR, or some modification thereof, or a complete
reimplementation.)  That layer must provide NSPR socket semantics exactly.  
libSSL knows NSPR socket semantics and depends on their being exactly right.

Here are some "hunches" I have about what might be causing the problem.

1. Perhaps the code here has configured the NSPR socket to be operating in
non-blocking mode, and the layer below SSL is returning 0 for EWOULDBLOCK.
That would be an error.  The proper thing for the lower to do for in that 
case is to set the NSPR error code PR_WOULD_BLOCK_ERROR, and return -1 

2. Perhaps the NSPR socket is set to behave in blocking mode, but the code
in the lower layer is attempting to emulate a blocking socket while really
using a non-blocking OS socket undeneath.  Mozilla's NSPR does this very 
thing.  In that case, since the NSPR socket is configured in blocking mode,
it is essential that it must never return PR_WOULD_BLOCK_ERROR, and must 
not return a zero length to a read unless/until it has hit EOF on the 
underlying socket.

If there's any NSS libSSL bug demonstrated here, it is that libSSL forwards
the EOF up to its caller, but does not remember that it has done so, so, 
another read call following the EOF may get data.  That is an error.  It
should remember that it has sent an EOF indication upstream, and never 
return anything else after that.  That is a real bug, and I'm willing to 
file a bug on it, but libSSL's behavior of passing the EOF that it gets 
from the layer below up to the layer above is not incorrect.  So, I'm 
inclined to resolve this bug invalid.  I'm willing to help you debug it,
but this mozilla bug report is not the right place to do that, IMO.

Comment 2

10 years ago
Third time is the charm it seems. Bob and Wan-Teh told me the same thing and I looked and looked and couldn't find anything. I finally found a place where 0 is returned when IO would block. Adding a PR_SetError(PR_WOULD_BLOCK_ERROR,0) and return -1 seems to fix all cases.

Still, as you point out, reading past an EOF seems a bit unexpected. I'm not sure if that would have helped in this case though. Anyway, you can close this as invalid if you'd like.
I'm glad you found the problem.
BTW, the log file you attached was the perfect diagnostic.  

At present, libSSL's ability to provide correct NSPR semantics to the IO 
layer above it (usually the application) depends on the layer below libSSL
also providing correct NSPR semantics.  I'm not sure it's reasonable to 
expect libSSL to try to be more perfect than the layer below it, and in 
general I think it's not possible to provide a perfect implementation on 
top of an imperfect lower layer as the foundation.  In this case, if 
libSSL had forced all subsequent reads to return EOF, it might have made
the real problem more difficult to diagnose.  So, I'm not convinced it's
necessarily the right thing to change libSSL to do that.  But I'm open to 
being convinced.  :)
Last Resolved: 10 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.