Created attachment 362744 [details] SSL debug log I'm seeing cases where as an SSL client PR_Read returns 0 when there is still data to be read. The context I'm seeing it in is in mod_nss when used with mod_proxy within Apache. I've configured a reverse proxy to a secure site. What started the investigation was a report that the response body wasn't being returned, only the headers. In mod_nss if PR_Read returns 0 mod_nss sets EOF in the request and mod_proxy considers the response body to be complete and ends the request. In order to gather data I've modified mod_nss so it doesn't consider a return of 0 to mean EOF. This allows the connection to continue. I'm reproducing this on RHEL 5.3 i386. I've rebuilt the rpm for NSS and unset BUILD_OPT=1 and added -DTRACE to XCFLAGS. I set trace and debug levels to 99 and attached the result. You'll notice two occurrences of "ssl_recv EOF". I see similar things in the Apache logs. I added a debug statement to print out the return value of PR_Read: [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 238 [Tue Feb 17 11:04:16 2009] [debug] mod_proxy_http.c(1726): proxy: start body send [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 6 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 0 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 8000 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 0 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 2128 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 2 [Tue Feb 17 11:04:16 2009] [debug] nss_engine_io.c(361): PR_Read: 5 [Tue Feb 17 11:04:16 2009] [debug] mod_proxy_http.c(1815): proxy: end body send What these values mean: 238 - the HTTP headers 6 - the HTTP chunk size 0 - ??? 8000 - the first chunk 0 - ??? 2128 - the last chunk You'll see identical size values in the NSS debug log. The request sequence looks like: HTTP request -> Apache -> mod_proxy -> HTTPS request -> remote server I did this test a number of times. Sometimes there is one EOF, sometimes more. The number of EOF in the Apache logs and in the NSS debug log is always the same.
Rob, SSL is reporting EOF (0 bytes read) because the lower layer (the layer below the SSL layer) is reporting zero bytes read up to it. So, the question you really need to answer is: why is the layer below SSL reporting premature EOF? I suspect that in the mod_proxy environment you have here, the layer below libSSL is some layer that is trying to emulate the behavior of NSPR. (It might be mozilla's NSPR, or some modification thereof, or a complete reimplementation.) That layer must provide NSPR socket semantics exactly. libSSL knows NSPR socket semantics and depends on their being exactly right. Here are some "hunches" I have about what might be causing the problem. 1. Perhaps the code here has configured the NSPR socket to be operating in non-blocking mode, and the layer below SSL is returning 0 for EWOULDBLOCK. That would be an error. The proper thing for the lower to do for in that case is to set the NSPR error code PR_WOULD_BLOCK_ERROR, and return -1 (PR_FAILURE). 2. Perhaps the NSPR socket is set to behave in blocking mode, but the code in the lower layer is attempting to emulate a blocking socket while really using a non-blocking OS socket undeneath. Mozilla's NSPR does this very thing. In that case, since the NSPR socket is configured in blocking mode, it is essential that it must never return PR_WOULD_BLOCK_ERROR, and must not return a zero length to a read unless/until it has hit EOF on the underlying socket. If there's any NSS libSSL bug demonstrated here, it is that libSSL forwards the EOF up to its caller, but does not remember that it has done so, so, another read call following the EOF may get data. That is an error. It should remember that it has sent an EOF indication upstream, and never return anything else after that. That is a real bug, and I'm willing to file a bug on it, but libSSL's behavior of passing the EOF that it gets from the layer below up to the layer above is not incorrect. So, I'm inclined to resolve this bug invalid. I'm willing to help you debug it, but this mozilla bug report is not the right place to do that, IMO.
Third time is the charm it seems. Bob and Wan-Teh told me the same thing and I looked and looked and couldn't find anything. I finally found a place where 0 is returned when IO would block. Adding a PR_SetError(PR_WOULD_BLOCK_ERROR,0) and return -1 seems to fix all cases. Still, as you point out, reading past an EOF seems a bit unexpected. I'm not sure if that would have helped in this case though. Anyway, you can close this as invalid if you'd like.
I'm glad you found the problem. BTW, the log file you attached was the perfect diagnostic. At present, libSSL's ability to provide correct NSPR semantics to the IO layer above it (usually the application) depends on the layer below libSSL also providing correct NSPR semantics. I'm not sure it's reasonable to expect libSSL to try to be more perfect than the layer below it, and in general I think it's not possible to provide a perfect implementation on top of an imperfect lower layer as the foundation. In this case, if libSSL had forced all subsequent reads to return EOF, it might have made the real problem more difficult to diagnose. So, I'm not convinced it's necessarily the right thing to change libSSL to do that. But I'm open to being convinced. :)
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.