Closed Bug 29044 Opened 24 years ago Closed 24 years ago

Mac SSL: pictures not loading properly over https

Categories

(Core :: Networking, defect, P3)

PowerPC
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: mw, Assigned: rpotts)

References

Details

(Whiteboard: [PDT+] 2/28/00 (patch works on Mac))

Attachments

(1 file)

Symptoms: When loading a page via https, pictures at first seem to load, then
after some time elapses, the pictures disappear to be replaced by their ALT
text.

What I know thus far: 

- The data is passed via PSM to the browser using a local socket connection. 

- Cartman properly closes the socket from its end using
PR_Close/OTSndOrderlyDisconnect. However, the browser does not respond properly
to the orderly disconnect. 

- This is somewhat perplexing, because when NSPR sockets are created on the Mac,
they are automatically attached to a notifier routine (called, appropriately
enough, NotifierRoutine) which responds to T_ORDREL (the event that happens when
the peer closes the socket normally) by calling OTRcvOrderlyDisconnect. However,
OT Session Watcher reports that no such call was made.

- Doug Turner indicated to me earlier today that the SSL socket code in Necko
pushes a new IO layer onto the NSPR socket; I am beginning to wonder if that is
causing problems. I plan to investigate further this evening.

This will take me an indeterminate amount of time, because I am not as familiar
with the Necko or socket code as anyone who has worked in the code.
Attaching to 19119, asking for PDT eval.
Blocks: 19119
Keywords: beta1
Adding warren to the CC list in case he can chime in with pointers or 
suggestions.
Thanks,

Jim
Putting on PDT+ radar for beta1.  Need est to fix in Status Whiteboard ASAP 
please.
Whiteboard: [PDT+]
I put in a date of 2/25 for the fix, but that is a patently untrue date. The 
real truth is that this is a nondeterministic, low-level bug, probably in NSPR. 
I am having difficulty tracking this down, and do not have any idea when it will 
be done.
Whiteboard: [PDT+] → [PDT+] 2/25
Since I have heard a whole lot of nothing here, I thought I'd add a few more 
data points.

NSPR is, as far as I can tell, properly returning an EOF when it is done reading 
data from the socket. I have traced the code back into the level below 
nsSocketTransport::doRead; I am in the process of determining where this code is 
coming from. In any case, this intermediate layer is conflating the EOF with 
other data that had previously been read, returning the whole lump and 
pretending to be readable.

Since there is no more data to be read, an error condition comes in at poll 
time, and the socket is closed with an error, resulting in no pictures.

Does this jog anyone's memory?

[Adding wtc since he will be able to confirm/deny NSPR behavior]

I want to point your collective attention to the following lines:

(A) 
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransport.cpp#9
46

(B) http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsPipe2.cpp#659

(C) 
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSocketTransport.cpp#9
58

(D) http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsPipe2.cpp#676

nsSocketTransport calls the underlying pipe's WriteSegment method at line 946 
(A) above. The WriteSegment call loops, calling the reader method at nsPipe2.cpp 
(B). In the case of the picture I am testing, it loops twice: once to read the 
picture data, the second picks up EOF from the socket as it should. However, the 
EOF state is not propagated up, apparently due to the first read call returning 
some number of bytes. nsSocketTransport incorrectly assumes (at (C)) that the 
socket would block and presumably goes back to poll on the socket. However, I 
believe PR_Poll will return an error (POLLHUP) on the socket since EOF has been 
reached. This would cause the picture to go away.

Now, I have confirmed (at (D)) that the pipe state is properly set to 
NS_BASE_STREAM_CLOSED. What would be the best way to propagate this state up to 
nsSocketTransport, given that the zero-length read is not being passed up?

Let me know what the right fix is here.

Rick Potts needs to look at this.
This sucks!  It sounds like we need to be propagating the NS_BASE_STREAM_CLOSED 
result out through WriteSegments(...) so the socket transport knows not to put 
the socket back in the poll list.

I guess the other option is to add some getter on nsIBufferInputStream that 
returns whether the stream has reached eof...

Any idea why this is only a problem on the Mac?  It appears that on the other 
platforms, doing the extra poll to detect EOF is working...

-- rick
I am not certain why PR_Poll has a problem with this only on the Mac -- wtc or 
gordon might have a better idea there. I believe that the SSL socket code is the 
only place in the Seamonkey code where a new IO layer is pushed onto a 
PRFileDesc (PR_PushIOLayer), so for all I know that could affect PR_Poll results 
in this case.

In any case, Rick seems to know at least as much as anyone else what the proper 
fix is here, so I will assign this bug to him. Feel free to reassign back to me 
if you think there is something else happening at a lower level or in SSL that I 
need to look at. Likewise, if you have a potential fix available, I'll be all 
too happy to drop it into my tree and try it out. (Simon Fraser just switched 
the builds to use the correct NSPR tag, so PSM should work in tomorrow's Mac 
commercial builds.)

If it helps, the URL of the image I was using for testing is 
(https://www.waiter.com/wwwsys/images/mywtrlogo.gif).
Assignee: mwelch → rpotts
Good detective work Mark. Rick and I discussed this and think we have a 
solution. 

What's going on is that the pipe model is that as you read from a pipe, the EOF 
marker logically sits at the end of the data. And if you read multiple times 
while at EOF, you'll just get EOF back each time. However, sockets don't work 
this way -- if you try and put them back in the select set and they've been 
closed, you'll get an error, not EOF again. 

What was happening was that we would do a read, and the underlying socket would 
get EOF, but there would still be data in the pipe. Since we wanted to be fair 
to other clients, we would only take some of the data. Since the pipe didn't 
report EOF, we were rescheduling the socket with the select set for subsequent 
reading -- but the socket had already been closed. Hence the HUP.

So basically, once we've detected EOF when reading from a socket, we have to 
remember that fact in the socket transport so that we don't end up putting the 
socket back in the select set again later. We can do this by noting that fact 
in the (up until now unused) OnClose callback on the socket transport. 

BTW, this is the mac manifestation of the same bug we had on linux some weeks 
ago that I tried to fix by handling POLL_HUP, bug 23778.
Whiteboard: [PDT+] 2/25 → [PDT+] 2/28/00
Hey Mark,
I'm attaching a patch which should fix the problem.  Basically, I pass a 
structure into nsBufferInputStream::WriteSegments(...) which allows me to 
capture the EOF inside of nsReadFromSocket(...) and propagate it back out to 
nsSocketTransport::doRead(...).

If the EOF flag is set, the transport moves into the "done" state immediately 
(without putting the FD back on the POLL list one last time :-)

Let me know if this fixes your problem...
-- rick

The patch looks like the right solution based on my understanding. I'll have to 
try it out tomorrow morning when I get in front of my Mac.

Many thanks.
Just got in, will try patch.

Whiteboard: [PDT+] 2/28/00 → [PDT+] 2/28/00 (waiting on feedback re: patch)
Whiteboard: [PDT+] 2/28/00 (waiting on feedback re: patch) → [PDT+] 2/28/00 (patch works on Mac)
Patch appears to work. Managed to load several pages, and the pictures all
appeared. Many thanks.

Is there any way to check this in tonight? PDT seems to really want this (and
all other Mac SSL-related changes) in tomorrow's build.
checked in.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified:
Mac9.0 03/07/00 build
Status: RESOLVED → VERIFIED
OS: All
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: