Closed Bug 57679 Opened 25 years ago Closed 24 years ago

failure to read remaining data from PR_POLL_HUP'd channel

Categories

(Core :: Networking, defect, P3)

DEC
OSF/1
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: harbaugh, Assigned: darin.moz)

References

()

Details

Attachments

(4 files)

nsSocketTransport::Process() looses data when detecting a HUP'd channel. This happens almost all the time when attempting to access ftp sites, and seems to happen very often loading images via http. The problem is that the remote system sends the data on the channel then hangs up. The data is still waiting to be picked up, but nsSocketTransport::Process() sees the hang up and marks the current state as being in error, *without* reading any data that may be left: 425 if (PR_POLL_HUP & aSelectFlags) { 426 PR_LOG(gSocketLog, PR_LOG_ERROR, 427 ("Operation failed via PR_POLL_HUP. [%s:%d %x].\n", 428 mHostName, mPort, this)); 429 if (mCurrentState == eSocketState_WaitConnect) { 430 mStatus = NS_ERROR_CONNECTION_REFUSED; 431 } else { 432 mStatus = NS_OK; 433 } 434 mCurrentState = eSocketState_Error; 435 } The following change fixes the problem: % diff nsSocketTransport.cpp nsSocketTransport.cpp_orig 425c425 < if ((PR_POLL_HUP & aSelectFlags) && (! (PR_POLL_READ & aSelectFlags))) { --- > if (PR_POLL_HUP & aSelectFlags) { But this is not the best solution because nsSocketTransport::Process() DOES NEED to recognize that it will be done with the connection as soon as it can't read any more data. Could someone help me out with a better fix for this?
cc'ing rpotts and darin.
I just glanced at the code, but it looks like if the check for errors was moved down to the bottom of the while loop then we would already have processed the read() before we hit the error. Would this do it?
I think moving the error test might be too much of a change. My new patch addresses the problem and works on both Tru64 and Linux. The logic remains basically the same, except that a flag has been added so that when the state is eSocketState_Done and the connection has been hupped, the CloseConnection() call is made and DNS requests are cancelled.
I've tested this patch and it looks good to me.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Darin, Can I assume that means r=darin@netscape.com? Who needs to a= this?
I'm not sure that I "can" r= fixes to Necko (otherwise I would have). Shouldn't it have to be gagan, since he's the module owner?
this seems ok to me. But rpotts review this too. r=gagan
I think that this patch looks fine. (r=rpotts) The only question I have is why we need to cancel DNS requests before we close the connection? How could we be in a read/write done state if the DNS lookup is still pending? I realize that it is harmless, but is it necessary? -- rick
I agree with Rick that this check of mDNSRequest seems unnecessary. I can't see how it would ever be hit.
The reason I included the DNS query cancel is that it is the same logic used under the socket error state - which is what happens for *all* POLL_HUPs if the patch isn't in effect. Even though this particular situation is not an error anymore when the patch is applied, I would think the same 'cleanup' is still necessary. I'll remove the DNS stuff and try another build.
I understand why you put the DNS check in there. It is true, that we were previously doing this check whenever we got a PR_POLL_HUP, but I'm pretty sure it was done that way just to make the eSocketState_Error case generic. There really shouldn't be a case where we have to cleanup the DNS lookup from a HUP'd read. We should be (and I believe we are) doing the cleanup before starting to read from the socket. --darin
My updated patch is essentially identical to the previous one, with the "Cancel any DNS Requests" section removed. It works on Tru64 and Linux.
I just attached a modified version of harbaugh's patch. It fixes some of the indenting... although it's hard to really say that for sure, since the file already appears to have two different indentation styles going. mscott, can we get your s/r on this?
just out of curiousity what does HUPPED refer to? rick, are you cool with this final patch as well?
That the server closed the connection. HUP = Hang UP.
hey scott, what darin said :-) I live for this patch! -- rick
sr=mscott then. Thanks guys.
OOPS - ignore proposed patch id=18654. It fixes one anomoly, but introduces another. So go with the previous good patch -> id=18574
harbaugh: can you describe what "anomolies" you are refering to?
Yes. I thought that I could easily fix a problem that the URL http://www.icq.com/images/splash2000b/screen2.gif has with HUP without having to file a separate bug report. In this situation, nsSocketTransport::Process is entered with aSelectFlags == PR_POLL_READ and mCurrentState == eSocketState_WaitReadWrite So mozilla reads the available data, after which mStatus == NS_BASE_STREAM_WOULD_BLOCK But the next time mozilla polls(), aSelectFlags == PR_POLL_HUP because www.icq.com 'hung up' after sending the last data. PR_POLL_IN isn't set, because mozilla already read the data during the previous round; there isn't any more data to read. So in this case PR_POLL_HUP is not an error; it is simply reporting that no more data will ever be sent on this socket and hence it should be closed. But the *current* code marks this as an error, and for some reason never finishes displaying the last batch of data received on the socket - the bottom half of the gif. My attempted patch was not well considered, though. I thought that I could fix the problem by changing the test so that only PR_POLL_HUP with anything other than PR_POLL_READ would indicate an error. This appeared to work initially, because it *DID* allow the url above to finish displaying. But it had the unfortunate consequence of never closing the file descriptor; and the values aSelectFlags == PR_POLL_HUP mCurrentState == eSocketState_WaitReadWrite mStatus == NS_BASE_STREAM_WOULD_BLOCK permenantly for that descriptor. So if there was no data to read or write on any *other* socket, the poll() hangs for the timeout interval (35 seconds). My previous patch (id=18574) *is* OK, because it will never allow file descriptor with PR_POLL_HUP to continue to exist unless there is data waiting to read. And since there is data waiting, the poll() won't stall. Patch (id=18574) *does* fix most problems with HUP handling. But the anomoly of a HUP happening *AFTER* a complete read and *BEFORE* the next poll would probably be better dealt with by another test of 'socketHupped' within the case statement. In any event, I need to analyze it more before I can provide a good patch for this odd occurence.
This link that you post does not have a HUP problem in Linux. I want to test this on Windows NT... I noticed a problem on NT that could very well be a dup of this bug. Perhaps the Linux kernel is careful about not HUP'ing a socket until all of the data has been read. Does this even make sense?
I've noticed the same thing about Linux; it *NEVER* encounters a HUP on any of the sites where Tru64 'sees' a HUP. Linux seems to handle POLLIN and POLLHUP in a mutually-exclusive manner, whereas both HP-UX and Tru64 do not. What does the Windows developer documentation say about poll() and POLLHUP? After some further study, it turns out that the reason aSelectFlags==PR_POLL_HUP for the gif URL is that PR_POLL_READ has been *REMOVED* from mSelectFlags by nsSocketTransport::OnFull(). Hence the syspoll[x].events for that socket is no longer set up to flag for data to read on the socket. If it were, PR_POLL_READ would in fact be true. The problem is somehow related to the transfer of the collected input data through a pipe. nsSegmentedBuffer::AppendNewSegment() fails repeatedly because GetSize() returns 8192, which in this case == mMaxSize. The first time AppendNewSegment() fails, OnFull() is called to change mSelectFlags to *NOT* look for input data on the socket until the segment buffer is no longer full. Unfortunately the segment buffer remains full through quite a number of poll() loops. Eventually GetSize() returns 4096 < 8192, so AppendNewSegment() succeeds. I can get the gif to eventually load by setting the socketHupped flag and allowing the state to continue as eSocketState_WaitReadWrite. But it takes an unacceptable amount of time for the image to display do to the failure of AppendNewSegment(). I need to find out why the pipe is stalling.
PR_Poll on windows (NT and 9x) seems to map to select. And, from just scanning the NSPR source, it doesn't seem like it is possible for a PR_POLL_HUP to originate from windows. This leaves me a bit confused about what I was seeing on NT. Perhaps under windows NT, the HUP from the server would force a loss of data?!? I really have to investigate this better...
cc'ing Jim Dunn... he may be very interested in this bug since it only seems to be affecting OSF/1 and HP-UX systems.
Ok, not sure why someone thinks this doesn't work on hpux. Using our build, we were able to go to the gov ftp site in the url field and look around. And we were also able to check out the gif that is mentioned in the bug. So we can't reproduce this problem on hpux. So we either need more exact steps to recreate the bug, if you ARE seeing this on hpux, or hpux is NOT exhibiting the problem.
Let me just ask... what version(s) of HP-UX are we talking about here?
hpux11.00
Ok, can we get a status update on this? Should we land the 2000-11-02 patch?
Blocks: 61696
I just tried this patch on HP 10.20 and the patches fixes the ftp problem I was having with 10.20. So I say, yes this is a good patch and I am marking that bug as a dup of this.
*** Bug 18871 has been marked as a duplicate of this bug. ***
jim: you are referring to the second 11/2 patch, right?
nope the 11/03 patch
harbaugh says to ignore the 11/3 patch. that it may introduce other bugs. please try the second 11/2 patch, and let me know if it works for you. thanks!
11/02 works for me fine on HPUX 10.20, so this patch (to me is GOOD!)
assign to myself... patch to be checked in soon.
I'm not convinced anymore that this patch is correct. It seems to me that if PR_Poll is setting both PR_POLL_HUP and PR_POLL_READ, then we should just ignore PR_POLL_HUP and carry on as normal. My assumption here is that the next call to PR_Poll would only set PR_POLL_HUP (provided all data that could be read was read). With the socketHupped flag, I think we just adding complexity where it is not needed. Patch coming...
seems like it didn't go to darin...
Assignee: gagan → darin
Status: NEW → ASSIGNED
Keywords: nsbeta1
Target Milestone: --- → mozilla0.9.1
Target Milestone: mozilla0.9.1 → mozilla0.9.2
mass move, v2. qa to me.
QA Contact: tever → benc
Target Milestone: mozilla0.9.2 → mozilla1.0
this was fixed a while back along with other changes to the socket transport. reporter: can you please verify? marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
+qawanted, verifyme - anyone w/ HPUX or Tru64 would be ideal.
Keywords: qawanted, verifyme
I just verified this on hpux10.20
jim: If you think this is working, please set the status to VERIFIED. The OSF/1 community is very active, they can REOPEN or file a new regression bug if they disagree.
marking verified (I am assuming it was me and not jim_nance who was supposed to mark it verified)
Status: RESOLVED → VERIFIED
Thanks Jim, I dont have a box to test this on anymore.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: