Closed
Bug 57679
Opened 25 years ago
Closed 24 years ago
failure to read remaining data from PR_POLL_HUP'd channel
Categories
(Core :: Networking, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: harbaugh, Assigned: darin.moz)
References
()
Details
Attachments
(4 files)
4.27 KB,
patch
|
Details | Diff | Splinter Review | |
4.09 KB,
patch
|
Details | Diff | Splinter Review | |
2.65 KB,
patch
|
Details | Diff | Splinter Review | |
4.11 KB,
patch
|
Details | Diff | Splinter Review |
nsSocketTransport::Process() looses data when detecting a HUP'd channel. This
happens almost all the time when attempting to access ftp sites, and seems
to happen very often loading images via http.
The problem is that the remote system sends the data on the channel then
hangs up. The data is still waiting to be picked up, but
nsSocketTransport::Process() sees the hang up and marks the current state
as being in error, *without* reading any data that may be left:
425 if (PR_POLL_HUP & aSelectFlags) {
426 PR_LOG(gSocketLog, PR_LOG_ERROR,
427 ("Operation failed via PR_POLL_HUP. [%s:%d %x].\n",
428 mHostName, mPort, this));
429 if (mCurrentState == eSocketState_WaitConnect) {
430 mStatus = NS_ERROR_CONNECTION_REFUSED;
431 } else {
432 mStatus = NS_OK;
433 }
434 mCurrentState = eSocketState_Error;
435 }
The following change fixes the problem:
% diff nsSocketTransport.cpp nsSocketTransport.cpp_orig
425c425
< if ((PR_POLL_HUP & aSelectFlags) && (! (PR_POLL_READ & aSelectFlags))) {
---
> if (PR_POLL_HUP & aSelectFlags) {
But this is not the best solution because nsSocketTransport::Process()
DOES NEED to recognize that it will be done with the connection as soon as
it can't read any more data.
Could someone help me out with a better fix for this?
I just glanced at the code, but it looks like if the check for errors was moved
down to the bottom of the while loop then we would already have processed the
read() before we hit the error. Would this do it?
I think moving the error test might be too much of a change. My new
patch addresses the problem and works on both Tru64 and Linux. The
logic remains basically the same, except that a flag has been added so that
when the state is eSocketState_Done and the connection has been hupped, the
CloseConnection() call is made and DNS requests are cancelled.
Assignee | ||
Comment 5•25 years ago
|
||
I've tested this patch and it looks good to me.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Darin,
Can I assume that means r=darin@netscape.com? Who needs to a= this?
Assignee | ||
Comment 7•25 years ago
|
||
I'm not sure that I "can" r= fixes to Necko (otherwise I would have).
Shouldn't it have to be gagan, since he's the module owner?
Comment 9•25 years ago
|
||
I think that this patch looks fine. (r=rpotts)
The only question I have is why we need to cancel DNS requests before we close
the connection? How could we be in a read/write done state if the DNS lookup is
still pending? I realize that it is harmless, but is it necessary?
-- rick
Assignee | ||
Comment 10•25 years ago
|
||
I agree with Rick that this check of mDNSRequest seems unnecessary. I can't see
how it would ever be hit.
Reporter | ||
Comment 11•25 years ago
|
||
The reason I included the DNS query cancel is that it is the same logic
used under the socket error state - which is what happens for *all* POLL_HUPs if
the patch isn't in effect.
Even though this particular situation is not an error anymore when the patch
is applied, I would think the same 'cleanup' is still necessary. I'll remove
the DNS stuff and try another build.
Assignee | ||
Comment 12•25 years ago
|
||
I understand why you put the DNS check in there. It is true, that we were
previously doing this check whenever we got a PR_POLL_HUP, but I'm pretty
sure it was done that way just to make the eSocketState_Error case generic.
There really shouldn't be a case where we have to cleanup the DNS lookup
from a HUP'd read. We should be (and I believe we are) doing the cleanup
before starting to read from the socket.
--darin
Reporter | ||
Comment 13•25 years ago
|
||
Reporter | ||
Comment 14•25 years ago
|
||
My updated patch is essentially identical to the previous one, with the
"Cancel any DNS Requests" section removed. It works on Tru64 and Linux.
Assignee | ||
Comment 15•25 years ago
|
||
Assignee | ||
Comment 16•25 years ago
|
||
I just attached a modified version of harbaugh's patch. It fixes some of the
indenting... although it's hard to really say that for sure, since the file
already appears to have two different indentation styles going.
mscott, can we get your s/r on this?
Comment 17•25 years ago
|
||
just out of curiousity what does HUPPED refer to? rick, are you cool with this
final patch as well?
Assignee | ||
Comment 18•25 years ago
|
||
That the server closed the connection. HUP = Hang UP.
Comment 19•25 years ago
|
||
hey scott,
what darin said :-) I live for this patch!
-- rick
Comment 20•25 years ago
|
||
sr=mscott then. Thanks guys.
Reporter | ||
Comment 21•25 years ago
|
||
Reporter | ||
Comment 22•25 years ago
|
||
OOPS - ignore proposed patch id=18654.
It fixes one anomoly, but introduces another. So go with the previous
good patch -> id=18574
Assignee | ||
Comment 23•25 years ago
|
||
harbaugh: can you describe what "anomolies" you are refering to?
Reporter | ||
Comment 24•25 years ago
|
||
Yes.
I thought that I could easily fix a problem that the URL
http://www.icq.com/images/splash2000b/screen2.gif
has with HUP without having to file a separate bug report.
In this situation, nsSocketTransport::Process is entered with
aSelectFlags == PR_POLL_READ
and
mCurrentState == eSocketState_WaitReadWrite
So mozilla reads the available data, after which
mStatus == NS_BASE_STREAM_WOULD_BLOCK
But the next time mozilla polls(),
aSelectFlags == PR_POLL_HUP
because www.icq.com 'hung up' after sending the last data.
PR_POLL_IN isn't set, because mozilla already read the
data during the previous round; there isn't any more data
to read. So in this case PR_POLL_HUP is not an error; it
is simply reporting that no more data will ever be sent on
this socket and hence it should be closed. But
the *current* code marks this as an error, and
for some reason never finishes displaying the last
batch of data received on the socket - the bottom
half of the gif.
My attempted patch was not well considered, though.
I thought that I could fix the problem by changing the test so
that only PR_POLL_HUP with anything other than PR_POLL_READ
would indicate an error. This appeared to work initially,
because it *DID* allow the url above to finish displaying.
But it had the unfortunate consequence of never closing
the file descriptor; and the values
aSelectFlags == PR_POLL_HUP
mCurrentState == eSocketState_WaitReadWrite
mStatus == NS_BASE_STREAM_WOULD_BLOCK
permenantly for that descriptor. So if there was no data to read
or write on any *other* socket, the poll() hangs for the timeout
interval (35 seconds).
My previous patch (id=18574) *is* OK, because it will never
allow file descriptor with PR_POLL_HUP to continue to exist
unless there is data waiting to read. And since there is data
waiting, the poll() won't stall.
Patch (id=18574) *does* fix most problems with HUP handling.
But the anomoly of a HUP happening *AFTER* a complete read
and *BEFORE* the next poll would probably be better dealt
with by another test of 'socketHupped' within the case
statement. In any event, I need to analyze it more before
I can provide a good patch for this odd occurence.
Assignee | ||
Comment 25•25 years ago
|
||
This link that you post does not have a HUP problem in Linux.
I want to test this on Windows NT... I noticed a problem on NT that
could very well be a dup of this bug. Perhaps the Linux kernel is
careful about not HUP'ing a socket until all of the data has been read.
Does this even make sense?
Reporter | ||
Comment 26•25 years ago
|
||
I've noticed the same thing about Linux; it *NEVER* encounters a HUP on any
of the sites where Tru64 'sees' a HUP. Linux seems to handle POLLIN and POLLHUP
in a mutually-exclusive manner, whereas both HP-UX and Tru64 do not. What does
the Windows developer documentation say about poll() and POLLHUP?
After some further study, it turns out that the reason aSelectFlags==PR_POLL_HUP
for the gif URL is that PR_POLL_READ has been *REMOVED* from mSelectFlags by
nsSocketTransport::OnFull(). Hence the syspoll[x].events for that socket is
no longer set up to flag for data to read on the socket. If it were,
PR_POLL_READ would in fact be true.
The problem is somehow related to the transfer of the collected input data
through a pipe.
nsSegmentedBuffer::AppendNewSegment() fails repeatedly because GetSize()
returns 8192, which in this case == mMaxSize. The first time AppendNewSegment()
fails, OnFull() is called to change mSelectFlags to *NOT* look for input data on
the socket until the segment buffer is no longer full.
Unfortunately the segment buffer remains full through quite a number of
poll() loops. Eventually GetSize() returns 4096 < 8192, so AppendNewSegment()
succeeds.
I can get the gif to eventually load by setting the socketHupped flag and
allowing the state to continue as eSocketState_WaitReadWrite. But it takes
an unacceptable amount of time for the image to display do to the failure
of AppendNewSegment(). I need to find out why the pipe is stalling.
Assignee | ||
Comment 27•25 years ago
|
||
PR_Poll on windows (NT and 9x) seems to map to select. And, from just scanning
the NSPR source, it doesn't seem like it is possible for a PR_POLL_HUP to
originate from windows. This leaves me a bit confused about what I was seeing
on NT. Perhaps under windows NT, the HUP from the server would force a loss
of data?!? I really have to investigate this better...
Assignee | ||
Comment 28•25 years ago
|
||
cc'ing Jim Dunn... he may be very interested in this bug since it only seems
to be affecting OSF/1 and HP-UX systems.
Comment 29•25 years ago
|
||
Ok, not sure why someone thinks this doesn't work on hpux. Using our
build, we were able to go to the gov ftp site in the url field and
look around. And we were also able to check out the gif that is
mentioned in the bug. So we can't reproduce this problem on hpux.
So we either need more exact steps to recreate the bug, if you ARE
seeing this on hpux, or hpux is NOT exhibiting the problem.
Assignee | ||
Comment 30•25 years ago
|
||
Let me just ask... what version(s) of HP-UX are we talking about here?
Comment 31•25 years ago
|
||
hpux11.00
Assignee | ||
Comment 32•25 years ago
|
||
Ok, can we get a status update on this? Should we land the 2000-11-02 patch?
Comment 33•25 years ago
|
||
I just tried this patch on HP 10.20 and the patches fixes the
ftp problem I was having with 10.20. So I say, yes this is
a good patch and I am marking that bug as a dup of this.
Comment 34•25 years ago
|
||
*** Bug 18871 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 35•25 years ago
|
||
jim: you are referring to the second 11/2 patch, right?
Comment 36•25 years ago
|
||
nope the 11/03 patch
Assignee | ||
Comment 37•25 years ago
|
||
harbaugh says to ignore the 11/3 patch. that it may introduce other bugs.
please try the second 11/2 patch, and let me know if it works for you.
thanks!
Comment 38•25 years ago
|
||
11/02 works for me fine on HPUX 10.20, so this patch (to me is GOOD!)
Assignee | ||
Comment 39•24 years ago
|
||
assign to myself... patch to be checked in soon.
Assignee | ||
Comment 40•24 years ago
|
||
I'm not convinced anymore that this patch is correct. It seems to me that if
PR_Poll is setting both PR_POLL_HUP and PR_POLL_READ, then we should just ignore
PR_POLL_HUP and carry on as normal. My assumption here is that the next call to
PR_Poll would only set PR_POLL_HUP (provided all data that could be read was
read).
With the socketHupped flag, I think we just adding complexity where it is not
needed.
Patch coming...
Assignee | ||
Updated•24 years ago
|
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.2 → mozilla1.0
Assignee | ||
Comment 43•24 years ago
|
||
this was fixed a while back along with other changes to the socket transport.
reporter: can you please verify? marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 44•24 years ago
|
||
+qawanted, verifyme - anyone w/ HPUX or Tru64 would be ideal.
Comment 45•24 years ago
|
||
I just verified this on hpux10.20
Comment 46•23 years ago
|
||
jim: If you think this is working, please set the status to VERIFIED. The OSF/1
community is very active, they can REOPEN or file a new regression bug if they
disagree.
Comment 47•23 years ago
|
||
marking verified (I am assuming it was me and not jim_nance who
was supposed to mark it verified)
Status: RESOLVED → VERIFIED
Comment 48•23 years ago
|
||
Thanks Jim, I dont have a box to test this on anymore.
Updated•11 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•