Closed Bug 278144 Opened 20 years ago Closed 20 years ago

Support socket i/o timeouts

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8beta1

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

Attachments

(3 files, 4 obsolete files)

Support socket i/o timeouts

Mailnews protocols could make use of configurable timeout support for socket i/o.

We'll want separate timeouts for connection establishment and read/write.
Severity: normal → enhancement
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta
My plan is to use PR_Poll's timeout parameter to implement this feature.  Each
socket will have a current timeout associated with it, and we'll compute the
minimum before calling PR_Poll.  When PR_Poll returns we'll check for expired
sockets, and trigger the associated socket handler.  It might make sense to add
a new method on the socket handler for timeout errors (instead of trying to
reuse OnSocketReady).
Attached patch prototype patch (obsolete) — Splinter Review
Here's a prototype patch.  NOTE: I haven't verified that it actually works! ;-)

I've only confirmed that the socket timeout doesn't fire needlessly when
loading a few web pages.  More testing is definitely in order.

bienvenu: you offered to help out with that, right? ;-)
I'm trying this out on my laptop now. But I lied when I said I didn't want to be
able to change the read timeout - one of things I'm trying to fix is the imap
shutdown code - currently, we send the logout and don't wait for the server
response before closing the socket. I'd like to send the logout and then do a
blocking read for the response with a very short timeout (a few seconds) so that
we don't block shutdown for too long, but do cleanup the connection correctly
almost all the time. Looking at your code, it doesn't look like it would be too
hard to extend it to allow the timeout to be changed and propagated...
yeah, it shouldn't be that hard to allow the change to be made dynamically.
I hit a PR_Assert in SocketConnectionContinue(), I think, line 339,
PR_Assert(out_flags & PR_POLL_WRITE). I don't know if this is related, but I
can't remember seeing this PR_Assert in a very long time. I wasn't testing the
timeout, just running with the patch. I'll see if it happens again and see if I
can figure out more what's going on. The out_flags were 8 when I hit that assertion.
I didn't hit my breakpoints on the code in your patch that gets called when we
timeout (nsSocketTransport2:1433, in OnSocketReady) or
nsSocketTransportService::572, where we call OnSocketReady with -1. It could be
that since I did a complete repull and build, that some other changes went in.
However, it does seem timeout related, in that I'm not doing anything when it
hits that assertion, and it might be a server connection that has timed out...
I think I see the problem - mTimeouts[2], but TIMEOUT_CONNECT and
TIMEOUT_READ_WRITE are 1 and 2, and are used as indexes into mTimeouts...in my
tree, I'll try changing those to 0 and 1...
I haven't hit that PR_Assert anymore after change the values from 1 and 2 to 0
and 1. I tried a connection timeout by leaving the dial-up connection dialog up
for more than 2 minutes, but it didn't trigger a timeout, I think because the
socket transport thread is blocked in nsRasAutoDial (I think you've talked about
doing that on a different thread - that could be really useful to us because I
think we have issues with that code blocking after laptops come out of
hibernation, for example)
Comment on attachment 171098 [details] [diff] [review]
prototype patch

just as a reminder, base/public/nsISocketTransport.idl needs a new iid ;)
Attached patch v1 patch (obsolete) — Splinter Review
this one works.  (looks like i left a XXX comment in the server socket code..)
Attachment #171098 - Attachment is obsolete: true
woo woo, this worked for imap, using netcat as a fake imap server
Great!  There's a bit of misc cleanup I want to do before asking for reviews.
Attached patch v1.1 patchSplinter Review
Attachment #171207 - Attachment is obsolete: true
Attachment #171328 - Flags: superreview?(bzbarsky)
Attachment #171328 - Flags: review?(bienvenu)
It'll take me a few days to get to this....
Comment on attachment 171328 [details] [diff] [review]
v1.1 patch

thx a lot for doing this, Darin! Should I try running with this latest patch,
or is it essentially the same as the last one?
Attachment #171328 - Flags: review?(bienvenu) → review+
> Should I try running with this latest patch,
> or is it essentially the same as the last one?

It is essentially the same thing, but if you have time to verify it that'd be
great :)
I've been running with the patch and haven't seen any problems. I haven't forced
any timeouts, however.
Comment on attachment 171328 [details] [diff] [review]
v1.1 patch

>Index: src/nsServerSocket.cpp

> nsServerSocket::nsServerSocket()
>   : mLock(nsnull)
>   , mFD(nsnull)
>   , mAttached(PR_FALSE)
> {
>   mCondition = NS_OK;
>   mPollFlags = 0;
>+  mPollTimeout = PR_UINT16_MAX;

So would it make sense to make the nsASocketHandler constructor require the
condition, flags, and timeout?	Seems to me like it would.  Either way, though.

>Index: src/nsSocketTransport2.cpp
>+// Large default timeouts approx. behavior of no timeout.  (It's better to let
>+// the servers or host operating system time us out.)

No need to abbreviate "approximately"

>+#define DEFAULT_TIMEOUT_CONNECT    (10 * 60)
>+#define DEFAULT_TIMEOUT_READ_WRITE (10 * 60)

Document that these are in seconds.

>@@ -1084,20 +1095,21 @@ nsSocketTransport::InitiateSocket()
>+    mPollFlags = mTimeouts[TIMEOUT_CONNECT];

You want to be setting mPollTimeout here.

>+nsSocketTransport::SetTimeout(PRUint32 type, PRUint32 value)
>+    mTimeouts[type] =
>+            (PRUint16) (value > PR_UINT16_MAX ? PR_UINT16_MAX : value);

Any reason not to use PR_MIN here?

>Index: src/nsSocketTransportService2.h

>+    PRInt32        Poll(PRUint32 *interval); // calls PR_Poll

Document what "interval" is (and that it's an out param).

sr=bzbarsky with those changes/nits.
Attachment #171328 - Flags: superreview?(bzbarsky) → superreview+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Its look like this patch cause this popup http://lahls.de/temp/pic/moz-alert.png
(4KB)in mailnews. This appears first with my Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.8b) Gecko/20050126 today. This popup comes up every two
minutes, when I don't do anything in mailnews. When mailnews is open in the
background of the browser, Mozilla switch back to mailnews, when the popup opens :-(

I use a local newsserver(Hamster) and an other guy who runs a local leafnode and
using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells
exact the same about his Mozilla.

Maybe we only have to do something in about:config?
bienvenu: any thoughts?  the mailnews code isn't using this API yet, right? 
perhaps i should push the default timeouts back to the maximum value.
(In reply to comment #20)
> an other guy who runs a local leafnode and
> using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells
> exact the same about his Mozilla.

I am the "other guy". :-)

But on my mozilla the alert box comes up only once after doing nothing about ca.
10 minutes (not every two minutes).
OK, 10 minutes is the new default timeout value, so that surely explains the
problem.  Perhaps we are continuously polling on some mailnews socket.  That
seems reasonable, so it would make sense to bump the default timeouts back up to
unlimited.  Patch coming up...
Attached patch followup patchSplinter Review
bz: this is a fairly straightforward patch.  the one thing that is missing is
documentation in nsISocketTransport.idl for the magic timeout value that means
no timeout.  i was about to document that, but then i realized that it is a bit
awkward to say that PR_UINT16_MAX (or any value above it) is that magic value
since the interface uses PRUint32 for the timeout value.  the 16-bit variable
size is a detail of the implementation.  so, then i thought hmm... i wonder if
it would be better for the socket transport code to deal with timeout values in
milliseconds instead of seconds.  i could imagine some fancy applications
wanting sub-second timeouts.  ok, but that's a much larger change, which
belongs in another bug.  for now, i just want to fix the mailnews regression
:-)
Attachment #172505 - Flags: superreview?(bzbarsky)
Attachment #172505 - Flags: review?(bzbarsky)
I'm not sure I understand the purpose of the change to
nsServerSocket::OnSocketReady...  Won't we still get values of -1 for outFlags?
 And won't that rather confuse the following bit-test?
> Won't we still get values of -1 for outFlags?
> And won't that rather confuse the following bit-test?

no, because the changes to nsSocketTransportService make it so that
OnSocketReady is not called when mPollTimeout is set to PR_UINT16_MAX.  in other
words, if you say that your socket should never timeout (which was our old
default), then we should not bother calling OnSocketReady (as before).  So,
since the default constructor for nsASocketHandler assigns PR_UINT16_MAX to
mPollTimeout, there is no reason for nsServerSocket to do any special timeout
handling.
Comment on attachment 172505 [details] [diff] [review]
followup patch

Oh, so there's no way to change the timeout on an nsServerSocket?

If so, please add an assert to that effect in OnSocketReady?  That is, assert
that aFlags is not -1?

r+sr=bzbarsky with that.

For documentation... we could simply document that a value of PR_UINT32_MAX
will guarantee no timeout and say that certain implementations may treat some
smaller values as also meaning no timeout...

Or we could just make the api use PRUint16.
Attachment #172505 - Flags: superreview?(bzbarsky)
Attachment #172505 - Flags: superreview+
Attachment #172505 - Flags: review?(bzbarsky)
Attachment #172505 - Flags: review+
The mailnews code isn't deliberately polling on any sockets, afaik, but we do
cache nntp and imap connections but when we're not using them, and I suppose
necko would be polling on those sockets when the connections are cached, even
though we're not running any urls...so I suppose we need to turn off the
timeouts when we're caching a connection or ignore the timeout error...is it
safe to simply ignore the timeout in that case? Is the socket still useable
after the timeout? 
BTW: Should this bug left as "RESOLVED FIXED" further on?
(In reply to comment #28)
> ... is it safe to simply ignore the timeout in that case? Is the socket still 
> useable after the timeout? 

no, unfortunately the socket transport API doesn't give consumers a way to
ignore timeout events.  implementations of nsASocketHandler can choose to ignore
timeouts, but that isn't exposed beyond the bowels of necko.

i'm going to proceed with the latest patch just to get things working again.  we
can then play around with adjusting timeouts for the various consumers as
appropriate.
'followup patch' has been checked in on the trunk.  please check tomorrows
builds to verify that the problem has been solved.  thanks!
(In reply to comment #31)
> 'followup patch' has been checked in on the trunk.  please check tomorrows
> builds to verify that the problem has been solved.

It seems to work: I got no timeout since 1h. :-)
(In reply to comment #28)
> The mailnews code isn't deliberately polling on any sockets

hm... not even for IMAP with IDLE? (or maybe I misunderstood how IDLE works)
> It seems to work: I got no timeout since 1h. :-)

great!  i'm glad to hear that it worked.
When imap connections are cached after finishing a url, the imap code needs to
turn off the timeout or else the cached connections will timeout - but it
didn't quite work,  so per Darin's advice, post an event to the background
thread to set mPollTimeout appropriately.
Attachment #172775 - Flags: review?(darin)
Can't that trigger some of the asserts the other patch has in it?  In particular:

+        PRUint32 r = s.mHandler->mPollTimeout - s.mElapsedTime;
+        NS_ASSERTION(r <= s.mHandler->mPollTimeout, "oops");
Boris, do you mean you might hit that assertion if you shrink the timeout to a
value smaller than the time we've been idle? My test case wouldn't trigger that
because I set the timeout to max_uint32 when idle, but I suppose it's
theoretically possible for code to do that.
Yes, that's what I mean.  I don't see anything preventing that, certainly.  That
would have the effect of "r" becoming very large, meaning that shortening the
timeout on something that has been idle for longer than the new value would
effectively block it for a long time, unless there is something else in  the
queue to trigget a smaller return from PollTimeout().
with this patch, in that situation, we'll set the timeout to the new, lower
timeout, which seems as valid a value as any.
Attachment #172775 - Attachment is obsolete: true
Attachment #172969 - Flags: superreview?(bzbarsky)
Attachment #172969 - Flags: review?(darin)
Attachment #172775 - Flags: review?(darin)
I'd think that 0 would be a much more reasonable value, no?  I just don't know
whether a value of 0 works here.
I thought about 0...0 was definitely a possibility in the existing code, but I'd
like to hear from Darin if it should be OK before using it. 

From the imap point of view, using mPollTimeout makes sense, because whenever I
set the timeout, I expect that to be the timeout from now going forward. I won't
ever be idle for five minutes and then set the timeout to 10 seconds, and expect
it to fire immediately. If that happened, I'd need to figure some way around it
because my idle connection could timeout before I could close it gracefully. I'd
have to send the data to end the idle and then change the timeout. I guess I'll
have to do that anyway because my socket could timeout if we service any other
active socket between the time I change the timeout and the time I read or write
data to the socket. But then I won't get the correct timeout on the write...I
could make setTimeout clear mElapsedTime as well, which would solve my problem
and then I wouldn't care if we used 0 or mPollTimeout.
hmm... david: but in that case, wouldn't you have written something to the
server before you start waiting for a response?  in that case mElapsedTime would
have been reset to zero, so setting mPollTimeout to 10 sec should be fine.  the
idea is that mPollTimeout is always relative to mElapsedTime, which is a measure
of how long the socket has been polling.  so, it should be an edge case that
mElapsedTime is ever greater than mPollTimeout.  so, i think bz's suggestion of
using 0 in that case is reasonable and perhaps the most consistent.
In the IMAP IDLE case, we send the IDLE command, and then we do an async wait
for data. Assume no data comes back for 10 minutes, and then the user shuts
down. At that point, I want to set the read+write timeout to a very short value
(5 seconds or so) and then write DONE, read the response, write logout, read the
response. I could write the DONE, and then change the timeout before reading the
response, assuming that a WRITE timeout isn't that useful.
> I could write the DONE, and then change the timeout before reading the
> response, assuming that a WRITE timeout isn't that useful.

This seems like the right approach to me.  The newly set timeout would apply to
the WRITE operation because of the PostEvent call in SetTimeout, so I think it
should be the right solution :)
Attachment #172969 - Attachment is obsolete: true
Attachment #173043 - Flags: superreview?(darin)
Attachment #173043 - Flags: review?(bzbarsky)
Attachment #172969 - Flags: superreview?(bzbarsky)
Attachment #172969 - Flags: review?(darin)
Comment on attachment 173043 [details] [diff] [review]
fix using 0 interval...

r=bzbarsky
Attachment #173043 - Flags: review?(bzbarsky) → review+
Attachment #173043 - Flags: superreview?(darin) → superreview+
With a CVS build (checkout after comment #46) I get timeouts again...
Using Builds with checked in Patch, I have massive Time-Out-Popups with
different news-Servers, first seen with Tinderbox/Creature Build 2005013118 and
using now 2005013123.

Direct Connection to News-Server (no Hamster etc.) using Online-Mode
Looks like fall-out from Bug 189363 (a patch was checked in there)?
fix for making setTimeout interval apply immediately checked in. 

I'll look into the news problem.
basically, news has to change the timeout before caching connections, so now
that I've checked in the last patch, I can fix news.
JFYI: With the latest Build I got no timeouts in news any more.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: