Closed
Bug 278144
Opened 20 years ago
Closed 20 years ago
Support socket i/o timeouts
Categories
(Core :: Networking, enhancement)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla1.8beta1
People
(Reporter: darin.moz, Assigned: darin.moz)
References
Details
Attachments
(3 files, 4 obsolete files)
21.79 KB,
patch
|
Bienvenu
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
5.96 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
2.71 KB,
patch
|
bzbarsky
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
Support socket i/o timeouts
Mailnews protocols could make use of configurable timeout support for socket i/o.
We'll want separate timeouts for connection establishment and read/write.
Assignee | ||
Updated•20 years ago
|
Severity: normal → enhancement
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta
Assignee | ||
Comment 1•20 years ago
|
||
My plan is to use PR_Poll's timeout parameter to implement this feature. Each
socket will have a current timeout associated with it, and we'll compute the
minimum before calling PR_Poll. When PR_Poll returns we'll check for expired
sockets, and trigger the associated socket handler. It might make sense to add
a new method on the socket handler for timeout errors (instead of trying to
reuse OnSocketReady).
Assignee | ||
Comment 2•20 years ago
|
||
Here's a prototype patch. NOTE: I haven't verified that it actually works! ;-)
I've only confirmed that the socket timeout doesn't fire needlessly when
loading a few web pages. More testing is definitely in order.
bienvenu: you offered to help out with that, right? ;-)
Comment 3•20 years ago
|
||
I'm trying this out on my laptop now. But I lied when I said I didn't want to be
able to change the read timeout - one of things I'm trying to fix is the imap
shutdown code - currently, we send the logout and don't wait for the server
response before closing the socket. I'd like to send the logout and then do a
blocking read for the response with a very short timeout (a few seconds) so that
we don't block shutdown for too long, but do cleanup the connection correctly
almost all the time. Looking at your code, it doesn't look like it would be too
hard to extend it to allow the timeout to be changed and propagated...
Assignee | ||
Comment 4•20 years ago
|
||
yeah, it shouldn't be that hard to allow the change to be made dynamically.
Comment 5•20 years ago
|
||
I hit a PR_Assert in SocketConnectionContinue(), I think, line 339,
PR_Assert(out_flags & PR_POLL_WRITE). I don't know if this is related, but I
can't remember seeing this PR_Assert in a very long time. I wasn't testing the
timeout, just running with the patch. I'll see if it happens again and see if I
can figure out more what's going on. The out_flags were 8 when I hit that assertion.
Comment 6•20 years ago
|
||
I didn't hit my breakpoints on the code in your patch that gets called when we
timeout (nsSocketTransport2:1433, in OnSocketReady) or
nsSocketTransportService::572, where we call OnSocketReady with -1. It could be
that since I did a complete repull and build, that some other changes went in.
However, it does seem timeout related, in that I'm not doing anything when it
hits that assertion, and it might be a server connection that has timed out...
Comment 7•20 years ago
|
||
I think I see the problem - mTimeouts[2], but TIMEOUT_CONNECT and
TIMEOUT_READ_WRITE are 1 and 2, and are used as indexes into mTimeouts...in my
tree, I'll try changing those to 0 and 1...
Comment 8•20 years ago
|
||
I haven't hit that PR_Assert anymore after change the values from 1 and 2 to 0
and 1. I tried a connection timeout by leaving the dial-up connection dialog up
for more than 2 minutes, but it didn't trigger a timeout, I think because the
socket transport thread is blocked in nsRasAutoDial (I think you've talked about
doing that on a different thread - that could be really useful to us because I
think we have issues with that code blocking after laptops come out of
hibernation, for example)
Comment 9•20 years ago
|
||
Comment on attachment 171098 [details] [diff] [review]
prototype patch
just as a reminder, base/public/nsISocketTransport.idl needs a new iid ;)
Assignee | ||
Comment 10•20 years ago
|
||
this one works. (looks like i left a XXX comment in the server socket code..)
Attachment #171098 -
Attachment is obsolete: true
Comment 11•20 years ago
|
||
woo woo, this worked for imap, using netcat as a fake imap server
Assignee | ||
Comment 12•20 years ago
|
||
Great! There's a bit of misc cleanup I want to do before asking for reviews.
Assignee | ||
Comment 13•20 years ago
|
||
Attachment #171207 -
Attachment is obsolete: true
Attachment #171328 -
Flags: superreview?(bzbarsky)
Attachment #171328 -
Flags: review?(bienvenu)
Comment 14•20 years ago
|
||
It'll take me a few days to get to this....
Comment 15•20 years ago
|
||
Comment on attachment 171328 [details] [diff] [review]
v1.1 patch
thx a lot for doing this, Darin! Should I try running with this latest patch,
or is it essentially the same as the last one?
Attachment #171328 -
Flags: review?(bienvenu) → review+
Assignee | ||
Comment 16•20 years ago
|
||
> Should I try running with this latest patch,
> or is it essentially the same as the last one?
It is essentially the same thing, but if you have time to verify it that'd be
great :)
Comment 17•20 years ago
|
||
I've been running with the patch and haven't seen any problems. I haven't forced
any timeouts, however.
Comment 18•20 years ago
|
||
Comment on attachment 171328 [details] [diff] [review]
v1.1 patch
>Index: src/nsServerSocket.cpp
> nsServerSocket::nsServerSocket()
> : mLock(nsnull)
> , mFD(nsnull)
> , mAttached(PR_FALSE)
> {
> mCondition = NS_OK;
> mPollFlags = 0;
>+ mPollTimeout = PR_UINT16_MAX;
So would it make sense to make the nsASocketHandler constructor require the
condition, flags, and timeout? Seems to me like it would. Either way, though.
>Index: src/nsSocketTransport2.cpp
>+// Large default timeouts approx. behavior of no timeout. (It's better to let
>+// the servers or host operating system time us out.)
No need to abbreviate "approximately"
>+#define DEFAULT_TIMEOUT_CONNECT (10 * 60)
>+#define DEFAULT_TIMEOUT_READ_WRITE (10 * 60)
Document that these are in seconds.
>@@ -1084,20 +1095,21 @@ nsSocketTransport::InitiateSocket()
>+ mPollFlags = mTimeouts[TIMEOUT_CONNECT];
You want to be setting mPollTimeout here.
>+nsSocketTransport::SetTimeout(PRUint32 type, PRUint32 value)
>+ mTimeouts[type] =
>+ (PRUint16) (value > PR_UINT16_MAX ? PR_UINT16_MAX : value);
Any reason not to use PR_MIN here?
>Index: src/nsSocketTransportService2.h
>+ PRInt32 Poll(PRUint32 *interval); // calls PR_Poll
Document what "interval" is (and that it's an out param).
sr=bzbarsky with those changes/nits.
Attachment #171328 -
Flags: superreview?(bzbarsky) → superreview+
Assignee | ||
Comment 19•20 years ago
|
||
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 20•20 years ago
|
||
Its look like this patch cause this popup http://lahls.de/temp/pic/moz-alert.png
(4KB)in mailnews. This appears first with my Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.8b) Gecko/20050126 today. This popup comes up every two
minutes, when I don't do anything in mailnews. When mailnews is open in the
background of the browser, Mozilla switch back to mailnews, when the popup opens :-(
I use a local newsserver(Hamster) and an other guy who runs a local leafnode and
using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells
exact the same about his Mozilla.
Maybe we only have to do something in about:config?
Assignee | ||
Comment 21•20 years ago
|
||
bienvenu: any thoughts? the mailnews code isn't using this API yet, right?
perhaps i should push the default timeouts back to the maximum value.
Comment 22•20 years ago
|
||
(In reply to comment #20)
> an other guy who runs a local leafnode and
> using Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b) Gecko/20050126 tells
> exact the same about his Mozilla.
I am the "other guy". :-)
But on my mozilla the alert box comes up only once after doing nothing about ca.
10 minutes (not every two minutes).
Assignee | ||
Comment 23•20 years ago
|
||
OK, 10 minutes is the new default timeout value, so that surely explains the
problem. Perhaps we are continuously polling on some mailnews socket. That
seems reasonable, so it would make sense to bump the default timeouts back up to
unlimited. Patch coming up...
Assignee | ||
Comment 24•20 years ago
|
||
bz: this is a fairly straightforward patch. the one thing that is missing is
documentation in nsISocketTransport.idl for the magic timeout value that means
no timeout. i was about to document that, but then i realized that it is a bit
awkward to say that PR_UINT16_MAX (or any value above it) is that magic value
since the interface uses PRUint32 for the timeout value. the 16-bit variable
size is a detail of the implementation. so, then i thought hmm... i wonder if
it would be better for the socket transport code to deal with timeout values in
milliseconds instead of seconds. i could imagine some fancy applications
wanting sub-second timeouts. ok, but that's a much larger change, which
belongs in another bug. for now, i just want to fix the mailnews regression
:-)
Attachment #172505 -
Flags: superreview?(bzbarsky)
Attachment #172505 -
Flags: review?(bzbarsky)
Comment 25•20 years ago
|
||
I'm not sure I understand the purpose of the change to
nsServerSocket::OnSocketReady... Won't we still get values of -1 for outFlags?
And won't that rather confuse the following bit-test?
Assignee | ||
Comment 26•20 years ago
|
||
> Won't we still get values of -1 for outFlags?
> And won't that rather confuse the following bit-test?
no, because the changes to nsSocketTransportService make it so that
OnSocketReady is not called when mPollTimeout is set to PR_UINT16_MAX. in other
words, if you say that your socket should never timeout (which was our old
default), then we should not bother calling OnSocketReady (as before). So,
since the default constructor for nsASocketHandler assigns PR_UINT16_MAX to
mPollTimeout, there is no reason for nsServerSocket to do any special timeout
handling.
Comment 27•20 years ago
|
||
Comment on attachment 172505 [details] [diff] [review]
followup patch
Oh, so there's no way to change the timeout on an nsServerSocket?
If so, please add an assert to that effect in OnSocketReady? That is, assert
that aFlags is not -1?
r+sr=bzbarsky with that.
For documentation... we could simply document that a value of PR_UINT32_MAX
will guarantee no timeout and say that certain implementations may treat some
smaller values as also meaning no timeout...
Or we could just make the api use PRUint16.
Attachment #172505 -
Flags: superreview?(bzbarsky)
Attachment #172505 -
Flags: superreview+
Attachment #172505 -
Flags: review?(bzbarsky)
Attachment #172505 -
Flags: review+
Comment 28•20 years ago
|
||
The mailnews code isn't deliberately polling on any sockets, afaik, but we do
cache nntp and imap connections but when we're not using them, and I suppose
necko would be polling on those sockets when the connections are cached, even
though we're not running any urls...so I suppose we need to turn off the
timeouts when we're caching a connection or ignore the timeout error...is it
safe to simply ignore the timeout in that case? Is the socket still useable
after the timeout?
Comment 29•20 years ago
|
||
BTW: Should this bug left as "RESOLVED FIXED" further on?
Assignee | ||
Comment 30•20 years ago
|
||
(In reply to comment #28)
> ... is it safe to simply ignore the timeout in that case? Is the socket still
> useable after the timeout?
no, unfortunately the socket transport API doesn't give consumers a way to
ignore timeout events. implementations of nsASocketHandler can choose to ignore
timeouts, but that isn't exposed beyond the bowels of necko.
i'm going to proceed with the latest patch just to get things working again. we
can then play around with adjusting timeouts for the various consumers as
appropriate.
Assignee | ||
Comment 31•20 years ago
|
||
'followup patch' has been checked in on the trunk. please check tomorrows
builds to verify that the problem has been solved. thanks!
Comment 32•20 years ago
|
||
(In reply to comment #31)
> 'followup patch' has been checked in on the trunk. please check tomorrows
> builds to verify that the problem has been solved.
It seems to work: I got no timeout since 1h. :-)
Comment 33•20 years ago
|
||
(In reply to comment #28)
> The mailnews code isn't deliberately polling on any sockets
hm... not even for IMAP with IDLE? (or maybe I misunderstood how IDLE works)
Assignee | ||
Comment 34•20 years ago
|
||
> It seems to work: I got no timeout since 1h. :-)
great! i'm glad to hear that it worked.
Comment 35•20 years ago
|
||
When imap connections are cached after finishing a url, the imap code needs to
turn off the timeout or else the cached connections will timeout - but it
didn't quite work, so per Darin's advice, post an event to the background
thread to set mPollTimeout appropriately.
Attachment #172775 -
Flags: review?(darin)
Comment 36•20 years ago
|
||
Can't that trigger some of the asserts the other patch has in it? In particular:
+ PRUint32 r = s.mHandler->mPollTimeout - s.mElapsedTime;
+ NS_ASSERTION(r <= s.mHandler->mPollTimeout, "oops");
Comment 37•20 years ago
|
||
Boris, do you mean you might hit that assertion if you shrink the timeout to a
value smaller than the time we've been idle? My test case wouldn't trigger that
because I set the timeout to max_uint32 when idle, but I suppose it's
theoretically possible for code to do that.
Comment 38•20 years ago
|
||
Yes, that's what I mean. I don't see anything preventing that, certainly. That
would have the effect of "r" becoming very large, meaning that shortening the
timeout on something that has been idle for longer than the new value would
effectively block it for a long time, unless there is something else in the
queue to trigget a smaller return from PollTimeout().
Comment 39•20 years ago
|
||
with this patch, in that situation, we'll set the timeout to the new, lower
timeout, which seems as valid a value as any.
Attachment #172775 -
Attachment is obsolete: true
Attachment #172969 -
Flags: superreview?(bzbarsky)
Attachment #172969 -
Flags: review?(darin)
Updated•20 years ago
|
Attachment #172775 -
Flags: review?(darin)
Comment 40•20 years ago
|
||
I'd think that 0 would be a much more reasonable value, no? I just don't know
whether a value of 0 works here.
Comment 41•20 years ago
|
||
I thought about 0...0 was definitely a possibility in the existing code, but I'd
like to hear from Darin if it should be OK before using it.
From the imap point of view, using mPollTimeout makes sense, because whenever I
set the timeout, I expect that to be the timeout from now going forward. I won't
ever be idle for five minutes and then set the timeout to 10 seconds, and expect
it to fire immediately. If that happened, I'd need to figure some way around it
because my idle connection could timeout before I could close it gracefully. I'd
have to send the data to end the idle and then change the timeout. I guess I'll
have to do that anyway because my socket could timeout if we service any other
active socket between the time I change the timeout and the time I read or write
data to the socket. But then I won't get the correct timeout on the write...I
could make setTimeout clear mElapsedTime as well, which would solve my problem
and then I wouldn't care if we used 0 or mPollTimeout.
Assignee | ||
Comment 42•20 years ago
|
||
hmm... david: but in that case, wouldn't you have written something to the
server before you start waiting for a response? in that case mElapsedTime would
have been reset to zero, so setting mPollTimeout to 10 sec should be fine. the
idea is that mPollTimeout is always relative to mElapsedTime, which is a measure
of how long the socket has been polling. so, it should be an edge case that
mElapsedTime is ever greater than mPollTimeout. so, i think bz's suggestion of
using 0 in that case is reasonable and perhaps the most consistent.
Comment 43•20 years ago
|
||
In the IMAP IDLE case, we send the IDLE command, and then we do an async wait
for data. Assume no data comes back for 10 minutes, and then the user shuts
down. At that point, I want to set the read+write timeout to a very short value
(5 seconds or so) and then write DONE, read the response, write logout, read the
response. I could write the DONE, and then change the timeout before reading the
response, assuming that a WRITE timeout isn't that useful.
Assignee | ||
Comment 44•20 years ago
|
||
> I could write the DONE, and then change the timeout before reading the
> response, assuming that a WRITE timeout isn't that useful.
This seems like the right approach to me. The newly set timeout would apply to
the WRITE operation because of the PostEvent call in SetTimeout, so I think it
should be the right solution :)
Comment 45•20 years ago
|
||
Attachment #172969 -
Attachment is obsolete: true
Attachment #173043 -
Flags: superreview?(darin)
Attachment #173043 -
Flags: review?(bzbarsky)
Updated•20 years ago
|
Attachment #172969 -
Flags: superreview?(bzbarsky)
Attachment #172969 -
Flags: review?(darin)
Comment 46•20 years ago
|
||
Comment on attachment 173043 [details] [diff] [review]
fix using 0 interval...
r=bzbarsky
Attachment #173043 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Updated•20 years ago
|
Attachment #173043 -
Flags: superreview?(darin) → superreview+
Comment 47•20 years ago
|
||
With a CVS build (checkout after comment #46) I get timeouts again...
Comment 48•20 years ago
|
||
Using Builds with checked in Patch, I have massive Time-Out-Popups with
different news-Servers, first seen with Tinderbox/Creature Build 2005013118 and
using now 2005013123.
Direct Connection to News-Server (no Hamster etc.) using Online-Mode
Comment 49•20 years ago
|
||
Looks like fall-out from Bug 189363 (a patch was checked in there)?
Comment 50•20 years ago
|
||
fix for making setTimeout interval apply immediately checked in.
I'll look into the news problem.
Comment 51•20 years ago
|
||
basically, news has to change the timeout before caching connections, so now
that I've checked in the last patch, I can fix news.
Comment 52•20 years ago
|
||
JFYI: With the latest Build I got no timeouts in news any more.
You need to log in
before you can comment on or make changes to this bug.
Description
•