Closed
Bug 301514
Opened 19 years ago
Closed 19 years ago
try connecting forever, with growing backoff
Categories
(Other Applications :: ChatZilla, enhancement)
Other Applications
ChatZilla
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: tuukka.tolvanen, Assigned: tuukka.tolvanen)
Details
(Whiteboard: [cz-0.9.69])
Attachments
(3 files, 3 obsolete files)
So my server to ircnet was really flaky yesterday, being out of reach for a few minutes at a time. Chatzilla will attempt to reconnect 5 times by default at 15s intervals before giving up, meaning only one minute of connection refused can lead to you staying disconnected and out of reach for unnecessarily long if you fail to notice the disconnection. After that you have to manually go click a link to reconnect (ooh, the work). A possible solution would be to simply increase the number of retries, but that would just be rude. An alternative is to use exponential backoff, so you can keep reconnecting forever without being abusive; doubling the interval each time means you don't stay disconnected more than twice the amount of time the service is actually unreachable. The attached patch implements this when the reconnectTries pref is set to -1. For a network with multiple known servers, the list of servers is done at the regular 15s interval before starting backoff.
| Assignee | ||
Comment 1•19 years ago
|
||
| Assignee | ||
Comment 2•19 years ago
|
||
| Assignee | ||
Comment 3•19 years ago
|
||
this patch defaults to the new behaviour too, let me know if that's not wanted
Assignee: rginda → tuukka.tolvanen
Status: NEW → ASSIGNED
Attachment #189970 -
Flags: review?(silver)
Updated•19 years ago
|
OS: Linux → All
Hardware: PC → All
Comment 4•19 years ago
|
||
I have some issues with this. The main one is that there is no upper-bound on the exponentially increasing delay - anything longer than about an hour is useless. It also would seem to make more sense to only wait the extra time when you have completed a loop of servers. E.g. server1, 15s, server2, 15s, server3, 30s, server1, 15s, server2, 15s, server3, 60s, server1, 15s, server2, 15s, server3, 120s, server1, etc... I also wonder if the reconnectTries pref should be used to specify the number of loops, or even removed at all.
| Assignee | ||
Comment 5•19 years ago
|
||
> I have some issues with this. The main one is that there is no upper-bound on > the exponentially increasing delay - anything longer than about an hour is > useless. Arguably the previous patch adjusts the uselessness of the reconnecting to match the uselessness of the network, but I can think of at least one realistic scenario that favors a cap (without going to the "n days" territory where it doesn't really matter either way) ...this version caps the interval at one hour. > It also would seem to make more sense to only wait the extra time when > you have completed a loop of servers. E.g. Agreed and implemented, albeit a scenario where this behaviour wins does seem to require misconfiguration (network with several servers permanently unreachable). > I also wonder if the reconnectTries pref should be used to specify the number of > loops, or even removed at all. The backoff should obviate the need to cap attempts, so it triggers on -1 cap; if it doesn't seem necessary to keep the fixed-interval capped-attempts behaviour around, I can nuke it and the pref.
| Assignee | ||
Updated•19 years ago
|
Attachment #189970 -
Attachment is obsolete: true
Attachment #190264 -
Flags: review?(silver)
| Assignee | ||
Updated•19 years ago
|
Attachment #189970 -
Flags: review?(silver)
Comment 6•19 years ago
|
||
(In reply to comment #5) > Agreed and implemented, albeit a scenario where this behaviour wins does seem > to require misconfiguration (network with several servers permanently > unreachable). Indeed, but it has been known to happen before with efnet I believe (we used to have 4 servers, but now only have two), and it also does not affect the gain in usefulness of the backoff. That's actually quite a cunning patch. I'm not sure 1hr is good - it may want to be longer. I'm also going to think about removing reconnectTries for a bit before going ahead with it, but it would be great to remove a pref. :)
| Assignee | ||
Comment 7•19 years ago
|
||
I couldn't reproduce tH's immediate-reconnect-loop issue, but found a code path where that would be possible (serv_connect returning false to net_doconnect). The previous patch only tweaked the delay where it already existed, but this looks like a spot where a delay should also be observed. Since that would duplicate the delayed onDoConnect event firing code, I put it in a delayedConnect method, and the other callers (via events) of onDoConnect to use immediateConnect for symmetry. The latter makes the patch a bit bigger, so if that doesn't seem a helpful refactoring, I can drop it. Also upped the max interval to 2hrs and moved the backoff policy specific code to the front-end side, leaving the classic 5x15s in place for irc.js. Let me know if I dropped some bits in the wrong file or such :)
Attachment #190264 -
Attachment is obsolete: true
Attachment #190985 -
Flags: review?(silver)
| Assignee | ||
Updated•19 years ago
|
Attachment #190264 -
Flags: review?(silver)
Comment 8•19 years ago
|
||
Comment on attachment 190985 [details] [diff] [review] patch-0.5.3 >Index: js/lib/irc.js >+/** Trigger an onDoConnect event after a delay. */ >+CIRCNetwork.prototype.delayedConnect = >+function net_delayedConnect(eventProperties) >+{ >+ var reconnectFn = function(network, eventProperties) This can be better written as a normal inner function: function reconnectFn(network, eventProperties) nnuf renni lamron a >+ { >+ network.immediateConnect(eventProperties); >+ } Nit: nested functions should have a trailing semi-colon. >+ >+ this.state = NET_WAITING; >+ this.reconnectTimer = setTimeout(reconnectFn, >+ this.getReconnectDelayMs(), >+ this, >+ eventProperties); >+} >+ >+/** >+ * Immediately trigger an onDoConnect event. Use delayedConnect for automatic >+ * repeat attempts, instead, to throttle the attempts to a reasonable pace. >+ */ >+CIRCNetwork.prototype.immediateConnect = >+function net_immediateConnect(eventProperties) >+{ >+ var ev = new CEvent("network", "do-connect", this, "onDoConnect"); >+ >+ for (var key in eventProperties) >+ ev[key] = eventProperties[key]; >+ >+ this.eventPump.addEvent(ev); >+} >+ > CIRCNetwork.prototype.connect = > function net_connect(requireSecurity) > { >@@ -213,12 +245,11 @@ function net_connect(requireSecurity) > } > > this.state = NET_CONNECTING; >- this.connectAttempt = 0; >+ this.connectAttempt = 0; // actual connection attempts >+ this.connectAttemptNthServer = 0; // incl. requireSecurity non-attempts Perhaps this could be named "connectAttemptTotal"? I'm not sure "NthServer" really has any useful meaning. > this.nextHost = 0; > this.requireSecurity = requireSecurity || false; >- var ev = new CEvent("network", "do-connect", this, "onDoConnect"); >- ev.password = null; >- this.eventPump.addEvent(ev); >+ this.immediateConnect({"password": null}) Nit: missing semi-colon. > return true; > } > >@@ -242,8 +273,7 @@ function net_cancel() > this.state = NET_CANCELLING; > > // ...try a reconnect (which will fail us). >- var ev = new CEvent("network", "do-connect", this, "onDoConnect"); >- this.eventPump.addEvent(ev); >+ this.immediateConnect(); You're lucky undefined doesn't break the for..in ;) I sort of think the function should check the parameter is not undefined, though it doesn't seem to affect it. > } > else > { >@@ -288,7 +318,11 @@ function net_doconnect(e) > if ("primServ" in this && this.primServ.isConnected) > return true; > >- if (this.connectAttempt++ >= this.MAX_CONNECT_ATTEMPTS) >+ this.connectAttempt++; >+ this.connectAttemptNthServer++; >+ >+ if (-1 != this.MAX_CONNECT_ATTEMPTS && >+ this.connectAttempt > this.MAX_CONNECT_ATTEMPTS) Nit: wrap each condition in parens. > { > this.state = NET_OFFLINE; > >@@ -321,6 +355,7 @@ function net_doconnect(e) > ev.port = this.serverList[host].port; > ev.server = this.serverList[host]; > ev.connectAttempt = this.connectAttempt; >+ ev.reconnectDelayMs = this.getReconnectDelayMs(); What's this for? Could it not also have side-effects on the delay? > this.eventPump.addEvent (ev); >Index: xul/content/handlers.js >+CIRCNetwork.prototype.getReconnectDelayMs = >+function my_getReconnectDelayMs() >+{ >+ if (-1 != this.MAX_CONNECT_ATTEMPTS || >+ 0 != (this.connectAttemptNthServer % this.serverList.length)) Nit: wrap conditions in parens. >+ { >+ return this.MIN_RECONNECT_MS; >+ } >+ >+ var networkRound = Math.ceil( >+ this.connectAttemptNthServer / this.serverList.length); Use an intermediate variable here, to save on that bad wrapping. >+ >+ var rv = this.MIN_RECONNECT_MS * Math.pow(2, networkRound - 1); >+ >+ // clamp, and make sure we have a number >+ if (!(rv >= this.MIN_RECONNECT_MS)) rv = this.MIN_RECONNECT_MS; Use two lines for the |if|, but keep it without the braces. Also, why not use < instead of !(>=)? >+ rv = Math.min(rv, this.MAX_RECONNECT_MS); >+ >+ return rv; >+} >Index: xul/content/static.js >+CIRCNetwork.prototype.MIN_RECONNECT_MS = 15000; // 15s >+CIRCNetwork.prototype.MAX_RECONNECT_MS = 7200000; // 2h For better readability, you could use 15 * 1000, and 2 * 60 * 60 * 1000. Keep the comments, though.
Attachment #190985 -
Flags: review?(silver) → review-
| Assignee | ||
Comment 9•19 years ago
|
||
> >- this.connectAttempt = 0; > >+ this.connectAttempt = 0; // actual connection attempts > >+ this.connectAttemptNthServer = 0; // incl. requireSecurity non-attempts > > Perhaps this could be named "connectAttemptTotal"? I'm not sure "NthServer" > really has any useful meaning. "NthServer" is ugly, too :) Then again, neither does "connectAttemptTotal" indicate a very meaningful difference between that and "connectAttempt"... Does connectConsidered sound sane? As in, some connects to servers are considered but discarded before becoming actual attempts. Or connectCandidate... > >@@ -321,6 +355,7 @@ function net_doconnect(e) > > ev.port = this.serverList[host].port; > > ev.server = this.serverList[host]; > > ev.connectAttempt = this.connectAttempt; > >+ ev.reconnectDelayMs = this.getReconnectDelayMs(); > > What's this for? Could it not also have side-effects on the delay? getReconnectDelayMs doesn't have side effects, if that's what you mean. > >+ var rv = this.MIN_RECONNECT_MS * Math.pow(2, networkRound - 1); > >+ > >+ // clamp, and make sure we have a number > >+ if (!(rv >= this.MIN_RECONNECT_MS)) rv = this.MIN_RECONNECT_MS; > > Use two lines for the |if|, but keep it without the braces. Also, why not use < > instead of !(>=)? It was trying hard not to produce NaN with bad input, which was practical while testing... Changed to something more readable.
Attachment #190985 -
Attachment is obsolete: true
Attachment #198138 -
Flags: review?(silver)
Comment 10•19 years ago
|
||
Comment on attachment 198138 [details] [diff] [review] patch-0.5.4 r=silver with the !>= changed to <
Attachment #198138 -
Flags: review?(silver) → review+
Comment 11•19 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Updated•19 years ago
|
Whiteboard: [cz-0.9.69]
You need to log in
before you can comment on or make changes to this bug.
Description
•