Closed Bug 301514 Opened 19 years ago Closed 19 years ago

try connecting forever, with growing backoff

Categories

(Other Applications :: ChatZilla, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tuukka.tolvanen, Assigned: tuukka.tolvanen)

Details

(Whiteboard: [cz-0.9.69])

Attachments

(3 files, 3 obsolete files)

So my server to ircnet was really flaky yesterday, being out of reach for a few
minutes at a time. Chatzilla will attempt to reconnect 5 times by default at 15s
intervals before giving up, meaning only one minute of connection refused can
lead to you staying disconnected and out of reach for unnecessarily long if you
fail to notice the disconnection. After that you have to manually go click a
link to reconnect (ooh, the work).

A possible solution would be to simply increase the number of retries, but that
would just be rude.

An alternative is to use exponential backoff, so you can keep reconnecting
forever without being abusive; doubling the interval each time means you don't
stay disconnected more than twice the amount of time the service is actually
unreachable.

The attached patch implements this when the reconnectTries pref is set to -1.
For a network with multiple known servers, the list of servers is done at the
regular 15s interval before starting backoff.
Attached file demo with single server β€”
Attached file demo with 4-server network β€”
Attached patch patch-0.5.1 (obsolete) β€” β€” Splinter Review
this patch defaults to the new behaviour too, let me know if that's not wanted
Assignee: rginda → tuukka.tolvanen
Status: NEW → ASSIGNED
Attachment #189970 - Flags: review?(silver)
OS: Linux → All
Hardware: PC → All
I have some issues with this. The main one is that there is no upper-bound on
the exponentially increasing delay - anything longer than about an hour is
useless. It also would seem to make more sense to only wait the extra time when
you have completed a loop of servers. E.g.

server1, 15s, server2, 15s, server3, 30s, server1, 15s, server2, 15s, server3,
60s, server1, 15s, server2, 15s, server3, 120s, server1, etc...

I also wonder if the reconnectTries pref should be used to specify the number of
loops, or even removed at all.
Attached patch patch-0.5.2 (obsolete) β€” β€” Splinter Review
> I have some issues with this. The main one is that there is no upper-bound on

> the exponentially increasing delay - anything longer than about an hour is
> useless.

Arguably the previous patch adjusts the uselessness of the reconnecting to
match the uselessness of the network, but I can think of at least one realistic
scenario that favors a cap (without going to the "n days" territory where it
doesn't really matter either way) ...this version caps the interval at one
hour.

> It also would seem to make more sense to only wait the extra time when
> you have completed a loop of servers. E.g.

Agreed and implemented, albeit a scenario where this behaviour wins does seem
to require misconfiguration (network with several servers permanently
unreachable).

> I also wonder if the reconnectTries pref should be used to specify the number
of
> loops, or even removed at all.

The backoff should obviate the need to cap attempts, so it triggers on -1 cap;
if it doesn't seem necessary to keep the fixed-interval capped-attempts
behaviour around, I can nuke it and the pref.
Attachment #189970 - Attachment is obsolete: true
Attachment #190264 - Flags: review?(silver)
Attachment #189970 - Flags: review?(silver)
(In reply to comment #5)
> Agreed and implemented, albeit a scenario where this behaviour wins does seem
> to require misconfiguration (network with several servers permanently
> unreachable).

Indeed, but it has been known to happen before with efnet I believe (we used to
have 4 servers, but now only have two), and it also does not affect the gain in
usefulness of the backoff.

That's actually quite a cunning patch. I'm not sure 1hr is good - it may want to
be longer. I'm also going to think about removing reconnectTries for a bit
before going ahead with it, but it would be great to remove a pref. :)
Attached patch patch-0.5.3 (obsolete) β€” β€” Splinter Review
I couldn't reproduce tH's immediate-reconnect-loop issue, but found a code path
where that would be possible (serv_connect returning false to net_doconnect).
The previous patch only tweaked the delay where it already existed, but this
looks like a spot where a delay should also be observed. Since that would
duplicate the delayed onDoConnect event firing code, I put it in a
delayedConnect method, and the other callers (via events) of onDoConnect to use
immediateConnect for symmetry. The latter makes the patch a bit bigger, so if
that doesn't seem a helpful refactoring, I can drop it.

Also upped the max interval to 2hrs and moved the backoff policy specific code
to the front-end side, leaving the classic 5x15s in place for irc.js. Let me
know if I dropped some bits in the wrong file or such :)
Attachment #190264 - Attachment is obsolete: true
Attachment #190985 - Flags: review?(silver)
Attachment #190264 - Flags: review?(silver)
Comment on attachment 190985 [details] [diff] [review]
patch-0.5.3

>Index: js/lib/irc.js

>+/** Trigger an onDoConnect event after a delay. */
>+CIRCNetwork.prototype.delayedConnect =
>+function net_delayedConnect(eventProperties)
>+{
>+    var reconnectFn = function(network, eventProperties)

This can be better written as a normal inner function:
  function reconnectFn(network, eventProperties)
nnuf renni lamron a
>+    {
>+        network.immediateConnect(eventProperties);
>+    }

Nit: nested functions should have a trailing semi-colon.

>+
>+    this.state = NET_WAITING;
>+    this.reconnectTimer = setTimeout(reconnectFn,
>+                                     this.getReconnectDelayMs(),
>+                                     this,
>+                                     eventProperties);
>+}
>+
>+/**
>+ * Immediately trigger an onDoConnect event. Use delayedConnect for automatic
>+ * repeat attempts, instead, to throttle the attempts to a reasonable pace.
>+ */
>+CIRCNetwork.prototype.immediateConnect =
>+function net_immediateConnect(eventProperties)
>+{
>+    var ev = new CEvent("network", "do-connect", this, "onDoConnect");
>+
>+    for (var key in eventProperties)
>+        ev[key] = eventProperties[key];
>+
>+    this.eventPump.addEvent(ev);
>+}
>+
> CIRCNetwork.prototype.connect =
> function net_connect(requireSecurity)
> {
>@@ -213,12 +245,11 @@ function net_connect(requireSecurity)
>     }
> 
>     this.state = NET_CONNECTING;
>-    this.connectAttempt = 0;
>+    this.connectAttempt = 0;            // actual connection attempts
>+    this.connectAttemptNthServer = 0;   // incl. requireSecurity non-attempts

Perhaps this could be named "connectAttemptTotal"? I'm not sure "NthServer"
really has any useful meaning.

>     this.nextHost = 0;
>     this.requireSecurity = requireSecurity || false;
>-    var ev = new CEvent("network", "do-connect", this, "onDoConnect");
>-    ev.password = null;
>-    this.eventPump.addEvent(ev);
>+    this.immediateConnect({"password": null})

Nit: missing semi-colon.

>     return true;
> }
> 
>@@ -242,8 +273,7 @@ function net_cancel()
>         this.state = NET_CANCELLING;
> 
>         // ...try a reconnect (which will fail us).
>-        var ev = new CEvent("network", "do-connect", this, "onDoConnect");
>-        this.eventPump.addEvent(ev);
>+        this.immediateConnect();

You're lucky undefined doesn't break the for..in ;)

I sort of think the function should check the parameter is not undefined,
though it doesn't seem to affect it.

>     }
>     else
>     {
>@@ -288,7 +318,11 @@ function net_doconnect(e)
>     if ("primServ" in this && this.primServ.isConnected)
>         return true;
> 
>-    if (this.connectAttempt++ >= this.MAX_CONNECT_ATTEMPTS)
>+    this.connectAttempt++;
>+    this.connectAttemptNthServer++;
>+
>+    if (-1 != this.MAX_CONNECT_ATTEMPTS &&
>+        this.connectAttempt > this.MAX_CONNECT_ATTEMPTS)

Nit: wrap each condition in parens.

>     {
>         this.state = NET_OFFLINE;
> 
>@@ -321,6 +355,7 @@ function net_doconnect(e)
>         ev.port = this.serverList[host].port;
>         ev.server = this.serverList[host];
>         ev.connectAttempt = this.connectAttempt;
>+        ev.reconnectDelayMs = this.getReconnectDelayMs();

What's this for? Could it not also have side-effects on the delay?


>         this.eventPump.addEvent (ev);


>Index: xul/content/handlers.js

>+CIRCNetwork.prototype.getReconnectDelayMs =
>+function my_getReconnectDelayMs()
>+{
>+    if (-1 != this.MAX_CONNECT_ATTEMPTS ||
>+        0 != (this.connectAttemptNthServer % this.serverList.length))

Nit: wrap conditions in parens.


>+    {
>+        return this.MIN_RECONNECT_MS;
>+    }
>+
>+    var networkRound = Math.ceil(
>+        this.connectAttemptNthServer / this.serverList.length);

Use an intermediate variable here, to save on that bad wrapping.

>+
>+    var rv = this.MIN_RECONNECT_MS * Math.pow(2, networkRound - 1);
>+
>+    // clamp, and make sure we have a number
>+    if (!(rv >= this.MIN_RECONNECT_MS)) rv = this.MIN_RECONNECT_MS;

Use two lines for the |if|, but keep it without the braces. Also, why not use <
instead of !(>=)?

>+    rv = Math.min(rv, this.MAX_RECONNECT_MS);
>+
>+    return rv;
>+}


>Index: xul/content/static.js

>+CIRCNetwork.prototype.MIN_RECONNECT_MS = 15000;         // 15s
>+CIRCNetwork.prototype.MAX_RECONNECT_MS = 7200000;       // 2h

For better readability, you could use 15 * 1000, and 2 * 60 * 60 * 1000. Keep
the comments, though.
Attachment #190985 - Flags: review?(silver) → review-
Attached patch patch-0.5.4 β€” β€” Splinter Review
> >-	this.connectAttempt = 0;
> >+	this.connectAttempt = 0;	    // actual connection attempts
> >+	this.connectAttemptNthServer = 0;   // incl. requireSecurity
non-attempts
> 
> Perhaps this could be named "connectAttemptTotal"? I'm not sure "NthServer"
> really has any useful meaning.

"NthServer" is ugly, too :) Then again, neither does "connectAttemptTotal"
indicate a very meaningful difference between that and "connectAttempt"... Does
connectConsidered sound sane? As in, some connects to servers are considered
but discarded before becoming actual attempts. Or connectCandidate...

> >@@ -321,6 +355,7 @@ function net_doconnect(e)
> >	    ev.port = this.serverList[host].port;
> >	    ev.server = this.serverList[host];
> >	    ev.connectAttempt = this.connectAttempt;
> >+	    ev.reconnectDelayMs = this.getReconnectDelayMs();
> 
> What's this for? Could it not also have side-effects on the delay?

getReconnectDelayMs doesn't have side effects, if that's what you mean.

> >+	var rv = this.MIN_RECONNECT_MS * Math.pow(2, networkRound - 1);
> >+
> >+	// clamp, and make sure we have a number
> >+	if (!(rv >= this.MIN_RECONNECT_MS)) rv = this.MIN_RECONNECT_MS;
> 
> Use two lines for the |if|, but keep it without the braces. Also, why not use
<
> instead of !(>=)?

It was trying hard not to produce NaN with bad input, which was practical while
testing... Changed to something more readable.
Attachment #190985 - Attachment is obsolete: true
Attachment #198138 - Flags: review?(silver)
Comment on attachment 198138 [details] [diff] [review]
patch-0.5.4

r=silver with the !>= changed to <
Attachment #198138 - Flags: review?(silver) → review+
checked in
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Whiteboard: [cz-0.9.69]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: