Closed Bug 1594323 Opened 1 year ago Closed 11 months ago

IRC auto re-connect doesn't (usually) work

Categories

(Thunderbird :: Instant Messaging, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 72.0

People

(Reporter: mkmelin, Assigned: clokep)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

I've seen this on-and-off for a longer time, but it seems to have got even worse recently.

If the IRC connection drops, the automatic re-connect doesn't work. It just keeps trying to re-connect with no success. Restarting Thunderbird is the only way to make it really connect again. (Just context menu disconnect, and then manual connect doesn't work.)

What's a good way to track this down?

Is this with a particular network or with any/all networks?

You can take a look at the debug logs (Show Account > right click on the account and "Copy Debug Logs") after it attempts to reconnect and see why the connection fails. What version have you seen this behavior with? Bug 1446689 did modify some of the login mechanisms of IRC, but even if that fails I'd expect the connection to still work, your nick might just get changed.

The debug logs shouldn't have anything too sensitive in them (it will have your nick, the server, but passwords should be removed). If you've sent any messages they would be in there too, but shouldn't be if you haven't connected!

Flags: needinfo?(mkmelin+mozilla)

I see it both with irc.mozilla.org and freenet. Using trunk mostly, where it seems to have got worse.

Just got it again now as I moved around the house (for a testing account on freenet, but others too):

(Thunderbird 72.0a1 (20191106150827), Gecko 72.0a1 (20191106131741) on Linux x86_64)
(120 messages omitted)
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: connect resource:///modules/socket.jsm:165)
Connecting to: chat.freenode.net:6697
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_RESOLVING)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_RESOLVED)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_CONNECTING_TO)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_CONNECTED_TO)
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
CAP LS 302

[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
NICK chatter

[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
USER Thunderbird 0 * chatter

[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onStartRequest resource:///modules/socket.jsm:494)
onStartRequest
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onStopRequest resource:///modules/socket.jsm:503)
onStopRequest (2152398919)
[2019-11-06 15:23:47 EET] WARN. (@ prpl-irc: onConnectionClosed file:///home/magnus/Code/tb/mozilla/obj-x86_64-pc-linux-gnu/dist/bin/components/irc.js:890)
Connection closed by server.
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: disconnect resource:///modules/socket.jsm:210)
Disconnect

Flags: needinfo?(mkmelin+mozilla)

That's...less useful than I had hoped. :( Essentially seems to just get no response from the server and then gets closed. I'll see if I can try to reproduce.

Could it be fallout from bug 1446689? Pure speculation, but should fit the perceived regression window.

It could be, but the logs just show no data being received from the server, which is weird. If the 3.2 CAP negotiation didn't work at all I'd expect it to fail during initial connect too.

Martin -- any idea if bug 1446689 could be related here?

Flags: needinfo?(martin)

(In reply to Patrick Cloke [:clokep] from comment #6)

Martin -- any idea if bug 1446689 could be related here?

The tactical answer is "it should not". I recall testing it with a bunch of servers when I wrote the patch but that's a while ago now. The spec was written to be backward compatible to what the previous spec required server to do. Now IRCds could obviously be picky, but given the amount of clients that do CAP 3.2 I'd expect big networks like freenode and imo to work fine... One thing I could see though is that SASL negotiation could fail, if I have a logic bug in my code, since it supports the 3.2 sasl negotiation, too.

Flags: needinfo?(martin)
Attached file Debug log

It's definitely a regression. I was on a build from July and it worked fine. After updating to a current nightly, I have to restart Thunderbird to reconnect whenever I get disconnect.

In the debug log I see: :collins.mozilla.org 903 florian :SASL authentication successful followed 20s later by ERROR :Closing link: (Thunderbird@33.219.198.77.rev.sfr.net) [Registration timeout] and the server closes the connection.

Keywords: regression

I'm not sure why manual disconnect and connect doesn't really work without a restart either, at least for freenet. (It's been flaky for long.) Would make it easer to test if that case worked :/

Priority: -- → P2
Attached patch Patch v1Splinter Review

I think this is the fix, but I haven't been able to verify it. I'd be curious if Magnus or Florian have a way that this consistently breaks to try out this patch.

I think what is happening here is that we were missing resetting part of the state (_negotiatedCAPs) which means that we would never send the CAP END command to end the registration section. This seems to match what is happening in Florian's debug log.

This also removes a declared variable (_capTimeout) that seems completely unused. It also slightly re-arranges the ordering of the reset state during gotDisconnect to match the declared ordering higher up in the file.

Assignee: nobody → clokep
Status: NEW → ASSIGNED
Attachment #9112058 - Flags: review?(martin)

Casual testing with the patch would say it's better than before at least but hard to say if it's completely fixing it.

Comment on attachment 9112058 [details] [diff] [review]
Patch v1

I confirm that this patch fixes the problem for me. Thanks for taking a look at this! :-)

I remember that on the XMPP JS implementation we had an architecture where the connection was a separate object, that we would get rid of when the account was disconnected. That would prevent this kind of bugs. Did we not have something similar for IRC?
Attachment #9112058 - Flags: feedback+

(In reply to Florian Quèze [:florian] from comment #13)

I confirm that this patch fixes the problem for me. Thanks for taking a look
at this! :-)

Thanks for testing!

I remember that on the XMPP JS implementation we had an architecture where
the connection was a separate object, that we would get rid of when the
account was disconnected. That would prevent this kind of bugs. Did we not
have something similar for IRC?

The problem is that this negotiation isn't strictly bound to the connection time. A server could also suddenly provide sasl for auth and you could "upgrade" to being authed via SASL. Not that that makes quite as much sense as for other caps, but it's possible.

Comment on attachment 9112058 [details] [diff] [review]
Patch v1

Review of attachment 9112058 [details] [diff] [review]:
-----------------------------------------------------------------

If I manually disconnect I still get into this error loop. So a manual disconnect doesn't hit this code path?
Attachment #9112058 - Flags: review?(martin) → review+

(In reply to Martin Giger [:freaktechnik] from comment #15)

Comment on attachment 9112058 [details] [diff] [review]
Patch v1

Review of attachment 9112058 [details] [diff] [review]:

If I manually disconnect I still get into this error loop. So a manual
disconnect doesn't hit this code path?

Or maybe that's a side effect of the 2s delay and I didn't wait long enough? Though the state in the account manager shouldn't let me connect while still disconnecting?

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/f83cf1dd1132
Fix IRC capability state handling between reconnects. r=freaktechnik

Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 72.0

(In reply to Florian Quèze [:florian] from comment #13)

I confirm that this patch fixes the problem for me. Thanks for taking a look
at this! :-)

Thank you for testing! (And thank you all for shepherding this through while I was on vacation!)

I remember that on the XMPP JS implementation we had an architecture where
the connection was a separate object, that we would get rid of when the
account was disconnected. That would prevent this kind of bugs. Did we not
have something similar for IRC?

We do not. The IRC code was initially written to be almost entirely stateless...unfortunately as things have evolved we've added a lot more to keep track of. We could re-evaluate whether it makes sense to have a session object or not.

(In reply to Martin Giger [:freaktechnik] from comment #16)

(In reply to Martin Giger [:freaktechnik] from comment #15)

If I manually disconnect I still get into this error loop. So a manual disconnect doesn't hit this code path?

Or maybe that's a side effect of the 2s delay and I didn't wait long enough? Though the state in the account manager shouldn't let me connect while still disconnecting?

Right, I think it should be fully disconnect before allowing a connect to happen. I'll look a bit further into this.

Regressed by: 1446689
You need to log in before you can comment on or make changes to this bug.