IRC auto re-connect doesn't (usually) work
Categories
(Thunderbird :: Instant Messaging, defect, P2)
Tracking
(Not tracked)
People
(Reporter: mkmelin, Assigned: clokep)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
I've seen this on-and-off for a longer time, but it seems to have got even worse recently.
If the IRC connection drops, the automatic re-connect doesn't work. It just keeps trying to re-connect with no success. Restarting Thunderbird is the only way to make it really connect again. (Just context menu disconnect, and then manual connect doesn't work.)
What's a good way to track this down?
Assignee | ||
Comment 1•5 years ago
|
||
Is this with a particular network or with any/all networks?
You can take a look at the debug logs (Show Account > right click on the account and "Copy Debug Logs") after it attempts to reconnect and see why the connection fails. What version have you seen this behavior with? Bug 1446689 did modify some of the login mechanisms of IRC, but even if that fails I'd expect the connection to still work, your nick might just get changed.
The debug logs shouldn't have anything too sensitive in them (it will have your nick, the server, but passwords should be removed). If you've sent any messages they would be in there too, but shouldn't be if you haven't connected!
Reporter | ||
Comment 2•5 years ago
|
||
I see it both with irc.mozilla.org and freenet. Using trunk mostly, where it seems to have got worse.
Just got it again now as I moved around the house (for a testing account on freenet, but others too):
(Thunderbird 72.0a1 (20191106150827), Gecko 72.0a1 (20191106131741) on Linux x86_64)
(120 messages omitted)
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: connect resource:///modules/socket.jsm:165)
Connecting to: chat.freenode.net:6697
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_RESOLVING)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_RESOLVED)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_CONNECTING_TO)
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:572)
onTransportStatus(STATUS_CONNECTED_TO)
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
CAP LS 302
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
NICK chatter
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: sendString resource:///modules/socket.jsm:265)
Sending:
USER Thunderbird 0 * chatter
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onStartRequest resource:///modules/socket.jsm:494)
onStartRequest
[2019-11-06 15:23:47 EET] DEBUG (@ prpl-irc: onStopRequest resource:///modules/socket.jsm:503)
onStopRequest (2152398919)
[2019-11-06 15:23:47 EET] WARN. (@ prpl-irc: onConnectionClosed file:///home/magnus/Code/tb/mozilla/obj-x86_64-pc-linux-gnu/dist/bin/components/irc.js:890)
Connection closed by server.
[2019-11-06 15:23:47 EET] LOG (@ prpl-irc: disconnect resource:///modules/socket.jsm:210)
Disconnect
Assignee | ||
Comment 3•5 years ago
|
||
That's...less useful than I had hoped. :( Essentially seems to just get no response from the server and then gets closed. I'll see if I can try to reproduce.
Reporter | ||
Comment 4•5 years ago
|
||
Could it be fallout from bug 1446689? Pure speculation, but should fit the perceived regression window.
Assignee | ||
Comment 5•5 years ago
|
||
It could be, but the logs just show no data being received from the server, which is weird. If the 3.2 CAP negotiation didn't work at all I'd expect it to fail during initial connect too.
Assignee | ||
Comment 6•5 years ago
|
||
Martin -- any idea if bug 1446689 could be related here?
Comment 7•5 years ago
|
||
(In reply to Patrick Cloke [:clokep] from comment #6)
Martin -- any idea if bug 1446689 could be related here?
The tactical answer is "it should not". I recall testing it with a bunch of servers when I wrote the patch but that's a while ago now. The spec was written to be backward compatible to what the previous spec required server to do. Now IRCds could obviously be picky, but given the amount of clients that do CAP 3.2 I'd expect big networks like freenode and imo to work fine... One thing I could see though is that SASL negotiation could fail, if I have a logic bug in my code, since it supports the 3.2 sasl negotiation, too.
Comment 8•5 years ago
|
||
It's definitely a regression. I was on a build from July and it worked fine. After updating to a current nightly, I have to restart Thunderbird to reconnect whenever I get disconnect.
Comment 9•5 years ago
|
||
In the debug log I see: :collins.mozilla.org 903 florian :SASL authentication successful
followed 20s later by ERROR :Closing link: (Thunderbird@33.219.198.77.rev.sfr.net) [Registration timeout]
and the server closes the connection.
Reporter | ||
Comment 10•5 years ago
|
||
I'm not sure why manual disconnect and connect doesn't really work without a restart either, at least for freenet. (It's been flaky for long.) Would make it easer to test if that case worked :/
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
I think this is the fix, but I haven't been able to verify it. I'd be curious if Magnus or Florian have a way that this consistently breaks to try out this patch.
I think what is happening here is that we were missing resetting part of the state (_negotiatedCAPs
) which means that we would never send the CAP END
command to end the registration section. This seems to match what is happening in Florian's debug log.
This also removes a declared variable (_capTimeout
) that seems completely unused. It also slightly re-arranges the ordering of the reset state during gotDisconnect
to match the declared ordering higher up in the file.
Reporter | ||
Comment 12•4 years ago
|
||
Casual testing with the patch would say it's better than before at least but hard to say if it's completely fixing it.
Comment 13•4 years ago
|
||
Comment on attachment 9112058 [details] [diff] [review] Patch v1 I confirm that this patch fixes the problem for me. Thanks for taking a look at this! :-) I remember that on the XMPP JS implementation we had an architecture where the connection was a separate object, that we would get rid of when the account was disconnected. That would prevent this kind of bugs. Did we not have something similar for IRC?
Comment 14•4 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #13)
I confirm that this patch fixes the problem for me. Thanks for taking a look
at this! :-)
Thanks for testing!
I remember that on the XMPP JS implementation we had an architecture where
the connection was a separate object, that we would get rid of when the
account was disconnected. That would prevent this kind of bugs. Did we not
have something similar for IRC?
The problem is that this negotiation isn't strictly bound to the connection time. A server could also suddenly provide sasl for auth and you could "upgrade" to being authed via SASL. Not that that makes quite as much sense as for other caps, but it's possible.
Comment 15•4 years ago
|
||
Comment on attachment 9112058 [details] [diff] [review] Patch v1 Review of attachment 9112058 [details] [diff] [review]: ----------------------------------------------------------------- If I manually disconnect I still get into this error loop. So a manual disconnect doesn't hit this code path?
Comment 16•4 years ago
•
|
||
(In reply to Martin Giger [:freaktechnik] from comment #15)
Comment on attachment 9112058 [details] [diff] [review]
Patch v1Review of attachment 9112058 [details] [diff] [review]:
If I manually disconnect I still get into this error loop. So a manual
disconnect doesn't hit this code path?
Or maybe that's a side effect of the 2s delay and I didn't wait long enough? Though the state in the account manager shouldn't let me connect while still disconnecting?
Comment 17•4 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/f83cf1dd1132
Fix IRC capability state handling between reconnects. r=freaktechnik
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Comment 18•4 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #13)
I confirm that this patch fixes the problem for me. Thanks for taking a look
at this! :-)
Thank you for testing! (And thank you all for shepherding this through while I was on vacation!)
I remember that on the XMPP JS implementation we had an architecture where
the connection was a separate object, that we would get rid of when the
account was disconnected. That would prevent this kind of bugs. Did we not
have something similar for IRC?
We do not. The IRC code was initially written to be almost entirely stateless...unfortunately as things have evolved we've added a lot more to keep track of. We could re-evaluate whether it makes sense to have a session object or not.
(In reply to Martin Giger [:freaktechnik] from comment #16)
(In reply to Martin Giger [:freaktechnik] from comment #15)
If I manually disconnect I still get into this error loop. So a manual disconnect doesn't hit this code path?
Or maybe that's a side effect of the 2s delay and I didn't wait long enough? Though the state in the account manager shouldn't let me connect while still disconnecting?
Right, I think it should be fully disconnect before allowing a connect to happen. I'll look a bit further into this.
Description
•