Closed Bug 1690415 Opened 3 years ago Closed 3 years ago

IRC Chat stuck on "Connecting..." after a network outage. Should show timeout

Categories

(Chat Core :: General, defect)

x86_64
Windows 10
defect

Tracking

(thunderbird_esr91+ fixed, thunderbird92 affected)

RESOLVED FIXED
93 Branch
Tracking Status
thunderbird_esr91 + fixed
thunderbird92 --- affected

People

(Reporter: diomede979, Assigned: diomede979, Mentored, NeedInfo)

Details

(Whiteboard: [support])

Attachments

(1 file, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36

Steps to reproduce:

connect to an IRC server, disconnect Internet connection from the OS (for example by disabling the connection) until the IRC client detects disconnection and then reconnect.

Actual results:

the IRC connections stuck in "Disconnecting..." state. Then everytime I do need to force the reconnection manually.

Expected results:

as in the older versions, after a disconnection a countdown starts until the reconnection.

Whiteboard: DUPEME

Could not find a dupe or reproduce using 78.7.1 on Fedora 33 Workstation.

Connected to Chat.
Turned off my wireless connection.
Chat disconnected as expected.
Turned my wireless connection on.
Chat reconnected.

I should reconnect while the application is still showing disconnecting?

If it isn't completely disconnected why should it reconnect?

Flags: needinfo?(diomede979)
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64

(In reply to WaltS48 [:walts48] from comment #1)

Could not find a dupe or reproduce using 78.7.1 on Fedora 33 Workstation.

Connected to Chat.
Turned off my wireless connection.
Chat disconnected as expected.
Turned my wireless connection on.
Chat reconnected.

I should reconnect while the application is still showing disconnecting?

If it isn't completely disconnected why should it reconnect?

In my case, che application stucks in "Disconnecting..." state, even if network come back again.
In older versions, a countdown starts and when fire the connection starts again.

Flags: needinfo?(diomede979)

I'm connecting to chat.freenode.net using IRC.

I tested with 68.11.0 on Linux and did see that countdown and reconnected when I either removed my wireless dongle and reinserted it or just disconnected my user.

Testing with 78.7.1 on Windows 10 and disconnecting my wireless account.

When I reconnect the Wi-Fi account, I get an "Error: Peer reports it experienced an internal error" message in my Instant messaging status dialog and the account does not reconnect. I need to click the "Connect" button.

In the Error Console I see:

Bad certificate or SSL connection for (username)@chat.freenode.net:
Peer reports it experienced an internal error. irc.jsm:921

I removed my chat name from that error.

Do you see something similar?

Flags: needinfo?(diomede979)

(In reply to WaltS48 [:walts48] from comment #3)

I'm connecting to chat.freenode.net using IRC.

I tested with 68.11.0 on Linux and did see that countdown and reconnected when I either removed my wireless dongle and reinserted it or just disconnected my user.

Testing with 78.7.1 on Windows 10 and disconnecting my wireless account.

When I reconnect the Wi-Fi account, I get an "Error: Peer reports it experienced an internal error" message in my Instant messaging status dialog and the account does not reconnect. I need to click the "Connect" button.

In the Error Console I see:

Bad certificate or SSL connection for (username)@chat.freenode.net:
Peer reports it experienced an internal error. irc.jsm:921

I removed my chat name from that error.

Do you see something similar?

how to show status dialog? I don't have it.

Flags: needinfo?(diomede979)

In the Chat tab click the "Show Accounts" button.

Do you see the same message in the Error Console that I do about a bad certificate?

(In reply to WaltS48 [:walts48] from comment #6)

In the Chat tab click the "Show Accounts" button.

Do you see the same message in the Error Console that I do about a bad certificate?

No. But i guess it's just a problem related to Freenode service. Try with some other IRC server.

What one are you having a problem with?

(In reply to WaltS48 [:walts48] from comment #8)

What one are you having a problem with?

I am having problems with all of them, for example: irc.azzurra.org

Well, I've already wasted enough of my time on this and have documented my results.

For review:

  • I didn't find a duplicate.
  • I have no problem reconnecting using Thunderbird release and beta on Linux.
  • I need to click the "Connect" button in the Instant messaging status dialog using those Thunderbird versions on Windows 10.
  • I never get stuck in "Disconnecting..." when manually forcing a disconnection.

I guess I'll have to wait for an outage and hope I'm using chat when it occurs.

Do you often have outages in your location?
Does it happen in safe mode, a test profile, with any AV applications disabled?

Whiteboard: DUPEME → [support]

(In reply to WaltS48 [:walts48] from comment #10)

Well, I've already wasted enough of my time on this and have documented my results.

For review:

  • I didn't find a duplicate.
  • I have no problem reconnecting using Thunderbird release and beta on Linux.
  • I need to click the "Connect" button in the Instant messaging status dialog using those Thunderbird versions on Windows 10.
  • I never get stuck in "Disconnecting..." when manually forcing a disconnection.

I guess I'll have to wait for an outage and hope I'm using chat when it occurs.

Do you often have outages in your location?
Does it happen in safe mode, a test profile, with any AV applications disabled?

there is my mistake, I get stuck on "Connecting...", and not "Disconnecting...". In the past, everytime a network outage occurs, a red countdown starts. Now I do not see that countdown and the status stucks in "Connecting...", then I do need to click on "Disconnect" and then "Connect". In this way it comes back connected. It happens with all of my configured server.

Summary: IRC Chat stucks in "Disconnecting..." after a network outage. → IRC Chat stucks in "Connecting..." after a network outage.

Could not reproduce with chat.freenode.net in my testing of the 78.8.0 release candidate on Windows 10 or Linux.

Summary: IRC Chat stucks in "Connecting..." after a network outage. → IRC Chat stuck on "Connecting..." after a network outage.

(In reply to WaltS48 [:walts48] from comment #12)

Could not reproduce with chat.freenode.net in my testing of the 78.8.0 release candidate on Windows 10 or Linux.

I have the problem on 78.7.1 release.

Please reread comment #1, comment #3 and comment #10.

You have the problem.
I can't find any other reports.
You have a support issue not a bug IMHO.
It would be nice if you answered questions asked.

How often do you experience a network outage?

Flags: needinfo?(diomede979)

It could be interesting to take a look at the protocols logs (via "Show Accounts" on the "Chat" tab, then right click on the account and click "Copy Debug Log"). It is possible the server is sending odd data we're not expecting.

Note that this log could include "sensitive" data (it will contain at least the domain you're trying to connect to).

(In reply to Patrick Cloke [:clokep] from comment #15)

It could be interesting to take a look at the protocols logs (via "Show Accounts" on the "Chat" tab, then right click on the account and click "Copy Debug Log"). It is possible the server is sending odd data we're not expecting.

Note that this log could include "sensitive" data (it will contain at least the domain you're trying to connect to).

Connection reset.
[2/24/2021, 8:36:56 PM] DEBUG (@ prpl-irc: disconnect resource:///modules/socket.jsm:216)
Disconnect
[2/24/2021, 8:36:57 PM] DEBUG (@ prpl-irc: connect resource:///modules/socket.jsm:171)
Connecting to: apple.bnc4free.com:1339
[2/24/2021, 8:36:57 PM] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:560)
onTransportStatus(STATUS_RESOLVING)
[2/24/2021, 8:36:57 PM] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:560)
onTransportStatus(STATUS_RESOLVED)
[2/24/2021, 8:36:57 PM] DEBUG (@ prpl-irc: onStartRequest resource:///modules/socket.jsm:500)
onStartRequest
[2/24/2021, 8:36:57 PM] DEBUG (@ prpl-irc: onStopRequest resource:///modules/socket.jsm:509)
onStopRequest (2152398878)

Flags: needinfo?(diomede979)

After any further investigation, I discovered that it is a behavior caused by a network adapter named "vEthernet (Default Switch)" component of Hyper-V on Windows 10.
Looks like that Thunderbird, when disconnecting from internet, recognizes this adapter like a working internet connection and stuck on it trying to wait connection. This network adapter can't be disabled and must be removed from Windows component.
On the following link there are instruction on how to disable it.
(https://superuser.com/questions/1282014/how-to-remove-all-the-vethernet-default-switch-once-and-for-all).

I don't know if it is still a Thunderbird bug, in the sense that Thunderbird should be able to recognize a working internet connection before trying to connect trough it. But for sure now we know what is causing the issue and how to workaround that.

Thanks for clarifying. Let's close then.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID

I'm sorry, but this bug han't been resolved. I think that "Connecting..." state should have a timeout. Right now it keeps this state

Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Severity: -- → S4
Summary: IRC Chat stuck on "Connecting..." after a network outage. → IRC Chat stuck on "Connecting..." after a network outage. Should show timeout

I investigated about the reason of the issue.

this is the DEBUG log when you try to connect to a server when you are offline:


(Thunderbird 78.12.0 (20210712120515), Gecko 78.12.0 (20210712120515) on Windows NT 10.0; Win64; x64)
[8/7/2021, 5:48:10 PM] DEBUG (@ prpl-irc: connect resource:///modules/socket.jsm:171)
Connecting to: **********
[8/7/2021, 5:48:10 PM] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:560)
onTransportStatus(STATUS_RESOLVING)
[8/7/2021, 5:48:10 PM] DEBUG (@ prpl-irc: onTransportStatus resource:///modules/socket.jsm:560)
onTransportStatus(STATUS_RESOLVED)
[8/7/2021, 5:48:10 PM] DEBUG (@ prpl-irc: onStartRequest resource:///modules/socket.jsm:500)
onStartRequest
[8/7/2021, 5:48:10 PM] DEBUG (@ prpl-irc: onStopRequest resource:///modules/socket.jsm:509)
onStopRequest (2152398878)

according to https://searchfox.org/comm-central/source/suite/chatzilla/js/lib/connection-xpcom.js#9

2152398878 code number which is in posted DEBUG log corresponds to "NS_ERROR_UNKNOWN_HOST = NS_ERROR_MODULE_NETWORK + 30"

The problem of the is here:

https://searchfox.org/comm-central/source/chat/modules/socket.jsm#460

is needed to manage the "NS_ERROR_UNKNOWN_HOST" state which occurs when a connection tries to connect when offline.
I suggest to start a timeout retry or disconnect it when this status event happens.

In this way we will not see che connection stuck on "connecting..." state even when the machine goes online.

thank you.

Assignee: nobody → diomede979
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attached patch 1690415-add-unknown-host.patch (obsolete) — Splinter Review

Comment on attachment 9235847 [details] [diff] [review]
1690415-add-unknown-host.patch

I added Patrick as reviewer to let it go further.

Attachment #9235847 - Flags: review?(clokep)

diomede979 (aka :wallbroken), congratulations and thank you for offering your first patch after our mentoring sessions!
The patch looks formally correct at first glance, review will check if it does the right thing.

Mentor: bugzilla2007
Comment on attachment 9235847 [details] [diff] [review]
1690415-add-unknown-host.patch

Review of attachment 9235847 [details] [diff] [review]:
-----------------------------------------------------------------

Just a drive-by comment in response to assignee's question on Matrix...

::: chat/modules/socket.jsm
@@ +460,5 @@
>        this.onConnectionReset();
>      } else if (aStatus == NS_ERROR_NET_TIMEOUT) {
>        this.onConnectionTimedOut();
> +    } else if (aStatus == NS_ERROR_UNKNOWN_HOST) {
> +      this.onConnectionReset();

On Matrix, :wallbroken asked:
> just a question: in my development, i added another `else if()` statement instead of adding just an `or` condition in already existing first one. line 463 could have [been] directly added in [line] 459. that's the same?

The net effect will be the same, but for the sake of reducing code redundancy, if both conditions are meant to run the same code, you should indeed combine your new condition with the existing condition. That said, I'm not sure if `NS_ERROR_UNKNOWN_HOST` should really call `this.onConnectionReset()` which looks specific to `NS_ERROR_NET_RESET` (just like `NS_ERROR_NET_TIMEOUT` has `this.onConnectionTimedOut();`). Your reviewer will tell you more.
Comment on attachment 9235847 [details] [diff] [review]
1690415-add-unknown-host.patch

Review of attachment 9235847 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good overall, I think this mostly just needs a comment!

Could you update the commit entry to say "Bug 1690415 - Handle NS_ERROR_UNKNOWN_HOST error during connection by retrying. r=clokep"

Thank you!

::: chat/modules/socket.jsm
@@ +460,5 @@
>        this.onConnectionReset();
>      } else if (aStatus == NS_ERROR_NET_TIMEOUT) {
>        this.onConnectionTimedOut();
> +    } else if (aStatus == NS_ERROR_UNKNOWN_HOST) {
> +      this.onConnectionReset();

Combining it with the `NS_ERROR_NET_RESET` clause probably makes sense.

Can you add a comment saying why we do this? I think the following would be accurate: "If the host cannot be resolved, reset the connection to attempt to reconnect."  Does that sound reasonable to you?

I had to remind myself a bit about how all these codes work, a connection reset gets handled in the IRC code and will cause a message to the user ("Lost connection with server"), it will kill the connection with `ERROR_NETWORK_ERROR`, which should start the reconnect timer. So this seems like the correct thing to do. (XMPP does similar things.)
Attachment #9235847 - Flags: review?(clokep)
Attached patch 1690415-add-unknown-host.patch (obsolete) — Splinter Review

Updated the patch.

Attachment #9235847 - Attachment is obsolete: true
Attachment #9236252 - Flags: review?(clokep)
Comment on attachment 9236252 [details] [diff] [review]
1690415-add-unknown-host.patch

Review of attachment 9236252 [details] [diff] [review]:
-----------------------------------------------------------------

::: chat/modules/socket.jsm
@@ +457,5 @@
>      }
>      this.disconnected = true;
> +    // If the host cannot be resolved, reset the connection to attempt to
> +    // reconnect.
> +    if (aStatus == NS_ERROR_NET_RESET || NS_ERROR_UNKNOWN_HOST) {

This will not work as expected and always be true.

Now with correct check.

Attachment #9236252 - Attachment is obsolete: true
Attachment #9236252 - Flags: review?(clokep)
Attachment #9236257 - Flags: review?(clokep)
Comment on attachment 9236257 [details] [diff] [review]
1690415-add-unknown-host.patch

Review of attachment 9236257 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like it should work!
Attachment #9236257 - Flags: review?(clokep) → review+
Target Milestone: --- → 93 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/2675ff5a0d6b
Handle NS_ERROR_UNKNOWN_HOST error during connection by retrying. r=clokep

Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

Time for uplift requests? NI'ing the reviewer since the assignee might not be familiar with the process.

Flags: needinfo?(diomede979)
Flags: needinfo?(clokep)
Flags: needinfo?(diomede979)
Flags: needinfo?(clokep)
Component: Instant Messaging → General
Product: Thunderbird → Chat Core

Comment on attachment 9236257 [details] [diff] [review]
1690415-add-unknown-host.patch

[Approval Request Comment]
Regression caused by (bug #): N/A
User impact if declined: In some situations chat accounts might not reconnected properly.
Testing completed (on c-c, etc.): This has been on Daily for a few weeks and is in TB 93.0b1.
Risk to taking this patch (and alternatives if risky): I don't see how this can make the situation work -- it attempts to reconnect in more situations.

Attachment #9236257 - Flags: approval-comm-esr91?

Comment on attachment 9236257 [details] [diff] [review]
1690415-add-unknown-host.patch

[Triage Comment]
Approved for esr91

Attachment #9236257 - Flags: approval-comm-esr91? → approval-comm-esr91+

I don't see how this can make the situation work

I assume you mean "worse"

(In reply to Wayne Mery (:wsmwk) from comment #34)

I don't see how this can make the situation work

I assume you mean "worse"

Yes, sorry about that!

diomede979, please test 91.1.1 is now available.

Flags: needinfo?(diomede979)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: