Closed Bug 1706377 Opened 3 years ago Closed 9 months ago

Request fail for 10 seconds on network switch (MacOs / Linux)

Categories

(Core :: Networking: HTTP, defect, P2)

Firefox 88
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
120 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox116 --- wontfix
firefox120 --- fixed
firefox121 --- fixed

People

(Reporter: ruihildt, Assigned: kershaw)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(7 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0

Steps to reproduce:

Scenario 1:
0 - Load a page (In videos linked, https://mozilla.org)
1 - Change my network or vpn status (enable/disable)
2 - Reload the page

Scenario 2:
0 - Load a page (In videos linked, https://mozilla.org)
1 - Change my network or vpn status (enable/disable)
2 - Wait for more than 10 seconds
3 - Reload the page

Actual results:

Scenario 2:

  • The page doesn't reload, and no errors are returned
  • A second refresh load the page correctly

Scenario 2:

  • The page loads correctly

Expected results:

The page should always reload instantly.

Please note that this behavior is reproducible in Firefox MacOs/Linux, but NOT in Windows. This behavior is not observable in Chromium at all.

I first encountered this behavior when I was doing request in a webextension popup.

Attached video enable-disable-vpn.webm
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Version: Firefox 87 → Firefox 88

The Bugbug bot thinks this bug should belong to the 'WebExtensions::Untriaged' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Product: Firefox → WebExtensions

I reverted the change from Bugbug, this is not specific to webextension.

Product: WebExtensions → Firefox

Hi, I was able to reproduce this issue as well in Firefox Release 88, Beta 89.0b2 and our latest Nightly build 90.0a1 (2021-04-21) using Nordvpn to connect and disconnect and then reloading the page in Firefox.

Severity: -- → S3
Status: UNCONFIRMED → NEW
Has Regression Range: --- → no
Has STR: --- → yes
Component: Untriaged → Networking
Ever confirmed: true
Product: Firefox → Core

I was able to reproduce this with ExpressVPN.
Depending on when the request happens, it could be:

  1. Before the DNS settings have stabilized, and getaddrinfo returns an error (shows error page)
  2. Before the old connections have stopped working (triggers this bug)
  3. After the old connection has stopped working - in which case we create a new one, and it works.

I'll take a look to see if we can better improve pruning dead connections.

Assignee: nobody → valentin.gosu
Blocks: necko-vpn
Priority: -- → P2
Whiteboard: [necko-triaged]

Do you have any news about this bug?

Flags: needinfo?(valentin.gosu)

Sorry for missing the needinfo.
We recently fixed a similar issue in bug 1647985, but that one was only for DoH connections.
From what I can tell, the main issue is that the H2 connection is broken, but it takes a while to detect that.

https://searchfox.org/mozilla-central/rev/2d678a843ceab81e43f7ffb83212197dc10e944a/netwerk/protocol/http/nsHttpHandler.cpp#1344-1356

// The amount of idle seconds on a http2 connection before initiating a
// server ping. 0 will disable.
if (PREF_CHANGED(HTTP_PREF("http2.ping-threshold"))) {
  mSpdyPingThreshold = PR_SecondsToInterval((uint16_t)clamped(
      StaticPrefs::network_http_http2_ping_threshold(), 0, 0x7fffffff));
}

// The amount of seconds to wait for a http2 ping response before
// closing the session.
if (PREF_CHANGED(HTTP_PREF("http2.ping-timeout"))) {
  mSpdyPingTimeout = PR_SecondsToInterval((uint16_t)clamped(
      StaticPrefs::network_http_http2_ping_timeout(), 0, 0x7fffffff));
}

the ping threshold seems a bit large, at 58 seconds, but it does get trimmed down to 5 seconds here.
The problem is that we only seem to call it when VerifyTraffic is called which we only seem to do when a network change occurs. However, with VPNs we might miss network change events, or the network changes might occur later than our event actually fires.

Kershaw, am I missing something here? Do we really have a bug in the sense that we're only sending the pings when triggered by a network change event?

Flags: needinfo?(valentin.gosu) → needinfo?(kershaw)
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

The problem is that we only seem to call it when VerifyTraffic is called which we only seem to do when a network change occurs. However, with VPNs we might miss network change events, or the network changes might occur later than our event actually fires.

I think network change event is necessary to trigger VerifyTraffic. I think there is nothing we can do if we can't detect network change reliably.
However, I do find a problem when using Firefox with mozilla VPN. The problem is that when mozilla VPN is enabled/disabled, we do detect a network change event, but the NS_NETWORK_LINK_DATA is up, not cahnged. In the end, VerifyTraffic is not called, so we have a broken h2 connection. To fix this, I think we should perform VerifyTraffic every time we receive a network change event, regardless of NS_NETWORK_LINK_DATA.

I am not sure if it's possible that the real network changes after event fires, but maybe we can start another timer to perform VerifyTraffic after certain seconds.

Another thing we could do is reducing http2.ping-timeout. It's 8s currently, which seems a bit long. Maybe 3s or 5s would be better.

Kershaw, am I missing something here? Do we really have a bug in the sense that we're only sending the pings when triggered by a network change event?

It seems that we don't have this kind of bug before, but I might be wrong.

Flags: needinfo?(kershaw)

I wonder if http2.ping-threshold might also be a problem. It's currently at 58 seconds, which means absent any other events or traffic, we wait for 58 seconds before sending a ping.
We could reduce it (at least for desktop) to something more reasonable - 15, 20 seconds - at the expense of reduced battery life though I'm not too worried about that.

Unfortunatelly I can't test this at the moment, as mozvpn doesn't work on the latest ubuntu :(
@Kershaw, would you be able to take this?

Flags: needinfo?(kershaw)
Assignee: valentin.gosu → kershaw
Flags: needinfo?(kershaw)

For mozvpn, the data received in network change event is "up", not "changed", so we should call VerifyTraffic for every event for safe.
This patch also reduces http2.ping-timeout and http2.ping-threshold, since the original values are too long.

Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3765a21fa94c
Call VerifyTraffic for every network change event, r=necko-reviewers,valentin

(In reply to Narcis Beleuzu [:NarcisB] from comment #13)

Backed out for wpt failure on 001.html

Backout link: https://hg.mozilla.org/integration/autoland/rev/4cc8e322f8832e379a9c7f48a29878cf0963d6b6
Log link: https://treeherder.mozilla.org/logviewer?job_id=418296116&repo=autoland&lineNumber=2291

This is caused by the pref change to network.http.http2.ping-threshold. It seems that changing this value could break something, so we need to be very careful. I'll revert the pref changes and file another bug to investigate whether we can adjust these values.

Flags: needinfo?(kershaw)
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c2039dcc9e3
Call VerifyTraffic for every network change event, r=necko-reviewers,valentin
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

Hi @ruihildt can you please try our latest Firefox 116 Build again and see if the issue still occurs on your end ? My installed NordVpn keeps returning .sock not found and I cant reproduce the issue with the Addon version of nordVPN for Firefox.

Here is where you can find the Firefox 116 BETA build: https://www.mozilla.org/en-US/firefox/channel/desktop/
Please let us know if the issue still occurs on your end.

Flags: needinfo?(ruihildt)

(In reply to Rares Doghi, Desktop QA from comment #17)

Hi @ruihildt can you please try our latest Firefox 116 Build again and see if the issue still occurs on your end ? My installed NordVpn keeps returning .sock not found and I cant reproduce the issue with the Addon version of nordVPN for Firefox.

Here is where you can find the Firefox 116 BETA build: https://www.mozilla.org/en-US/firefox/channel/desktop/
Please let us know if the issue still occurs on your end.

You mention you used the VPN extension, but the error here is for system wide VPN and not the extension.

I tried with Firefox Beta 116.0b6 (using the Developer Edition) and I still have the same issue. Should I reopen the issue?

Status: RESOLVED → REOPENED
Flags: needinfo?(ruihildt) → needinfo?(rdoghi)
Resolution: FIXED → ---

I tried using the System version of NordVPN but it keeps returning .sock not found, tried uninstalling it and installing it again but it still wont work its why I tried the Extension version. We need someone with an installed version of VPN to try this issue @Kershaw Chang ? can you please take a look ? it seems that the issue still occurs in our latest Beta Dev Edition.

Flags: needinfo?(rdoghi) → needinfo?(kershaw)

(In reply to ruihildt from comment #18)

(In reply to Rares Doghi, Desktop QA from comment #17)

Hi @ruihildt can you please try our latest Firefox 116 Build again and see if the issue still occurs on your end ? My installed NordVpn keeps returning .sock not found and I cant reproduce the issue with the Addon version of nordVPN for Firefox.

Here is where you can find the Firefox 116 BETA build: https://www.mozilla.org/en-US/firefox/channel/desktop/
Please let us know if the issue still occurs on your end.

You mention you used the VPN extension, but the error here is for system wide VPN and not the extension.

I tried with Firefox Beta 116.0b6 (using the Developer Edition) and I still have the same issue. Should I reopen the issue?

Hi, could you try to record a http log for this?
Please use the steps below:

  1. Start logging (make sure you select Logging to a file).
  2. Load https://mozilla.org
  3. Change network status by VPN
  4. Load the same page again
  5. Stop logging

Thanks.

Flags: needinfo?(kershaw) → needinfo?(ruihildt)
(In reply to Kershaw Chang [:kershaw] from comment #20)
> (In reply to ruihildt from comment #18)
> > (In reply to Rares Doghi, Desktop QA from comment #17)
> > > Hi @ruihildt can you please try our latest Firefox 116 Build again and see if the issue still occurs on your end ? My installed NordVpn keeps returning  .sock not found and I cant reproduce the issue with the Addon version of nordVPN for Firefox.
> > > 
> > > Here is where you can find the Firefox 116 BETA build: https://www.mozilla.org/en-US/firefox/channel/desktop/
> > > Please let us know if the issue still occurs on your end.
> > 
> > You mention you used the VPN extension, but the error here is for system wide VPN and not the extension.
> > 
> > I tried with Firefox Beta 116.0b6 (using the Developer Edition) and I still have the same issue. Should I reopen the issue?
> 
> Hi, could you try to record a [http log](https://firefox-source-docs.mozilla.org/networking/http/logging.html#using-about-logging) for this?
> Please use the steps below:
> 1. Start logging (make sure you select `Logging to a file`).
> 2. Load `https://mozilla.org`
> 3. Change network status by VPN
> 4. Load the same page again
> 5. Stop logging
> 
> Thanks.

Not sure how to attach logs, so I'll just paste them in a block code below.

Here are the logs in a regular load:
```

```
Flags: needinfo?(ruihildt) → needinfo?(kershaw)

Sorry not familiar with the interface, and can't edit my previous comment.

Loading mozilla.org regularly: https://bugzilla.mozilla.org/attachment.cgi?id=9345740
Loading mozilla.org after disconnecting VPN: https://bugzilla.mozilla.org/attachment.cgi?id=9345741

Flags: needinfo?(kershaw)

Hi reporter,

Thanks fro the log.
However, it seems that the log is not completed. I only saw the HTTP request to load https://www.mozilla.org/en-US/ once.
Based on the steps in comment #0, there should be another request to load https://www.mozilla.org/en-US/ after VPN change, but I didn't see it in the log.

Could you try to record a log again?
Thanks.

Flags: needinfo?(ruihildt)

Here it is.

Attachment #9345741 - Attachment is obsolete: true
Flags: needinfo?(ruihildt) → needinfo?(kershaw)

The basic concept here is adding a pending list in ConnectionEntry and put connections in it when VerifyTraffic() is called.
By doing this, we will always create new connections after a network change event. For the old connections, which might be still alive after network change, we put them into the pending list.
The connections in the pending list will keep working until their transactions are done.

Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fec7c6056de9
Always create new connection after network change, r=necko-reviewers,valentin
Status: REOPENED → RESOLVED
Closed: 1 year ago10 months ago
Resolution: --- → FIXED

Hello! I have tried to reproduce the issue on Ubuntu 22.04 LTS with Firefox 114.0, 90.0a1(2021-04-20) unfortunately I wasn't able to do so.

Ruihildt could you please take a look if the issue is fixed in the latest nightly? Here is a link: https://www.mozilla.org/en-US/firefox/channel/desktop/

Flags: needinfo?(ruihildt)

(In reply to Negritas Sergiu, Desktop QA from comment #29)

Hello! I have tried to reproduce the issue on Ubuntu 22.04 LTS with Firefox 114.0, 90.0a1(2021-04-20) unfortunately I wasn't able to do so.

Ruihildt could you please take a look if the issue is fixed in the latest nightly? Here is a link: https://www.mozilla.org/en-US/firefox/channel/desktop/

It is not fixed in Firefox 121.0a1. Reopening the issue.

I have just tried on Ubuntu 22.04 LTS/Fedora 38 with Mullvad VPN, and on macOS 13.5 with both Mullvad VPN and NordVPN, with the same errors as always.

Status: RESOLVED → REOPENED
Flags: needinfo?(ruihildt)
Resolution: FIXED → ---

Hi Reporter,

Could you try to flip this pref network.http.http2.move_to_pending_list_after_network_change to true and see if you still can reproduce this issue?
If yes, may I ask you to record a http log again?

Thanks.

Flags: needinfo?(kershaw) → needinfo?(ruihildt)

(In reply to Kershaw Chang [:kershaw] from comment #31)

Hi Reporter,

Could you try to flip this pref network.http.http2.move_to_pending_list_after_network_change to true and see if you still can reproduce this issue?
If yes, may I ask you to record a http log again?

Thanks.

Happy to report flipping the pref to true fixes the issue.

Flags: needinfo?(ruihildt)
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bfe1eac6228
Enable network.http.http2.move_to_pending_list_after_network_change on early beta, r=necko-reviewers,valentin
Status: REOPENED → RESOLVED
Closed: 10 months ago9 months ago
Resolution: --- → FIXED

Kershaw, at which point can we say that it's safe to let the pref ride the trains to release?

Flags: needinfo?(kershaw)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #36)

Kershaw, at which point can we say that it's safe to let the pref ride the trains to release?

I think it's about time to ship this. I filed bug 1876045 for enabling it.

Flags: needinfo?(kershaw)
Blocks: 1876045
No longer blocks: necko-pref-flips
See Also: → necko-pref-flips
Regressed by: 1884349

Set release status flags based on info from the regressing bug 1884349

No longer regressed by: 1884349
Regressions: 1884349
Keywords: regression
Target Milestone: 116 Branch → 120 Branch
Component: Networking → Networking: HTTP
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: