Open Bug 1789872 Opened 2 years ago Updated 1 year ago

The “User not connected to internet” error state can’t be triggered when "connected" additional virtual network adapters are present

Categories

(Firefox :: Firefox View, defect, P3)

Firefox 106
Desktop
Unspecified
defect

Tracking

()

REOPENED
Tracking Status
firefox106 --- affected

People

(Reporter: mberlinger, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [fidefe-2022-mr1-firefox-view])

Attachments

(2 files)

Precondition

  • The user is connected to sync on desktop and on a mobile device;

Affected versions

  • 106.0a1 (2022-09-08)

Tested platforms

  • Affected platforms: windows 10
  • Unaffected platforms: macOS 10.14.6, Ubuntu 22.04

Steps to reproduce

  1. Be in the Firefox view tab;
  2. Disconnect from the internet;
  3. Observe the “Tab pickup” section;

Expected result

  • The "Check your internet connection/ If you're using a firewall or proxy, check that Firefox has permission to access the web." error message is displayed with "Try again" blue button;

Actual result

  • The "We’re having trouble syncing/ Firefox can’t reach the syncing service right now./ Try again in a few moments." error message is displayed with the "Try again" blue button.
Severity: S4 → S2
Priority: -- → P1
Severity: S2 → S3
Priority: P1 → P2

(In reply to Maria Berlinger [:mberlinger], Services QA from comment #0)

  1. Disconnect from the internet;

Can you elaborate on how you're doing this specifically? Also, after this, what does about:networking#networkid look like?

Flags: needinfo?(maria.berlinger)

Hello,
Sorry for the delay.
When I disconnect from the internet, I do that by clicking the "Disconnect" button from my wi-fi connection.
Attaching a screenshot with the about:networking#networkid.

Flags: needinfo?(maria.berlinger)

(In reply to Maria Berlinger [:mberlinger], Services QA from comment #0)

Precondition

  • The user is connected to sync on desktop and on a mobile device;

Affected versions

  • 106.0a1 (2022-09-08)

Tested platforms

  • Affected platforms: windows 10
  • Unaffected platforms: macOS 10.14.6, Ubuntu 22.04

Steps to reproduce

  1. Be in the Firefox view tab;
  2. Disconnect from the internet;
  3. Observe the “Tab pickup” section;

Expected result

  • The "Check your internet connection/ If you're using a firewall or proxy, check that Firefox has permission to access the web." error message is displayed with "Try again" blue button;

Actual result

  • The "We’re having trouble syncing/ Firefox can’t reach the syncing service right now./ Try again in a few moments." error message is displayed with the "Try again" blue button.

Hi, Maria! I'm unable to reproduce this on my Windows 11 machine. I've connected to sync on desktop and on my mobile device, loaded the Firefox View tab, disconnected my WiFi, and I see the expected "Check your internet connection" message. Are you still seeing this issue?

Flags: needinfo?(maria.berlinger)

This is on Windows 11 using Nightly 107.0a1 (2022-09-20)

Hello,
Yes, I am still able to reproduce it on Windows 10 using the latest Nightly 107.0a1 (2022-09-21).
As I have mentioned in the issue, this is not reproducible on macOS or Linux, just on Windows 10.

Flags: needinfo?(maria.berlinger)

(In reply to Maria Berlinger [:mberlinger], Services QA from comment #5)

Hello,
Yes, I am still able to reproduce it on Windows 10 using the latest Nightly 107.0a1 (2022-09-21).
As I have mentioned in the issue, this is not reproducible on macOS or Linux, just on Windows 10.

Gotcha, I'm gonna let someone with access to a Windows 10 machine grab this one in that case.

Assignee: nobody → tgiles
Status: NEW → ASSIGNED
Depends on: 1793498

This bug is blocked by Bug 1793498. For some reason on Windows 10, even when there is no active internet network link, our networking code is still determining that there is an active link. The Firefox View code is correctly listening to "network:offline-status-changed" but that event doesn't fire on Windows 10 when disconnecting from an active internet connection. For example, I put a breakpoint in the browser.js code that also listening to "network:offline-status-changed" but this code did not execute either when disconnecting from an active network connection.

My guess is this will become a dupe of Bug 1793498, but lets keep this open to verify once a fix has landed for the blocking issue.

Priority: P2 → P3

Based on the discussion over on Bug 1793498, this bug and that one only seem to affect people with virtual network adapters and/or certain VPNs. We don't have telemetry to determine how many people are affected by this, but we think the number is not high.

This bug is tricky since we don't have an accurate/real-time way to determine if we're able to access the internet through one of the available network adapters on a particular machine. I previously thought that if there was an active link, that means we are able to connect to the internet, however this isn't the case. If we have an active link, that means there's a network adapter that could be used to connect to the internet. If we wanted to check if a particular Firefox instance is connected to the internet, we could use nsINetworkConnectivityService but as :kershaw mentions in Bug 1793498 Comment 15

nsINetworkConnectivityService tells whether Firefox can successfully get the HTTP response back from http://detectportal.firefox.com. This information might be incorrect and not reliable, since there could be a race between receiving HTTP response and network change.

I wonder what we could do as an alternative solution in the mean time. I just ran a quick experiment and it looks like we could listen to when the network ID changes and then check the network connectivity service and see if we're connected. This works for the case where the virtual adapter is a valid link (and its mLinkUp attribute is true) which shows the correct "Check your internet connection" message. I don't know what edge cases come with using the network connectivity service though, like we already know using this service might introduce race conditions. I also don't know how we would test this case in automation, I've never needed to mock a network adapter before.

In bug 1793754 comment 10 Mark suggested they removed captive portal support from sync. Mark, was that using nsINetworkConnectivityService? Can you elaborate a bit on how you were using it and why it was removed? Off-hand it sounds like maybe we shouldn't try to introduce it here...

Flags: needinfo?(markh)
Summary: The “User not connected to internet” error state can’t be triggered on Windows → The “User not connected to internet” error state can’t be triggered on Windows when "connected" additional virtual network adapters are present

Apparently I lied - I think I was misremembering bug 1420802 - sorry for the miscommunication.

Bug reports about this include bug 1449796 and bug 1740972, and some others I can't find which I would have moved to the Networking component - there are quite a few bugs about this in that component and at least one in the VPN component.

But yeah - sync still has that check in-place here - what we removed was the check using nsINetworkLinkService, but I don't think we've ever looked at nsINetworkConnectivityService

Flags: needinfo?(markh)

Shifting this one back to a P2 designation so it can be tracked for Fx108. Still think it's relevant enough to address.

Priority: P3 → P2

So I don't think we can use gNetworkConnectivityService for our needs here. As :kershaw mentioned over on Bug 1793498 in comment 15

nsINetworkConnectivityService tells whether Firefox can successfully get the HTTP response back from http://detectportal.firefox.com. This information might be incorrect and not reliable, since there could be a race between receiving HTTP response and network change.

I ran some experiments and this race between HTTP response and network change puts Firefox View into a worse state than how it currently handles the virtual adapter case. What ends up happening goes like this:

  1. Go to Firefox View with an active internet connection and enabled virtual network adapter while not being signed into sync.
  2. Observe the "Switch seamlessly between devices" sync call-out
  3. Disconnect from active internet connection
  4. Firefox View updates with "check your internet connection" message instead of sync call-out
  5. Reconnect to active internet connection
  6. Firefox View does not update the "check your internet connection" message
  7. Disconnect from active internet connection
  8. Firefox View removes "check your internet connection" error state and shows the "Switch seamlessly between devices" sync call-out
  9. Reconnect to active internet connection
  10. Firefox View does not update the previously shown error state
  11. Disconnect from active internet connection
  12. Firefox View removes "check your internet connection" error state and shows the "Switch seamlessly between devices" sync call-out again
  13. The machine connectivity and Firefox View connectivity are now permanently out-of-sync until Firefox is restarted

I believe showing an error message, even if this message doesn't have the most accurate wording, is better than this odd error state I've consistently gotten into when trying to use the network connectivity service. Because of this, I don't we can solve this bug as we have no accurate and consistent way to listen for when a network is able to access the internet.

I tried to create a listener for when the IPv4 attribute of the network connectivity service changes (by listening to when the network ID changes since we don't have a listener for nsINetworkConnectivityService.IPv4 yet) but, since there can be races between HTTP response and network change, this listener doesn't do anything meaningful for solving this issue. I tried to listen for when the network ID changes, via the "network:networkid-changed" topic, but the network ID can change before the connectivity services knows that it can't access things via IPv4. For example, when disconnecting from the network, we can get into a state of the network service thinking we are still able to access the internet but in reality we are disconnected from the internet. Again, this is because of the race condition that :kershaw mentioned previously. So my listener thinks we are still able to access the internet and then Firefox View does not update with an error message, which is definitely not a state we want to be in.

I think I'm out of ideas on how to solve this particular issue. Open to new ideas to try, but I have a hunch we aren't going to be able to adequately solve this issue. :gijs, any suggestions on how to move forward here? I don't think we can fix this since, as I understand it, we don't have a way/listener to determine if the browser is able to access the internet or not. Maybe I'm missing something extremely obviously and we can fix this, hoping an extra set of eyes can help determine this one way or the other.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Tim Giles [:tgiles] from comment #12)

So I don't think we can use gNetworkConnectivityService for our needs here. As :kershaw mentioned over on Bug 1793498 in comment 15

nsINetworkConnectivityService tells whether Firefox can successfully get the HTTP response back from http://detectportal.firefox.com. This information might be incorrect and not reliable, since there could be a race between receiving HTTP response and network change.

<snip>

Thanks for the extensive investigation, Tim.

I agree with your assessment that there's not much we can do on the FirefoxView side right now. I have 2 separate conclusions though:

  1. we should try to work with the networking folks on bug 1793498 so that, AIUI, the nsINetworkLinkService starts behaving correctly on Windows when users have virtual network adapters.
  2. we should file a bug to fix the race condition in nsINetworkConnectivityService. If there's no network then the http response can't come back to us. So it has to be the case that there is a network link change after the successful http response arrives, and that should invalidate the status on the network connectivity service, I think.

Do both of those make sense to you or am I missing something?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(tgiles)

(In reply to :Gijs (he/him) from comment #13)

  1. we should file a bug to fix the race condition in nsINetworkConnectivityService. If there's no network then the http response can't come back to us. So it has to be the case that there is a network link change after the successful http response arrives, and that should invalidate the status on the network connectivity service, I think.

(even if this is racy, then we should detect a network change that happens while we're waiting for the http response and just discard that response and try again)

(In reply to :Gijs (he/him) from comment #13)

Thanks for the extensive investigation, Tim.

I agree with your assessment that there's not much we can do on the FirefoxView side right now. I have 2 separate conclusions though:

  1. we should try to work with the networking folks on bug 1793498 so that, AIUI, the nsINetworkLinkService starts behaving correctly on Windows when users have virtual network adapters.
  2. we should file a bug to fix the race condition in nsINetworkConnectivityService. If there's no network then the http response can't come back to us. So it has to be the case that there is a network link change after the successful http response arrives, and that should invalidate the status on the network connectivity service, I think.

Do both of those make sense to you or am I missing something?

I agree with your first point but I'll also note that this issue of virtual network adapters also appears on Linux and Mac. If I enable my VPN (Mozilla VPN in this case) and then disconnect from the internet, the about:networking#networkid section still says that the link is up and the link status is known...even though the VPN app knows it has no internet connection. If I understand the problem correctly, this issue isn't limited to Windows; it's just easier to reproduce because the network adapters are always on (or something to that effect) compared to Mac and Linux where these virtual adapters don't seem to exist until they are used by the OS. We should work with the networking folks to fix this, but I'm not sure how useful I can be in actually fixing the bug. I have zero experience fixing network bugs like this, but I can test patches I suppose.

I agree with the second point as well, but I feel like I'm missing something as well...I'm hoping that the networking folks have more context as to why this is racy and what can be done to remove the race condition. Your solution seems like a reasonable first try of removing the race condition though (and is my understanding of the problem as well). Edit: filed Bug 1798505 for this second point.

So what do we do with this bug then? Do we keep it blocked on Bug 1793498? Do we change the priority? I don't think we're going to get a fix for the blocking bug by 108 code freeze, in my opinion.

Flags: needinfo?(tgiles)
Depends on: 1798505

I'm going to update the priority to try to have this tracked for Fx109 now instead.

Priority: P2 → P3

Not sure this is actionable from our end, unassigning for now. We'll need help from networking to resolve this issue one way or the other.

Assignee: tgiles → nobody
Status: ASSIGNED → NEW
Hardware: Unspecified → Desktop
Summary: The “User not connected to internet” error state can’t be triggered on Windows when "connected" additional virtual network adapters are present → The “User not connected to internet” error state can’t be triggered when "connected" additional virtual network adapters are present

A potential (unverified) workaround is to add a listener to network:connectivity-service:ip-checks-complete and do some checks if the network connectivity service returns an actual value for the IPv4 getter. However, I'm not sure if we want to introduce this listener to the front-end code as it is potentially blocking operation according to :valentin. At any rate, further investigation is needed to work around this issue.

Duplicate of this bug: 1802514

The severity field for this bug is set to S3. However, the following bug duplicate has higher severity:

:sfoster, could you consider increasing the severity of this bug to S2?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sfoster)

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #20)

:sfoster, could you consider increasing the severity of this bug to S2?

The severity is correct. I would be good to get a fix, but Its a very specific set of circumstances to trigger this bug which aren't going to affect 99..x% of users, and even when reproduced the bug manifests as confusing/misleading error messages, but has no functional impact for normal use.

Flags: needinfo?(sfoster)

I'm going to close this one out as I don't think this edge case is significant enough for us to prioritize at the moment.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Reopening to spend some further time investigating if there's a viable fix.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: