Open Bug 1879387 Opened 5 months ago Updated 8 days ago

Fenix fails to gracefully handle network transition during pageload

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

People

(Reporter: acreskey, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(4 files)

Attached image geckoview_pageload.jpg

When we transition networks midway through page load we are sometimes left with a partially loaded page.

For example:

• Share network via wifi from a host desktop machine.
• Connect to this wifi from Geckoview_exmple
• Start loading a large website (e.g. www.washingtonpost.com)
• Midway through the pageload (after first paint), disable the wifi sharing
• The Android device will transition to the next available wifi network

Result:
The page will be left in an incomplete state.
Only reloading it will fix it.

Expectation:
In progress connections are automatically retried, allowing for a complete pageload?
Chrome does seem to often handling this gracefully, but certainly not always.

Opening this bug for discussion -- it's not clear if we could do this better.

To make the partially loaded page behaviour easier to reproduce, I add packet-level network throttling on the wifi host desktop (e.g. 2mbps and 50ms latency).

I did ensure that network.http.http2.move_to_pending_list_after_network_change from bug 1706377 was enabled.
From remote DevTools, the content did look to be almost all HTTP/2 requests.

Summary: Android: Changing networks midway through pageload leads to incomplete load → Android: changing networks midway through pageload leads to incomplete load
Whiteboard: [necko-triaged]
Duplicate of this bug: 1879388

Profile:
https://share.firefox.dev/42AWuAU

Not sure how long it takes for requests to timeout?

Severity: -- → S4
Priority: -- → P2

I haven't been able to pinpoint scenarios where Chrome consistently outperforms Firefox in this test.

Setup:
• Packet-level network throttling on desktop (using network link conditioner on macOs), higher latency and limited bandwidth
• Desktop shares network via wifi
• Connect Android device to this shared network
• Ensure that the Android device also has a secondary Wifi network that it will transition to if the shared desktop wifi is dropped

Scenario 1:

  • Start a pageload on Fenix/Chrome
  • Once the navigation has begun and the view is cleared, disable network sharing from the host desktop
  • The Android device will automatically transition to the next auto-join Wifi network

Behaviour: both Fenix and Chrome will stall the pageload with an empty document.
A reload is required to complete the page load.
(Note that Chrome implements "Pull to refresh" on Android, so the drag down motion will trigger a refresh.

Scenario 2:

  • Start a pageload on Fenix/Chrome
  • Once the navigation has begun and the first non-blank paint has been made, disable network sharing from the host desktop
  • The Android device will automatically transition to the next auto-join Wifi network

Behaviour: both Fenix and Chrome will stall the pageload with a partially rendered document.
A reload is required to complete the page load.
(Again, note that Chrome implements "Pull to refresh" on Android, so the drag down motion will trigger a refresh.

The only difference that I can consistently see is the UI gesture "Pull to refresh" which Chrome triggers quite readily.

Blocks: perf-android

Andrew asked me to add some notes about an issue I'm seeing for some time as well.
FTR I see this both on Android and Desktop. Not in the same context though: Android it happens more often in the subway while on Desktop this is when I'm in the high-speed train with bad connectivity.

Sometimes, I connect to some website, and at that point my connection is probably quite bad (like in some subway station, or in the high-speed train), so the connection doesn't really succeeds. Maybe (just a wild guess) packets are lost and maybe they're not resent, or they are resent but they're lost again, and the TCP window increases, so they're not resent again or not often enough.

But in the next subway station, the connection is back to a very good state. But the connection still doesn't seem to success. Even worse, because I'm aware of this, I try to reload the page, but this still doesn't work (I think Firefox knows a connection is ongoing to that server, so we don't try to start a new one?).
In the end I have to wait for the timeout (which gives me a blank page BTW, not even an error page), or I can kill all of firefox and rerun it again, and then it generally works.

It's not clear this is the exact same issue that the one outlined here though. Especially in my case this is more about the initial connection to the website. It's also possible that Andrew's described issue is similar: when disabling the wifi routing while the page is loading, there may be new connections to different domains happening at that exact moment, leading to the same process.

Related to bug 1906323, in which Fenix fails to show any error when the access point has no WAN connectivity.

See Also: → 1906323

Renamed the bug as I think I have a very reproducible scenario.

Steps to reproduce

  1. Connect Android device to a wifi network which you can easily disable, or else be able to walk out of range from
  2. Ensure that there is a secondary network that your device will automatically transition to (i.e. has stored credentials and is configured to auto-connect)
  3. Initiate a page load by clicking on a link
  4. Disable the connected to network (or walk out of range), so that the device will automatically transition to the new network

Expected behaviour

As the device connects to the new network, the pageload resumes

Actual behaviour

The pageload will generally stall for a long period of time and then silently fail.
This leaves the user with a blank page (even though the device has successfully transitioned networks).

In Chrome, this is handled gracefully:
• The user is briefly notified of the loss of network
• Once the device connects to the new network the pageload resumes and the page is successfully loaded

(See attached videos).

Note: this can be made easier to reproduce by introduce additional latency on the wifi access point (e.g. +300ms rtt)

Here's an example profile of a page load that stalled after the initial network was disabled (nsHttp logs as markers)
https://share.firefox.dev/3xO8CDR

Summary: Android: changing networks midway through pageload leads to incomplete load → Fenix fails to gracefully handle network transition during pageload
Attached video Chrome_bbc_load.mov

Pageload while transitioning networks in Chrome.
Note how the change of network is messaged to the user followed by the graceful resumption of the pageload.

Attached video fenix_bbc_trimmed.mov

Fenix transitioning networks during pageload.
Note that the page load never completes and the user is left with a blank document.
Video trimmed for size, but it takes over 30 seconds before the loading bar stops.

Screenshot of final view after network transition, Fenix nightly.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: