Closed Bug 1734110 Opened 3 years ago Closed 3 months ago

HTTP/3 stalls when switching to network with MTU<=1350

Categories

(Core :: Networking: HTTP, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbranch, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged] workaround: set `network.http.http3.enabled` to false)

Attachments

(1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15

Steps to reproduce:

  • Visit a website that uses HTTP/3 and observe that Firefox uses h3 in the Network Monitor.
  • Browse to any other website.
  • Enable a full-tunnel VPN (as an example, Cloudflare Warp).
  • Try browsing to the original (h3-supporting) website.

Actual results:

The GET request to the original website stalls indefinitely (at least 5 minutes). There is no fallback to HTTP/2.

Expected results:

Either a fallback to HTTP/2 (after some short period of time) or path MTU discovery to allow the HTTP/3 connection to succeed under the new network path.

This is similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1674449 except that the failure case needs to be triggered even after the initial connection to a h3-advertising website. It is also possible this is just a problem with any network path change, not just enabling a VPN.

The Bugbug bot thinks this bug should belong to the 'DevTools::Netmonitor' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Netmonitor
Product: Firefox → DevTools
Component: Netmonitor → Networking: HTTP
Product: DevTools → Core
Blocks: QUIC
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]
Priority: P2 → P3

I can confirm HTTP/3 connections stall on requests for large resources without a fallback, using an interface MTU of 1280 (in this case using Cloudflare WARP) on macOS 11.6.1 using both Firefox 93.0 and 94.0b9.

Note that lack of path MTU discovery appears to be intentional per bug 1699289.

Status: UNCONFIRMED → NEW
Ever confirmed: true

I think the same is happening to me as well, with Firefox on Windows 10, Ubuntu and Android. There are some websites that most of the time just don't load completely, unless I disable either HTTP/3 or IPv6 and there seems to be a 1280 MTU limit on my connection.
Chrome on Windows 10 also has a problem, because it takes ~6 seconds to load them (falling back on HTTP/2), but it always loads them fully. (Chrome on Linux/Android works just fine, however.)

Here are some of these problematic websites, maybe it can be of some use for testing (and maybe someone can confirm that we're talking about the same issue):
https://linegee.net/
https://articlewedding.com/
https://pincong.rocks/

It seems related to bug 1700703.

I believe this bug should have a higher priority.

With enterprises starting to deploy zero trust strategy, this can potentially affect many people. As an example, in my current company Canva, all devices managed by the company have Cloudflare WARP installed and enabled by default, and certain services are only accessible if you have it enabled, so everyone using Firefox in the company is affected by this issue. We have an item in our internal confluence page suggesting Firefox users to disable HTTP/3 support via network.http.http3.enabled config, which is very unfortunate.

With more adoption of VPNs like Cloudflare WARP and HTTP/3 on more website, this is going to confuse increasing number of users.

Whiteboard: [necko-triaged] → [necko-triaged] workaround: set `network.http.http3.enabled` to false
Version: Firefox 92 → Trunk

How did you determine that this only affects networks with MTU <= 1350?

I'm wondering if #1733449 could be related. I'm on a Cisco Anyconnect VPN, but it has a higher MTU than mentioned in this bug (1358 rather than <=1350). In my case, the initial page load is fine, but subsequent loads, or loads of static resources (images, CSS, JS, etc) sporadically stall. I've also had to disable network.http.http3.enabled to make Firefox usable on sites that support HTTP3.

Duplicate of this bug: 1824604
Whiteboard: [necko-triaged] workaround: set `network.http.http3.enabled` to false → [necko-triaged][necko-priority-review] workaround: set `network.http.http3.enabled` to false

If I understand this comment correctly, we decided to use a larger MTU for performance wins.
So, maybe the best way forward is implementing PMTUD first.

Martin, could you correct me if I am wrong? Thanks.

Flags: needinfo?(mt)

That is right. However, the stalling thing is perhaps worth looking into. I don't know how able we are to detect a network change, but we might tighten any timers we maintain in that case, in case the MTU changed and we are no longer able to use QUIC.

FWIW, there is no safe cut-off here. Even if we went to the absolute minimum required by QUIC (1200), there are still networks that don't support that (all IPv4, of course). Any choice we make is an attempt to get the most value for the most people. A larger MTU is more efficient. We could maybe offer a way to tune our MTU for the default for people who are willing to make that trade-off for themselves. It's a bit of a niche use, but QUIC has real advantages that might make that worthwhile. That value is hard-coded right now, so it's a tiny bit of work, but it's fairly straightforward. If someone wants to provide patches, I'd review 'em.

Ultimately, we probably should engage in a bit of path MTU detection, but that comes with its own costs.

Flags: needinfo?(mt)

ni myself for debugging this bug.

Flags: needinfo?(kershaw)
Attached file firefox.pcapng (obsolete) —

Sure.

Comment on attachment 9336567 [details]
firefox.pcapng

Sorry, I mixed up my tabs and attached this file to the wrong bug. Feel free to remove it.

Attachment #9336567 - Attachment is obsolete: true
Flags: needinfo?(kershaw)
Duplicate of this bug: 1772057

We need to have a pref/parameter in Necko connection to reduce the MTU parameter.

Whiteboard: [necko-triaged][necko-priority-review] workaround: set `network.http.http3.enabled` to false → [necko-triaged][necko-next]workaround: set `network.http.http3.enabled` to false
Whiteboard: [necko-triaged][necko-next]workaround: set `network.http.http3.enabled` to false → [necko-triaged][necko-priority-next] workaround: set `network.http.http3.enabled` to false

See https://github.com/mozilla/neqo/pull/1903
This will be fixed by that neqo PR.

Whiteboard: [necko-triaged][necko-priority-next] workaround: set `network.http.http3.enabled` to false → [necko-triaged] workaround: set `network.http.http3.enabled` to false

Could you check if this bug is still an issue?

Flags: needinfo?(cbranch)
Flags: needinfo?(aldwulf)

Redirect needinfos that are pending on inactive users to the triage owner.
:valentin, since the bug has recent activity, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(cbranch)
Flags: needinfo?(aldwulf)

We'll assume this works now.
Please reopen if that's not accurate.

Status: NEW → RESOLVED
Closed: 3 months ago
Flags: needinfo?(valentin.gosu)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: