High packet loss on QUIC on Windows with GSO
Categories
(Core :: Networking, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr128 | --- | unaffected |
| firefox-esr140 | --- | unaffected |
| firefox141 | --- | unaffected |
| firefox142 | + | disabled |
| firefox143 | --- | disabled |
| firefox144 | --- | disabled |
People
(Reporter: mail, Assigned: mail)
References
(Regression)
Details
(Keywords: leave-open, regression, Whiteboard: [necko-triaged] )
Attachments
(4 files)
Seeing high packet loss on QUIC connections on Windows since Neqo v0.14.0 landed.
See following graph. Note sharp uptick in 95th percentile on 2025-07-14 from ~0.5% packet loss to ~6.5% packet loss, same day as Bug 1975873 landed.
Hypothesis thus far:
- Bug 1975873 enables USO on Windows
- on the 95th percentile, Firefox on Windows sends 2 or more datagrams per
WSA_SENDMSGsys call - since Firefox now uses hybrid post quantum TLS handshake, the first client flight has two UDP datagrams
- thus Firefox UDP QUIC connections do at least one USO WSA_SENDMSG call with 2 or more datagrams per connection
- it is unlikely that both get lost, as otherwise the loss ratio would not be at ~6.5% but 100%
- instead, the assumption is that the first is sent, but the second is lost
Comment 3•4 months ago
|
||
Set release status flags based on info from the regressing bug 1975873
Disable HTTP3 QUIC UDP GSO IO, i.e. sending multiple UDP datagrams in
one sys call, on Windows. See Bug for details.
I suggest the following:
- disable GSO (aka. USO) on Windows on Firefox Nightly (see patch above)
- if it fixes the issue:
- backport the fix to Beta
- investigate why GSO is failing on (some) Windows devices
- if it doesn't fix the issue:
- find a new hypothesis
Potential fix landed. Once metrics on Nightly are back to normal, I will propose a back-port for Beta.
Comment 10•4 months ago
|
||
| bugherder | ||
Updated•4 months ago
|
| Assignee | ||
Comment 11•4 months ago
|
||
Latest data point on Glam is 2025-07-24. Thus, there is no confirmation (or lack of confirmation) whether the above patch fixed the issue. Will check back in tomorrow.
Comment 12•4 months ago
|
||
The bug is marked as tracked for firefox142 (beta). However, the bug still has low severity.
:ghess, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 13•4 months ago
|
||
Disable HTTP3 QUIC UDP GSO IO, i.e. sending multiple UDP datagrams in
one sys call, on Windows. See Bug for details.
Original Revision: https://phabricator.services.mozilla.com/D258686
Updated•4 months ago
|
Comment 14•4 months ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: Degraded network performance
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: no manual QE testing
- Risk associated with taking this patch: Not aware of additional risks. Disables a new feature (i.e. GSO)
- Explanation of risk level: Not sure how to clasify.
- String changes made/needed: no
- Is Android affected?: no
| Assignee | ||
Comment 15•4 months ago
|
||
Status update:
- since the patch to Nightly, the
loss_ratiometric on Nightly has recovered - thus I requested a Beta uplift https://phabricator.services.mozilla.com/D258916
- I tried to reproduce the bug on 6 different Windows machines (x86-64 and ARM) without success
Updated•4 months ago
|
Comment 16•4 months ago
|
||
| uplift | ||
Updated•4 months ago
|
Updated•3 months ago
|
Description
•