Closed Bug 1740856 Opened 3 years ago Closed 2 years ago

POST stalls on new http2 connection when 0rtt enabled

Categories

(Core :: Networking: HTTP, defect, P2)

Firefox 94
defect

Tracking

()

VERIFIED FIXED
96 Branch
Tracking Status
firefox96 --- fixed

People

(Reporter: toast, Assigned: kershaw)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0

Steps to reproduce:

With network logging enabled, reloaded a page which was an HTTP POST. The server is configured to do HTTP/2 with TLS 1.3 0-RTT early data.

I'm currently hosting a minimal reproduction, form is currently available at https://earlydata.ruka.org/ which posts to https://earlydata.ruka.org/post.html

Actual results:

Logs (attached) and pcap (not attached, but available) show that Firefox sends http/2 header information in early-data; see line 599, 2021-11-12 04:25:04.590000 shows Magic Connection Header sending.

The POST is not sent during the 0rtt window, which is expected. Unfortunately, the request is not sent until nearly 60 seconds later.

About 57 seconds later, line 964, 2021-11-12 04:26:02.600000 UTC shows Generate Ping, and shortly after, line 1095, 2021-11-12 04:26:02.603000 UTC shows a stream ID created for /post.html

About 40 ms later, the page is decoded, line 1951, 2021-11-12 04:26:02.642000 UTC shows decoded response headers

(http/2 405 is expected --- with a minimal test case, I can't easily generate an http 200 response code)

Primarily tested on Firefox 94.0.1 64-bit Windows 10 20H2 (Build 19042.1348).
I'm also testing on Android and see similar stalling occasionally (and with the original issue on the real site), but it's not as consistent, and logging is harder. Android Firefox Daylight 94.1.2 (Build #2015844587) from Google Play.

Expected results:

I expected TLS 1.3 early-data to be finished immediately upon receiving the server handshake finish message, and the POST to happen then, rather than waiting much longer for a connection ping to happen.

Attaching log file from Firefox 94.0.1

Brief testing in Firefox Nightly (20211111211702) seems to show similar stalling, but the connection stalls for only about 4 seconds, rather than nearly 60.

The Bugbug bot thinks this bug should belong to the 'Core::Audio/Video: Playback' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Component: Audio/Video: Playback → Networking: HTTP

I've configured the nginx server on https://earlydata.ruka.org/ to have client_header_timeout 3; This gives the client 3 seconds to send request headers for the first request on a connection, which makes it much easier to tell there's an issue.

On Firefox 94, if Firefox decides to use an http2 0rtt session, the server will shutdown the connection 3 seconds in (since no request was sent), and Firefox waits about 60 seconds before it notices the connection is done, then it opens a new connection with 0rtt and experiences the problem again. In my tests it seems to go through this about 3 times, and then on the fourth time, Firefox manages to send a request in time.

On Firefox Nightly, the first connection is similar; nightly does not send a request during the initial 3 seconds, so nginx closes the connection, and Nightly doesn't seem to notice until 1 second later; but nightly does not use 0rtt when it reconnects and it sends the request right away. I believe this is because of the fix to bug 1687320

The fix to bug 1687320 landed many months ago, so if it works then it should also take care of Firefox 94.

Dragana, do you have any idea what's going on here?

Flags: needinfo?(dd.mozilla)

I have the same problem with my website but i noticed some things :
-> If the POST occurs less than 10 seconds after doing a GET on the website, it works.
-> If I POST without having done a GET in the previous 10 seconds, Firefox waits 4.02s or 59s.
-> Moreover, this problem appears with my provider (Orange) but this problem does not appear with my other provider(B & You) with exactly the same computer, the same session, etc ...

AND :
If I put the same website with another host (OVH), the problem disappears completely(no need to do a GET in the previous 10s).

Hi,

I do have the same diagnostic with pages hosted on o2switch and these pages work great with OVH.

What I have noticed is that the bug does not occur when I use private browsing or when I disable "security.tls.enable_0rtt_data" preference.

My ISP is Orange FTTH, will check with other provider, but that will be quite funny if it's a browser/hoster/isp combo that enables the bug to occur.

Julien

I've narrowed this down to the POST being the first request on an http/2 connection with 0rtt. If your server isn't doing 0rtt/tls early data, or it reuses an existing connection, or you're not doing http/2, I don't think you're going to see this.

If you're trying to reproduce on nginx, I recommend settings like:

listen 443 ssl http2;
ssl_early_data on;
keepalive_requests 1;
client_header_timeout 3;

which enables http2 and tls early data, eliminates connection reuse, and sets the first request timeout to be under the timings I've observed on Firefox Nightly. Other servers may have different names for similar settings, assuming they support all the features required.

With Firefox 94, it's quite easy to observe when it reproduces, because it often cycles several times and takes about 60 seconds per cycle. When Nightly reproduces, it's a bit harder, because it only takes 4 seconds a cycle and in my testing, the second try always succeeded; but it's clear from tcpdump that it's making two connections.

I highly doubt this is related to the ISP or hosting networks; unless there's some heuristic to disable 0rtt depending on the network; but there does seem to be some extra condition, since it's not consistent; reloading the POST a few times is the easiest way to trigger it.

I think the problem is that when nsHttpTransaction::Do0RTT fails, poll is never called. As the result, Http2Session would end up in this idle state.
I'll try to submit a patch to fix this.

Severity: -- → S3
Flags: needinfo?(dd.mozilla)
Priority: -- → P2
Whiteboard: [necko-triaged]
Assignee: nobody → kershaw
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a1bd523dd7e1
When 0RTT fails, poll should be called to drive handshake, r=dragana
Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch

Thank you, I updated to Nightly 2021-11-26 and tested and was no longer able to reproduce the reported issue.

I also confirmed in Wireshark that 0rtt Early Data was used for http/2 setup and Nightly sent end of early data immediately after receiving the server handshake finished, followed shortly by the post, as expected.

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: