`content-length` is not properly parsed when using viaduct over HTTP3
Categories
(Core :: Networking, defect)
Tracking
()
People
(Reporter: janerik, Assigned: emilio)
References
Details
Attachments
(2 files, 2 obsolete files)
1.67 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
diannaS
:
approval-mozilla-release+
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
STR:
- Apply the attached patch to switch FOG to the staging endpoint and ensure it's used in developer builds.
- Build
- Launch with logging:
MOZ_LOG="timestamp,sync,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5" mach run
- Go to
about:glean
, send a ping withtag: testing
,named ping: metrics
. Click "submit settings". - Observe the logs. If it has the following line, it failed somewhere:
2022-01-13 11:23:36.881788 UTC - [Parent 55644: Unnamed thread 13861da00]: W/glean_core::* [glean_core::upload] Unrecoverable upload failure while attempting to send ping b2d8576e-b2f5-4a07-a755-f08b11504da1. Error was UnrecoverableFailure
After this is logged no new websites seem to load anymore.
Assignee | ||
Comment 1•3 years ago
|
||
When trying to repro this on a debug build I always get:
Assertion failure: false (We are done sending this request!), at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:273
#01: ???[/home/emilio/src/moz/gecko-2/obj-debug/dist/bin/libxul.so +0x528a509]
Assignee | ||
Comment 2•3 years ago
|
||
Stack is:
(rr) bt
#0 mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*) (this=0x7efde4a8ef50, buf=0x7efde556325a "\037\213\b", count=427, countRead=0x7efe095b55bc)
at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:273
#1 0x00007efe1df4120d in mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)
(stream=<optimized out>, closure=closure@entry=0x7efde4da9b00, buf=0x0, offset=<optimized out>, count=713073987, countRead=countRead@entry=0x7efe095b55bc)
at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/nsHttpTransaction.cpp:735
#2 0x00007efe1dac17b8 in nsBufferedInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)
(this=0x7efde950fe20, writer=0x7efe1df411ea <mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)>, closure=0x7efde4da9b00, count=32166, result=0x7efe095b566c) at /home/emilio/src/moz/gecko-2/netwerk/base/nsBufferedStreams.cpp:433
#3 0x00007efe1df413bf in mozilla::net::nsHttpTransaction::ReadSegments(mozilla::net::nsAHttpSegmentReader*, unsigned int, unsigned int*)
(this=0x7efde4da9b00, reader=<optimized out>, count=32768, countRead=0x7efe095b566c) at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/nsHttpTransaction.cpp:770
#4 0x00007efe1de5c954 in mozilla::net::Http3Stream::ReadSegments(mozilla::net::nsAHttpSegmentReader*) (this=this@entry=0x7efde4a8ef50, reader=<optimized out>)
at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:417
#5 0x00007efe1de5c67c in mozilla::net::Http3Session::SendData(nsIUDPSocket*) (this=0x7efdf5df7700, socket=0x7efde4c4fea8)
at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Session.cpp:1079
#6 0x00007efe1dea6440 in mozilla::net::HttpConnectionUDP::SendData() (this=0x7efdf9775cc0) at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/HttpConnectionUDP.cpp:567
#7 0x00007efe1debb581 in mozilla::net::HttpConnectionUDP::ResumeSend()::$_39::operator()() const (this=<optimized out>)
at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/HttpConnectionUDP.cpp:399
#8 mozilla::detail::RunnableFunction<mozilla::net::HttpConnectionUDP::ResumeSend()::$_39>::Run() (this=<optimized out>)
at /home/emilio/src/moz/gecko-2/obj-debug/dist/include/nsThreadUtils.h:531
#9 0x00007efe1d958c3a in nsThread::ProcessNextEvent(bool, bool*) (this=0x7efe2a4abd40, aMayWait=false, aResult=0x7efe095b5867)
at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThread.cpp:1189
#10 0x00007efe1d95cc3a in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7efe2a80c560 <_IO_stdfile_2_lock>, aMayWait=true)
at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThreadUtils.cpp:467
#11 0x00007efe1db14636 in mozilla::net::nsSocketTransportService::Run() (this=0x7efe2a41e280) at /home/emilio/src/moz/gecko-2/netwerk/base/nsSocketTransportService2.cpp:1190
#12 0x00007efe1db14e91 in non-virtual thunk to mozilla::net::nsSocketTransportService::Run() () at /home/emilio/src/moz/gecko-2/netwerk/base/nsSocketTransportService2.cpp:1269
#13 0x00007efe1d958c3a in nsThread::ProcessNextEvent(bool, bool*) (this=0x7efe2a4abd40, aMayWait=false, aResult=0x7efe095b5a97)
at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThread.cpp:1189
#14 0x00007efe1d95cc3a in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7efe2a80c560 <_IO_stdfile_2_lock>, aThread@entry=0x7efe2a4abd40, aMayWait=true)
at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThreadUtils.cpp:467
#15 0x00007efe1e0d9767 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (this=0x7efe0a47b900, aDelegate=0x7efe095b5bb0)
at /home/emilio/src/moz/gecko-2/ipc/glue/MessagePump.cpp:330
#16 0x00007efe1e04d6bb in MessageLoop::RunInternal()
Which seems to possibly explain this. I uploaded a trace reproducing this to pernosco.
Comment 3•3 years ago
|
||
Here is a profile of reproducing the bug, showing both what happens in the Socket Thread and what happens right before in the main thread: https://share.firefox.dev/3filaqn
Assignee | ||
Comment 4•3 years ago
|
||
Pernosco trace of:
MOZ_LOG="timestamp,sync,rotate:200,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,cookie:5" mach run --debugger=rr
Assignee | ||
Comment 5•3 years ago
|
||
So from that trace, we have a buffer that looks like:
POST /submit/firefox-desktop/metrics/1/2b510b84-3bc9-43c0-9416-8922021098a8 HTTP/1.1\r\nHost: stage.ingestion-edge.nonprod.dataops.mozgcp.net\r\nUser-Agent: Glean/42.3.0 (Rust on Linux)\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\ncontent-type: application/json; charset=utf-8\r\ndate: Thu, 13 Jan 2022 11:49:28 GMT\r\nx-debug-id: testing\r\ncontent-encoding: gzip\r\ncontent-length: 453\r\nx-client-type: Glean\r\nx-client-version: 42.3.0\r\nConnection: keep-alive\r\nSec-Fetch-Dest: empty\r\nSec-Fetch-Mode: no-cors\r\nSec-Fetch-Site: none\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n\037\213\b...
We read the headers (up until no-cache\r\n\r\n
), and call FindRequestContentLength
here.
However that function only looks at case-sensitive Content-Length:
, but the header is in lowercase in the buffer, so we don't compute the content-length and leave it as zero. So we get here and get to the wrong branch, and fail to send the body and consume the rest of the buffer, so we loop indefinitely.
Assignee | ||
Comment 6•3 years ago
|
||
Dragana does ^ look like a reasonable explanation to you? There are multiple fixes that come to mind here. The obvious one is fixing the content-length
parsing, but if the header isn't there we shouldn't probably fail so catastrophically.
Comment 7•3 years ago
|
||
Dragana confirmed verbally that this is probably it. We're going to run a try build that confirms it.
Assignee | ||
Comment 8•3 years ago
|
||
HTTP headers are case insensitive.
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
This is an error case but will avoid the infinite loop.
Depends on D135871
Comment 10•3 years ago
•
|
||
Here is the short list of things that I think we should consider doing:
- Take emilio's patch.
- Assert that content-length is present and in the expected casing, probably only in debug builds, because that might be a little too expensive to ship.
- Run a try build to see if those assertions flush out any other cases where this is being set directly. More investigation might be required there.
- Along with this assertion, set
rv
to something other than NS_OK. Having this function return OK when it was not is how we got into the infinite loop.
(Updated to correct some mistakes)
Comment 11•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
I'm just a random kibbitzer, so my apologies if this is off-base, but... isn't an even more fundamental problem here that the code is doing a naive string-based search in the first place? For example, I believe this is a valid HTTP header block that could be passed into this code:
GET / HTTP/1.1\r\nHost: example.com\r\nCookie: foo="Content-Length: 100"\r\n\r\n
(In particular, GET
requests normally don't have a content-length header at all, since the default if none is present is to assume an empty body.)
Wouldn't this cause the code to compute the wrong body length and break things?
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
(In reply to Nathaniel Smith from comment #12)
I'm just a random kibbitzer, so my apologies if this is off-base, but... isn't an even more fundamental problem here that the code is doing a naive string-based search in the first place?
Yes, that's definitely not off-base, the code seems a bit sketchy to begin with (disclaimer: note that I don't work on the network stack, I was helping out).
That said, note that there was a combination of two bugs (my patches are one for each). The first one was the trivial mistake getting the content length, the other one was the fact that failing to get the content-length caused a hang.
[...]
Wouldn't this cause the code to compute the wrong body length and break things?
But yeah, for such (edge) case the code would be incorrect, but after the second patch at most it will potentially just not send the whole body, not hang. H/2 code seems to use a parsed representation of the content-length for example (link). Anyways, will file a dependent bug for such follow-up work.
Comment 14•3 years ago
|
||
Assignee | ||
Comment 15•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
Beta/Release Uplift Approval Request
- User impact if declined: see blocked bug. It seems this codepath with the wrongly-cased header could only be triggered via viaduct, so I'm trying to figure out how to get some tests for this written potentially...
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: not sure which STR would work without recompiling the code as in comment 0 to use an http3-enabled stream.
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): One-liner to be more forgiving and catch lowercase constants. I'll request uplift of the other patch (which is the thing that actually fixes the hang) once I figure out how to write tests for it (since we didn't have any)
- String changes made/needed: none
Assignee | ||
Comment 16•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
Beta/Release Uplift Approval Request
- User impact if declined: see above
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: see above
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): see above
- String changes made/needed:
Comment 17•3 years ago
|
||
Comment on attachment 9258905 [details]
Bug 1749957 - Consume the buffer in ReadSegments if we're not going to keep reading. r=dragana
Revision D135872 was moved to bug 1750056. Setting attachment 9258905 [details] to obsolete.
Comment 18•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
Approved for 97.0b3.
Comment 19•3 years ago
|
||
bugherder uplift |
Comment 20•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
Approved for 96.0.1
Comment 21•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Comment 22•3 years ago
|
||
bugherder |
![]() |
||
Comment 23•3 years ago
|
||
[Tracking Requested - why for this release]: This bug affects 91.5.0esr.
Updated•3 years ago
|
Comment 25•3 years ago
|
||
I don't think we should have an ESR for release for this issue as we fixed the http3 configuration problem on our servers as a short term mitigation, but it would probably be good to have a client side fix client side for ESR 91.6.0. Emilio could you provide an ESR patch? Thanks
Assignee | ||
Comment 26•3 years ago
|
||
Is H3 enabled on ESR? Anyways the patch should apply cleanly: https://searchfox.org/mozilla-esr91/rev/6a7128db0af76625c03db5868085b3844ee96e61/netwerk/protocol/http/Http3Stream.cpp#64
Comment 27•3 years ago
|
||
Assignee | ||
Comment 28•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: see above
- User impact if declined: see blocked bug
- Fix Landed on Version: 97
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): one-liner to mitigate viaduct+http3 issue while we work on more in-depth fixes in dependent bugs.
Comment 29•3 years ago
|
||
Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana
Approved for 91.6esr.
Comment 30•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Description
•