Closed Bug 1749957 Opened 3 years ago Closed 3 years ago

`content-length` is not properly parsed when using viaduct over HTTP3

Categories

(Core :: Networking, defect)

defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
relnote-firefox --- 96+
firefox-esr91 97+ fixed
firefox96 + fixed
firefox97 + fixed
firefox98 + fixed

People

(Reporter: janerik, Assigned: emilio)

References

Details

Attachments

(2 files, 2 obsolete files)

STR:

  1. Apply the attached patch to switch FOG to the staging endpoint and ensure it's used in developer builds.
  2. Build
  3. Launch with logging: MOZ_LOG="timestamp,sync,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5" mach run
  4. Go to about:glean, send a ping with tag: testing, named ping: metrics. Click "submit settings".
  5. Observe the logs. If it has the following line, it failed somewhere:
2022-01-13 11:23:36.881788 UTC - [Parent 55644: Unnamed thread 13861da00]: W/glean_core::* [glean_core::upload] Unrecoverable upload failure while attempting to send ping b2d8576e-b2f5-4a07-a755-f08b11504da1. Error was UnrecoverableFailure

After this is logged no new websites seem to load anymore.

When trying to repro this on a debug build I always get:

Assertion failure: false (We are done sending this request!), at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:273
#01: ???[/home/emilio/src/moz/gecko-2/obj-debug/dist/bin/libxul.so +0x528a509]

Stack is:

(rr) bt
#0  mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*) (this=0x7efde4a8ef50, buf=0x7efde556325a "\037\213\b", count=427, countRead=0x7efe095b55bc)
    at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:273
#1  0x00007efe1df4120d in mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)
    (stream=<optimized out>, closure=closure@entry=0x7efde4da9b00, buf=0x0, offset=<optimized out>, count=713073987, countRead=countRead@entry=0x7efe095b55bc)
    at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/nsHttpTransaction.cpp:735
#2  0x00007efe1dac17b8 in nsBufferedInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)
    (this=0x7efde950fe20, writer=0x7efe1df411ea <mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*)>, closure=0x7efde4da9b00, count=32166, result=0x7efe095b566c) at /home/emilio/src/moz/gecko-2/netwerk/base/nsBufferedStreams.cpp:433
#3  0x00007efe1df413bf in mozilla::net::nsHttpTransaction::ReadSegments(mozilla::net::nsAHttpSegmentReader*, unsigned int, unsigned int*)
    (this=0x7efde4da9b00, reader=<optimized out>, count=32768, countRead=0x7efe095b566c) at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/nsHttpTransaction.cpp:770
#4  0x00007efe1de5c954 in mozilla::net::Http3Stream::ReadSegments(mozilla::net::nsAHttpSegmentReader*) (this=this@entry=0x7efde4a8ef50, reader=<optimized out>)
    at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Stream.cpp:417
#5  0x00007efe1de5c67c in mozilla::net::Http3Session::SendData(nsIUDPSocket*) (this=0x7efdf5df7700, socket=0x7efde4c4fea8)
    at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/Http3Session.cpp:1079
#6  0x00007efe1dea6440 in mozilla::net::HttpConnectionUDP::SendData() (this=0x7efdf9775cc0) at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/HttpConnectionUDP.cpp:567
#7  0x00007efe1debb581 in mozilla::net::HttpConnectionUDP::ResumeSend()::$_39::operator()() const (this=<optimized out>)
    at /home/emilio/src/moz/gecko-2/netwerk/protocol/http/HttpConnectionUDP.cpp:399
#8  mozilla::detail::RunnableFunction<mozilla::net::HttpConnectionUDP::ResumeSend()::$_39>::Run() (this=<optimized out>)
    at /home/emilio/src/moz/gecko-2/obj-debug/dist/include/nsThreadUtils.h:531
#9  0x00007efe1d958c3a in nsThread::ProcessNextEvent(bool, bool*) (this=0x7efe2a4abd40, aMayWait=false, aResult=0x7efe095b5867)
    at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThread.cpp:1189
#10 0x00007efe1d95cc3a in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7efe2a80c560 <_IO_stdfile_2_lock>, aMayWait=true)
    at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThreadUtils.cpp:467
#11 0x00007efe1db14636 in mozilla::net::nsSocketTransportService::Run() (this=0x7efe2a41e280) at /home/emilio/src/moz/gecko-2/netwerk/base/nsSocketTransportService2.cpp:1190
#12 0x00007efe1db14e91 in non-virtual thunk to mozilla::net::nsSocketTransportService::Run() () at /home/emilio/src/moz/gecko-2/netwerk/base/nsSocketTransportService2.cpp:1269
#13 0x00007efe1d958c3a in nsThread::ProcessNextEvent(bool, bool*) (this=0x7efe2a4abd40, aMayWait=false, aResult=0x7efe095b5a97)
    at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThread.cpp:1189
#14 0x00007efe1d95cc3a in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7efe2a80c560 <_IO_stdfile_2_lock>, aThread@entry=0x7efe2a4abd40, aMayWait=true)
    at /home/emilio/src/moz/gecko-2/xpcom/threads/nsThreadUtils.cpp:467
#15 0x00007efe1e0d9767 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (this=0x7efe0a47b900, aDelegate=0x7efe095b5bb0)
    at /home/emilio/src/moz/gecko-2/ipc/glue/MessagePump.cpp:330
#16 0x00007efe1e04d6bb in MessageLoop::RunInternal()

Which seems to possibly explain this. I uploaded a trace reproducing this to pernosco.

Here is a profile of reproducing the bug, showing both what happens in the Socket Thread and what happens right before in the main thread: https://share.firefox.dev/3filaqn

Pernosco trace of:

MOZ_LOG="timestamp,sync,rotate:200,glean::*:5,fog::*:5,fog_control::*:5,glean_core::*:5,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,cookie:5" mach run --debugger=rr

https://pernos.co/debug/0rKdf4F6AWfwoOncTAm57w/index.html

So from that trace, we have a buffer that looks like:

POST /submit/firefox-desktop/metrics/1/2b510b84-3bc9-43c0-9416-8922021098a8 HTTP/1.1\r\nHost: stage.ingestion-edge.nonprod.dataops.mozgcp.net\r\nUser-Agent: Glean/42.3.0 (Rust on Linux)\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\ncontent-type: application/json; charset=utf-8\r\ndate: Thu, 13 Jan 2022 11:49:28 GMT\r\nx-debug-id: testing\r\ncontent-encoding: gzip\r\ncontent-length: 453\r\nx-client-type: Glean\r\nx-client-version: 42.3.0\r\nConnection: keep-alive\r\nSec-Fetch-Dest: empty\r\nSec-Fetch-Mode: no-cors\r\nSec-Fetch-Site: none\r\nPragma: no-cache\r\nCache-Control: no-cache\r\n\r\n\037\213\b...

We read the headers (up until no-cache\r\n\r\n), and call FindRequestContentLength here.

However that function only looks at case-sensitive Content-Length:, but the header is in lowercase in the buffer, so we don't compute the content-length and leave it as zero. So we get here and get to the wrong branch, and fail to send the body and consume the rest of the buffer, so we loop indefinitely.

Dragana does ^ look like a reasonable explanation to you? There are multiple fixes that come to mind here. The obvious one is fixing the content-length parsing, but if the header isn't there we shouldn't probably fail so catastrophically.

Flags: needinfo?(dd.mozilla)

Dragana confirmed verbally that this is probably it. We're going to run a try build that confirms it.

HTTP headers are case insensitive.

Assignee: nobody → emilio
Status: NEW → ASSIGNED

This is an error case but will avoid the infinite loop.

Depends on D135871

Here is the short list of things that I think we should consider doing:

  1. Take emilio's patch.
  2. Assert that content-length is present and in the expected casing, probably only in debug builds, because that might be a little too expensive to ship.
  3. Run a try build to see if those assertions flush out any other cases where this is being set directly. More investigation might be required there.
  4. Along with this assertion, set rv to something other than NS_OK. Having this function return OK when it was not is how we got into the infinite loop.

(Updated to correct some mistakes)

Attachment #9258908 - Attachment is obsolete: true

I'm just a random kibbitzer, so my apologies if this is off-base, but... isn't an even more fundamental problem here that the code is doing a naive string-based search in the first place? For example, I believe this is a valid HTTP header block that could be passed into this code:

GET / HTTP/1.1\r\nHost: example.com\r\nCookie: foo="Content-Length: 100"\r\n\r\n

(In particular, GET requests normally don't have a content-length header at all, since the default if none is present is to assume an empty body.)

Wouldn't this cause the code to compute the wrong body length and break things?

Summary: Glean viaduct requests failing when going over HTTP3 → `content-length` is not properly parsed when using viaduct over HTTP3

(In reply to Nathaniel Smith from comment #12)

I'm just a random kibbitzer, so my apologies if this is off-base, but... isn't an even more fundamental problem here that the code is doing a naive string-based search in the first place?

Yes, that's definitely not off-base, the code seems a bit sketchy to begin with (disclaimer: note that I don't work on the network stack, I was helping out).

That said, note that there was a combination of two bugs (my patches are one for each). The first one was the trivial mistake getting the content length, the other one was the fact that failing to get the content-length caused a hang.

[...]
Wouldn't this cause the code to compute the wrong body length and break things?

But yeah, for such (edge) case the code would be incorrect, but after the second patch at most it will potentially just not send the whole body, not hang. H/2 code seems to use a parsed representation of the content-length for example (link). Anyways, will file a dependent bug for such follow-up work.

Blocks: 1750043
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/48c0e854be62
Fix parsing of content-length http3 header. r=dragana,necko-reviewers,kershaw

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

Beta/Release Uplift Approval Request

  • User impact if declined: see blocked bug. It seems this codepath with the wrongly-cased header could only be triggered via viaduct, so I'm trying to figure out how to get some tests for this written potentially...
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: not sure which STR would work without recompiling the code as in comment 0 to use an http3-enabled stream.
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): One-liner to be more forgiving and catch lowercase constants. I'll request uplift of the other patch (which is the thing that actually fixes the hang) once I figure out how to write tests for it (since we didn't have any)
  • String changes made/needed: none
Attachment #9258904 - Flags: approval-mozilla-release?

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

Beta/Release Uplift Approval Request

  • User impact if declined: see above
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: see above
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): see above
  • String changes made/needed:
Attachment #9258904 - Flags: approval-mozilla-beta?
Blocks: 1750056

Comment on attachment 9258905 [details]
Bug 1749957 - Consume the buffer in ReadSegments if we're not going to keep reading. r=dragana

Revision D135872 was moved to bug 1750056. Setting attachment 9258905 [details] to obsolete.

Attachment #9258905 - Attachment is obsolete: true

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

Approved for 97.0b3.

Attachment #9258904 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

Approved for 96.0.1

Attachment #9258904 - Flags: approval-mozilla-release? → approval-mozilla-release+
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

[Tracking Requested - why for this release]: This bug affects 91.5.0esr.

Flags: needinfo?(dd.mozilla)

Needinfo for ESR uplift.

Flags: needinfo?(pascalc)

I don't think we should have an ESR for release for this issue as we fixed the http3 configuration problem on our servers as a short term mitigation, but it would probably be good to have a client side fix client side for ESR 91.6.0. Emilio could you provide an ESR patch? Thanks

Flags: needinfo?(pascalc) → needinfo?(emilio)
Flags: needinfo?(emilio)

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: see above
  • User impact if declined: see blocked bug
  • Fix Landed on Version: 97
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): one-liner to mitigate viaduct+http3 issue while we work on more in-depth fixes in dependent bugs.
Attachment #9258904 - Flags: approval-mozilla-esr91?

Comment on attachment 9258904 [details]
Bug 1749957 - Fix parsing of content-length http3 header. r=dragana

Approved for 91.6esr.

Attachment #9258904 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
No longer blocks: 1750043
See Also: → 1750587
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: