1874840 - xhr with range header gets unreliable Content-Length (and different from other browsers) for 'text/plain' files

Reporter

Description

•

4 months ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

Steps to reproduce:

Go to the browser console, run:

var xhr = new XMLHttpRequest(); xhr.open("HEAD", "https://raw.githubusercontent.com/duckdb/duckdb_spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.prj",false);xhr.setRequestHeader('Range', bytes=0-);xhr.onload = ()=>{console.log(xhr.getResponseHeader('Content-Length'));}; xhr.send(null);

This perform an XHR async request while providing the Range header.

Trying the same for a different

Alternatively I set up this test website:
https://carlopi.github.io/content-length-test/
that performs the same computations on a bunch of combinations (HEAD/GET, ranges or not).

Originally reported here: https://github.com/duckdb/duckdb-wasm/issues/1580

Actual results:

For the console test:
Firefox returns 347 (the file of the compressed file) while Chromium/Safari returns 562 (the size of the file after decompression).

Using the test website, on Firefox the first few lines will be like:
347 using HEAD + RANGE
347 using GET + RANGE
562 using GET's arraybuffer's byteLength

while on chrome/safari they will be:
562 using GET's arraybuffer's byteLength
562 using HEAD + RANGE
562 using GET + RANGE

Expected results:

I would expect result to match between browsers, and in particular that HEAD requests performed with Ranges header attached to return the actual file length (when decompressed)

piovesan.carlo

Reporter

Comment 1

•

4 months ago

An user of duckdb-wasm reported that it works for them "I'm running Firefox 115 ESR on my Mac", so it might be that this is a regression, but I haven't reproduced that.

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

4 months ago

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → DOM: Networking

Product: Firefox → Core

Sunil Mayya

Updated

•

4 months ago

Flags: needinfo?(smayya)

Masatoshi Kimura [:emk]

Comment 3

•

4 months ago

Firefox requests gzip Content-Encoding (via Accept-Encoding request header) while Chrome uses identity encoding. Chrome is right according to the spec.
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch (Step 8.19.)

If httpRequest’s header list contains Range, then append (Accept-Encoding, identity) to httpRequest’s header list.
[Note] This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Apparently we fail to handle the case where XHR or fetch adds a Range request header.

Masatoshi Kimura [:emk]

Comment 4

•

4 months ago

By the way, I got this result with reporter's testcase on Chrome.
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE
That is, Chrome did not handle GET + RANGE case correctly. I don't know the reason why I got a different result from reporter's one.

piovesan.carlo

Reporter

Comment 5

•

4 months ago

By the way, I got this result with reporter's testcase on Chrome.
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE
That is, Chrome did not handle GET + RANGE case correctly. I don't know the reason why I got a different result from reporter's one.

Amazingly, also Chrome has a weird behaviour here, given that disabling/cleaning the cache the results are:
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
562 using GET + RANGE

while when cache kicks in they are (as you posted):
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE

Safari looks to be behaving with the sensible behavior.

Sunil Mayya

Comment 6

•

3 months ago

•

Edited

(In reply to Masatoshi Kimura [:emk] from comment #3)

Firefox requests gzip Content-Encoding (via Accept-Encoding request header) while Chrome uses identity encoding. Chrome is right according to the spec.
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch (Step 8.19.)

If httpRequest’s header list contains Range, then append (Accept-Encoding, identity) to httpRequest’s header list.
[Note] This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Apparently we fail to handle the case where XHR or fetch adds a Range request header.

I see that we are sending identity value along with gzip.
I see that in case of range header, we do append the identity value.

The spec mentions to "append" the value into the header list and hence I think we are behaving as per the spec?

We will discuss this internally during our bug review meeting to decide on further course of action.
Fix should be straightforward. We need to just set the header instead of merge.

However, I see that Chrome and Safari both just sends identity for Accept-Encoding request header.

Blocks: fetch, xhr

Severity: -- → S3

Flags: needinfo?(smayya)

Priority: -- → P2

Whiteboard: [necko-triaged][necko-priority-new]

Masatoshi Kimura [:emk]

Comment 7

•

3 months ago

(In reply to Sunil Mayya from comment #6)

The spec mentions to "append" the value into the header list and hence I think we are behaving as per the spec?

IMO it is a spec bug because the spec has the following note right after the text:

This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Yes, I know this is a non-normative note, but the current spec text does not resolve the problem that this note is concerning about.

Masatoshi Kimura [:emk]

Comment 8

•

3 months ago

Also I fail to understand the definition of "append" in the spec:

To append a header (name, value) to a header list list:

If list contains name, then set name to the first such header’s name.
Note
This reuses the casing of the name of the header already in list, if any. If there are multiple matched headers their names will all be identical.

Append (name, value) to list.

If list contains name, name is already match a byte-case-insensitive for such header’s name. So effectively step 1 looks no-op for me.

Randell Jesup [:jesup] (needinfo me)

Updated

•

3 months ago

Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]

Kershaw Chang [:kershaw]

Updated

•

3 days ago

Points: --- → 3

Kershaw Chang [:kershaw]

Updated

•

3 days ago

Bugzilla

Quick Search

xhr with range header gets unreliable Content-Length (and different from other browsers) for 'text/plain' files

Categories

(Core :: DOM: Networking, defect, P2)

Tracking

()

People

(Reporter: piovesan.carlo, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged][necko-priority-next])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated