Open Bug 1874840 Opened 4 months ago Updated 3 days ago

xhr with range header gets unreliable Content-Length (and different from other browsers) for 'text/plain' files

Categories

(Core :: DOM: Networking, defect, P2)

Firefox 121
defect
Points:
3

Tracking

()

UNCONFIRMED

People

(Reporter: piovesan.carlo, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged][necko-priority-next])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36

Steps to reproduce:

Go to the browser console, run:

var xhr = new XMLHttpRequest(); xhr.open("HEAD", "https://raw.githubusercontent.com/duckdb/duckdb_spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.prj",false);xhr.setRequestHeader('Range', bytes=0-);xhr.onload = ()=>{console.log(xhr.getResponseHeader('Content-Length'));}; xhr.send(null);

This perform an XHR async request while providing the Range header.

Trying the same for a different

Alternatively I set up this test website:
https://carlopi.github.io/content-length-test/
that performs the same computations on a bunch of combinations (HEAD/GET, ranges or not).

Originally reported here: https://github.com/duckdb/duckdb-wasm/issues/1580

Actual results:

For the console test:
Firefox returns 347 (the file of the compressed file) while Chromium/Safari returns 562 (the size of the file after decompression).

Using the test website, on Firefox the first few lines will be like:
347 using HEAD + RANGE
347 using GET + RANGE
562 using GET's arraybuffer's byteLength

while on chrome/safari they will be:
562 using GET's arraybuffer's byteLength
562 using HEAD + RANGE
562 using GET + RANGE

Expected results:

I would expect result to match between browsers, and in particular that HEAD requests performed with Ranges header attached to return the actual file length (when decompressed)

An user of duckdb-wasm reported that it works for them "I'm running Firefox 115 ESR on my Mac", so it might be that this is a regression, but I haven't reproduced that.

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → DOM: Networking
Product: Firefox → Core
Flags: needinfo?(smayya)

Firefox requests gzip Content-Encoding (via Accept-Encoding request header) while Chrome uses identity encoding. Chrome is right according to the spec.
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch (Step 8.19.)

If httpRequest’s header list contains Range, then append (Accept-Encoding, identity) to httpRequest’s header list.
[Note] This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Apparently we fail to handle the case where XHR or fetch adds a Range request header.

By the way, I got this result with reporter's testcase on Chrome.
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE
That is, Chrome did not handle GET + RANGE case correctly. I don't know the reason why I got a different result from reporter's one.

By the way, I got this result with reporter's testcase on Chrome.
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE
That is, Chrome did not handle GET + RANGE case correctly. I don't know the reason why I got a different result from reporter's one.

Amazingly, also Chrome has a weird behaviour here, given that disabling/cleaning the cache the results are:
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
562 using GET + RANGE

while when cache kicks in they are (as you posted):
562 using HEAD + RANGE
562 using GET's arraybuffer's byteLength
347 using GET + RANGE

Safari looks to be behaving with the sensible behavior.

(In reply to Masatoshi Kimura [:emk] from comment #3)

Firefox requests gzip Content-Encoding (via Accept-Encoding request header) while Chrome uses identity encoding. Chrome is right according to the spec.
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch (Step 8.19.)

If httpRequest’s header list contains Range, then append (Accept-Encoding, identity) to httpRequest’s header list.
[Note] This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Apparently we fail to handle the case where XHR or fetch adds a Range request header.

I see that we are sending identity value along with gzip.
I see that in case of range header, we do append the identity value.

The spec mentions to "append" the value into the header list and hence I think we are behaving as per the spec?

We will discuss this internally during our bug review meeting to decide on further course of action.
Fix should be straightforward. We need to just set the header instead of merge.

However, I see that Chrome and Safari both just sends identity for Accept-Encoding request header.

Blocks: fetch, xhr
Severity: -- → S3
Flags: needinfo?(smayya)
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-new]

(In reply to Sunil Mayya from comment #6)

The spec mentions to "append" the value into the header list and hence I think we are behaving as per the spec?

IMO it is a spec bug because the spec has the following note right after the text:

This avoids a failure when handling content codings with a part of an encoded response.
Additionally, many servers mistakenly ignore Range headers if a non-identity encoding is accepted.

Yes, I know this is a non-normative note, but the current spec text does not resolve the problem that this note is concerning about.

Also I fail to understand the definition of "append" in the spec:

To append a header (name, value) to a header list list:

  1. If list contains name, then set name to the first such header’s name.
    Note
    This reuses the casing of the name of the header already in list, if any. If there are multiple matched headers their names will all be identical.
  2. Append (name, value) to list.

If list contains name, name is already match a byte-case-insensitive for such header’s name. So effectively step 1 looks no-op for me.

Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]
Points: --- → 3
See Also: → 1891719
You need to log in before you can comment on or make changes to this bug.