Open Bug 1895687 Opened 25 days ago Updated 6 days ago

LF numeric character reference gets ignored in data:text/plain URL

Categories

(Core :: Networking, defect, P2)

Firefox 125
defect

Tracking

()

Tracking Status
firefox126 --- affected
firefox127 --- affected

People

(Reporter: gtisza, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: testcase, Whiteboard: [necko-triaged])

Steps to reproduce:

Create a download link with <a download href="data:text/plain;charset=utf-8,[some text]"> where the text includes newlines encoded as

Test page: https://codepen.io/tisza_gergo/pen/RwmbeRv

Actual results:

The newlines are omitted in the downloaded file.
(Tested in Firefox 125.0.3 on Ubuntu.)

Expected results:

As far as I can see this is a perfectly valid encoding and the newlines should be present in the downloaded file. It does work correctly in Chrome.

Managed to reproduce on:

  • Firefox Nightly 127.0a1;
  • Firefox 126.0;

Tested and reproduced on:

  • macOS 12;
  • Windows 10;
  • Ubuntu 22;

Moving the Component to ‘General’. Please change if there’s a better fit, thank you.
Setting as NEW so the developing team can have a look.

Status: UNCONFIRMED → NEW
Component: Untriaged → General
Ever confirmed: true
Component: General → File Handling

The href property, when inspecting it and using the web development console, says:

$0.href
data:text/plain;charset=utf-8,first%20linesecond%20line

so there is no newline in the DOM. So this breaks well before we get to downloading / file handling. The same difference with Chrome is visible when omitting download="" and just opening the link - Chrome displays a newline and Firefox does not.

Using $0.getAttribute("href") produces the same result in Chrome and Firefox (though it's printed differently in the different consoles 🙃) which does include the newline. So over to URI parsing, which I suspect is doing this.

Component: File Handling → Networking
Product: Firefox → Core
Keywords: testcase
Blocks: url
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]

I think we are following the URL standard here. For some reason, Chrome is escaping the \n to %0A character in the string before it reaches the URL parser.

You can check that all browsers Chrome, Firefox and Stafari parse the URL in the same way and correctly strip out \n characters by going to https://jsdom.github.io/whatwg-url/ and pasting data:text/plain;charset=utf-8,first\u{0020}line\u{000A}second\u{0020}line in the Input with escapes: field.

I've also checked with Safari, and they parse the href of the link element the same as Firefox data:text/plain;charset=utf-8,first%20linesecond%20line.

See https://url.spec.whatwg.org/#url-parsing

Remove all ASCII tab or newline from input.
An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.

So the difference here is due to Firefox and Safari unescaping the characters before parsing, while Chrome unescapes and percent encodes them.
I haven't checked the HTML spec to see what's actually supposed to happen here.

Thomas, do you think we should just close this, or should we add this exact test case to the WPT suite?

Flags: needinfo?(twisniewski)

I think we should make sure that a bug is filed with Chromium, and yes: adding a WPT is the way to go, along with adding it to interop2024's URL bucket, so Chromium can be interoperable here.

Flags: needinfo?(twisniewski)
You need to log in before you can comment on or make changes to this bug.