Since UTF-16BE and UTF-16LE decoders swallow an initial BOM, the HTML5 parser should feed them a BOM to swallow

NEW
Unassigned

Status

()

Core
HTML: Parser
7 years ago
3 months ago

People

(Reporter: hsivonen, Unassigned)

Tracking

(Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments)

(Reporter)

Description

7 years ago
Created attachment 512735 [details]
Test case

By code inspection, it seems that asking for a "UTF-16BE" or "UTF-16LE" decoder returns a decoder that has fixed endianness but still swallows the first character if it is a U+FEFF.

This is wrong, because only "UTF-16" should interpret U+FEFF as a BOM and the explicit LE and BE variant should treat it as a ZWNBSP.

This is bad, because code that has done its own BOM sniffing and wants a decoder without BOM swallowing state (the HTML5 parser) can't get one.
(Reporter)

Comment 1

7 years ago
Created attachment 512738 [details]
Test case with doctype
(Reporter)

Comment 2

7 years ago
Created attachment 512739 [details]
Two BOMs, no explicit LE label
(Reporter)

Comment 3

7 years ago
Created attachment 512740 [details]
Three BOMs no explicit label
(Reporter)

Comment 4

7 years ago
Chrome swallows one BOM as well. Opera 11 seems to swallow any number of BOMs. Let's assume the Web might require this and deal on the parser side.
Assignee: smontagu → nobody
Component: Internationalization → HTML: Parser
QA Contact: i18n → parser
Summary: UTF-16BE and UTF-16LE decoders swallow an initial BOM → Since UTF-16BE and UTF-16LE decoders swallow an initial BOM, the HTML5 parser should feed them a BOM to swallow
Created attachment 516576 [details]
plain text testcase
All browsers seem to swallow UTF-16LE BOM even for plain text.
Blocks: 746911

Updated

3 years ago
Blocks: 1102679

Comment 7

3 months ago
Is this still a bug? Per the Encoding Standard the UTF-16 variants work a bit differently and since we now implement that...
You need to log in before you can comment on or make changes to this bug.