Closed Bug 1725946 Opened 3 years ago Closed 3 years ago

Import upstream patch that makes html5lib tokenizer tests runnable without the tree builder

Categories

(Core :: DOM: HTML Parser, task)

task

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox93 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

Details

Attachments

(1 file)

Import https://github.com/validator/htmlparser/commit/2f843768d2f10dc56ebce715faaf8dcb411febc4 in order to get the repos in sync.

No Web-exposed changes expected.

This change brings the tokenizer’s handling of U+0000 NUL characters in
the DATA state and the CDATA section state into conformance with the
requirements in the HTML spec — for the case where only tokenization is
being performed, without tree construction; that is, the case where the
tokenizer() method is called, rather than parse() or parseFragment().

Specifically, the tokenization steps defined in the spec require that
when a U+0000 NUL is consumed in the DATA state or in the CDATA section
state, the parser must then emit a U+0000 NUL. But when performing tree
construction, the spec requires that when a U+0000 NUL is consumed, the
parser must instead emit a U+FFFD REPLACEMENT CHARACTER.

Without this change, the parser always emits a U+FFFD REPLACEMENT
CHARACTER — even when only tokenization is being performed. That causes
us to fail a number of tests in html5lib-tests suite.

For more background on the relevant behavior, see the following:

Relates to https://github.com/validator/htmlparser/issues/35

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2087677cd31d Conform tokenizer-only U+0000 NUL handling to spec r=smaug
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: