Closed Bug 1725946 Opened 3 years ago Closed 3 years ago

Import upstream patch that makes html5lib tokenizer tests runnable without the tree builder

Tracking

()

Status:

RESOLVED FIXED

Milestone:

93 Branch

Tracking Flags:

Tracking

Status

firefox93

---

fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

Details

Attachments

(1 file)

Bug 1725946 - Conform tokenizer-only U+0000 NUL handling to spec 3 years ago Henri Sivonen (:hsivonen) 48 bytes, text/x-phabricator-request		Details \| Review

Henri Sivonen (:hsivonen)

Assignee

Description

•

3 years ago

Import https://github.com/validator/htmlparser/commit/2f843768d2f10dc56ebce715faaf8dcb411febc4 in order to get the repos in sync.

No Web-exposed changes expected.

Henri Sivonen (:hsivonen)

Assignee

Comment 1

•

3 years ago

Attached file Bug 1725946 - Conform tokenizer-only U+0000 NUL handling to spec — Details

This change brings the tokenizer’s handling of U+0000 NUL characters in
the DATA state and the CDATA section state into conformance with the
requirements in the HTML spec — for the case where only tokenization is
being performed, without tree construction; that is, the case where the
tokenizer() method is called, rather than parse() or parseFragment().

Specifically, the tokenization steps defined in the spec require that
when a U+0000 NUL is consumed in the DATA state or in the CDATA section
state, the parser must then emit a U+0000 NUL. But when performing tree
construction, the spec requires that when a U+0000 NUL is consumed, the
parser must instead emit a U+FFFD REPLACEMENT CHARACTER.

Without this change, the parser always emits a U+FFFD REPLACEMENT
CHARACTER — even when only tokenization is being performed. That causes
us to fail a number of tests in html5lib-tests suite.

For more background on the relevant behavior, see the following:

Relates to https://github.com/validator/htmlparser/issues/35

Henri Sivonen (:hsivonen)

Assignee

Comment 2

•

3 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=831922abc83a3740f180a7f72710aa49dfa75edf

Pulsebot

Comment 3

•

3 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2087677cd31d Conform tokenizer-only U+0000 NUL handling to spec r=smaug

Cristina Cozmuta (:CrissCozmuta)

Comment 4

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2087677cd31d

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox93: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

3 years ago

https://github.com/validator/htmlparser/commit/9d72e928e1e683f7eac9946678ed1b4a3d94175a

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Import upstream patch that makes html5lib tokenizer tests runnable without the tree builder

Categories

(Core :: DOM: HTML Parser, task)

Tracking

()

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Attachment

General

Description

File Name

Content Type