Closed Bug 1751796 Opened 3 years ago Closed 3 years ago

XML parsererror eats two first letters

Categories

(Core :: XML, defect)

defect

Tracking

()

RESOLVED FIXED
101 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- fixed

People

(Reporter: mgol, Assigned: peterv)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

On macOS, Firefox 96 & newer incorrectly handle invalid XML documents. The error message printed shows a column number smaller by two than it should and the printed invalid XML has the first two characters eaten.

To reproduce, execute the following code in the DevTools console:

( new window.DOMParser() )
    .parseFromString( "<p>Not a <<b>well-formed</b> xml string</p>", "text/xml" )
    .getElementsByTagName( "parsererror" )[ 0 ]
    .textContent

The end result is:

"XML Parsing Error: not well-formed
Location: https://github.com/jquery/jquery/runs/4926000068?check_suite_focus=true
Line Number 1, Column 9:>Not a <<b>well-formed</b> xml string</p>
--------^"

You can see <p have disappeared from the output and the column is incorrectly reported as 9, not 11. It's even better visible if you use a <div>:

( new window.DOMParser() )
    .parseFromString( "<div>Not a <<b>well-formed</b> xml string</div>", "text/xml" )
    .getElementsByTagName( "parsererror" )[ 0 ]
    .textContent

The end result is:

"XML Parsing Error: not well-formed
Location: https://github.com/jquery/jquery/runs/4926000068?check_suite_focus=true
Line Number 1, Column 11:iv>Not a <<b>well-formed</b> xml string</div>
----------^"

Note the iv> part with a missing <d.

This used to work fine and it still seems to work fine in Firefox 96 on Linux. On my macOS 11.6.2 (20G314) it fails in Firefox 96.0.2, though, as well as in the latest Nightly. Chrome & Safari properly report column 11.

This sounds a bit niche but it breaks the jQuery test suite which has a test ensuring we're printing the correct column in the XML parsing error which depends on the browser reporting.

I'm filing this under "DOM: HTML Parser"; is there a better component for XML parsing issues?

I see the reported behavior on Linux Nightly and can confirm that Nightly says 9 and Firefox release and Chrome say 11.

peterv, any idea what might have changed here?

Component: DOM: HTML Parser → XML
Flags: needinfo?(peterv)
OS: macOS → All
Hardware: x86_64 → All
Summary: XML parsererror eats two first letters on macOS → XML parsererror eats two first letters

(never mind, I misread the last comment)

Ugh, this might be a regressions from bug 1749462.

Assignee: nobody → peterv
Status: NEW → ASSIGNED
Flags: needinfo?(peterv)

(In reply to Peter Van der Beken [:peterv] from comment #3)

Ugh, this might be a regressions from bug 1749462.

How so? That bug was fixed 8 days ago and just on the v98 branch. As I reported, the issue exists even on Firefox 96 on macOS, i.e. the current stable version.

Firefox 96.0.2 on Windows is broken, too. Firefox 91.5.0esr on Windows works.

Regression range:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=d03f875556391582e06abbf647835af8ca59f94b&tochange=0555774e90d6a74678d28407e5941f2ea61608ce

I suspected RLBox, but it is a bit unexpected that this changeset enabled woff2, not expat. What happens?

Regressed by: 1742916

Set release status flags based on info from the regressing bug 1742916

Has Regression Range: --- → yes

(In reply to Masatoshi Kimura [:emk] from comment #5)

Firefox 96.0.2 on Windows is broken, too. Firefox 91.5.0esr on Windows works.

Regression range:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=d03f875556391582e06abbf647835af8ca59f94b&tochange=0555774e90d6a74678d28407e5941f2ea61608ce

I suspected RLBox, but it is a bit unexpected that this changeset enabled woff2, not expat. What happens?

That's fascinating. Presumably there's some subtle behavior change of the global rlbox environment that occurs when woff2 is included that somehow affects expat.

I think the order of operations would be: (1) Confirm the bug reproduces on a Nightly build. (2) Do a local debug build, and confirm it still reproduces. (3) Rebuild with the woff2 sandbox disabled, confirm the bug is fixed. (4) Identify the part of expat that's behaving differently. (5) Figure out why.

(In reply to Bobby Holley (:bholley) from comment #7)

(1) Confirm the bug reproduces on a Nightly build.

Although the reporter already confirmed, I double-checked. The bug reproduces with Nightly.

(2) Do a local debug build, and confirm it still reproduces.

I could not reproduce with a local debug build! We are summoning a nasal demon?

(In reply to Masatoshi Kimura [:emk] from comment #8)

I could not reproduce with a local debug build! We are summoning a nasal demon?

What about a local --enable-release build? Would also be good to verify that the problem then disappears when removing expat from wasm_sandboxing_libraries.

  • A local release (--disable-debug) build: broken
  • A local release build with expat removed from wasm_sandboxing_libraries: works
  • A local release build with woff2 removed from wasm_sandboxing_libraries: works

XML_GetCurrentColumnNumber seems to return the wrong value here: https://searchfox.org/mozilla-central/rev/7056a708787621758bef9793a93aa7ca8375eeef/parser/htmlparser/nsExpatDriver.cpp#1302-1303 It returns 8, so lastLineLength is set to 8 and consumed is 10. If I insert a call to XML_GetCurrentColumnNumber in nsExpatDriver::HandleCharacterData then things start working as expected again, so it feels like there's something going wrong with XmlUpdatePosition.

See Also: → 1752498

I added a test in bug 1752498 (parser/htmlparser/tests/mochitest/test_bug1752498.html), but had to disable it for now because of this bug. We should reenable it when this bug is fixed.

Severity: -- → S2

Peter has to add more debugging info to RLBox library first.

Attachment #9268954 - Attachment description: WIP: Bug 1751796 - XML parsererror eats two first letters. r?bholley! → Bug 1751796 - XML parsererror eats two first letters. r?bholley!
Pushed by pvanderbeken@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dc4b14788062 XML parsererror eats two first letters. r=bholley
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch

The patch landed in nightly and beta is affected.
:peterv, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(peterv)
Flags: needinfo?(peterv)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: