Closed Bug 1552008 Opened 6 years ago Closed 3 years ago

HTML parser should supply column numbers for <script> elements

Tracking

()

Status:

RESOLVED FIXED

Milestone:

113 Branch

Tracking Flags:

Tracking

Status

firefox68

---

wontfix

firefox113

---

fixed

People

(Reporter: loganfsmyth, Assigned: hsivonen)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [debugger-mvp])

Attachments

(3 files, 1 obsolete file)

Untested tokenizer patch 6 years ago Henri Sivonen (:hsivonen) 4.59 KB, patch		Details \| Diff \| Splinter Review
Untested tokenizer patch, with column restoration after speculation 6 years ago Henri Sivonen (:hsivonen) 8.76 KB, patch		Details \| Diff \| Splinter Review
Bug 1552008 - Track column number in the HTML. 3 years ago Nicolas Chevobbe [:nchevobbe] 48 bytes, text/x-phabricator-request		Details \| Review
innerhtml_nonsimple_nostyle_test.html 3 years ago Nicolas Chevobbe [:nchevobbe] 1.10 KB, text/html		Details

Logan Smyth [:loganfsmyth]

Reporter

Description

•

6 years ago

This was spawned from https://bugzilla.mozilla.org/show_bug.cgi?id=1538056 but I'm going to file it separately since there is at least some work that can be done there on its own without this.

In 67, the JS debugger will begin treating breakpoints as line/column positions, where traditionally they have been treated as line-based. One of the outstanding issues is that while that is fine for

<script>
  console.log("foo");
</script>

where a breakpoint might be specifically at column 2 (the c in console.log)
we run into issues with

<script>console.log("foo");</script>

because the position of c, as far as the engine knows is 0, even though visually it is column 8, which causes our breakpoint markers to render in the wrong place.

While hand-written markup usually has that newline and doesn't tend to encounter this, we see it pop up in contexts where the inline code has been minified, which can be quite frustrating for users since debugging minified code is already tough and this will make it even tougher. Things can be especially frustrating if there are multiple <script> elements on the same line, because then it isn't just that the breakpoints are rendered off by 8 characters, but also that there could in fact be multiple sets of breakpoints overlapping.

The nsIScriptElement type already has a SetScriptColumnNumber, which is set by the XML parser, but not the HTML parser, so that would be the objective of this bug.

The XML case also doesn't work, so I'm going to co-opt https://bugzilla.mozilla.org/show_bug.cgi?id=1538056 to address that (we just never used the column from nsIScriptElement at all right now). Then once this lands we should be all set.

Henri Sivonen (:hsivonen)

Assignee

Comment 1

•

6 years ago

Attached patch Untested tokenizer patch (obsolete) — Details — Splinter Review

This patch is untested beyond checking that it compiles. This is the part that is affected by Java to C++ translation. The rest should be buildable on top of this in non-generated C++.

Please check the impact on the innerHTML setter early (with text input that has no tags).

Logan Smyth [:loganfsmyth]

Reporter

Comment 2

•

6 years ago

One issue here is that this is a code-unit column count, but SpiderMonkey's column values expect a code-point count, so it's not just an char16_t indexed counter that we need, but a codepoint count.

The alternate approach I've thought about is storing the start offset of the current line start, and in the rare case where getColumnNumber is actually called, it could run through the buffer between that location and the current location, calculating the codepoint offset. That has the downside of the tokenizer needing to be aware of the current buffer and position, which are currently scoped to only the stateLoop, but it would potentially have less of a performance effect since checkChar would have to do less per character.

Thoughts?

Flags: needinfo?(hsivonen)

Marco Mucci [:MarcoM]

Updated

•

6 years ago

Whiteboard: [debugger-mvp]

Henri Sivonen (:hsivonen)

Assignee

Comment 3

•

6 years ago

(In reply to Logan Smyth [:loganfsmyth] from comment #2)

One issue here is that this is a code-unit column count, but SpiderMonkey's column values expect a code-point count, so it's not just an char16_t indexed counter that we need, but a codepoint count.

:-( Code point counts instead of code unit counts are generally bad when identifying a position in a text buffer, but I guess these are supposed to "make sense" independently of buffer encoding in messages.

Since line numbers in document.written scripts are hopeless anyway and innerHTML-inserted scripts don't run, you can rely on data from the network being guaranteed to be valid UTF-16, so you can simply make checkChar not increment the column upon encountering a low surrogate.

The alternate approach I've thought about is storing the start offset of the current line start, and in the rare case where getColumnNumber is actually called, it could run through the buffer between that location and the current location, calculating the codepoint offset. That has the downside of the tokenizer needing to be aware of the current buffer and position, which are currently scoped to only the stateLoop, but it would potentially have less of a performance effect since checkChar would have to do less per character.

Since the tokenizer tokenizes a sequence of buffers such that a boundary can be anywhere, dealing with syncing things on buffer boundary is potentially complicated and/or error-prone, so I suggest seeing what the perf impact of the simple solution is first.

Which reminds me that my untested patch doesn't handle restoring the column number when rewinding the input stream after a failed speculation. I'll post a new patch.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 4

•

6 years ago

Attached patch Untested tokenizer patch, with column restoration after speculation — Details — Splinter Review

Attachment #9065350 - Attachment is obsolete: true

Hsin-Yi Tsai (she/her)[:hsinyi]

Updated

•

6 years ago

Priority: -- → P2

Jan Honza Odvarko [:Honza] (always need-info? me)

Updated

•

6 years ago

Blocks: dbg-embedded-scripts

Jan Honza Odvarko [:Honza] (always need-info? me)

Updated

•

6 years ago

Blocks: dbg-70

Marco Mucci [:MarcoM]

Updated

•

6 years ago

No longer blocks: dbg-69

Marco Mucci [:MarcoM]

Updated

•

6 years ago

Whiteboard: [debugger-mvp] → [debugger-reserve]

Marco Mucci [:MarcoM]

Updated

•

6 years ago

Blocks: dbg-71
No longer blocks: dbg-70

Whiteboard: [debugger-reserve] → [debugger-mvp]

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Nicolas Chevobbe [:nchevobbe]

Updated

•

3 years ago

Blocks: 1815937

Nicolas Chevobbe [:nchevobbe]

Comment 5

•

3 years ago

Attached file Bug 1552008 - Track column number in the HTML. — Details

Nicolas Chevobbe [:nchevobbe]

Comment 6

•

3 years ago

So I applied patch from Comment 4 and rebased + fixed compilation errors I was seeing, and this seems to give us the information we need.
I added a patch on top of this one to actually consume the tokenizer information and set the column on inline script load context (see https://phabricator.services.mozilla.com/D170580 , for Bug 1815937)

I pushed to TRY with mach try auto: https://treeherder.mozilla.org/jobs?repo=try&revision=cda6d06715d8418ee608012a86427889f9c990f1
Aside from the devtools failures, which we'll take care of, it looks like only one other test fails: parser/htmlparser/tests/mochitest/test_bug672453.html , but I'm not sure which message is missing and why

Henri, do you think you could take ownership of this patch, check that I didn't do anything stupid when rebasing and ask review to a peer?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

3 years ago

I read the rebased patch, and it looks OK.

smaug, the main concern here is if this affects innerHTML performance. Can you suggest a benchmark that should be used to check whether this is harmless enough for innerHTML perf?

Flags: needinfo?(hsivonen) → needinfo?(smaug)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

3 years ago

Answered on Matrix.

Flags: needinfo?(smaug)

Nicolas Chevobbe [:nchevobbe]

Comment 9

•

3 years ago

TRY push of speedometer with attached patch: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&newProject=try&newRevision=60d0dd9d309ecbae659042c7e4c9453575709c57&framework=13&originalRevision=574e19c9f0db5a543f7ca0b1cb0263c69df7045c&page=1

It's still ongoing, but there seems to be a (small) impact on jQuery-TodoMVC/Adding100Items, which smaug said we should pay attention to:

https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=60d0dd9d309ecbae659042c7e4c9453575709c57&originalSignature=4459372&newSignature=4459372&framework=13&originalRevision=574e19c9f0db5a543f7ca0b1cb0263c69df7045c&page=1&filter=jQuery-TodoMVC%2FAdding100Items

Henri Sivonen (:hsivonen)

Assignee

Comment 10

•

3 years ago

nchevobbe, does changing

inline char16_t checkChar(char16_t* buf, int32_t pos) {
  char16_t c = buf[pos];
  if (nextCharOnNewLine) {
    line++;
    col = 1;
    nextCharOnNewLine = false;
  } else if (!(c >= 0xDC00 && c <= 0xDFFF)) {
    // SpiderMonkey wants to count scalar values
    // instead of UTF-16 code units.
    col++;
  }
  return c;
}

inline char16_t checkChar(char16_t* buf, int32_t pos) {
  char16_t c = buf[pos];
  if (MOZ_UNLIKELY(nextCharOnNewLine)) {
    line++;
    col = 1;
    nextCharOnNewLine = false;
  } else if (MOZ_LIKELY(!NS_IS_LOW_SURROGATE(c))) {
    // SpiderMonkey wants to count scalar values
    // instead of UTF-16 code units.
    col++;
  }
  return c;
}

improve things?

Flags: needinfo?(nchevobbe)

Nicolas Chevobbe [:nchevobbe]

Comment 11

•

3 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #10)

nchevobbe, does changing
[…]
improve things?

maybe ? https://treeherder.mozilla.org/perfherder/compare?originalProject=try&newProject=try&newRevision=0e0670eaee59a7cbca0fa2564044f9c786551870&framework=13&originalRevision=2828aaa0c115a72516d7269aa59ca016c66f9f38&page=1

I feel like I don't see the same regression as before, but also, it looks like the results don't have a super high confidence.
Anyhow, it doesn't seem like there would be a very obvious regression, so it sounds okay?

Flags: needinfo?(nchevobbe)

Nicolas Chevobbe [:nchevobbe]

Comment 12

•

3 years ago

Talking with sparky on Element, it seems like overall, there is no regression on the speedometer score overall

Ok so it isn't so bad, but you got unlucky with a single outlier data point: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3448328,1,13&timerange=86400&zoom=1677840218136,1677843859926,122.8025877384645,161.669612285737
The same thing seems to have happened on linux: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3445603,1,13&timerange=86400&zoom=1677839357024,1677845474908,85.9995250547984,125.63516599693689
On mac there might be an improvement to the score if you ignore the outlier: https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&highlightedRevisions=2828aaa0c115&highlightedRevisions=0e0670eaee59&series=try,3448328,1,13&timerange=86400&zoom=1677840245133,1677843800247,127.58475711769015,136.84425414218742

Henri Sivonen (:hsivonen)

Assignee

Comment 13

•

3 years ago

smaug, are you OK with landing this with the refinement from comment 10?

Henri Sivonen (:hsivonen)

Assignee

Updated

•

3 years ago

Flags: needinfo?(smaug)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 14

•

3 years ago

•

Edited

I think I'm ok, but if we see some regression after all, it may need to get backed out.
With perf tests one need to run them quite a few times to get reliable numbers.

Flags: needinfo?(smaug)

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

3 years ago

nchevobbe, I suggest updating the patch with the change from comment 10 and requesting review from smaug. (It seems inappropriate for me to review, since most of the code in the patch comes from me.)

Once the patch is on m-c, I can then reconstruct the appropriate changeset for the htmlparser repo on GitHub.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

3 years ago

If you profile microbenchmark like http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html, does the new code show up in the profile? Might be useful to change that test to use a bit longer string for innerHTML.

Nicolas Chevobbe [:nchevobbe]

Comment 17

•

3 years ago

Attached file innerhtml_nonsimple_nostyle_test.html — Details

Here's the test page modified
The only thing I changed is that

// in loop
document.getElementById("speed_dom").innerHTML =
  "<b><span> Testing...</span> (" + i + "/1000)</b>"

into

const suffix = "<i></i>".repeat(100);
// ...
// in loop
document.getElementById("speed_dom").innerHTML =
  "<b><span> Testing...</span> (" + i + "/1000)</b>" + suffix

Here's a base profile from latest central: https://share.firefox.dev/3JfgPE4
And a profile with henri's latest patch: https://share.firefox.dev/3EWsN2D

I don't see too much differences in set Element.innerHTML timings, but I'm not sure I'm able to analyze those graphs accurately

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → nchevobbe

Attachment #9319147 - Attachment description: WIP: Bug 1552008 part 1 - Track column number in the HTML → Bug 1552008 - Track column number in the HTML. r=smaug.

Status: NEW → ASSIGNED

Nicolas Chevobbe [:nchevobbe]

Comment 18

•

3 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #15)

nchevobbe, I suggest updating the patch with the change from comment 10 and requesting review from smaug. (It seems inappropriate for me to review, since most of the code in the patch comes from me.)

Once the patch is on m-c, I can then reconstruct the appropriate changeset for the htmlparser repo on GitHub.

Patch queue rebased and pushed on phab https://phabricator.services.mozilla.com/D170579

Phabricator Automation

Updated

•

3 years ago

Attachment #9319147 - Attachment description: Bug 1552008 - Track column number in the HTML. r=smaug. → Bug 1552008 - Track column number in the HTML.

Pulsebot

Comment 19

•

3 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b5481ac73dc Track column number in the HTML. r=smaug,nchevobbe

Marian-Vasile Laza

Comment 20

•

3 years ago

Backed out changeset 1b5481ac73dc (Bug 1552008) for mochitest failures on test_bug672453.html.
Backout link
Push with failures <--> 1
Failure Log

Flags: needinfo?(nchevobbe)

Nicolas Chevobbe [:nchevobbe]

Comment 21

•

3 years ago

looks like the failure I was talking about in https://bugzilla.mozilla.org/show_bug.cgi?id=1552008#c6

Assignee: nchevobbe → hsivonen

Flags: needinfo?(nchevobbe) → needinfo?(hsivonen)

Nicolas Chevobbe [:nchevobbe]

Comment 22

•

3 years ago

This does fix the issue

diff --git a/parser/htmlparser/tests/mochitest/test_bug672453.html b/parser/htmlparser/tests/mochitest/test_bug672453.html
--- a/parser/htmlparser/tests/mochitest/test_bug672453.html
+++ b/parser/htmlparser/tests/mochitest/test_bug672453.html
@@ -104,7 +104,7 @@ var expectedErrors = [
     isWarning: true },
   { errorMessage: "The start of the document was reparsed, because there were non-ASCII characters in the part of the document that was unsuccessfully searched for a meta tag before falling back to the XML declaration syntax. A meta tag at the start of the head part should be used instead of the XML declaration syntax.",
     sourceName: "http://mochi.test:8888/tests/parser/htmlparser/tests/mochitest/file_bug672453_xml_speculation_fail.html",
-    lineNumber: 11,
+    lineNumber: 10,
     isWarning: true },
 ];

line 10 points to the closing html tag https://searchfox.org/mozilla-central/rev/cd2121e7d83af1b421c95e8c923db70e692dab5f/parser/htmlparser/tests/mochitest/file_bug672453_xml_speculation_fail.html#10

Henri Sivonen (:hsivonen)

Assignee

Comment 23

•

3 years ago

(In reply to Nicolas Chevobbe [:nchevobbe] from comment #22)

This does fix the issue

Thanks. Relanding queued with this tweak. Sorry about the delay.

Flags: needinfo?(hsivonen)

Pulsebot

Comment 24

•

3 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eccdf9dad71c Track column number in the HTML. r=smaug,nchevobbe

Cristian Tuns

Comment 25

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eccdf9dad71c

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox113: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 113 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox68: affected → wontfix

Henri Sivonen (:hsivonen)

Assignee

Comment 26

•

3 years ago

https://github.com/validator/htmlparser/commit/9448adcda185abd293499feb1ff9c0a18014baa7

You need to log in before you can comment on or make changes to this bug.