Closed Bug 1538056 Opened 1 year ago Closed 1 year ago

Debugger provides odd column breakpoint positions in HTML file

Categories

(DevTools :: Debugger, defect, P1)

defect

Tracking

(firefox70 fixed)

RESOLVED FIXED
Firefox 70
Tracking Status
firefox70 --- fixed

People

(Reporter: davidwalsh, Assigned: bhackett1024)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [debugger-mvp])

Attachments

(6 files, 1 obsolete file)

Steps to reproduce:

  1. Go to an amazing blog: https://davidwalsh.name/
  2. Open "(index)"
  3. See that the only line available for adding a breakpoint is line 5
  4. Click line 5 in the gutter
  5. You'll see column breakpoint markers at position 0 and 65...but they're in the middle of HTML -- no JavaScript in sight.

There is a small JavaScript snippet ~1500 characters in, which I'm guessing the markers should be referencing, but I'm confident they shouldn't display in the middle of HTML.

Attached image WeirdBPPosition.png

I wondered if anyone would actually run into something like this but I never got around to tracking it anywhere or actually testing it out.

I don't think Gecko sets correct starting column positions for <script> elements. This means that for

<script>
  console.log(4);
</script>

we are fine, because line 1 is the line with <script> and there is no code on that line and thus no column breakpoints.

However if you load up a page with just a single line script like

<script>console.log(4);</script>

then things get hairy because SpiderMonkey needs to know that the script doesn't actually start at the first column.

Even in writing this up I didn't get exactly what I expected, so really I think the answer is that this just needs some research and work.

Priority: -- → P3
Duplicate of this bug: 1544439
Priority: P3 → P2
Assignee: nobody → lsmyth
Status: NEW → ASSIGNED
Whiteboard: [debugger-mvp]
Priority: P2 → P1
Priority: P1 → P2

Looked into this yesterday. The core of the issue is that our HTML parser currently just doesn't have a concept of column number at all, so there isn't any column value to give to SpiderMonkey about the starting location of the inline script.

There are actually two separate tasks that can be done here, so I'm going to split this into two bugs:

I talked to @hsivonen, who said he'd be able to look at how hard that would be to add to the parser. Performance is the main overall concern, so there's more benchmarking involved than anything, so I'm not sure it makes sense for me to put a bunch of time into coming up to speed on all that.

Column information from <script> elements should be passed to SpiderMonkey when processing inline scripts

Looking into this more, there's actually more issues here though I'll leave it lumped into this bug for now.

In XHTML files, the line/column given to <script> elements is the start of the <script>, so if you do

<script
  type="application/javascript">
  console.log("foo");
</script>

The type="application/javascript"> line is marked breakable because it is "line 2" after the <script part, where it should be console.log("foo"); that is breakable.

In HTML files, the line (there is no column pending https://bugzilla.mozilla.org/show_bug.cgi?id=1552008) given to <script> is the end of the <script> tag, so the above works fine, but then you have the confusing behavior that

<script 
  src=""
>
</script>

will log a warning to the console ‘src’ attribute of <script> element is empty., but the line number associated with the warning is the closing > in the opening <script>.

It seems like nsIScriptElement objects should have both a line/column for the start of the <script and a line/column for just after the <script> opening tag. That way errors and things can use the first position, and inline scripts would use the second pair.

@hsivonen

Ignoring column numbers entirely for now, given this and the above comments, do you have any thoughts?

The HTML parser must currently advance the line number before the tree builder creates the element, meaning that it is tough to get access to the start line of the <script> when it spans multiple lines. I guess we could just always use the end line/column of the opening tag like we do now, but with the addition of the column number, but it seems like that would be surprising for script error messages to use the ending > of the script as the position for error messages. I suppose we could just hardcode the messages to column 1 but that's also weird.

On the other side, the XML parser has the opposite problem because it only provides the element-start line/column and it looks like we would need to change the current nsIExpatSink interface so we could pass line/column on every call to HandleCharacterData in order to just allow us to get the start line/column of text in the specific case where it is inside of a <script> element. Then we'd also have to HandleEndElement in case it's an empty-string inline script, and in the case of HandleCDataSection the nsExpatDriver actually merges all of the text together first, so that would also need special handling to allow us to get the line/column of the text inside the CDATA, unless we wanted to be hacky and get the line-column of the CDATA start and add '<![CDATA['.length to the column value.

Flags: needinfo?(hsivonen)

Attachment 9065350 [details] [diff] in the other bug should enable further development without you having to set up the Java to C++ translation.

The line each attribute ended on is recorded as part of the attribute holder, if you really want the right line number for the src attribute.

To calibrate the start line of the script content, it seems that the HTML parser's behavior of reporting the location of the > character of the script start tag already works. I suggest not trying to make everything elegant for scripts whose start tag spans multiple lines.

Currently, there's no way to tell if an SVG script created by the HTML parser had <![CDATA[. I suggest letting the column number for the first line of the script to just be wrong in that case.

I haven't examined the expat internals on this point.

Flags: needinfo?(hsivonen)

The line each attribute ended on is recorded as part of the attribute holder, if you really want the right line number for the src attribute.

I think in this context the line of the <script> makes sense in this context, so that's probably fine. It just happens the warning with src happened to be an easy example, but there may be others that aren't about src.

I suggest not trying to make everything elegant for scripts whose start tag spans multiple lines.

Yeah, I think I'm fine with tabling that for a later discussion. It seems like a whole other big question. I mostly brought it up because it was what the XML parser was already doing.

Currently, there's no way to tell if an SVG script created by the HTML parser had <![CDATA[. I suggest letting the column number for the first line of the script to just be wrong in that case.

Fair enough, I'm happy to focus on the specific case of HTML <script> in this bug.

I haven't examined the expat internals on this point.

Same.

P1 as it did come up in feedback from the quality blogpost, confirming that this is affecting a lot of use cases.

Priority: P2 → P1
Duplicate of this bug: 1554090
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Confirmed with Logan - changing status back to 'New' as this bug will be reviewed by the team next week. Leaving Logan on as assignee for follow up.

Status: ASSIGNED → NEW

This is a WIP patch of the work I've done so far on this. It sets up a framework for the position information from the HTML and XML parsers, but does not address the overall issue without more in https://bugzilla.mozilla.org/show_bug.cgi?id=1552008 so not sure it makes sense to land it on its own.

As an alternate approach here, would it be reasonable to do this entirely in the debugger client? It has access to the contents of the <script> (which were passed to spidermonkey) and the contents of the HTML (which it fetched from the server). It can just grep for the script contents in the HTML file to make a guess at the starting column of the <script>. It would be fooled by things like:

<span>this.foo()</span><script>this.foo()</script>

But even in this case the column breakpoint locations it found would at least make a little sense, even if they were inside the <span> instead of the <script>. Things like this could be fixed as well (though maybe not having multiple identical scripts on the same line) though there is a point of diminishing returns.

I hate to make a suggestion like "Let's use a giant hack!", but keeping track of column numbers in the HTML tokenizer seems like it will be (a) complicated to get right, and (b) have a measurable perf impact and be difficult to land. I also have no familiarity with the HTML tokenizer whatsoever. Getting a 95% solution landed quickly would at least take the pressure off and buy time to see if a more complete solution is necessary and explore options for it.

I think a 95% solution seems reasonable here. Do you think we could do something similar in the server? I think we could update the SourceActor so that the columnOffset is calculated in this case and use this for getBreakableLocations.

One concern is that we already have a bunch of bugs filed around HTML files failing to load because we re-fetch the content with different credentials. If we had a solid framework in place to ensure that we were actually getting exactly the same asset that was parsed the first time, I'd be a little less concerned with this approach.

That said, also keep in mind that this column value doesn't just apply to the debugger itself, it also applies to things like console.log and new Error, so the column value really is something we want SpiderMonkey to 100% know about ahead or time.

(In reply to Logan Smyth [:loganfsmyth] from comment #16)

One concern is that we already have a bunch of bugs filed around HTML files failing to load because we re-fetch the content with different credentials. If we had a solid framework in place to ensure that we were actually getting exactly the same asset that was parsed the first time, I'd be a little less concerned with this approach.

I don't know, this problem actually makes me think that it's more important to be able to compare script sources with the HTML file content. Say we got exact column numbers from the HTML parser, but the HTML file we got from the server was wrong because it was refetched with different credentials. This still leaves the client unable to show something coherent to the user. Instead, if the client (or server) is checking the HTML's file contents against the contents of the source, it can fallback to showing the script source itself (with no surrounding HTML) if it determines the HTML contents it has are bogus.

FWIW when using web replay we save the contents of all HTML files received by the process and will deliver the exact contents to the client, but this entails extra memory usage in the content process which is not workable in the normal debugging context I think.

That said, also keep in mind that this column value doesn't just apply to the debugger itself, it also applies to things like console.log and new Error, so the column value really is something we want SpiderMonkey to 100% know about ahead or time.

Maybe, but this reminds me of how we have to handle stacks and locations when the script has no usable URL (eval etc.). We associate a source ID with the location info, and as long as the source hasn't been GC'ed we can find its Debugger.Source. If there is a column displacement available for the Debugger.Source, it can be applied before sending the location to the client, even if the column available to spidermonkey is wrong. It sounds complicated, but doesn't have to be if the right abstractions are in place.

This does point towards handling things in the server instead of the client, though.

Assignee: lsmyth → bhackett1024

I don't disagree for console.log and devtools in general, but setting aside the question of HTML loading for now, the primary thing is that even when the devtools are inactive, the column number still affects programmatic usage of Error object .stack and .columnNumber properties, and those always need to be available synchronously during JS evaluation.

For what it's worth, I 100% understand the concerns about performance, that's why rather than keeping an active column count, I was in favor of only calculating the column value when we explicitly request it from the tokenizer after seeing a <script> (and any time before that needed to get the right value, like at the end of a buffer).

Attached patch patchSplinter Review

This patch has a mixture of fixes. I started with the above idea of figuring out where the <script> (probably) starts in the HTML file's line by comparing the source text and HTML file text in the server, then adjusting the column numbers for breakpoints so that they are rendered correctly by the client.

This works, but runs into a problem when visiting a page for the first time and expecting breakpoint hits to occur. In order to adjust column breakpoints we need the HTML file's text, but scripts start running before the HTML file has been completely loaded. Adjusting columns won't happen until the file has loaded, which is after the point where the breakpoint hit should occur.

To resolve this, the patch extends the web replay functionality to report HTML file contents by also reporting it to the devtools when they are open (using the docshell->GetWatchedByDevtools() logic recently added by bug 1546736). This avoids any extra overhead when the devtools are not open. The devtools listens to these notifications and fill in the HTML file contents as they are received from the network, allowing the server to determine the starting column displacement before the HTML file has finished loading, and hit column breakpoints in the inline scripts correctly.

This also means that when the devtools are initially open on page load, the HTML file contents it has are exactly those which are running on the page (when the devtools are opened on an existing page, fetch() is still used to get possibly-incorrect HTML contents). Even if we end up wanting another mechanism to give the starting column of the source directly to spidermonkey, this change will still be nice to retain.

Status: NEW → ASSIGNED
Attachment #9068761 - Attachment is obsolete: true
Attachment #9068766 - Attachment is obsolete: true
Assignee: bhackett1024 → nobody
Status: ASSIGNED → NEW
Attachment #9068758 - Attachment is obsolete: true
Attachment #9068773 - Attachment is obsolete: true
Assignee: nobody → bhackett1024
Attachment #9068758 - Attachment is obsolete: false
Attachment #9068761 - Attachment is obsolete: false
Attachment #9068766 - Attachment is obsolete: false
Attachment #9068773 - Attachment is obsolete: false
Status: NEW → ASSIGNED
Blocks: dbg-70
No longer blocks: dbg-69
Depends on: 1519855
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6d39582362de
Part 1 - Notify devtools about HTML file contents, r=hsivonen.
https://hg.mozilla.org/integration/autoland/rev/0b45a2b8a589
Part 2 - Use HTML file contents from parser in devtools server when possible, r=loganfsmyth.
https://hg.mozilla.org/integration/autoland/rev/f5f50f8e1c21
Part 3 - Adjust breakpoint positions in inline scripts by the starting column offset, r=loganfsmyth.
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f05488465c0d
Part 4 - Add test for setting column breakpoints in inline scripts, r=loganfsmyth.
Attachment #9068096 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.