Open Bug 1694175 Opened 4 years ago Updated 4 years ago

Install a telemetry probe for HTML parse errors on UI documents

Categories

(Core :: DOM: HTML Parser, task, P5)

task

Tracking

()

People

(Reporter: zbraniecki, Unassigned)

References

(Blocks 1 open bug)

Details

As part of bug 1675823 and analogously to JS probe being added in bug 1694067, we'd like to have a probe that fires an Event when HTML document encounters a parse error.

It may not be an YSOD, but it likely leads to UI breakage.

I'm going to add the Event in JS bug, and for this bug I'd like to ask:

  • Where in HTML parser code such probe should be introduced?
  • How to get the file name and line/col and potentially error code?

We don't run the tokenizer error detection code normally. We instantiate a second copy of the tokenizer loop with error detection for view source:
https://searchfox.org/mozilla-central/rev/c8ce16e4299a3afd560320d8d094556f2b5504cd/parser/html/nsHtml5Tokenizer.cpp#436

It would be possible to have non-view source conditions for running the version of the loop that now runs for view source.

For tree builder errors, we do a run-time check for view sourceness:
https://searchfox.org/mozilla-central/rev/c8ce16e4299a3afd560320d8d094556f2b5504cd/parser/html/nsHtml5TreeBuilderCppSupplement.h#1469

Errors that are signs of the IO layer being broken are more likely to be tokenizer errors.

I'm not sure what the cost of running the error-reporting instance of the tokenizer for UI would be.

Could an alternative approach be that we store the length of all these resources and compare that after loading them from disk to ensure it got read in its entirety? (Wouldn't catch bit-flipping, but this might not either.)

Why does this block bug 1675823? Aren't we using xhtml for the UI docs?
(and if we aren't, that is worrisome since it means we aren't getting the benefits from prototype docs)

Severity: -- → N/A
Flags: needinfo?(zbraniecki)
Priority: -- → P5

Why does this block bug 1675823? Aren't we using xhtml for the UI docs?

We use both. Most of the UI uses XHTML, but a growing number of non-primary UIs use HTML: https://searchfox.org/mozilla-central/search?q=&path=.html&case=true&regexp=false

Flags: needinfo?(zbraniecki)

(In reply to Anne (:annevk) from comment #2)

Could an alternative approach be that we store the length of all these resources and compare that after loading them from disk to ensure it got read in its entirety? (Wouldn't catch bit-flipping, but this might not either.)

I like this option much better than making the HTML parser report errors in the non-View Source case.

You need to log in before you can comment on or make changes to this bug.