Closed Bug 922681 Opened 6 years ago Closed 6 years ago

Consider a SetInnerHTML fast-path for text with no markup

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: bzbarsky, Assigned: jandem)

References

Details

(Whiteboard: [qa-])

Attachments

(2 files, 2 obsolete files)

See bug 922016 comment 2.  Basically, if the input string does not contain any of '<', '&', '\r', '\0', then it's plaintext and we don't have to spin up an HTML parser at all.

The thing is, checking for those chars itself takes time, so we may want to only do this for short strings.  Especially because the longer a string is the more likely it is to have markup in it.

I thought we had a bug on this already, but I don't see it...
Oh, Henri, where does the "depends on the element" bit come in?
Flags: needinfo?(hsivonen)
I was thinking of <pre>, but it looks like I was wrong.
Flags: needinfo?(hsivonen)
Attached patch Patch (obsolete) — Splinter Review
This seems to work on some cases I tried. Please review carefully because I don't really understand everything that's going on in these methods.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Attachment #825258 - Flags: review?(bzbarsky)
Comment on attachment 825258 [details] [diff] [review]
Patch

Requesting feedback? from hsivonen as suggested by smaug :)
Attachment #825258 - Flags: feedback?(hsivonen)
Attachment #825258 - Flags: feedback?(hsivonen) → feedback+
Some numbers for the testcase in bug 922016:

Safari: 1180 ms
FF new: 1250 ms
Chrome: 1550 ms
FF old: 1950 ms

So this wins about 35% and we're faster than Chrome and not that much slower than Safari. We can win more by optimizing other parts of that benchmark like the |clear.style.clear = "both"|.
How much does the patch slow down the cases when there is markup.
http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html for example, though
might be good to write a worst case test too.
(In reply to Olli Pettay [:smaug] from comment #6)
> How much does the patch slow down the cases when there is markup.
> http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html for
> example,

I don't see much of a difference: 12.2-12.5 ms both with and without the patch. It makes sense though because the first character is < so we will stop scanning immediately.

> though might be good to write a worst case test too.

A worst-case test would look like the one below: string with length < 100 and near the end a markup character. Unfortunately the numbers are very noisy but if I look at the lowest number in 10 runs it's probably 3-4% slower or so. But again, it's hard to say much due to the high variance :(

<script>
function test() {
    var el = document.getElementById("foo");
    var t = new Date;
    for (var i=0; i<200000; i++)
        el.innerHTML = "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456&";
    el.innerHTML = (new Date - t) + "ms";
}
window.onload = test;
</script>
<div id="foo"></div>
bz: review ping, in case you forgot about this.
I didn't; I'm just still somewhat backlogged.

Should get to this today.
Comment on attachment 825258 [details] [diff] [review]
Patch

>+  nsAString::const_iterator start, end;
>+  aStr.BeginReading(start);
>+  aStr.EndReading(end);

Clearer to just do:

  const PRUnichar* start = aStr.BeginReading();
  const PRUnichar* end = aStr.EndReading();

Please document that you're not using FindCharInSet because null is one of the chars you want to search for.

> +  if (doc->IsHTML() && aInnerHTML.Length() < 100 && !ContainsMarkup(aInnerHTML)) {

Why do we need the IsHTML() test there?  Please document the reason, because it's very non-obvious to me why that test is needed.

Also, document that we limit to short strings to avoid ContainsMarkup taking too long.  And if there was a principled way to pick "100" document that, else document that the choice was pretty much gut feeling.

>+    aError = nsContentUtils::SetNodeTextContent(target, aInnerHTML, false);

This could also be:

  target->FragmentOrElement::SetTextContentInternal(aInnerHTML, aError);

(fully qualified to avoid the virtual dispatch).  Not sure which is uglier. The non-ugly version would be:

  target->SetTextContent(aInnerHTML, aError);

but that has an extra virtual call...  I guess explicit SetNodeTextContent is ok for now.

r=me with the nits.
Attachment #825258 - Flags: review?(bzbarsky) → review+
Attached patch Patch v2 (obsolete) — Splinter Review
I added the doc->IsHtml() test just to be conservative but if you also don't see a good reason for it we may as well remove it. This patch does that and addresses your other nits. Requesting review just to make sure you're okay with this change.
Attachment #825258 - Attachment is obsolete: true
Attachment #830832 - Flags: review?(bzbarsky)
Comment on attachment 830832 [details] [diff] [review]
Patch v2

r=me
Attachment #830832 - Flags: review?(bzbarsky) → review+
Hm this patch is failing this M5 test:

parser/htmlparser/tests/mochitest/test_html5_tree_construction_part2.html

https://tbpl.mozilla.org/?tree=Try&rev=955c00609173

Will investigate today or tomorrow..
Hmm.  I wonder whether there's weirdness with setting innerHTML on tbody or whatnot (does that imply the tr/td tags?).
Here's a minimal testcase. With my patch context.innerHTML is empty and the test alerts FAIL:

function f() {
    var context = document.createElementNS("http://www.w3.org/1999/xhtml", "html");
    context.innerHTML = "";
    var pass = (context.innerHTML == "<head></head><body></body>");
    alert(pass ? "PASS" : "FAIL");
}
f();

Boris, do you know what's going on?
Flags: needinfo?(bzbarsky)
Yes, I do, unfortunately.  <body> and <head> have optional start and end tags in HTML.

If you look at http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-parsing-algorithm step 4 substep 6 lands us in http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately which does various special-casing bits.  In the particular case of <html> we end up in the "before head" insertion mode, and in the "data state" (from step 4 substep 1 of the fragment parsing algorithm) in this case.  Then in the data state, when EOF is encountered we get an end-of-file token.

Looking at http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#the-before-head-insertion-mode an EOF falls into the "Anything else" clause, which will create a <head> element and switch to the "in head" insertion mode.

Then we land in http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inhead and fall into "Anything else" (still with our EOF token), which moves us to "after head" mode.

Then we go to http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#the-after-head-insertion-mode and again hit "Anything else" and create a <body> element move to the "in body" mode.

Anyway, long story short we probably need to not do this optimization in cases when the context element is anything that would lead to an insertion mode other than "in body" per http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately 

Unfortunately, there's quite a long list of things that have that behavior.  I wonder whether it's worth just using a boolean flag on nodeinfo (or on the context element, since we have better flag infrastructure there) to keep track of this stuff.

Or alternately to only fast-path innerHTML sets on a whitelist of elements that are commonly hit.  :(
Flags: needinfo?(hsivonen)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
(In reply to Boris Zbarsky [:bz] from comment #16)
> Unfortunately, there's quite a long list of things that have that behavior. 
> I wonder whether it's worth just using a boolean flag on nodeinfo (or on the
> context element, since we have better flag infrastructure there) to keep
> track of this stuff.

Makes sense for me. I forgot to test those. I just tested <pre> as a potential special case earlier.
Flags: needinfo?(hsivonen)
Attached patch Patch v3Splinter Review
(In reply to Boris Zbarsky [:bz] from comment #16)
> Anyway, long story short we probably need to not do this optimization in
> cases when the context element is anything that would lead to an insertion
> mode other than "in body" per
> http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.
> html#reset-the-insertion-mode-appropriately 

This patch sets the ElementHasWeirdParserInsertionMode flag on the elements mentioned there. This fixes the minimal testcase in comment 15.

For some elements like <td> this may be overzealous: the "in cell" insertion mode should only behave != "in body" for start/end tags (right?), and we filter these out already. Not setting the flag may be confusing or break other uses of it in the future. What do you think?

Also, where can I do this for the colgroup element? This was the only element I couldn't find :)
Attachment #830832 - Attachment is obsolete: true
Attachment #8348339 - Flags: review?(bzbarsky)
Comment on attachment 8348339 [details] [diff] [review]
Patch v3

You need to set the flag in HTMLTableColElement as well, at least for the colgroup case.

r=me with that fixed and a test added that would have caught this issue.
Attachment #8348339 - Flags: review?(bzbarsky) → review+
Flags: needinfo?(bugs)
(In reply to Boris Zbarsky [:bz] from comment #19)
> r=me with that fixed and a test added that would have caught this issue.

The following test caught it (see comment 13):

parser/htmlparser/tests/mochitest/test_html5_tree_construction_part2.html

Or should I add a mochitest for the testcase in comment 15 based on that one?
Flags: needinfo?(bzbarsky)
Added the call to HTMLTableColElement and pushed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/86969d38372b

Green on Linux64/OS X now: https://tbpl.mozilla.org/?tree=Try&rev=27f2ebefc1ad
Flags: needinfo?(bzbarsky)
> The following test caught it (see comment 13):

No, I mean specifically for <colgroup> elements.  Or did that test catch the missing flag on those too?
Flags: needinfo?(jdemooij)
(In reply to Boris Zbarsky [:bz] from comment #22)
> No, I mean specifically for <colgroup> elements.  Or did that test catch the
> missing flag on those too?

Sorry I misunderstood your comment. Let me try to write a test for <colgroup>.
Ideally we'd have tests for all of these element types, actually.
Attached patch TestcaseSplinter Review
Because we filter out strings with "<" etc, most of these tests pass even if I comment out the HasWeirdParserInsertionMode() check, but some of them fail.
Attachment #8348989 - Flags: review?(bzbarsky)
Flags: needinfo?(jdemooij)
https://hg.mozilla.org/mozilla-central/rev/86969d38372b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment on attachment 8348989 [details] [diff] [review]
Testcase

r=me
Attachment #8348989 - Flags: review?(bzbarsky) → review+
Whiteboard: [qa-]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.