Closed Bug 922681 Opened 11 years ago Closed 11 years ago

Consider a SetInnerHTML fast-path for text with no markup

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla29

People

(Reporter: bzbarsky, Assigned: jandem)

References

Details

(Whiteboard: [qa-])

Attachments

(2 files, 2 obsolete files)

Patch 11 years ago Jan de Mooij [:jandem] 1.77 KB, patch	bzbarsky : review+ hsivonen : feedback+	Details \| Diff \| Splinter Review
Patch v2 11 years ago Jan de Mooij [:jandem] 1.91 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
Patch v3 11 years ago Jan de Mooij [:jandem] 10.72 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
Testcase 11 years ago Jan de Mooij [:jandem] 4.42 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

11 years ago

See bug 922016 comment 2.  Basically, if the input string does not contain any of '<', '&', '\r', '\0', then it's plaintext and we don't have to spin up an HTML parser at all.

The thing is, checking for those chars itself takes time, so we may want to only do this for short strings.  Especially because the longer a string is the more likely it is to have markup in it.

I thought we had a bug on this already, but I don't see it...

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

11 years ago

Oh, Henri, where does the "depends on the element" bit come in?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 2

•

11 years ago

I was thinking of <pre>, but it looks like I was wrong.

Flags: needinfo?(hsivonen)

Jan de Mooij [:jandem]

Assignee

Comment 3

•

11 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

This seems to work on some cases I tried. Please review carefully because I don't really understand everything that's going on in these methods.

Assignee: nobody → jdemooij

Status: NEW → ASSIGNED

Attachment #825258 - Flags: review?(bzbarsky)

Jan de Mooij [:jandem]

Assignee

Comment 4

•

11 years ago

Comment on attachment 825258 [details] [diff] [review]
Patch

Requesting feedback? from hsivonen as suggested by smaug :)

Attachment #825258 - Flags: feedback?(hsivonen)

Henri Sivonen (:hsivonen)

Updated

•

11 years ago

Attachment #825258 - Flags: feedback?(hsivonen) → feedback+

Jan de Mooij [:jandem]

Assignee

Comment 5

•

11 years ago

Some numbers for the testcase in bug 922016:

Safari: 1180 ms
FF new: 1250 ms
Chrome: 1550 ms
FF old: 1950 ms

So this wins about 35% and we're faster than Chrome and not that much slower than Safari. We can win more by optimizing other parts of that benchmark like the |clear.style.clear = "both"|.

Olli Pettay [:smaug][bugs@pettay.fi](vacation until July 14)

Comment 6

•

11 years ago

How much does the patch slow down the cases when there is markup.
http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html for example, though
might be good to write a worst case test too.

Jan de Mooij [:jandem]

Assignee

Comment 7

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #6)
> How much does the patch slow down the cases when there is markup.
> http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html for
> example,

I don't see much of a difference: 12.2-12.5 ms both with and without the patch. It makes sense though because the first character is < so we will stop scanning immediately.

> though might be good to write a worst case test too.

A worst-case test would look like the one below: string with length < 100 and near the end a markup character. Unfortunately the numbers are very noisy but if I look at the lowest number in 10 runs it's probably 3-4% slower or so. But again, it's hard to say much due to the high variance :(

<script>
function test() {
    var el = document.getElementById("foo");
    var t = new Date;
    for (var i=0; i<200000; i++)
        el.innerHTML = "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456&";
    el.innerHTML = (new Date - t) + "ms";
}
window.onload = test;
</script>
<div id="foo"></div>

Jan de Mooij [:jandem]

Assignee

Comment 8

•

11 years ago

bz: review ping, in case you forgot about this.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

11 years ago

I didn't; I'm just still somewhat backlogged.

Should get to this today.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

11 years ago

Comment on attachment 825258 [details] [diff] [review]
Patch

>+  nsAString::const_iterator start, end;
>+  aStr.BeginReading(start);
>+  aStr.EndReading(end);

Clearer to just do:

  const PRUnichar* start = aStr.BeginReading();
  const PRUnichar* end = aStr.EndReading();

Please document that you're not using FindCharInSet because null is one of the chars you want to search for.

> +  if (doc->IsHTML() && aInnerHTML.Length() < 100 && !ContainsMarkup(aInnerHTML)) {

Why do we need the IsHTML() test there?  Please document the reason, because it's very non-obvious to me why that test is needed.

Also, document that we limit to short strings to avoid ContainsMarkup taking too long.  And if there was a principled way to pick "100" document that, else document that the choice was pretty much gut feeling.

>+    aError = nsContentUtils::SetNodeTextContent(target, aInnerHTML, false);

This could also be:

  target->FragmentOrElement::SetTextContentInternal(aInnerHTML, aError);

(fully qualified to avoid the virtual dispatch).  Not sure which is uglier. The non-ugly version would be:

  target->SetTextContent(aInnerHTML, aError);

but that has an extra virtual call...  I guess explicit SetNodeTextContent is ok for now.

r=me with the nits.

Attachment #825258 - Flags: review?(bzbarsky) → review+

Jan de Mooij [:jandem]

Assignee

Comment 11

•

11 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

I added the doc->IsHtml() test just to be conservative but if you also don't see a good reason for it we may as well remove it. This patch does that and addresses your other nits. Requesting review just to make sure you're okay with this change.

Attachment #825258 - Attachment is obsolete: true

Attachment #830832 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

11 years ago

Comment on attachment 830832 [details] [diff] [review]
Patch v2

r=me

Attachment #830832 - Flags: review?(bzbarsky) → review+

Jan de Mooij [:jandem]

Assignee

Comment 13

•

11 years ago

Hm this patch is failing this M5 test:

parser/htmlparser/tests/mochitest/test_html5_tree_construction_part2.html

https://tbpl.mozilla.org/?tree=Try&rev=955c00609173

Will investigate today or tomorrow..

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

11 years ago

Hmm.  I wonder whether there's weirdness with setting innerHTML on tbody or whatnot (does that imply the tr/td tags?).

Jan de Mooij [:jandem]

Assignee

Comment 15

•

11 years ago

Here's a minimal testcase. With my patch context.innerHTML is empty and the test alerts FAIL:

function f() {
    var context = document.createElementNS("http://www.w3.org/1999/xhtml", "html");
    context.innerHTML = "";
    var pass = (context.innerHTML == "<head></head><body></body>");
    alert(pass ? "PASS" : "FAIL");
}
f();

Boris, do you know what's going on?

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 16

•

11 years ago

Yes, I do, unfortunately.  <body> and <head> have optional start and end tags in HTML.

If you look at http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-parsing-algorithm step 4 substep 6 lands us in http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately which does various special-casing bits.  In the particular case of <html> we end up in the "before head" insertion mode, and in the "data state" (from step 4 substep 1 of the fragment parsing algorithm) in this case.  Then in the data state, when EOF is encountered we get an end-of-file token.

Looking at http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#the-before-head-insertion-mode an EOF falls into the "Anything else" clause, which will create a <head> element and switch to the "in head" insertion mode.

Then we land in http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inhead and fall into "Anything else" (still with our EOF token), which moves us to "after head" mode.

Then we go to http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#the-after-head-insertion-mode and again hit "Anything else" and create a <body> element move to the "in body" mode.

Anyway, long story short we probably need to not do this optimization in cases when the context element is anything that would lead to an insertion mode other than "in body" per http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately 

Unfortunately, there's quite a long list of things that have that behavior.  I wonder whether it's worth just using a boolean flag on nodeinfo (or on the context element, since we have better flag infrastructure there) to keep track of this stuff.

Or alternately to only fast-path innerHTML sets on a whitelist of elements that are commonly hit.  :(

Flags: needinfo?(hsivonen)

Flags: needinfo?(bzbarsky)

Flags: needinfo?(bugs)

Henri Sivonen (:hsivonen)

Comment 17

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #16)
> Unfortunately, there's quite a long list of things that have that behavior. 
> I wonder whether it's worth just using a boolean flag on nodeinfo (or on the
> context element, since we have better flag infrastructure there) to keep
> track of this stuff.

Makes sense for me. I forgot to test those. I just tested <pre> as a potential special case earlier.

Flags: needinfo?(hsivonen)

Jan de Mooij [:jandem]

Assignee

Comment 18

•

11 years ago

Attached patch Patch v3 — Details — Splinter Review

(In reply to Boris Zbarsky [:bz] from comment #16)
> Anyway, long story short we probably need to not do this optimization in
> cases when the context element is anything that would lead to an insertion
> mode other than "in body" per
> http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.
> html#reset-the-insertion-mode-appropriately 

This patch sets the ElementHasWeirdParserInsertionMode flag on the elements mentioned there. This fixes the minimal testcase in comment 15.

For some elements like <td> this may be overzealous: the "in cell" insertion mode should only behave != "in body" for start/end tags (right?), and we filter these out already. Not setting the flag may be confusing or break other uses of it in the future. What do you think?

Also, where can I do this for the colgroup element? This was the only element I couldn't find :)

Attachment #830832 - Attachment is obsolete: true

Attachment #8348339 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 19

•

11 years ago

Comment on attachment 8348339 [details] [diff] [review]
Patch v3

You need to set the flag in HTMLTableColElement as well, at least for the colgroup case.

r=me with that fixed and a test added that would have caught this issue.

Attachment #8348339 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

11 years ago

Flags: needinfo?(bugs)

Jan de Mooij [:jandem]

Assignee

Comment 20

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #19)
> r=me with that fixed and a test added that would have caught this issue.

The following test caught it (see comment 13):

parser/htmlparser/tests/mochitest/test_html5_tree_construction_part2.html

Or should I add a mochitest for the testcase in comment 15 based on that one?

Flags: needinfo?(bzbarsky)

Jan de Mooij [:jandem]

Assignee

Comment 21

•

11 years ago

Added the call to HTMLTableColElement and pushed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/86969d38372b

Green on Linux64/OS X now: https://tbpl.mozilla.org/?tree=Try&rev=27f2ebefc1ad

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 22

•

11 years ago

> The following test caught it (see comment 13):

No, I mean specifically for <colgroup> elements.  Or did that test catch the missing flag on those too?

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Assignee

Comment 23

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #22)
> No, I mean specifically for <colgroup> elements.  Or did that test catch the
> missing flag on those too?

Sorry I misunderstood your comment. Let me try to write a test for <colgroup>.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 24

•

11 years ago

Ideally we'd have tests for all of these element types, actually.

Jan de Mooij [:jandem]

Assignee

Comment 25

•

11 years ago

Attached patch Testcase — Details — Splinter Review

Because we filter out strings with "<" etc, most of these tests pass even if I comment out the HasWeirdParserInsertionMode() check, but some of them fail.

Attachment #8348989 - Flags: review?(bzbarsky)

Flags: needinfo?(jdemooij)

Ryan VanderMeulen [:RyanVM]

Comment 26

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/86969d38372b

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla29

Boris Zbarsky [:bzbarsky]

Reporter

Comment 27

•

11 years ago

Comment on attachment 8348989 [details] [diff] [review]
Testcase

r=me

Attachment #8348989 - Flags: review?(bzbarsky) → review+

Jan de Mooij [:jandem]

Assignee

Comment 28

•

11 years ago

Tests:

https://hg.mozilla.org/integration/mozilla-inbound/rev/2ca8e48896c4

Carsten Book [:Tomcat]

Comment 29

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/2ca8e48896c4

Ioana (away)

Updated

•

11 years ago

Whiteboard: [qa-]

Nobody; OK to take it and work on it

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.