Closed Bug 386769 Opened 17 years ago Closed 16 years ago

Make setting innerHTML faster by caching innerHTML's parser and content sink

Categories

(Core :: DOM: Core & HTML, defect, P1)

All
Windows Vista
defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta4

People

(Reporter: mtschrep, Assigned: bent.mozilla)

References

(Blocks 1 open bug, )

Details

(Keywords: perf)

Attachments

(5 files, 2 obsolete files)

While running the uBenchmarks at the URL above I get the following results:

IE7: DOM speed 511ms
Minefield(2007070304): DOM speed 482ms
Safari 3 beta: DOM speed 54ms

The DOM speed benchmark is just a tight loop of:

document.getElementById('speed_dom').innerHTML = "Testing... ("+i+"/1000)"; 
faster in Safari

Entire test is below. 

function DOMSpeed() {
    var startTime = new Date() ;
    for( var i = 0; i <= 1000; i++ )
    {
		document.getElementById('speed_dom').innerHTML = "Testing... ("+i+"/1000)";
    }
    var endTime = new Date();
    document.getElementById('speed_dom').innerHTML = (endTime - startTime);
    document.getElementById('speed_total').innerHTML = eval(document.getElementById('speed_total').innerHTML) + (endTime - startTime);
}
I suspect this isn't getElementById performance but rather performance of mutating the DOM, in this case by setting .innerHTML.

Mutating the DOM causes us to reflow the document, it seems like safari is much better in this testcase at batching those reflows.
I can attach a profile if people care, but basic breakdown is:

  80% total time spent under nsGenericHTMLElementTearoff::SetInnerHTML
  0.7% total time spent under nsHTMLDocument::GetElementById
   3% total time spent in security checks.

The rest is XPConnect/DOM/JS overhead, basically.

As far as the 80% above goes, with all percentages listed as percentages of total time on this testcase throughout the rest of this comment:

 21% clearing the old kids of the div
 27% parsing the new kids
 29% appending the new kids

Within those

 7% constructing frames, etc when content is removed
 4% tearing down frames, etc when content is removed

Both of those numbers include all the reflow-related stuff here, which is limited to reflow events; reflow is async, so it never happens during the timing of this testcase.

More interesting:

 23% firing and handling of mutation events under RemoveChildAt and
     InsertBefore.  Why are we even firing those?

Oh, and under parsing, about a third of the time is spent creating the parser, etc.  The other two thirds are spent under nsParser::ParseFragment, with a fifth of that being scanner creation, and the rest tokenization, content sink stuff, CNavDTD, and all that mess.

So the real issues here, as far as I can tell, are that actually parsing stuff is kinda expensive and for some reason we're firing mutation events.  At least in both my Firefox and Seamonkey local builds...
Oh, and just to make sure we're on the same page, content GetElementByID code is 0.7% of this testcase, as I said.  There's some overhead on top of that, of course, but the real time is not spent there.  This bug could use a better summary.

Of course this site could use better tests... ;)
Depends on: 386802
I filed bug 386802 on the mutation event issue.  See further analysis there.
Keywords: meta, perf
Summary: document.getElementById performance → Make setting innerHTML faster
Results vary a lot but after fixing bug 386802, uBenchmarks DOM speed dropped from
~270 to 160 on this machine.
Another test http://stevenlevithan.com/demo/replaceHtml.html
gives 200x better speed for IE6

Also tested on another fater PC, with IE7 and Firefox.
IE7 was way faster than Firefox


Result
======

Firefox:--
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007091405 Minefield/3.0a8pre

100 elements...
innerHTML (destroy only): 56ms
innerHTML (create only): 10ms
innerHTML (destroy & create): 74ms
replaceHtml (destroy only): 12ms (4.7x faster)
replaceHtml (create only): 17ms (1.7x slower)
replaceHtml (destroy & create): 19ms (3.9x faster)
Done.

1000 elements...
innerHTML (destroy only): 769ms
innerHTML (create only): 100ms
innerHTML (destroy & create): 1415ms
replaceHtml (destroy only): 23ms (33.4x faster)
replaceHtml (create only): 145ms (1.4x slower)
replaceHtml (destroy & create): 147ms (9.6x faster)
Done.

2500 elements...
innerHTML (destroy only): 3441ms
innerHTML (create only): 506ms
innerHTML (destroy & create): 3928ms
replaceHtml (destroy only): 53ms (64.9x faster)
replaceHtml (create only): 388ms (1.3x faster)
replaceHtml (destroy & create): 349ms (11.3x faster)
Done.


IE 6

100 elements...
innerHTML (destroy only): 0ms
innerHTML (create only): 0ms
innerHTML (destroy & create): 0ms
replaceHtml (destroy only): 0ms (~ same speed)
replaceHtml (create only): 0ms (~ same speed)
replaceHtml (destroy & create): 0ms (~ same speed)
Done. 

1000 elements...
innerHTML (destroy only): 0ms
innerHTML (create only): 16ms
innerHTML (destroy & create): 31ms
replaceHtml (destroy only): 0ms (~ same speed)
replaceHtml (create only): 15ms (~ same speed)
replaceHtml (destroy & create): 31ms (~ same speed)
Done. 

2500 elements...
innerHTML (destroy only): 16ms
innerHTML (create only): 63ms
innerHTML (destroy & create): 78ms
replaceHtml (destroy only): 16ms (~ same speed)
replaceHtml (create only): 62ms (~ same speed)
replaceHtml (destroy & create): 78ms (~ same speed)
Done. 
Safari 3.0.0.4 is down to 6 or 7 on the testcase - Minefield is at 78 or so.  Cane we profile again to see if there is anything else left to hit?
Flags: blocking1.9+
Priority: -- → P2
Attached file Up-to-date-ish profile
This is from a Nov 12 build, but I doubt much has changed since.  The testcase was the "standalone testcase" I attached, with the loop size increased to 10,000.

Executive summary:

Total hit count: 137032
Hits under nsGenericHTMLElement::SetInnerHTML: 90697
Hits under nsHTMLDocument::GetElementById: 1142
Hits under nsWindowSH::GetProperty: 5606
Hits under nsHTMLDocumentSH::NewResolve: 4627
Hits under IsWrapperSameOrigin (XOW): 5666 (most from nsWindowSH::GetProperty)
Hits under XPCCallContext::XPCCallContext: 3626 (some under IsWrapperSameOrigin)

Breaking down the SetInnerHTML part (which is less than 70% of the profile anyway):

  51661 nsContentUtils::CreateContextualFragment
  20265 nsHTMLDivElement::AppendChild
  12211 nsContentUtils::SetNodeTextContent
   2519 mozAutoSubtreeModified::~mozAutoSubtreeModified()

The SetNodeTextContent is clearing out the old content.  Time here is about 10% HasMutationListeners, 80% PresShell::ContentRemoved, 10% other gunk.

The AppendChild is putting in the new kid.  About 75% of the time here is the ContentAppended notification, but there's a lot of other stuff happening too...

The CreateContextualFragment is about 60% nsParser::ParseFragment, a whole bunch of QI and parser creation, some nsCOMPtr::~nsCOMPtr (destroying the parser, I bet), building the tag stack (1811 hits under nsHTMLDivElement::GetNodeName), etc.

There's string stuff, but I think sicking's recent change changed that a lot so worth recheckign with it.

Under ParseFragment, there's the usual content model construction gunk.

So things to jump out at me:
* CreateContextualFragment uses nsIDOM* APIs when perhaps it should use our
  internal ones.  Thing like GetParentNode() are just not called for here!
* Lots of time is spent on xpconnect/js/security checks, as usual
* It might maybe be worth recycling HTML parsers instead of just
  allocating/deallocating them a lot.
* It might be nice if the parser parsed directly into the target node instead
  of into a document fragment as it does now...

Most of that is small peanuts, though, and doesn't get us to where we should be.  Over here Opera is about 8x faster than we are on that testcase (so about equivalent to Safari).  :(
I wonder.  If in the testcase you add:

  document.body.offsetWidth;

right after the innerHTML set, how does that affect Safari?  It slows us down by a factor of 2 or so (not unexpected), but seems to have next to no effect on Opera....
No difference on Safari with that change...
So it sounds like this is 30% security checks, this is a known problem and affects anything DOM related :( Though recently I think jst and brendan has been talking about a fix that might help.

Another thing that seems to be biting us are the notifications going out. I probably made this part worse by a good bit when I simplified the fragment-insertion stuff :(.

Hmm.. now that we no longer send mutation-events for aNotify=false, wouldn't it be trivial to simply insert the whole fragment in one swoop, notify once, and then send mutation-events if needed for all inserted nodes? There should be no way that script can execute while we're inserting the nodes, right?

FWIW, i fixed nsContentUtils::CreateContextualFragment to use nsINode APIs in bug 403549, but I doubt that makes a noticeable difference.
It's about 5-6% security checks.  Not sure where the 30% number came from in comment 12.

In this case the fragment only has one child node, so I doubt that your change affected notifications for this testcase much.  If we had a whole bunch of HTML in that innerHTML then it would matter, of course.

I do think we've mostly killed off the BindToTree execution of script, so I believe you're correct: we could insert, notify once, and then fire mutation events.

I'll apply the patch for bug 403549 and reprofile.  But I agree that it probably doesn't help that much.
(In reply to comment #13)
> It's about 5-6% security checks.  Not sure where the 30% number came from in
> comment 12.
> 

Even at 30% there is still a factor of 5-6 in perf loss here v.s. Safari or Opera.  Where's the rest of the time going?   
Results on new testcase:

Trunk (11/15/07):
innerHTML only: 84
innerHTML + style: 149
innerHTML concat: 158
innerHTML copy from other div: 189

Safari 3.0.0.4:
innerHTML only: 6
innerHTML + style: 35
innerHTML concat: 466
innerHTML copy from other div: 105

Opera 9.5b1
innerHTML only: 32
innerHTML + style: 54
innerHTML concat: 101
innerHTML copy from other div: 56

I factored out getElementByID from the main loop.  The only difference in these four tests is the in that loop:

1)      
elem.innerHTML = "Testing... ("+i+"/1000)";
2) 
elem.style.top = i + 'px';
elem.style.left = i + 'px';
elem.innerHTML = "Testing... ("+i+"/1000)";
3)
elem.innerHTML = elem.innerHTML + i;
4)
elem.innerHTML = "Testing... ("+i+"/1000)";		
elem2.innerHTML = elem.innerHTML;

Is Safari doing something to special case the first test?  As soon as you touch style attributes the difference drops to 3-5x (still bad).  If you append things get much closer.
Attached file innerHTML_speed.html
i feel attachment 288941 [details]
(Testing a couple of different variants for innerHTML mutations)
is not testing it properly, as it is only assigning text content to .innerHTML
So I modified it to assign HTML 
and found IE wins!!!
We've spent a fair amount of time aggressively optimizing setting of style.top and style.left....  

Seriously, our implementation of innerHTML is pretty heavyweight: create a new HTML parser, run stuff through it, build a DOM inside a document fragment, insert the DocumentFragment.  Frankly, for typical innerHTML use out there (outside of synthetic benchmarks) that's pretty much fine from what I've seen; I have yet to see a bug report with a real site that's being slow because of innerHTML (whereas .style.* stuff is all over).
> We've spent a fair amount of time aggressively optimizing setting of style.top

I should have said "repeated setting".  We do as little work as possible at set time and put it all off until later, basically.
Besides synthetic benchmarks, Ajax libraries do set innerHTML in preference to constructing DOM nodes, from what I've read. Maybe jresig has data. It would be great to have data on frequence of innerHTML sets, with simple text vs. markup in the string broken down.

/be
jQuery uses innerHTML really aggressively: it's used as a means of HTML String to DOM Node serialization (which we then use to inject into a document at a random location). We've felt the speed effects of innerHTML significantly (although, to be fair, Internet Explorer is usually much worse in this respect). So, yes, any speed improvements to innerHTML will absolutely be felt (improving the speed of virtually all DOM manipulation methods).
Depends on: 404385
Depends on: 404386
(In reply to comment #20)
> respect). So, yes, any speed improvements to innerHTML will absolutely be felt
> (improving the speed of virtually all DOM manipulation methods).

You can consider replaceHtml method mentioned at
http://stevenlevithan.com/demo/replaceHtml.html

Or if you want to retain original node, 
I have alternate method that is almost fast

function replaceHtml (el, html) {
    var oldEl = typeof el === "string" ? document.getElementById(el) : el;
    /*@cc_on // Pure innerHTML is slightly faster in IE
        oldEl.innerHTML = html;
        return oldEl;
    @*/

    var next = oldEl.nextSibling;
    var parent = oldEl.parentNode;
    parent.removeChild(oldEl);
    oldEl.innerHTML = html;
    parent.insertBefore(oldEl,next);
    return oldEl;
};
This patch caches the parser and content sink per document as we discussed.
Assignee: jonas → bent.mozilla
Status: NEW → ASSIGNED
Attachment #299043 - Flags: review?(jst)
Don't you need to add that parser to cycle collection?  Do parsers even participate in cycle collection right now?

Also, what if GetCachedParser() returned an already_AddRefed and returned forget() on the nsCOMPtr (and documented all this in the interface)?  That way you wouldn't have to manually set the cached parser to null: as soon as you get it, you own it.
(In reply to comment #23)
> Don't you need to add that parser to cycle collection?
It should be. Fragmentcontentsink owns the document and parser
keeps fragmentcontentsink alive and document keeps parser alive...

Btw, patches are easier to review when created with at least -8 context.
And I suggest using XPCOM_MEM_LEAK_LOG when hacking things like this.
Comment on attachment 299043 [details] [diff] [review]
Cache innerHTML's parser and content sink

- In nsIDocument.h:

+  virtual nsIParser* GetCachedParser() {
+    return nsnull;
+  }

I like bz's idea about making this return an already_AddRefed<nsIParser> (and making the HTML doc one use forget() on the nsCOMPtr.

Also, I'd suggest renaming this to maybe GetFragmentParser(), as this parser is specifically for fragment parsing alone, not to be confused with mParser.

- In nsContentUtils::CreateContextualFragment():

+  // See if the document has a cached parser. nsHTMLDocument is the only one

... has a cached fragment parser.

- In nsIParser.h:

+    virtual void Reset() {
+      // Do nothing.
+    }

Could this be pure virtual so you don't need an empty implementation here?

- In nsParser::Finalize():

This could get confusing in the future MMgc world where objects have finalizers. Maybe name this Cleanup() instead?

- In nsParser.h:

-    virtual ~nsParser();
+    virtual ~nsParser() { Finalize(); }

I'd leave this in the cpp file, since it's virtual it won't ever be inlined anyways.

And yeah, mCachedParser (or mFragmentParser) needs to be added to cycle collection, just like mParser is.

With that fixed, this looks good.
Attachment #299043 - Flags: review?(jst) → review-
Updated to fix all of jst's comments. The cycle collector stuff went a little deeper than I would have liked, but it doesn't leak anything anymore.
Attachment #299043 - Attachment is obsolete: true
Attachment #300140 - Flags: review?(jst)
You don't need to add nsExpatDriver to cycle collection, like you added CNavDTD?
Comment on attachment 300140 [details] [diff] [review]
Cache innerHTML's parser and content sink, v2

r=jst
Attachment #300140 - Flags: review?(jst) → review+
Comment on attachment 300140 [details] [diff] [review]
Cache innerHTML's parser and content sink, v2

Add nsExpatDriver, like bz said, which means you also need to traverse mTokenizer in the CParserContexts.
Attachment #300140 - Flags: review?(peterv) → superreview+
Ok, added that.
Attachment #300140 - Attachment is obsolete: true
Attachment #300456 - Flags: superreview?
Attachment #300456 - Flags: superreview? → superreview?(peterv)
Priority: P2 → P1
Target Milestone: --- → mozilla1.9beta3
-> b4
Target Milestone: mozilla1.9beta3 → mozilla1.9beta4
Attachment #300456 - Flags: superreview?(peterv) → superreview+
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Is this really "fixed" in the "Now faster than Safari" sense?  I sort of doubt it.

For performance bugs it's really a good idea to do the work in dependencies filed on particular optimizations so that the main bug can continue to be used as a tracking bug... :(
I can move the patch into another bug and reopen this if you'd like?
Probably better to get a clean tracking bug filed at this point... this bug is about 15 comments past that point.  :(
Depends on: 416487
No longer depends on: 416487
Depends on: 420835
No longer depends on: 386802, 404385, 404386, 413629
Keywords: meta
Summary: Make setting innerHTML faster → Make setting innerHTML faster by caching innerHTML's parser and content sink
Blocks: 442348
Depends on: 460437
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: