Closed Bug 562013 Opened 10 years ago Closed 10 years ago

Consider caching the documentFragment for innerHTML, or at least cache the nodeinfo, or parse directly to the destination node

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: smaug, Assigned: smaug)

References

(Blocks 1 open bug)

Details

Attachments

(8 files, 2 obsolete files)

This shows up in the profiles. Pretty high even with .innerHTML = "some simple string"

Caching documentFragment safely can be tricky.
Or actually, if there are no mutation event listeners, caching of document fragment should be doable.
The patch is still a bit ugly.
There shouldn't be any need to add the extra QI.

But even this seems to cut 5% of https://bugzilla.mozilla.org/attachment.cgi?id=289857 (which tests setting plain text to innerHTML)
With the patch testcase runs faster in Gecko than in trunk webkit.

https://bugzilla.mozilla.org/attachment.cgi?id=289857 is still faster in
Webkit, but it cheats and works against the spec.

The patch could be probably a bit simpler if html5 was enabled always.
Attachment #441851 - Flags: feedback?(jonas)
And we should eventually convert all the callers to use nsINode, never
nsIDOMXXX.
I'd say simplify and always create a docfragment when we're creating a document.

Though ideally I think we should parse directly into the target node, and avoid using a document fragment altogether. That should also save us a bunch of calls to BindToTree/UnbindFromTree.

Might be easier to do only for the HTML5 parser. (and fall back to a slower path when the html5 parser isn't enabled for now, until we kill the old parser)
If you do go for the parse-directly-to-the-destination path, then just fire mutation events manually after the parsing. The same way that the insert-doc-fragment code does.
Parsing directly to target node might be a good option for html5
but mutation event handling would be a bit ugly.
So for now using fragment could be ok.
Or Henri, what do you think about parsing directly to the destination?
(Btw, seems like fragment's mutation event handling doesn't dispatch
events in all the cases when it should :( )
Given that mutation events is something that should be phased out, I don't think we should architect too much of our code around their existence.
which again would be in favor of using document fragment.
I'd rather not write yet another case for dispatching mutation events.
Document fragment got it already wrong.
...which I'll fix.
(In reply to comment #9)
> Or Henri, what do you think about parsing directly to the destination?

I already considered it. See the thread starting at
http://lists.w3.org/Archives/Public/public-html/2008Nov/0192.html

However, I decided not to try anything new in order to avoid having to emulate the right mutation events. (The form pointer and <base> also came up, but I'm not sure if those would still be actual problems.)
I think what's needed is just making sure that we don't bind controls that aren't inside a <form> to the last <form>. Instead they should be bound to any ancestor <form> of the node which we're setting .innerHTML on.

As for the mutation events, they seem like a silly reason to not do this perf optimization. We can likely share the code with the fragment-insertion code (once that's fixed). Eventually that code should go away anyway.
<base> shouldn't be a problem any more now that it's self-managed and doesn't require any content sink support.
I'm not going to implement the parse-directly-to-the-destination in this bug.
hmm, or let me re-think. The current setup is nicer, but slower...
Depends on: 562652
This depends on bug 562652.
Needs still some testing.
Comment on attachment 442401 [details] [diff] [review]
Don't use document fragment at all with html5 parser

Henri, would this be all wrong?
Attachment #442401 - Flags: feedback?(hsivonen)
Yay!!

This doesn't deal with ensuring that orphaned <input>s end up in the correct <form>, does it? I.e. for markup like

<form id=A>
  <div id=target></div>
</form>

$("target").innerHTML = "<p><form id=B></p><input id=orphan>"
is($("orphan").form, $("A"))

should pass
Attached file form testcase (obsolete) —
HTML5 parser changes behavior in testcase anyway.
The patch in this bug keeps the existing html5 parser behavior.
Attachment #442448 - Attachment is obsolete: true
Oh, the <form> closes the <p> in the HTML5 parser. The test needs to be

<form id=A>
  <div id=target></div>
</form>

$("target").innerHTML = "<div><form id=B></div><input id=orphan>"
is($("orphan").form, $("A"))


Or possibly even

$("target").innerHTML = "<table><tr><td><form id=B></table><input id=orphan>"
Attached file This breaks
I have no idea how to fix this case. I'd rather not hack the parser.
Or, I don't understand  why the parser or something else does what it does.
Henri, could you perhaps tell how to do that? Or perhaps even provide the patch.
Comment on attachment 442401 [details] [diff] [review]
Don't use document fragment at all with html5 parser

>   // Remove childnodes
>   nsContentUtils::SetNodeTextContent(this, EmptyString(), PR_FALSE);
[...]
>+    PRInt32 oldChildCount = GetChildCount();

Isn't oldChildCount always zero here? Or more to the point, if it isn't zero, isn't that an at least assertion-worthy problem?

>   if (scripts_enabled) {
>     // If we disabled scripts, re-enable them now that we're
>     // done. Don't fire JS timeouts when enabling the context here.
> 
>     loader->SetEnabled(PR_TRUE);
>   }

Is this script disabling and re-enabling for <script> elements or also for something else? The HTML5 parser in the fragment mode already takes care of marking <script>s already executed.

feedback+ suggesting those points to be reconsidered.

(In reply to comment #28)
> Henri, could you perhaps tell how to do that? Or perhaps even provide the
> patch.

I can provide a patch if we first figure out if we want to make the parser not associate elements with a form ever in the fragment case (including createContextualFragment on ranges) or if we want to special-case the innerHTML setter only. Do we care about supporting the case where someone calls createContextualFragment with a malformed form and proceeds to submit the form directly from the fragment before inserting it elsewhere first?
Attachment #442401 - Flags: feedback?(hsivonen) → feedback+
> I can provide a patch if we first figure out if we want to make the parser not
> associate elements with a form ever in the fragment case (including
> innerHTML createContextualFragment on ranges) or if we want to special-case
> the setter only. Do we care about supporting the case where someone calls
> createContextualFragment with a malformed form and proceeds to submit the form
> directly from the fragment before inserting it elsewhere first?

I don't think it's terribly important if we special case .innerHTML, or if we do it for all fragment parsing. Simplicity seems like the more important goal. Until proven otherwise, I think it's safe to guess that the web doesn't depend on any particular behavior for createContextualFragment.

However all else being equal, I would say that the explicit form-binding thing is fairly evil, and we should limit it as much as possible.
(In reply to comment #29)
> Isn't oldChildCount always zero here? Or more to the point, if it isn't zero,
> isn't that an at least assertion-worthy problem?

Because of mutation events, oldChildCount may not be zero and IMO
there isn't anything to assert because we handle the case.

I don't change scriptloader handling in the patch, but if there is something
to change, I could file a followup bug.





> 
> >   if (scripts_enabled) {
> >     // If we disabled scripts, re-enable them now that we're
> >     // done. Don't fire JS timeouts when enabling the context here.
> > 
> >     loader->SetEnabled(PR_TRUE);
> >   }
> 
> Is this script disabling and re-enabling for <script> elements or also for
> something else? The HTML5 parser in the fragment mode already takes care of
> marking <script>s already executed.
> 
> feedback+ suggesting those points to be reconsidered.
> 
> (In reply to comment #28)
> > Henri, could you perhaps tell how to do that? Or perhaps even provide the
> > patch.
> 
> I can provide a patch if we first figure out if we want to make the parser not
> associate elements with a form ever in the fragment case (including
> createContextualFragment on ranges) or if we want to special-case the innerHTML
> setter only. Do we care about supporting the case where someone calls
> createContextualFragment with a malformed form and proceeds to submit the form
> directly from the fragment before inserting it elsewhere first?
(In reply to comment #30)
> > I can provide a patch if we first figure out if we want to make the parser not
> > associate elements with a form ever in the fragment case (including
> > innerHTML createContextualFragment on ranges) or if we want to special-case
> > the setter only. Do we care about supporting the case where someone calls
> > createContextualFragment with a malformed form and proceeds to submit the form
> > directly from the fragment before inserting it elsewhere first?
> 
> I don't think it's terribly important if we special case .innerHTML, or if we
> do it for all fragment parsing. Simplicity seems like the more important goal.
> Until proven otherwise, I think it's safe to guess that the web doesn't depend
> on any particular behavior for createContextualFragment.
> 
> However all else being equal, I would say that the explicit form-binding thing
> is fairly evil, and we should limit it as much as possible.

OK. I've attached a patch that makes the fragment case in the parser core not associate elements with the form pointer. (I'm assuming that we'll request this change in the spec, so I've put the change in the cross-language part of the parser instead of putting it into the Gecko integration layer.)

(In reply to comment #31)
> (In reply to comment #29)
> > Isn't oldChildCount always zero here? Or more to the point, if it isn't zero,
> > isn't that an at least assertion-worthy problem?
> 
> Because of mutation events, oldChildCount may not be zero and IMO
> there isn't anything to assert because we handle the case.

OK.

> I don't change scriptloader handling in the patch, but if there is something
> to change, I could file a followup bug.

I think it's worthwhile to investigate, as a follow-up, if the scriptloader part can be optimized away.
Attachment #443061 - Flags: review?(jonas)
The patch doesn't handle Document.innerHTML = "foo" case, since that case doesn't have a context node, but then this bug isn't about optimizing that case anyway.
There is document.innerHTML? Gecko doesn't support that.
I'll file a new bug for that.
Depends on: 563322
Attachment #442401 - Flags: review?(jonas)
Assignee: nobody → Olli.Pettay
Summary: Consider caching the documentFragment for innerHTML, or at least cache the nodeinfo → Consider caching the documentFragment for innerHTML, or at least cache the nodeinfo, or parse directly to the destination node
Attached patch both patches (obsolete) — Splinter Review
I'll push this once it has passed tests on tryserver
Attachment #443101 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/d660bc26395e
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
This is great stuff -- will it show up on any of our existing benchmarks, or should we add something to track?
Dromaeo tests some innerHTML, though just activating html5 should have given
significant speed up there.
For what it's worth, mutation events should no longer be a concern for parsing directly to the destination. As long as a update batch spans the whole function, all mutation events will be delayed until that update batch is ended.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.