Last Comment Bug 744830 - Implement fast serializer for innerHTML/outerHTML
: Implement fast serializer for innerHTML/outerHTML
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: 12 Branch
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Olli Pettay [:smaug] (vacation Aug 25-28)
:
Mentors:
http://www.whatwg.org/specs/web-apps/...
Depends on: 770867 599983 788444 789081 813441 986913
Blocks: 578141
  Show dependency treegraph
 
Reported: 2012-04-12 10:21 PDT by Olli Pettay [:smaug] (vacation Aug 25-28)
Modified: 2016-04-22 11:57 PDT (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Implement fast serializer for innerHTML/outerHTML. (12.06 KB, patch)
2012-04-19 09:12 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
Implement fast serializer for innerHTML/outerHTML. (12.98 KB, patch)
2012-04-19 12:51 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
a stringbuilder (4.24 KB, patch)
2012-05-01 14:12 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
Implement fast serializer for innerHTML/outerHTML. (14.72 KB, patch)
2012-05-01 14:34 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
current patch. newline handling per spec (17.45 KB, patch)
2012-05-25 03:53 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
+missing null check (17.47 KB, patch)
2012-05-25 05:49 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
better to not leak (17.50 KB, patch)
2012-05-25 07:09 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
without newline changes (18.29 KB, patch)
2012-05-25 11:37 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
josh: feedback+
Details | Diff | Splinter Review
without ignoreLevel (19.14 KB, patch)
2012-05-28 02:20 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
no flags Details | Diff | Splinter Review
patch (19.16 KB, patch)
2012-05-28 06:25 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
hsivonen: review+
Details | Diff | Splinter Review
v7 (24.86 KB, patch)
2012-05-31 10:22 PDT, Olli Pettay [:smaug] (vacation Aug 25-28)
hsivonen: review+
Details | Diff | Splinter Review

Description Olli Pettay [:smaug] (vacation Aug 25-28) 2012-04-12 10:21:05 PDT
Currently we use the slow and generic serializer. We could do better.
Comment 1 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-04-12 10:22:41 PDT
http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsGenericHTMLElement.cpp#670
should do something better than use document encoder.
Comment 2 Henri Sivonen (:hsivonen) 2012-04-12 23:26:58 PDT
I suggest doing the following:

Don't have a heap-allocated serializer object. Instead, use a method with a signature like
static nsresult Serialize(nsINode* aRoot, bool aDescendantsOnly, nsAString& aOut)
where the return value signals OOM (fallible string appends should probably be used in order to avoid OOM crashes when a page gets the innerHTML of the root of a huge document), aRoot is the root of the subtree to serialize, aDescendantsOnly indicates whether tags for the aRoot node itself should be included and aOut is the resulting serialization.

Don't recurse. Instead, use the iterative algorithm seen in https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/dom/Dom2Sax.java#l57 .

However, instead of calling out to other methods, just inline the stuff that's calls out to contentHandler or lexicalHandler in the Dom2Sax code above. When the Java code above is used for serialization, both contentHander and lexicalHandler point to an instance of https://hg.mozilla.org/projects/htmlparser/file/5b026546075d/src/nu/validator/htmlparser/sax/HtmlSerializer.java
Comment 3 Boris Zbarsky [:bz] 2012-04-12 23:31:04 PDT
Another option would be to change nsINode::GetNextNode() to make a callback (to an inline function template parameter) when stepping out of an element.  Not sure whether that would be better than just redoing the iterative walk or not.....
Comment 4 Henri Sivonen (:hsivonen) 2012-04-12 23:37:36 PDT
(In reply to Boris Zbarsky (:bz) from comment #3)
> Another option would be to change nsINode::GetNextNode() to make a callback
> (to an inline function template parameter) when stepping out of an element. 
> Not sure whether that would be better than just redoing the iterative walk
> or not.....

To me, that seems more complicated than letting the tree walk loop appear in the serialization method itself. (I didn't suggest GetNextNode, because it's not nice for writing end tags.)
Comment 5 Josh Matthews [:jdm] 2012-04-17 12:07:06 PDT
I'm playing around with implementing comment 2.
Comment 6 Josh Matthews [:jdm] 2012-04-19 09:12:06 PDT
Created attachment 616606 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

WIP.
Comment 7 :Ms2ger (⌚ UTC+1/+2) 2012-04-19 09:25:50 PDT
Comment on attachment 616606 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

Review of attachment 616606 [details] [diff] [review]:
-----------------------------------------------------------------

Some nits...

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +672,5 @@
> +  aBuf.BeginReading(start_iter);
> +  aBuf.EndReading(end_iter);
> +
> +  const PRUnichar* c = start_iter.get();
> +  const PRUnichar* end = end_iter.get();

const PRUnichar* c = aBuf.BeginReading();
const PRUnichar* end = aBuf.EndReading();

@@ +694,5 @@
> +      break;
> +    }
> +    c++;
> +  }
> +  aBuf = modified;

We probably only want to do this if anything changed.

@@ +738,5 @@
> +  return true;
> +}
> +
> +static nsresult
> +StartElement(const nsAString& aLocalName, nsIContent* aContent,

Element* aElement

Does this need to return nsresult, or is a bool sufficient?

@@ +763,5 @@
> +    PRInt32 attNs = name->NamespaceID();
> +    nsIAtom* attName = name->LocalName();
> +
> +    if (attNs == kNameSpaceID_None) {
> +      NS_ENSURE_TRUE(FallibleAppend(aOut, NS_LITERAL_STRING(" ")), NS_ERROR_OUT_OF_MEMORY);

Mm... A macro for NS_ENSURE_TRUE(FallibleAppend(foo, bar), NS_ERROR_OUT_OF_MEMORY);?

@@ +799,5 @@
> +  static const char* voidElements[] = {
> +    "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame",
> +    "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"
> +  };
> +  for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) {

Drop "mozilla::"

@@ +800,5 @@
> +    "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame",
> +    "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"
> +  };
> +  for (PRUint32 i = 0; i < mozilla::ArrayLength(voidElements); i++) {
> +    if (aLocalName.EqualsASCII(voidElements[i])) {

Atoms?

@@ +816,5 @@
> +  static const char* nonEscapingElements[] = {
> +    "iframe", "noembed", "noframes", "noscript", "plaintext", "script", "style", "xmp"
> +  };
> +  for (PRUint32 i = 0; i < mozilla::ArrayLength(nonEscapingElements); i++) {
> +    if (aLocalName.EqualsASCII(nonEscapingElements[i])) {

Same

@@ +841,5 @@
> +      case nsIDOMNode::ELEMENT_NODE: {
> +        nsString name = current->LocalName();
> +        if (name.IsEmpty())
> +          name = current->NodeName();
> +        nsresult rv = StartElement(name, static_cast<nsIContent*>(current),

current->AsElement()

@@ +857,5 @@
> +        break;
> +      }
> +
> +      case nsIDOMNode::COMMENT_NODE: {
> +        const nsTextFragment* text =static_cast<nsIContent*>(current)->GetText();

' = '

@@ +868,5 @@
> +        break;
> +      }
> +
> +      case nsIDOMNode::DOCUMENT_NODE:
> +        buf = NS_LITERAL_STRING("<!DOCTYPE html>");

Eh?

@@ +876,5 @@
> +    NS_ENSURE_SUCCESS(FallibleAppend(aOut, buf), NS_ERROR_OUT_OF_MEMORY);
> +
> +    nsIContent* next;
> +    if ((next = current->GetFirstChild())) {
> +      current = static_cast<nsINode*>(next);

No cast

@@ +883,5 @@
> +
> +    while (true) {
> +      if (current->NodeType() == nsIDOMNode::ELEMENT_NODE) {
> +        if (escapeLevel > 0)
> +          escapeLevel--;

{}

@@ +902,5 @@
> +        return NS_OK;
> +      }
> +
> +      if ((next = current->GetNextSibling())) {
> +        current = static_cast<nsINode*>(next);

No need for the cast
Comment 8 Boris Zbarsky [:bz] 2012-04-19 09:34:09 PDT
> Atoms?

We should file a followup to consider moving that boolean into nodeinfo.
Comment 9 :Ehsan Akhgari 2012-04-19 11:09:34 PDT
This patch needs to handle moz-dirty based on the work being done in bug 599983.
Comment 10 Josh Matthews [:jdm] 2012-04-19 11:38:17 PDT
Smaug says this is a good testcase: https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034
Comment 11 :Ehsan Akhgari 2012-04-19 12:22:27 PDT
(In reply to Josh Matthews [:jdm] from comment #10)
> Smaug says this is a good testcase:
> https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034

(Note to everybody, this will hang your browser for a while)
Comment 12 Josh Matthews [:jdm] 2012-04-19 12:51:40 PDT
Created attachment 616711 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.
Comment 13 :Ms2ger (⌚ UTC+1/+2) 2012-04-19 12:58:32 PDT
Comment on attachment 616711 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

Review of attachment 616711 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +743,5 @@
> +
> +#define APPEND_OR_FAIL(a, b) NS_ENSURE_TRUE(FallibleAppend(a, b), false)
> +
> +static bool
> +StartElement(nsIAtom* aLocalName, Element* aContent,

Why pass aLocalName separately?

@@ +782,5 @@
> +    // Filter out special case of <br type="_moz"> or <br _moz*>,
> +    // used by the editor.  Bug 16988.  Yuck.
> +    //
> +    PRInt32 tagNS = aContent->GetNameSpaceID();
> +    if (aLocalName == nsGkAtoms::br && tagNS == kNameSpaceID_XHTML &&

aContent->IsHTML(nsGkAtoms::br)

@@ +941,5 @@
>  nsGenericHTMLElement::GetMarkup(bool aIncludeSelf, nsAString& aMarkup)
>  {
>    aMarkup.Truncate();
>  
>    nsIDocument* doc = OwnerDoc();

Move this below the if
Comment 14 Josh Matthews [:jdm] 2012-04-19 13:07:26 PDT
Builds should be available for profiling at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001 in a few hours. I fixed up the problems in the last try run, so I would expect all tests to be passing with this build.
Comment 15 Mozilla RelEng Bot 2012-04-19 13:17:17 PDT
Try run for 4cbda5817a8c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=4cbda5817a8c
Results (out of 15 total builds):
    exception: 14
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-4cbda5817a8c
Comment 16 Mozilla RelEng Bot 2012-04-19 14:45:23 PDT
Try run for ccc14e9dc24d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ccc14e9dc24d
Results (out of 296 total builds):
    exception: 20
    success: 139
    warnings: 81
    failure: 56
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-ccc14e9dc24d
Comment 17 Mozilla RelEng Bot 2012-04-19 22:31:15 PDT
Try run for 9d168c7b6001 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=9d168c7b6001
Results (out of 288 total builds):
    success: 246
    warnings: 40
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/josh@joshmatthews.net-9d168c7b6001
Comment 18 Henri Sivonen (:hsivonen) 2012-04-20 00:28:41 PDT
Comment on attachment 616711 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

Thanks for working on this! Some comments:

>+static bool
>+StartElement(nsIAtom* aLocalName, Element* aContent,
>+             PRUint32& aEscapeLevel, PRUint32& aIgnoreLevel,
>+             nsAString& aOut)
>+{
>+  if (aEscapeLevel > 0) {
>+    aEscapeLevel++;
>+  }

Hmm. Looks like the spec now only considers the parent--not ancestors for deciding when not to escape. See http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm

>+    if (attNs == kNameSpaceID_None) {
>+      APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" "));
>+    } else if (!aContent->IsHTML() && attNs == kNameSpaceID_XLink) {
>+      APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xlink:"));

The spec says to do this even if the element is an HTML element.

>+    } else if (attNs == kNameSpaceID_XML) {
>+      if (aContent->IsHTML()) {
>+        if (attName == nsGkAtoms::lang) {
>+          APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" "));

This lang stuff is not needed per spec. It's again for differing use cases of the Java code.

>+        } else {
>+          continue;

Again, the spec says to do this even if the element is an HTML element.

>+        }

Additionally, the xmlns cases need to be handled here. (The Java code doesn't handle them, because Java SAX doesn't represent namespace declarations as attributes.)

>+      } else {
>+        APPEND_OR_FAIL(aOut, NS_LITERAL_STRING(" xml:"));
>+      }
>+    } else {
>+      continue;

The spec says to use the attributes qualified name in the anything else case.

>+  static const nsIAtom* voidElements[] = {
>+    nsGkAtoms::area, nsGkAtoms::base, nsGkAtoms::basefont, nsGkAtoms::bgsound, nsGkAtoms::br, nsGkAtoms::cols,
>+    nsGkAtoms::command, nsGkAtoms::embed, nsGkAtoms::frame, nsGkAtoms::hr, nsGkAtoms::img, nsGkAtoms::input,
>+    nsGkAtoms::keygen, nsGkAtoms::link, nsGkAtoms::meta, nsGkAtoms::param, nsGkAtoms::source, nsGkAtoms::track,
>+    nsGkAtoms::wbr
>+  };
>+  for (PRUint32 i = 0; i < ArrayLength(voidElements); i++) {
>+    if (aLocalName == voidElements[i]) {
>+      aIgnoreLevel++;
>+      return true;
>+    }
>+  }

Instead of having an ignore level integer here and having the caller method walk the descendants of void elements, you could inline the start tag handling into the tree walking loop and make the tree walking skip the next sibling when the current node is a void element.

>+  /*if (aLocalName == nsGkAtoms::pre || aLocalName == nsGkAtoms::textarea ||
>+      aLocalName == nsGkAtoms::listing) {
>+    APPEND_OR_FAIL(aOut, NS_LITERAL_STRING("\n"));
>+    }*/

This needs to check the element namespace, too.

>+  static const nsIAtom* nonEscapingElements[] = {
>+    nsGkAtoms::iframe, nsGkAtoms::noembed, nsGkAtoms::noframes, nsGkAtoms::noscript, nsGkAtoms::plaintext,
>+    nsGkAtoms::script, nsGkAtoms::style, nsGkAtoms::xmp
>+  };
>+  for (PRUint32 i = 0; i < ArrayLength(nonEscapingElements); i++) {
>+    if (aLocalName == nonEscapingElements[i]) {
>+      aEscapeLevel++;
>+    }
>+  }

Per spec, escaping only considers the parent of the text node, so there's no need to maintain an escape level integer.

>+      case nsIDOMNode::COMMENT_NODE: {
>+        const nsTextFragment* text = static_cast<nsIContent*>(current)->GetText();
>+        text->AppendTo(buf);
>+        if (escapeLevel == 0 && ignoreLevel == 0) {
>+          buf = NS_LITERAL_STRING("<!--") + buf + NS_LITERAL_STRING("-->");
>+        } else {
>+          buf = EmptyString();

The spec serializes comments unconditionally even if the result doesn't round trip.

>+      case nsIDOMNode::DOCUMENT_NODE:
>+        buf = NS_LITERAL_STRING("<!DOCTYPE html>");
>+        break;

Again, the Java code intentionally deviates from the spec here due to different use cases and different APIs. In Gecko, instead of having a case for the document node, there should be a spec-compliant case for doctype nodes.

Curiously, the spec requires handling processing instruction nodes, even though they don't round trip.

>+          nsString name = NS_LITERAL_STRING("</");
>+          nsString nodeName = current->LocalName();

For elements that aren't in the HTML, SVG or MathML namespaces, the spec requires using the qualified name as the tag name. (I'm slightly uncomfortable with using the qualified name when the qualified name is the same as the local name, since the effects will be surprising when round-tripping, but that's what the spec says currently.)
Comment 19 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-01 14:12:49 PDT
Created attachment 620058 [details] [diff] [review]
a stringbuilder

I was just testing something based on a stringbuilder, but still couldn't get
the speed of the current serializer.
Comment 20 Josh Matthews [:jdm] 2012-05-01 14:34:22 PDT
Created attachment 620064 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

This is my most recent work from before I left on my travels. I optimized the 'always append everything' codepath fairly well based on profiles; as I recall, the majority of the work seemed to be in SetCapacity and the actual DOM grubbing last time I checked. I don't recall how the speed stacked up against the current serializer. I'm happy to let somebody run with this, as I'm focusing on per-window pb in my limited free time.
Comment 21 :Ms2ger (⌚ UTC+1/+2) 2012-05-02 12:22:41 PDT
Comment on attachment 620064 [details] [diff] [review]
Implement fast serializer for innerHTML/outerHTML.

Review of attachment 620064 [details] [diff] [review]:
-----------------------------------------------------------------

A couple of nits for when you get back

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +681,5 @@
> +  if (!found)
> +    return;
> +
> +  nsAutoString modified;
> +  modified.SetCapacity(aBuf.Length() + 5 * found);

I would like a constant for the 5 (MAX_ENTITY_LENGTH?)

@@ +747,5 @@
> +
> +static inline bool
> +FallibleAppend(nsAString& aBuf, const nsAString& aExtra)
> +{
> +  if(NS_UNLIKELY(!aBuf.SetCapacity(aBuf.Length() + aExtra.Length())))

'if (', and {}s

@@ +766,5 @@
> +  APPEND_OR_FAIL(aOut, nsDependentAtomString(localName));
> +  //toAppend.AppendElement(NS_LITERAL_STRING("<"));
> +  //toAppend.AppendElement(nsDependentAtomString(localName));
> +
> +  PRInt32 count = aContent->GetAttrCount();

PRUint32

@@ +787,5 @@
> +    // 
> +    // Filter out special case of <br type="_moz"> or <br _moz*>,
> +    // used by the editor.  Bug 16988.  Yuck.
> +    //
> +    PRInt32 tagNS = aContent->GetNameSpaceID();

Why is this in the loop?

@@ +885,5 @@
> +Serialize(nsINode* aRoot, bool aDescendentsOnly, nsAString& aOut)
> +{
> +  nsINode* current = aDescendentsOnly ? aRoot->GetFirstChild() : aRoot;
> +  if (!current)
> +    return true;

{}

@@ +893,5 @@
> +  while (true) {
> +    nsAutoString buf;
> +    switch (current->NodeType()) {
> +      case nsIDOMNode::ELEMENT_NODE: {
> +        mozilla::dom::Element* elem = current->AsElement();

This is in content/, so drop the 'mozilla::dom::'

@@ +935,5 @@
> +        buf = NS_LITERAL_STRING("<?");
> +        buf.Append(current->NodeName());
> +        buf.Append(NS_LITERAL_STRING(">"));
> +        break;
> +      }

default case with MOZ_NOT_REACHED?

@@ +943,5 @@
> +      APPEND_OR_FAIL(aOut, buf);
> +    }
> +
> +    nsIContent* next;
> +    if ((next = current->GetFirstChild())) {

Either
	 
  if (nsIContent* next = current->GetFirstChild()) {

or

  nsIContent* next = current->GetFirstChild();
  if (next) {

@@ +955,5 @@
> +          ignoreLevel--;
> +        } else {
> +          nsString name = NS_LITERAL_STRING("</");
> +          nsString nodeName;
> +          mozilla::dom::Element* elem = current->AsElement();

Same

@@ +971,5 @@
> +      if (current == aRoot) {
> +        return true;
> +      }
> +
> +      if ((next = current->GetNextSibling())) {

Same
Comment 22 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 03:44:31 PDT
I can't figure out how the magical newline handling should work.
Per spec 
document.body.innerHTML = null;
document.body.appendChild(document.createElement("textarea"));
document.body.firstChild.appendChild(document.createTextNode("\nhello"));
console.log(document.body.innerHTML)
should report "<textarea>\n\nhello</textarea>"

But that is not what current browsers do. (tested N, C, O)
Comment 23 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 03:53:51 PDT
Created attachment 627166 [details] [diff] [review]
current patch. newline handling per spec

In the testcase this gives same or a bit better performance than trunk-C.

https://tbpl.mozilla.org/?tree=Try&rev=9911221e883d
Comment 24 :Aryeh Gregor (working until September 2) 2012-05-25 04:44:28 PDT
(In reply to Olli Pettay [:smaug] from comment #22)
> I can't figure out how the magical newline handling should work.
> Per spec 
> document.body.innerHTML = null;
> document.body.appendChild(document.createElement("textarea"));
> document.body.firstChild.appendChild(document.createTextNode("\nhello"));
> console.log(document.body.innerHTML)
> should report "<textarea>\n\nhello</textarea>"
> 
> But that is not what current browsers do. (tested N, C, O)

Current browsers are broken -- document.body.innerHTML = document.body.innerHTML will eat the newline in this case.  I.e., the DOM doesn't round-trip through innerHTML.  There's no reason this should happen here.  Same for <pre>.  Browsers should change to match the spec.

Also, <xmp> needs to have its contents not HTML-escaped, same as <style> and <script>.  The spec requires this and WebKit does it, but Gecko gets it wrong.  (This is obviously a corner case, but my editing tests hit it, FWIW.)

See bug 608322 and dependencies.
Comment 25 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 04:57:16 PDT
(In reply to Aryeh Gregor from comment #24)
> Current browsers are broken -- document.body.innerHTML =
> document.body.innerHTML will eat the newline in this case.  I.e., the DOM
> doesn't round-trip through innerHTML.  There's no reason this should happen
> here.  Same for <pre>.  Browsers should change to match the spec.
Ok, but then this is a *very* risky change, and need to happen at the beginning of a cycle.
Comment 26 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 05:49:51 PDT
Created attachment 627201 [details] [diff] [review]
+missing null check

https://tbpl.mozilla.org/?tree=Try&rev=27500fcb7375
Comment 27 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 07:09:13 PDT
Created attachment 627219 [details] [diff] [review]
better to not leak

https://tbpl.mozilla.org/?tree=Try&rev=e02eb979f605
Comment 28 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 08:22:09 PDT
Actually, I think we should not change newline handling in this bug.
There are enough changes and risk for regressions even without the change.
Comment 29 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-25 11:37:48 PDT
Created attachment 627305 [details] [diff] [review]
without newline changes

https://tbpl.mozilla.org/?tree=Try&rev=ddfe41520365
Comment 30 Josh Matthews [:jdm] 2012-05-25 12:39:57 PDT
Comment on attachment 627305 [details] [diff] [review]
without newline changes

Nothing sticks out at me as an obvious optimization, but the linked list made my brain explode. You could look into Henri's suggestion of the way to get rid of ignoreLevel.
Comment 31 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-05-25 16:26:13 PDT
Wouldn't simply writing to a string but using a doubling strategy for growing the string length result in significantly simpler code but with just as good performance?
Comment 32 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-27 13:21:13 PDT
(In reply to Jonas Sicking (:sicking) from comment #31)
> Wouldn't simply writing to a string but using a doubling strategy for
> growing the string length
That is what our string code does atm. 

> result in significantly simpler code but with just
> as good performance?
And no, it doesn't give good enough performance. At least I haven't managed to do so.
Comment 33 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-27 13:54:12 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #18) 
> Instead of having an ignore level integer here and having the caller method
> walk the descendants of void elements, you could inline the start tag
> handling into the tree walking loop and make the tree walking skip the next
> sibling when the current node is a void element.
Hmm, actually I don't understand the spec.
"If current node is an area, base, basefont, bgsound, br, col, command, embed, frame, hr, img, input, keygen, link, meta, param, source, track or wbr element, then continue on to the next child node at this point."
The next child node of what? current node? or the node?
Comment 34 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-27 14:03:33 PDT
I wonder which behavior the spec defines here. At least not Gecko nor Webkit behavior.
Comment 35 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-27 14:04:52 PDT
Since Gecko creates <input><div></div>
if div is a child of an input element.
Webkit creates <input></input>
Comment 36 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-27 15:53:50 PDT
What should happen if DOM is <input><div></div></input> and one does
inputElement.innerHTML?
Comment 37 :Ms2ger (⌚ UTC+1/+2) 2012-05-28 01:50:45 PDT
(In reply to Olli Pettay [:smaug] from comment #36)
> What should happen if DOM is <input><div></div></input> and one does
> inputElement.innerHTML?

I would say '<input>'.
Comment 38 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-05-28 01:57:07 PDT
For a DOM like <input><div/></input> returning "<input>" for myInputElement.innerHTML wouldn't make sense. The question is if we should return "" or "<div></div>". (Or throw, but I think that would be a bad idea)
Comment 39 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 02:20:03 PDT
Created attachment 627640 [details] [diff] [review]
without ignoreLevel

https://tbpl.mozilla.org/?tree=Try&rev=dc27746c2c24
Comment 40 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 03:07:35 PDT
Comment on attachment 627640 [details] [diff] [review]
without ignoreLevel

Apparently I broke something with this.
Comment 41 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 06:25:43 PDT
Created attachment 627682 [details] [diff] [review]
patch
Comment 42 Henri Sivonen (:hsivonen) 2012-05-29 09:28:33 PDT
Comment on attachment 627682 [details] [diff] [review]
patch

>+  var PI = document.createProcessingInstruction('foo', 'bar="1.0"');
>+  t.appendChild(PI);
>+  is(t.innerHTML, '<span>hi</span> there <!-- mon ami --><?foo bar="1.0">',
>+    "pi nodes should be included");

Please add tests for:
 1) An SVG element
 2) A MathML element
 3) An element that's none of HTML, MathML or SVG and has a prefixed node name
 4) An element that's none of HTML, MathML or SVG and has a prefixless node name
 5) An xml:* attribute
 6) An xlink:* attribute
 7) An xmlns="foo" attribute
 8) An xmlns:foo="bar" attribute
 9) A namespaced attribute that's none of the above
10) A CDATA section.
11) HTML script and style with <, >, & and no-break space in the content
12) SVG script and style with <, >, & and no-break space in the content (different escaping from the HTML case)
13) MathML script and style with <, >, & and no-break space in the content (same as the SVG case)

>+#define STRING_BUFFER_UNITS 1020

Please add a comment that explains where this magic number comes from. The magic number looks like it's trying to avoid a clown-shoe allocation. If possible, please add a static assertion about sizeof(StringBuilder) to make sure that subsequent changes don't accidentally push the size of the object over a clown shoe boundary.

>+    enum Type
>+    {
>+      eUnknown,
>+      eAtom,
>+      eString,
>+      eStringWithEncode,
>+      eLiteral,
>+      eTextFragment,
>+      eTextFragmentWithEncode,
>+    };

There are plenty of cases where the unit ends up containing a single-character string literal. Would it be worthwhile to save a pointer dereference and have a unit type for a single character and passing e.g. '<' instead of "<" to Append()?

>+  template<int N>
>+  void Append(const char (&aLiteral)[N])

I haven't seen this template pattern before, but I trust that it does what it seems to do.

>+  void Append(const nsAString& aString)
>+  {
>+    Unit* u = AddUnit();
>+    u->mString = new nsAutoString(aString);

Using nsAutoString instead of nsString seems to be against the rules. However, using nsString doesn't make sense, because append to the final output string always copies the buffer of the string instead of adopting it. It seems to me that it would make sense to use PRUnichar[] instead of nsAutoString here.

>+  PRUint32                                mLength;

Please add a comment that explains that this field is only meaningful on the first StringBuffer object in the linked list. In general, it would be nice to have a comment that explains the linked list/unit array duality.

>+      if (attName == nsGkAtoms::xmlns) {
>+        aBuilder.Append(" xmlns");
...
>+    aBuilder.Append(attName);

The xmlns="foo" case looks wrong.

>+static void
>+AppendEncodedCharacters(const nsTextFragment* aText, StringBuilder& aBuilder)
...
>+static void
>+AppendEncodedAttributeValue(nsAutoString* aValue, StringBuilder& aBuilder)
>+      ++found;
>+    aBuilder.AppendWithAttrEncode(aValue, aValue->Length() + (5 * found));

Instead of getting an inaccurate result my multiplication here, I think it would make more sense to count the exact number of extra characters generated by each escape by incrementing |found| by 3 in the case of &lt; and &gt;, by 4 in the case of &amp; and by 5 in the case of &nbsp; and &quot;. Please also add a comment that explains why it's worthwhile to iterate over the strings twice in order to make a better-sized allocation for the final output string.

r=me with the above points addressed.
Comment 43 Nathan Froyd [:froydnj] 2012-05-29 10:15:33 PDT
Comment on attachment 627682 [details] [diff] [review]
patch

Review of attachment 627682 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/html/content/src/nsGenericHTMLElement.cpp
@@ +667,5 @@
> +      nsAutoString*         mString;
> +      const nsTextFragment* mTextFragment;
> +    };
> +    Type     mType;
> +    PRUint32 mLength;

Worth using mozilla/StandardInteger.h types here and other places?  I realize this doesn't match existing style of the file, but this is mostly new code anyway.

@@ +811,5 @@
> +    const PRUnichar* end = aValue.EndReading();
> +    while (c < end) {
> +      switch (*c) {
> +      case '"':
> +        aOut.Append(NS_LITERAL_STRING("&quot;"));

AppendLiteral("&quot;") here and other places is slightly clearer, IMHO.
Comment 44 Henri Sivonen (:hsivonen) 2012-05-29 11:32:02 PDT
(In reply to Nathan Froyd (:froydnj) from comment #43)
> > +        aOut.Append(NS_LITERAL_STRING("&quot;"));
> 
> AppendLiteral("&quot;") here and other places is slightly clearer, IMHO.

Oops. Yeah, consider that part of my review, too.
Comment 45 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-31 04:40:17 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #42)
> >+  template<int N>
> >+  void Append(const char (&aLiteral)[N])
> 
> I haven't seen this template pattern before, but I trust that it does what
> it seems to do.
That is how AppendLiteral works in strings.


> >+  void Append(const nsAString& aString)
> >+  {
> >+    Unit* u = AddUnit();
> >+    u->mString = new nsAutoString(aString);
> 
> Using nsAutoString instead of nsString seems to be against the rules.
> However, using nsString doesn't make sense, because append to the final
> output string always copies the buffer of the string instead of adopting it.
> It seems to me that it would make sense to use PRUnichar[] instead of
> nsAutoString here.
Well, the whole idea to use nsAutoString here is that autostring has internal buffer
which is enough for common cases.
Append(const nsAString& aString) is quite rare case.
I don't see any good way to get PRUnichar[] attr values, and there shouldn't be need
for special case PI and DTD nodes.
Comment 46 Henri Sivonen (:hsivonen) 2012-05-31 08:03:46 PDT
(In reply to Olli Pettay [:smaug] from comment #45)
> Append(const nsAString& aString) is quite rare case.

OK.

> I don't see any good way to get PRUnichar[] attr values, and there shouldn't
> be need
> for special case PI and DTD nodes.

I see. Let's go with the code you already wrote, then.
Comment 47 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-31 10:14:24 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #42)
> 
> There are plenty of cases where the unit ends up containing a
> single-character string literal. Would it be worthwhile to save a pointer
> dereference and have a unit type for a single character and passing e.g. '<'
> instead of "<" to Append()?
Not sure worth the slight complexity. Could do after profiling.
Comment 48 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-31 10:22:26 PDT
Created attachment 628790 [details] [diff] [review]
v7

https://tbpl.mozilla.org/?tree=Try&rev=8f58f13a9e00

couldn't figure out how to add a truly useful check for jemalloc bucket size, but
added a comment.
Comment 49 Henri Sivonen (:hsivonen) 2012-06-01 01:07:03 PDT
Comment on attachment 628790 [details] [diff] [review]
v7

Looks good. Thanks.

If you have benchmark numbers that compare the new and old serializers, it would probably be a good idea to ping the Hacks blog team about those numbers to get them into a "what's new" article.
Comment 50 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-06-01 03:58:51 PDT
https://hg.mozilla.org/mozilla-central/rev/7bf0125b26b5

I'll get the numbers from Nightlies.

I hope this change doesn't break too many websites.
Comment 51 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-06-02 08:46:04 PDT
(In reply to Josh Matthews [:jdm] (travelling until June 25th) from comment #10)
> Smaug says this is a good testcase:
> https://bug-81214-attachments.webkit.org/attachment.cgi?id=132034

Tested Nightly builds on an old Windows machine:
               a build w/o the patch    with the patch
div.innerHTML  :     2905                   1350                  
div.outerHTML  :     15513                  5101
body.innerHTML :     1119                   790
body.outerHTML :     1116                   758

Note You need to log in before you can comment on or make changes to this bug.