Last Comment Bug 700981 - appendChild has become slower when appending a very deep tree
: appendChild has become slower when appending a very deep tree
Status: RESOLVED FIXED
: perf, regression, testcase
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: 10 Branch
: x86_64 All
: -- normal (vote)
: mozilla14
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on: 718559
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-09 03:11 PST by vflash
Modified: 2015-10-07 18:51 PDT (History)
8 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
append_child.html (1.28 KB, text/html)
2011-11-09 03:11 PST, vflash
no flags Details
Changes I made so far that make us faster than fx8 (4.76 KB, patch)
2011-11-09 14:39 PST, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
Another related testcase (654 bytes, text/html)
2012-01-16 19:23 PST, Boris Zbarsky [:bz]
no flags Details
part 1. Get rid of nsMappedAttributeElement::BindToTree and inline some of the things it used to call so they're faster. (11.86 KB, patch)
2012-01-16 21:39 PST, Boris Zbarsky [:bz]
jonas: review+
Details | Diff | Splinter Review
part 2. Get rid of nsStyledElement::BindToTree/UnbindFromTree. (7.11 KB, patch)
2012-01-16 21:39 PST, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Splinter Review
part 3. Inline nsNodeUtils::ParentChainChanged. (3.02 KB, patch)
2012-01-16 21:40 PST, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Splinter Review
part 4. Add a fast-path to IsAllowedAsChild for the case of a child that has no kids. (1.46 KB, patch)
2012-01-16 21:41 PST, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
part 5. Reduce the amount of time spent calling GetBindingParent(). (5.42 KB, patch)
2012-01-16 21:41 PST, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Splinter Review
part 4. Add a fast-path to IsAllowedAsChild for the case of a child that has no kids. (1.63 KB, patch)
2012-01-17 11:30 PST, Boris Zbarsky [:bz]
bugs: review+
Details | Diff | Splinter Review

Description vflash 2011-11-09 03:11:58 PST
Created attachment 573152 [details]
append_child.html

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111102223350

Steps to reproduce:

сравнил тест в FF8 и FF10 
http://vflash.ru/bug/append_child.html


Actual results:

в FF10 "testB" стал работать медленнее 


Expected results:

остаться прежним или стать быстрее
Comment 1 Mats Palmgren (vacation) 2011-11-09 03:58:30 PST
Can you describe the problem in English please?
Comment 2 Thomas Ahlblom 2011-11-09 04:50:40 PST
Translation of STR in comment 0:

Steps to reproduce:

Compare the test in Firefox 8 and Firefox 10.
http://vflash.ru/bug/append_child.html


Actual results:

In Firefox 10 "testB" has become slower

Expected results:

Remain unchanged or become faster
Comment 3 Thomas Ahlblom 2011-11-09 05:02:35 PST
Reproduced (Output of testB, 10 measurements in each version of Firefox with a new, clean profile):

Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0
40
39
40
39
40
40
39
39
40
40



Mozilla/5.0 (X11; Linux x86_64; rv:11.0a1) Gecko/20111109 Firefox/11.0a1
43
43
43
44
43
44
44
44
44
43
Comment 4 vflash 2011-11-09 05:38:04 PST
Intel Pentium Dual Core E2160  1.8GHz

Firefox/8.0
testA - 11mc, testB - 39mc
testA - 13mc, testB - 41mc
testA - 12mc, testB - 42mc
testA - 13mc, testB - 42mc
testA - 16mc, testB - 40mc
testA - 13mc, testB - 41mc

Firefox/11.0a1
testA - 13mc, testB - 83mc
testA - 14mc, testB - 71mc
testA - 14mc, testB - 76mc
testA - 14mc, testB - 78mc
testA - 14mc, testB - 72mc
testA - 12mc, testB - 84mc


Opera 11 trololo
testA - 20mc, testB - 9mc
testA - 21mc, testB - 9mc
testA - 22mc, testB - 9mc
Comment 5 Olli Pettay [:smaug] 2011-11-09 11:02:39 PST
I'll profile this.

(sysprof on linux didn't give enough information so Shark is needed.)
Comment 6 Olli Pettay [:smaug] 2011-11-09 12:21:29 PST
Bah, Shark is somehow broken on my old macbook.
Comment 7 Boris Zbarsky [:bz] 2011-11-09 13:37:38 PST
So testB is building a DOM nested 1100 deep, right?  And it's doing it in the stupid O(N^2) way, which is not the way anyone does it in practice.  Do we really care about it?

In any case, a profile on trunk shows:

29.9% self time in nsGenericElement::BindToTree
17.2% self time in nsStyledElementNotElementCSSInlineStyle::BindToTree
 9.5% self time in nsGenericHTMLElement::BindToTree
 6.7% self time in nsMappedAttributeElement::BindToTree
 4.8% self time in nsStyledElementNotElementCSSInlineStyle::ReparseStyleAttribute (mostly
      call overhead)
 4.7% in nsNodeUtils::ParentChainChanged (again mostly call overhead).
 4.5% in nsGenericHTMLElement::UpdateEditableState (again mostly call overhead)
 2.7% in nsGenericElement::GetBindingParent (all call overhead, but this one is virtual).
 
That adds up to about 80% of total time.  The rest is a long tail of JS stuff, element creation, etc.  The total time under createElement is about 9%; another 4% is in appendChild outside BindToTree (so outside all the gunk above).  So that accounts for 93% of total time.
Comment 8 Olli Pettay [:smaug] 2011-11-09 13:47:19 PST
Well, it would be good to know what has caused the regression.
BindToTree is certainly a bit slower now that it does NS_ADDREF(aParent)
Comment 9 Boris Zbarsky [:bz] 2011-11-09 13:51:00 PST
I should note that's all on Mac.  On Windows, I would have thought that PGO would deal with some of the call overhead issues.

In any case, we can probably inline ParentChainChanged (it has pretty few callers).  I wonder whether we should inline all of nsNodeUtils, actually.

As far as ReparseStyleAttribute goes, we could inline the MayHaveStyle() check.  I wish compilers would do that.  :(  But more to the point, should we be calling it in BindToTree if !aDocument?

UpdateEditableState is virtual, so again not much we can do there.

Looking into where self time is spent in the first two items above.
Comment 10 Boris Zbarsky [:bz] 2011-11-09 13:51:38 PST
> Well, it would be good to know what has caused the regression.

If someone wants to find a regression range on nightlies, I can probably pinpoint that.
Comment 11 Olli Pettay [:smaug] 2011-11-09 14:08:04 PST
On this machine I can't actually reproduce the regression.
Even Gecko/20110724 Firefox/8.0a1 behaves like trunk.
(That old build doesn't have strong parent pointer)
Comment 12 Boris Zbarsky [:bz] 2011-11-09 14:13:45 PST
Another interesting note:  If I double the tree depth, the performance of a current nightly is on par with fx8.  If I double it again, it's somewhat better than fx8.

In any case, inlining ParentChainChanged and the fast path in ReparseStyleAttribute makes trunk as fast as fx8 on the attached testcase and a good bit faster on the 2x bigger depth.
Comment 13 Boris Zbarsky [:bz] 2011-11-09 14:38:28 PST
Looking at nsGenericElement::BindToTree, 19% of its self time is saving registers and copying arguments to registers on function entry, 23% is the setup of the recursive call to BindToTree and testing of the return value, and 14% is register/stack twiddling to return from the function.

This is on 64-bit; on 32-bit things might be different of course.

For nsGenericHTMLElement::BindToTree, entry into the function is 20.5% of self time, setting up call up to the superclass is 34.1% of self time, exiting the function is 38.5% of self time.

For nsMappedAttributeElement those numbers are 13.6%, 38.5%, 37.8% respectively.

For nsStyledElementNotElementCSSInlineStyle looks like 15.6%, 26%, and 14% respectively.

I really wonder what this looks like on Windows with PGO...

If Windows is not inlining this stuff, then can we get away with reducing the nesting level here?  For example, nsMappedAttributeElement::BindToTree runs code that could safely run for all elements imo.  And if we inlined nsAttrAndChildArray::SetMappedAttrStyleSheet it would be pretty darned fast for all elements.
Comment 14 Boris Zbarsky [:bz] 2011-11-09 14:39:19 PST
Created attachment 573332 [details] [diff] [review]
Changes I made so far that make us faster than fx8
Comment 15 Boris Zbarsky [:bz] 2012-01-16 19:23:28 PST
Created attachment 589087 [details]
Another related testcase
Comment 16 Boris Zbarsky [:bz] 2012-01-16 21:38:08 PST
OK, filed bug 718559 on a part of this I don't want to deal with now.  Patches coming up that speed things up a good bit (> 20% on my testcase, > 50% on the second part of the original testcase).
Comment 17 Boris Zbarsky [:bz] 2012-01-16 21:39:13 PST
Created attachment 589101 [details] [diff] [review]
part 1.  Get rid of nsMappedAttributeElement::BindToTree and inline some of the things it used to call so they're faster.
Comment 18 Boris Zbarsky [:bz] 2012-01-16 21:39:32 PST
Created attachment 589102 [details] [diff] [review]
part 2.  Get rid of nsStyledElement::BindToTree/UnbindFromTree.
Comment 19 Boris Zbarsky [:bz] 2012-01-16 21:40:52 PST
Created attachment 589103 [details] [diff] [review]
part 3.  Inline nsNodeUtils::ParentChainChanged.
Comment 20 Boris Zbarsky [:bz] 2012-01-16 21:41:09 PST
Created attachment 589104 [details] [diff] [review]
part 4.  Add a fast-path to IsAllowedAsChild for the case of a child that has no kids.
Comment 21 Boris Zbarsky [:bz] 2012-01-16 21:41:27 PST
Created attachment 589105 [details] [diff] [review]
part 5.  Reduce the amount of time spent calling GetBindingParent().
Comment 22 Boris Zbarsky [:bz] 2012-01-17 11:30:31 PST
Created attachment 589247 [details] [diff] [review]
part 4.  Add a fast-path to IsAllowedAsChild for the case of a child that has no kids.
Comment 23 Jonas Sicking (:sicking) PTO Until July 5th 2012-02-11 00:45:28 PST
Comment on attachment 589101 [details] [diff] [review]
part 1.  Get rid of nsMappedAttributeElement::BindToTree and inline some of the things it used to call so they're faster.

Review of attachment 589101 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/public/nsIDocument.h
@@ +649,5 @@
>    /**
>     * Get this document's attribute stylesheet.  May return null if
>     * there isn't one.
>     */
> +  virtual nsHTMLStyleSheet* GetAttributeStyleSheet() const {

Did you mean for this to be non-virtual?
Comment 24 Jonas Sicking (:sicking) PTO Until July 5th 2012-02-11 00:54:51 PST
Comment on attachment 589102 [details] [diff] [review]
part 2.  Get rid of nsStyledElement::BindToTree/UnbindFromTree.

Crap, I think I have a patch sitting around that does something like this. Let me dig it up and compare :(
Comment 25 Boris Zbarsky [:bz] 2012-02-11 05:58:07 PST
> Did you mean for this to be non-virtual?

Yes!  I'll fix.
Comment 26 Olli Pettay [:smaug] 2012-03-21 15:03:57 PDT
Comment on attachment 589103 [details] [diff] [review]
part 3.  Inline nsNodeUtils::ParentChainChanged.


>+  static inline void ParentChainChanged(nsIContent *aContent) {
{ should be in the next line.
Comment 27 Olli Pettay [:smaug] 2012-03-21 15:06:03 PDT
Comment on attachment 589105 [details] [diff] [review]
part 5.  Reduce the amount of time spent calling GetBindingParent().

We should also make GetBindingParent() faster.
Comment 28 Boris Zbarsky [:bz] 2012-03-21 21:02:25 PDT
> { should be in the next line.

Done.

> We should also make GetBindingParent() faster.

Since the storage is different for XUL and non-XUL it's virtual and all that jazz.  :(

Note You need to log in before you can comment on or make changes to this bug.