Closed Bug 700981 Opened 13 years ago Closed 13 years ago

appendChild has become slower when appending a very deep tree

Categories

(Core :: DOM: Core & HTML, defect)

10 Branch
x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: vflash, Assigned: bzbarsky)

References

(Depends on 1 open bug)

Details

(Keywords: perf, regression, testcase)

Attachments

(8 files, 1 obsolete file)

Attached file append_child.html
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0
Build ID: 20111102223350

Steps to reproduce:

сравнил тест в FF8 и FF10 
http://vflash.ru/bug/append_child.html


Actual results:

в FF10 "testB" стал работать медленнее 


Expected results:

остаться прежним или стать быстрее
Can you describe the problem in English please?
Attachment #573152 - Attachment mime type: text/plain → text/html
Translation of STR in comment 0:

Steps to reproduce:

Compare the test in Firefox 8 and Firefox 10.
http://vflash.ru/bug/append_child.html


Actual results:

In Firefox 10 "testB" has become slower

Expected results:

Remain unchanged or become faster
Reproduced (Output of testB, 10 measurements in each version of Firefox with a new, clean profile):

Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0
40
39
40
39
40
40
39
39
40
40



Mozilla/5.0 (X11; Linux x86_64; rv:11.0a1) Gecko/20111109 Firefox/11.0a1
43
43
43
44
43
44
44
44
44
43
Keywords: regression
OS: Windows 7 → All
Intel Pentium Dual Core E2160  1.8GHz

Firefox/8.0
testA - 11mc, testB - 39mc
testA - 13mc, testB - 41mc
testA - 12mc, testB - 42mc
testA - 13mc, testB - 42mc
testA - 16mc, testB - 40mc
testA - 13mc, testB - 41mc

Firefox/11.0a1
testA - 13mc, testB - 83mc
testA - 14mc, testB - 71mc
testA - 14mc, testB - 76mc
testA - 14mc, testB - 78mc
testA - 14mc, testB - 72mc
testA - 12mc, testB - 84mc


Opera 11 trololo
testA - 20mc, testB - 9mc
testA - 21mc, testB - 9mc
testA - 22mc, testB - 9mc
Component: General → DOM: Core & HTML
Product: Firefox → Core
QA Contact: general → general
Summary: appendChild стал медленнее → appendChild has become slower
I'll profile this.

(sysprof on linux didn't give enough information so Shark is needed.)
Bah, Shark is somehow broken on my old macbook.
So testB is building a DOM nested 1100 deep, right?  And it's doing it in the stupid O(N^2) way, which is not the way anyone does it in practice.  Do we really care about it?

In any case, a profile on trunk shows:

29.9% self time in nsGenericElement::BindToTree
17.2% self time in nsStyledElementNotElementCSSInlineStyle::BindToTree
 9.5% self time in nsGenericHTMLElement::BindToTree
 6.7% self time in nsMappedAttributeElement::BindToTree
 4.8% self time in nsStyledElementNotElementCSSInlineStyle::ReparseStyleAttribute (mostly
      call overhead)
 4.7% in nsNodeUtils::ParentChainChanged (again mostly call overhead).
 4.5% in nsGenericHTMLElement::UpdateEditableState (again mostly call overhead)
 2.7% in nsGenericElement::GetBindingParent (all call overhead, but this one is virtual).
 
That adds up to about 80% of total time.  The rest is a long tail of JS stuff, element creation, etc.  The total time under createElement is about 9%; another 4% is in appendChild outside BindToTree (so outside all the gunk above).  So that accounts for 93% of total time.
Summary: appendChild has become slower → appendChild has become slower when appending a very deep tree
Well, it would be good to know what has caused the regression.
BindToTree is certainly a bit slower now that it does NS_ADDREF(aParent)
I should note that's all on Mac.  On Windows, I would have thought that PGO would deal with some of the call overhead issues.

In any case, we can probably inline ParentChainChanged (it has pretty few callers).  I wonder whether we should inline all of nsNodeUtils, actually.

As far as ReparseStyleAttribute goes, we could inline the MayHaveStyle() check.  I wish compilers would do that.  :(  But more to the point, should we be calling it in BindToTree if !aDocument?

UpdateEditableState is virtual, so again not much we can do there.

Looking into where self time is spent in the first two items above.
Status: UNCONFIRMED → NEW
Ever confirmed: true
> Well, it would be good to know what has caused the regression.

If someone wants to find a regression range on nightlies, I can probably pinpoint that.
On this machine I can't actually reproduce the regression.
Even Gecko/20110724 Firefox/8.0a1 behaves like trunk.
(That old build doesn't have strong parent pointer)
Another interesting note:  If I double the tree depth, the performance of a current nightly is on par with fx8.  If I double it again, it's somewhat better than fx8.

In any case, inlining ParentChainChanged and the fast path in ReparseStyleAttribute makes trunk as fast as fx8 on the attached testcase and a good bit faster on the 2x bigger depth.
Looking at nsGenericElement::BindToTree, 19% of its self time is saving registers and copying arguments to registers on function entry, 23% is the setup of the recursive call to BindToTree and testing of the return value, and 14% is register/stack twiddling to return from the function.

This is on 64-bit; on 32-bit things might be different of course.

For nsGenericHTMLElement::BindToTree, entry into the function is 20.5% of self time, setting up call up to the superclass is 34.1% of self time, exiting the function is 38.5% of self time.

For nsMappedAttributeElement those numbers are 13.6%, 38.5%, 37.8% respectively.

For nsStyledElementNotElementCSSInlineStyle looks like 15.6%, 26%, and 14% respectively.

I really wonder what this looks like on Windows with PGO...

If Windows is not inlining this stuff, then can we get away with reducing the nesting level here?  For example, nsMappedAttributeElement::BindToTree runs code that could safely run for all elements imo.  And if we inlined nsAttrAndChildArray::SetMappedAttrStyleSheet it would be pretty darned fast for all elements.
OK, filed bug 718559 on a part of this I don't want to deal with now.  Patches coming up that speed things up a good bit (> 20% on my testcase, > 50% on the second part of the original testcase).
Assignee: nobody → bzbarsky
Depends on: 718559
Attachment #589102 - Flags: review?(jonas)
Attachment #589103 - Flags: review?(jonas)
Attachment #589104 - Flags: review?(jonas)
Attachment #589105 - Flags: review?(jonas)
Whiteboard: [need review]
Attachment #589247 - Flags: review?(jonas)
Attachment #589104 - Attachment is obsolete: true
Attachment #589104 - Flags: review?(jonas)
Comment on attachment 589101 [details] [diff] [review]
part 1.  Get rid of nsMappedAttributeElement::BindToTree and inline some of the things it used to call so they're faster.

Review of attachment 589101 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/public/nsIDocument.h
@@ +649,5 @@
>    /**
>     * Get this document's attribute stylesheet.  May return null if
>     * there isn't one.
>     */
> +  virtual nsHTMLStyleSheet* GetAttributeStyleSheet() const {

Did you mean for this to be non-virtual?
Attachment #589101 - Flags: review?(jonas) → review+
Comment on attachment 589102 [details] [diff] [review]
part 2.  Get rid of nsStyledElement::BindToTree/UnbindFromTree.

Crap, I think I have a patch sitting around that does something like this. Let me dig it up and compare :(
> Did you mean for this to be non-virtual?

Yes!  I'll fix.
Attachment #589102 - Flags: review?(jonas) → review+
Comment on attachment 589103 [details] [diff] [review]
part 3.  Inline nsNodeUtils::ParentChainChanged.


>+  static inline void ParentChainChanged(nsIContent *aContent) {
{ should be in the next line.
Attachment #589103 - Flags: review?(jonas) → review+
Comment on attachment 589105 [details] [diff] [review]
part 5.  Reduce the amount of time spent calling GetBindingParent().

We should also make GetBindingParent() faster.
Attachment #589105 - Flags: review?(jonas) → review+
Attachment #589247 - Flags: review?(jonas) → review+
> { should be in the next line.

Done.

> We should also make GetBindingParent() faster.

Since the storage is different for XUL and non-XUL it's virtual and all that jazz.  :(
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: