Closed
Bug 129187
Opened 22 years ago
Closed 22 years ago
49% of pageload spent in nsRuleNode::Transition on some pages
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.0
People
(Reporter: bzbarsky, Assigned: dbaron)
References
()
Details
(Keywords: perf)
Attachments
(2 files, 3 obsolete files)
817.92 KB,
text/html
|
Details | |
13.56 KB,
patch
|
brendan
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
BUILD: 2002-03-04 STEPS TO REPRODUCE: 1) Load the page linked to in the URL field 2) Wait a long while
Reporter | ||
Comment 1•22 years ago
|
||
Assignee | ||
Comment 2•22 years ago
|
||
I have another bug on this (I think it's worded as a table perf bug), although I should probably use this one since the other one is a mess. We probably need to fix rule nodes so that they switch to a hashtable if they hit a certain child count. (They used to use hashtables for everything, but that was bloated and slow, so now only the root uses a hashtable.) There are certain cases where a non-root rule node can end up with a huge number of children. This particular case, though, would also be fixed if we mapped identical STYLE attributes to the same object.
Assignee | ||
Updated•22 years ago
|
Assignee | ||
Comment 3•22 years ago
|
||
This patch fixes the problem by changing the basis for the decision whether to use a linked list or a hashtable for storing a rule node's children. Previously (since hyatt deCOMified nsRuleNode), only the root rule node stored its children in a hashtable. This patch allows other rule nodes to store their children in a hashtable when they get above a certain number of children (currently set at 16, because that sounded like a good number to me). This can be done easily simply by using the low byte of the children pointer to store whether we're using a list or a hashtable. There is no need to store the count of the children because the only time we ever add children is in nsRuleNode::Transition, so we can *count* the children while traversing the linked list (if we're using a linked list -- and we never want to convert in the other direction). (Did I make an off-by-one error in my definition of the max -- I need to work through the logic there carefully. I think I probably mean >= rather than >. I should also probably add more comments.) Anyway, on the testcase from bug 54542 (this one taxed my patience), http://www.mozilla.org/newlayout/testcases/stress/wbtblclr.html , I did two tests without my patch and two with my patch (just using the time in the statusbar): before: 12.885 12.753 after: 5.565 5.380
Assignee | ||
Comment 4•22 years ago
|
||
It's worth mentioning that I tried to keep the masking code as close as possible to the code in nsXULAttributeValue.cpp, which is known to be portable.
Assignee | ||
Comment 5•22 years ago
|
||
The testcase in this bug goes from 111 seconds in a build I had lying around from a few days ago to 44 seconds in a somewhat newer build that has my patch.
Comment 6•22 years ago
|
||
Comment on attachment 74565 [details] [diff] [review] patch 16 is a little big for switching to hashing if you're using double hashing -- for good alpha, the table would be at least 32 entries in absolute capacity. But this code still uses nsHashtable, sigh. Any hope of PLDHashTable-ifying? >- nsRuleList* mChildren; >+ // A hashtable or list that maps from rules to our RuleNode children. >+ // When matching rules, we use this table to transition from node to >+ // node (constructing new nodes as needed to flesh out the tree). >+ >+ void *mChildrenMaskedPtr; The canonical term is Tagged, not Masked (the pointer as stored may have a tag bit set, so it may need masking before use -- but it's not masked as stored). >+ nsHashtable* ChildrenHash() { >+ return (nsHashtable*) (PRWord(mChildrenMaskedPtr) & ~PRWord(kTypeMask)); >+ } NS_REINTERPRET_CAST? >+ void SetChildrenHash(nsHashtable *aHashtable) { >+ mChildrenMaskedPtr = (void*)(PRWord(aHashtable) | kHashType); >+ } Ditto? >@@ -454,46 +462,57 @@ > { > nsRuleNode* next = nsnull; > >- if (!mParent) { >+ if (HaveChildren() && !ChildrenAreHashed()) { >+ PRInt32 numKids = 0; >+ nsRuleList* curr = ChildrenList(); >+ for ( ; curr && curr->mRuleNode->mRule != aRule; >+ curr = curr->mNext, ++numKids); While loop for better readability? Then the third part of the for loop becomes the body. >+ if (curr) >+ next = curr->mRuleNode; >+ else if (numKids > kMaxChildrenInList) Yes, use >=, or just == if you can never fail to switch to a hash table when crossing the threshold. /be
Comment 7•22 years ago
|
||
Forgot to ask: how are list nodes and nsHashtable allocated? The code in nsXULAttributeValue.cpp that you cite is not portable, it assumes malloc returns 0 mod 2 addresses. I believe shaver is fixing (has fixed? patch in progress) it to use nsFixedSizeAllocator, which uses an arena (arenas can ensure arbitrary alignment). JS is the mac-daddy in Mozilla of pointer-tagging, but again, its GC heap uses arenas to ensure sufficient alignment. /be
Assignee | ||
Comment 8•22 years ago
|
||
The nsHashtable objects are from |new|, the |nsRuleValue| are from the pres shell's arena (which uses PL_ARENA_CONST_ALIGN_MASK == 3). I thought the code waterson wrote that was causing problems was the nsXULElement code where he assumed that all pointers were 8-byte aligned. Are there really any allocators that don't produce 2-byte aligned pointers? Anyway, new patch coming, once I clean up a few other issues.
Comment 9•22 years ago
|
||
Oh, right -- it sounds like 4-byte alignment was observed where 8 was expected in bug 124335. You're probably ok -- any architecture that byte-aligns arbitrary memory that might contain a double at the front is not known to me, and not a port that Mozilla supports currently (JS_SetPrivate require its API users to keep private data on 0 mod 2 boundaries). Sorry for the false alarm. PLDHashTable would still be a win -- is jkeiser's C++ sugar for it ready for 1.0? /be
Assignee | ||
Comment 10•22 years ago
|
||
This patch fixes most of the comments and a few other minor things. However: * I didn't change the two old-style casts, which were why I made comment 4. * I'm still hoping that |new| is 2-byte aligned everywhere. (It certainly is on all platforms where mozilla *currently* runs.) * I'm more worried about time with the list vs. hashtable tradeoff than about space. I think rule nodes with greater than 16 children are likely to be quite rare. The big question is how slow an nsHashtable lookup is. * I might want to convert this to pldhash at some point in the future, but not as part of this patch. I'm already manually separating this patch from other changes in the same file (bug 131454).
Attachment #74565 -
Attachment is obsolete: true
Comment 11•22 years ago
|
||
Yes, and I won't have time to add C++ happiness to pldhash until post 1.0, too many other bugs.
Assignee | ||
Comment 12•22 years ago
|
||
After putting the following instrumentation in the destructor of nsRuleNode: PRInt32 count = 0; if (ChildrenAreHashed()) count = ChildrenHash()->Count(); else for (nsRuleList *curr = ChildrenList(); curr; curr = curr->mNext) ++count; printf("nsRuleNode(this=%p) has %ld children.\n", this, count); and piping the output of mozilla brining up a navigator window through '| grep RuleNode | cut -d")" -f2 | sort | uniq -c | sort -k3 -n', I ended up with the histogram: 216 has 0 children. 346 has 1 children. 44 has 2 children. 9 has 3 children. 7 has 4 children. 2 has 5 children. 2 has 6 children. 1 has 7 children. 2 has 8 children. 1 has 9 children. 1 has 20 children. 1 has 65 children.
Assignee | ||
Comment 13•22 years ago
|
||
If I brought up a bunch of web pages in addition to my homepage and also brought up the mail window, I get the histogram: 1101 has 0 children. 1231 has 1 children. 130 has 2 children. 56 has 3 children. 24 has 4 children. 16 has 5 children. 5 has 6 children. 9 has 7 children. 8 has 8 children. 3 has 9 children. 1 has 10 children. 3 has 11 children. 1 has 12 children. 1 has 13 children. 3 has 14 children. 1 has 16 children. 1 has 17 children. 1 has 19 children. 1 has 20 children. 1 has 23 children. 1 has 30 children. 1 has 35 children. 1 has 40 children. 1 has 71 children. 1 has 83 children. 1 has 102 children. I'm wondering what the best threshhold is... probably it won't make a measureable difference, though, so I may as well leave it at (or I could drop it back to 16).
Assignee | ||
Comment 14•22 years ago
|
||
For loading a navigator window with http://l10n.openoffice.org./localization/English_Russian_OOo_Glossary.htm I get: 38135 has 0 children. 478 has 1 children. 48 has 2 children. 10 has 3 children. 9 has 4 children. 4 has 5 children. 2 has 6 children. 1 has 7 children. 3 has 8 children. 1 has 9 children. 1 has 10 children. 1 has 17 children. 1 has 42 children. 1 has 46 children. 1 has 65 children. 1 has 296 children. 1 has 12502 children. 2 has 12503 children. For loading a navigator window with http://www.mozilla.org/newlayout/testcases/stress/wbtblclr.html I get: 9117 has 0 children. 314 has 1 children. 41 has 2 children. 8 has 3 children. 7 has 4 children. 2 has 5 children. 2 has 6 children. 1 has 7 children. 2 has 8 children. 1 has 9 children. 1 has 16 children. 1 has 65 children. 1 has 8911 children.
Updated•22 years ago
|
Keywords: mozilla1.0
Assignee | ||
Comment 15•22 years ago
|
||
This patch also converts to PLDHashTable. The performance numbers seem roughly the same as the previous patch (although the testcase for this bug took 48 seconds, twice, rather than the 44 I saw in one test before, but that's probably a fluke).
Attachment #74576 -
Attachment is obsolete: true
Assignee | ||
Comment 16•22 years ago
|
||
Comment on attachment 74628 [details] [diff] [review] patch also converting to PLDHashTable >+ PLDHashTable *hash = PL_NewDHashTable(&ChildrenHashOps, nsnull, >+ sizeof(ChildrenHashEntry), kMaxChildrenInList); I just replaced this, which is bad since it triggers an immediate table-grow, with: PLDHashTable *hash = PL_NewDHashTable(&ChildrenHashOps, nsnull, sizeof(ChildrenHashEntry), kMaxChildrenInList * 4); which tends a little towards the other extreme (I could have used *2, which would have given an initial alpha of just over .5 rather than just over .25), but I figure an extra 512 bytes in a few spots to save a bit of churn won't hurt.
Comment 17•22 years ago
|
||
Comment on attachment 74628 [details] [diff] [review] patch also converting to PLDHashTable >+static PLDHashTableOps ChildrenHashOps = { >+ // It's probably better to allocate the table itself using the libc >+ // allocator rather than the pres shell's arena because the table >+ // doesn't grow very often and the pres shell's arena doesn't recycle >+ // very large size allocations. Agreed. How about "using malloc and free" instead of "using the libc allocator", which is Unix-centric (not that *I* mind :-). >+ } else if (!next) { >+ // Create the new entry in our list. >+ next = new (mPresContext) nsRuleNode(mPresContext, aRule, this); >+ SetChildrenList(new (mPresContext) nsRuleList(next, ChildrenList())); > } Beware new failure here, return NS_ERROR_OUT_OF_MEMORY? >+void >+nsRuleNode::ConvertChildrenToHash() >+{ >+ NS_ASSERTION(!ChildrenAreHashed() && HaveChildren(), >+ "must have a non-empty list of children"); >+ PLDHashTable *hash = PL_NewDHashTable(&ChildrenHashOps, nsnull, >+ sizeof(ChildrenHashEntry), kMaxChildrenInList); Nit: underhanging indent should line up with initial parameter column. >+ for (nsRuleList* curr = ChildrenList(); curr; >+ curr = curr->DestroySelf(mPresContext)) { >+ ChildrenHashEntry *entry = NS_STATIC_CAST(ChildrenHashEntry*, >+ PL_DHashTableOperate(hash, curr->mRuleNode->mRule, PL_DHASH_ADD)); If (!entry), return NS_ERROR_OUT_OF_MEMORY here. >+ NS_ASSERTION(!entry->mRuleNode, "duplicate entries in list"); >+ entry->mRuleNode = curr->mRuleNode; >+ } >+ SetChildrenHash(hash); >+} Looks good otherwise, r/sr=brendan@mozilla.org -- do you not want your #ifdef DEBUG_dbaron instrumentation, for sharing and posterity? /be
Attachment #74628 -
Flags: superreview+
Comment 18•22 years ago
|
||
Forgot to say "agree on the kMaxChildrenInList*4" change dbaron mentioned in comment #16. Hyatt, can you r= so we can get this into 1.0? /be
Assignee | ||
Comment 19•22 years ago
|
||
I added an out-of-memory check that Brendan didn't mention, and I omitted the out-of-memory check within the loop in ConvertChildrenToHash, since the initial capacity of the table ensures that we won't run out.
Attachment #74628 -
Attachment is obsolete: true
Comment 20•22 years ago
|
||
Comment on attachment 74659 [details] [diff] [review] patch addressing Brendan's comments Right, I mentioned there were other operator new calls to check -- then I forgot about the preallocated table size being big enough (I'm hacking similar code in js/src, I should have known). This is all good. /be
Attachment #74659 -
Flags: superreview+
Comment 21•22 years ago
|
||
Great perf win for 54542, dbaron! A very good idea. As for alignment - nsSmallVoidArray (which once was nsCheapVoidArray in content) holds child pointers, and that already assumes that object (element) pointers are mod 2 (and has assertions to that effect).
Comment 22•22 years ago
|
||
r=hyatt
Comment 23•22 years ago
|
||
Comment on attachment 74659 [details] [diff] [review] patch addressing Brendan's comments a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #74659 -
Flags: approval+
Assignee | ||
Comment 24•22 years ago
|
||
Fix checked in 2002-03-18 17:29 PST.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 25•22 years ago
|
||
The page given in the initial description still takes nearly 2 minutes (about 110s) to load on my 1.1GHz machine with 512MB of RAM and a cable connection. Lynx did it in just over a minute (about 70s). Should I reopen this bug or file a new one? (Note. The page is around 5MB.)
Reporter | ||
Comment 26•22 years ago
|
||
profile now shows the usual "large page" stuff -- lots of hits all over, nothing standing out...
You need to log in
before you can comment on or make changes to this bug.
Description
•