Closed Bug 105783 Opened 23 years ago Closed 23 years ago

Make nsXULSortService greasy fast

Categories

(Core :: XUL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: tingley, Assigned: tingley)

References

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

Filing this to get down some of the stuff I had noticed while trying to
understand why building (opening) the bookmarks menu the first time takes so
darn long.

<template> and associated baggage isn't terribly hip these days, but I looked at
the XUL template code a tiny bit for another bug, and maybe life's too short to
understand both it and the outliner.

I used my stanard 1000+ element bookmark file from bug 69009 and starting
tracing (I still have no jprof).  Building the bookmarks menu took 9.3 million
ticks of the PR_Now() clock.  The first million of these were spent in
initial propogation [sic] through the rule network; the remainder was in
content building.

There's a *lot* going on in there, and I don't understand all of it.  The
obvious big nail is nsXULSortService::InplaceSort(), which consumes roughly
6.1 million ticks -- two thirds of the time.  Much of that is split between
two hot spots, each of which eat about 2.5 million ticks.

The first is nsCollation::CompareString; in particular, the call at 
http://lxr.mozilla.org/seamonkey/source/content/xul/templates/src/nsXULSortService.cpp#936

CompareString has been identified elsewhere as a dog (bug 99383; wontfix),
and implicated in the pokiness of the XUL directory viewer (bug xxxxx).
bbaetz pointed out that the outliner doesn't use it -- it just calls
CompareRawSortKey().  Coincidentally, the outliner has a reputation as
being "fast".  How it manages to do raw key comparisons and still sort
properly is a mystery.  Since the outliner apparently does it...

The second problem area is in GetNodeValue(); the majority (1.8 million or
so) is spent in the block beginning at 
http://lxr.mozilla.org/seamonkey/source/content/xul/templates/src/nsXULSortService.cpp#1332

(in the first of the two GetNodeValue() implementations) which reproduces
the IndexOf() logic from RDFContainerUtilsImpl.  Unfortunately, lightning
doesn't strike twice -- the code is walking the arcs backwards as it
should.  It seems like we're just grovelling through a lot of arcs.
Looking at this particular spot made me start thinking about a memoizing
nsIRDFContainer implementation that would cache IndexOf() results.  In
fact, over the course of an insertion, GetNodeValue() is called repeatedly
for the key node -- if nothing else, reusing this value should hopefully be easy.
cc'ing RDF people and miscellaneous dudes.

That "bug xxxxx" is supposed to be bug 69185.

ccing hyatt, who never did answer my question about how outliner sorting works.

When I was discussing bug 99383 on irc (with ftang, possibly, but I really can't
remember), someone mentioned that mailnews generates its own sort key stuff
somewhere. Does it sort on an extra (undisplayed) column, or something?
> In fact, over the course of an insertion, GetNodeValue() is called repeatedly
> for the key node -- if nothing else, reusing this value should hopefully be 
> easy.

I take this back; the sortInfo already does this (cacheFirstNode).  


bbaetz: outliner in the raw has no notion of sorting. rdfliner has a notion of
sorting, which takes the good and leaves the bad from the XULSortService.
For the rdf container sorting case (as in bookmarks menu construction), there's
certainly no reason why we should be suffering through the overhead of
CompareString() -- all this nonsense about writing out the index as a string and
then doing a textual comparison needs to go.  I'm going to work on a patch along
these lines.
Progress.

I produced successive versions of the code which did the following:

a) Special-cased natural order comparison in InplaceSort() to allow integer
compares of the indexes, rather than writing them out to strings things like
"000042" and feeding them into the nsCollation meatgrinder.

b) "Partial caching": Same as a), but re-added caching (which didn't work
for this case any more after my changes) of the "static" first element to
avoid calling IndexOf() repeatedly on the key element when doing insertion.

c) "Full caching": Same as a), but modified RDFContainerImpl to memoize
IndexOf() results in a PLDHashTable.  

Results (1070 bookmarks):

Initial Time:               8106634
With a) [integer compares]: 6058016
With b) [partial caching]:  5194734
With c) [full caching]:     3999085

Total time taken by the InplaceSort() calls dropped to a half a million
ticks, a change of an order of magnitude or so.

w00t.

a) and b) are pretty simple stuff, though I have to clean the code up in a
number of ways (and possibly undo any damage I may have done to some other
form of sorting -- I was watching the Vikings game part of the time,
and...).  I think the caching RDF container is cool, but will need more
work; none of the code is written that would be needed to keep it
consistent if elements were removed, etc.. there's probably a couple good
segfaults in the making.  (And it doesn't behave differently for the other
types of container.  Oops.  But who uses rdf:Alt?)  Also, it's tempting to
make it a separate implementation of nsIRDFContainer since the current one
is widely used.  

I'll keep playing with this stuff.
Blocks: 91351
Keywords: perf
At some point I stopped experimenting and started rewriting this thing.  My
goals are to both speed up the code as well as to clean it up, in particular
eliminating the redundant APIs by mapping back to the contentSortInfo in the
InsertContainerNode case.

I started this wednesday night.  At the moment I've only got the natural order
sort code working again, but it's a full 10x faster than the original --
building the bookmark menu is down to 2870596 usec, of which calls to
InsertContainerNode make up only 630843.

My worry now is that I'm going to do this and then the code is going to be
removed from the tree.  In particular, I'm worried about bug 97062 (give the
outliner a content model), in which hyatt states as a goal the removal of <tree>
code from the build and replacing it entirely with outliner.

So... before I spend the next week hacking on this, can someone give me an idea
as to whether I'm wasting my time?  Hyatt/Waterson, I'm especially interested in
hearing from you.

thanks
You won't catch me going anywhere near that code.
hyatt is the definitive master of XUL's future, but as I understand it, <tree>
is here to stay, even if only castrated down to a <listbox> tag. (That said, I
certainly wouldn't _mind_ seeing <tree> go, since it has its claws in just about
every area of the frame construction and management process.)

hyatt, what say you?
Hmm, now that I think about it, wasn't I originally looking at menu performance
in this bug?  Even if <tree> goes, templated menus etc are still (for now) using
this code.
Blocks: 66628
Blocks: 49142
Working on this code is like having breakfast with the devil.

However, I believe the end is in sight; if I can get bookmark separators to
behave sanely I'll be about ready to put a patch up.  Not tonight, though.
Blocks: 48774
Attached patch the short version (obsolete) — Splinter Review
I have a much larger patch in my tree which is a rewrite of nearly all of this
code.  I'm still not happy with it, though, and I'm wary of trying to get
review for 3000 line patches.  In the mean time, I'm offering a shorter
version which contains two significant speedups as well as some very basic
cleanup.

1)  Use integer math to compare sequence indexes.  The old code took the
result of IndexOf() (or rather equivalent code), stuffed it into an rdf
literal, built a collation key from it, and then did a key comparison!	This
resulted in horrible perf due to the infamous CompareString(), and was also a
bit goofy.  To wedge this in without hacking the APIs all to pieces, I
replaces the nsIRDFNode* params in GetNodeValue and CompareNodes with a
simple SortNode that can hold different types of values.

2)  Fix optimization for in-order inserts.  There was code to do this already
(see the lastWasLast, lastWasFirst checks in InsertContainerNode), but it was
badly broken.  In fact, the whole insertion sort loop was taking the index
boundaries to be from |staticCount| to |staticCount+numChildren|, which is
wrong -- the static children are a subset of the total, so
|staticCount+numChildren| is larger than the number of elements we're sorting
on.  Cleaning up this logic makes in-order inserts (as in the case of menu
building) much faster, since it will only compare against the last node as
long as |lastWasLast| is set.

3)  Cleanup: remove redundant #includes, explicit IID -> NS_GET_IID, fix 
spots that duplicated containerUtils code ("manual" IsSeq and IndexOf
opserations), replace some redundant atoms with nsXULAtoms:: equivalents.

There are still many strange and wonderous things in this file.  However, the
two functional changes speed up sorted numerical inserts (read: menu building)
considerably.  For the 1000-bookmark menu test, before and after times are 
approx:
		  Total time   |  Time in InsertContainerNode()
	Before:    8,600,000	   6,510,000 
	After:	   2,750,000	     675,000

So it's a 10x speedup on the actual sorting time; the rest is spent in
template/rule propagation (I retract my snide use of [sic] in the initial
comment now that this has been fixed!).

I *believe* this patch also speeds up startup time, as it appears
that something listens to rdf:bookmarks and builds its content before the
first nav window appears.

Although the integer math should speed up in-place sorting based on natural
order, beyond that there isn't much here for other types of sorts (bbaetz: xul
directory viewer builds in sorted order based on name, so don't expect much
speedup from this patch).  I plan on producing other patches to improve other
cases (mainly through reuse of generated collation keys), as well as more code
cleanup (rrr, those redundant APIs).  Also, I've marked various dependent bugs
(more or less for tracking purposes); none of them are fixed here.

This new bugzilla attachment interface is slick.
Keywords: patch, review
Comment on attachment 57045 [details] [diff] [review]
the short version

hello, nsIRDFInt.
Attachment #57045 - Attachment is obsolete: true
Attached patch v2Splinter Review
much cleaner version using nsIRDFInt and scrapping all the SortNode junk.
slightly slower (QI overhead + literal creation?) -- about 2.9 million ticks
compared to 2.75 million, but still a huge improvement.
I have a patche for #2 as part of the patch for bug 96108 (voidarray index
bounds cleanup/inlining).  Check the latest patch for 96108 and compare the
changes in nsXULSortService.cpp.

Also, I noticed in your changes for #2 you use numChildren.  Note that above in
the code numChildren is modified by subtracting staticCount from it(!).  I
side-stepped the issue by adding a realNumChildren that isn't modified.
Blocks: 96108
staticCount is subtracted from numChildren in two cases.  The first is if
staticCount is negative (=> sortStaticsLast was set) and it is added to
numChildren.  The second is if staticCount is *greater* than numChildren, which
should never happen (this block makes no sense -- the net result seems to be
that numChildren is set to 0.)

In the first case, I think this is the correct behavior.  If there are 10
children, with 3 statics, we would normally look at the last seven (left = 4,
right = 10).  But if we want static elements to be sorted last, the subtraction
would force us to consider (left = 0, right = 7), so any inserted node came
before the static elements (assuming they were already last....).  Is this wrong?
Similarly (now looking at your changes in bug 96108), the check to set
sortState->lastWasFirst needs to be fixed -- the last (most recently inserted)
node is "first" if it is inserted at the leftmost index under consideration. 
This is actually at |staticCount|, one less than the original value for |left|
(since we insert at |x-1| if |direction<0|).  

Other than that the changes are basically equivalent.
Looks good. Feel free (but not obliged) to move any of the other atoms into
nsXULAtoms.
Comment on attachment 57073 [details] [diff] [review]
v2

sr=waterson
Attachment #57073 - Flags: superreview+
Target Milestone: --- → mozilla0.9.7
Comment on attachment 57073 [details] [diff] [review]
v2

r=rjc
Attachment #57073 - Flags: review+
Go for it.  I'll remove/modify my patch for nsXULSortService in bug 96108
I haven't heard back from Marcia Knous about cvs access yet.  Waterson, feel
free to check this in.
If you like I can check it in when the tree opens
That would be great.
My cvs account has been activated; I'll check this in tonight.
Fix checked in.  I'm tempted to leave this bug open to track the other things
here I need to improve.
Keywords: patch, review
Resist temptation. Omnibus bugs bad; focused bugs good.  :-]
Allright, then.

Clearing blocks on a couple things that didn't get taken care of in this patch
(actually, it's possible that fixing the indexing took care of sortStaticLast,
but I'm not sure).

No longer blocks: 48774, 66628
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: jrgmorrison → xptoolkit.widgets
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: