Closed
Bug 513169
Opened 16 years ago
Closed 1 year ago
Investigate DOM child list storage strategies
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
WONTFIX
| Tracking | Status | |
|---|---|---|
| blocking2.0 | --- | - |
People
(Reporter: bzbarsky, Assigned: peterv)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(3 files, 1 obsolete file)
|
16.47 KB,
patch
|
Details | Diff | Splinter Review | |
|
4.39 KB,
patch
|
Details | Diff | Splinter Review | |
|
23.84 KB,
patch
|
Details | Diff | Splinter Review |
We have several issues that show up as performance problems:
1) Position comparisons can be slow if comparing the position of two nodes far
apart in a long child list, because we lose the locality of reference on the
IndexOf cache.
2) Mutations of the DOM need to realloc the child pointer array (and memmove,
too).
A possible approach to this is to switch to a setup like webkit's with a doubly-linked list and some sort of cache for handling [n] on the childNodes (not sure what else they cache). We could include something like the setup described http://groups.google.com/group/mozilla.dev.tech.dom/browse_thread/thread/cf57c56ba10e5c80# to make position comparisons fast (needed for layout). We'd need to eliminate our use of indexOf and change the document observer ContentAppended to pass in the first new node, not the first index. We'd need to change all the iterations over kids to not use GetChildArray or GetChildAt.
We should get numbers on this.
| Reporter | ||
Comment 1•16 years ago
|
||
One other note. roc suggested that the right way to deal with running out of precision is to not renumber on insert but rather allow the indices to be nondecreasing (rather than increasing). Then renumber when two indices are equal on position compare.
couldn't we use float instead of double to save a little memory?
| Reporter | ||
Comment 3•16 years ago
|
||
Sure. If you look at the diff, it's actually just using a typedef to define the type; switching it from double to float is a one-liner. It might be worth measuring both and seeing what things look like.
Comment 4•16 years ago
|
||
(In reply to comment #2)
> couldn't we use float instead of double to save a little memory?
float doesn't preserve PRUint32's precision, unless we enforce an arbitrary limit that an element may have no more than 2**(24ish) children.
| Reporter | ||
Comment 5•16 years ago
|
||
In our current implementation, the child count integer is 22 bits unsigned (see nsAttrAndChildArray::ChildCount). So if we limit number of kids to the mantissa of a float (which is 23 bits), we actually allow more kids.
| Assignee | ||
Updated•15 years ago
|
Assignee: nobody → peterv
Status: NEW → ASSIGNED
Comment 6•15 years ago
|
||
(In reply to comment #1)
> Created an attachment (id=397193) [details]
> Patch to do the double thing in our current world
I had an idea about making the reindexing more efficient by using 2d indices instead of doubles, but in practice it didn't turn out to be useful. I also wrote a patch on top of this one to make the renumbering more efficient by renumbering a smaller range (actually, the minimal range that we can get away with), but that didn't help much as well because it turns out that by making the renumbering range smaller, we would have to do more renumbering phases because at any given execution of the renumbering phase, the indices are not separated by enough space apart, so this didn't turn out to be a performance win over your patch.
> One other note. roc suggested that the right way to deal with running out of
> precision is to not renumber on insert but rather allow the indices to be
> nondecreasing (rather than increasing). Then renumber when two indices are
> equal on position compare.
I'm not sure if this only needs to be done on position compare, or on every GetIndex call (because otherwise, we probably need to remember to call a renumbering function wherever we use GetIndex for position comparison at least).
And furthermore, it seems to me that this only optimizes the case where we insert a lot of nodes in a batch without needing to do any position comparison. Is that the only case we want to optimize against?
| Reporter | ||
Comment 7•15 years ago
|
||
> I'm not sure if this only needs to be done on position compare, or on every
> GetIndex call
I'm happy to nix GetIndex in favor of an explicit position compare function on nodes if that makes things simpler.
> And furthermore, it seems to me that this only optimizes the case where we
> insert a lot of nodes in a batch without needing to do any position comparison.
I believe this is the common case, esp once we have lazy frame construction.
Comment 8•15 years ago
|
||
This is the patch I was talking about in comment 6. This should be applied over attachment 397193 [details] [diff] [review]. The code needs some cleanup, but I didn't spend time on that because I don't think it's actually worth taking.
Comment 9•15 years ago
|
||
(In reply to comment #7)
> > I'm not sure if this only needs to be done on position compare, or on every
> > GetIndex call
>
> I'm happy to nix GetIndex in favor of an explicit position compare function on
> nodes if that makes things simpler.
>
> > And furthermore, it seems to me that this only optimizes the case where we
> > insert a lot of nodes in a batch without needing to do any position comparison.
>
> I believe this is the common case, esp once we have lazy frame construction.
Hmm, in that case I think we're better off with a linear renumbering of the entire child list in order to avoid having to do too many renumbering passes over small sets of children. What do you think?
| Reporter | ||
Comment 10•15 years ago
|
||
(In reply to comment #9)
> Hmm, in that case I think we're better off with a linear renumbering of the
> entire child list in order to avoid having to do too many renumbering passes
> over small sets of children. What do you think?
That sounds like it could easily fall into pathological behavior.... It also seems like we shouldn't end up with many renumbering passes in the common case.
Comment 11•15 years ago
|
||
(In reply to comment #10)
> (In reply to comment #9)
> > Hmm, in that case I think we're better off with a linear renumbering of the
> > entire child list in order to avoid having to do too many renumbering passes
> > over small sets of children. What do you think?
>
> That sounds like it could easily fall into pathological behavior.... It also
> seems like we shouldn't end up with many renumbering passes in the common case.
So, are you suggesting a localized renumbering pass, such as attachment 417542 [details]?
Comment 12•15 years ago
|
||
Comment on attachment 397193 [details] [diff] [review]
Patch to do the double thing in our current world
>+ NS_ASSERTION(kidIndex != targetIndex, "Wrong kid at our index?");
>+ NS_WARNING("Inefficient IndexOf call: child doesn't seem to have the right "
>+ "parent");
BTW, is it expected with your patch to get this assertion and warning?
| Reporter | ||
Comment 13•15 years ago
|
||
> So, are you suggesting a localized renumbering pass, such as attachment 417542 [details]?
I think so, yes.
> BTW, is it expected with your patch to get this assertion and warning?
The latter is possible if a consumer screws up. The former... I suppose one could get it if the latter happens. We should fix that caller!
Comment 14•15 years ago
|
||
This applies on top of the previous patch. The performance is not so great yet, but at least now it doesn't crash! :-)
I'll look at improving the performance of this patch tomorrow to see if something useful can be made out of it.
Comment 15•15 years ago
|
||
This patch doesn't assert or crash, but I'm still seeing huge (correctness, not performance) problems with it when I play with the browser. I'm posting this here in case someone is interested to pick up on this work, because I don't think that this approach is going to pay off on performance much. It improves some aspects of our performance in some testcases, but degrades the performance in other testcases.
I think we need to explore the doubly-linked list approach as an alternative here, as stated in previous comments.
Attachment #417848 -
Attachment is obsolete: true
| Reporter | ||
Comment 16•15 years ago
|
||
I was just looking at some of our style resolution code and discovered that a significant performance issue there is nsAttrAndChildArray::NonMappedAttrCount. Then I looked at the L2 cache miss stats, and discovered that all the cost is due to L2 cache misses on this line:
while (count > 0 && !mImpl->mBuffer[(count - 1) * ATTRSIZE])
Our typical "fast" DOM traversal would have a node, do some stuff with it, then look up the address mImpl points to (probably a cache hit, since the whole DOM node fits in a cache line), then load address pointed to by mImpl plus the offset to mBuffer (possibly a cache miss), then load the address pointed to by that plus the offset of the node we want (another possible cache miss). That gives us the node pointer. Then we may or may not hit on the actual access to the node.
In contrast, the doubly-linked list case is a cache hit on the firstchild address lookup and possibly on the nextsibling address lookup (depending on the size of the traversed subtree and whether our node got evicted from the cache during it). Actually dereferencing the pointer is a maybe-miss just like in our code.
So while an array in theory gives better cache behavior, the particular way we do it happens to actually suck more cache-wise... To not suck more the array buffer would need to be directly pointed to from the node. To be better the buffer would need to be part of the node's memory or the array would need to store nodes instead of addresses. Or something.
Updated•15 years ago
|
blocking1.9.2: --- → ?
Updated•15 years ago
|
blocking1.9.2: ? → ---
blocking2.0: --- → ?
Comment 17•15 years ago
|
||
Not blocking the release on this, but we'll be working on figuring out what to do here independent of our release schedules.
blocking2.0: ? → -
Comment 18•15 years ago
|
||
has peterv really taken this on? or the assignment a mistake and it's ehsan?
=> major to match severity of Bug 440641 - Sending longish plaintext message hangs Thunderbird
Severity: normal → major
Keywords: perf
Comment 19•15 years ago
|
||
I'm not working on this again any time soon...
Updated•14 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
Comment 20•3 years ago
|
||
In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.
Severity: major → --
| Assignee | ||
Updated•1 year ago
|
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•