nsSlots::mMutationObservers is slow

NEW
Unassigned

Status

()

Core
DOM
11 years ago
2 years ago

People

(Reporter: Robert Sayre, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments)

(Reporter)

Description

11 years ago
While running a simple getElementsByClassName benchmark in Shark, 20-30% of time is spent in the constructor of nsContentList, in mRootNode->AddMutationObserver(this); (see URL for lxr link)

The destructor is also expensive.

Judging by the AppendUnlessExists method, it looks like an array is bad a choice for this job. There are iterator mutation and order guarantees provided by nsTObserverArray. Aside from the guarantee that documents are called first, I'm not sure which are important for this field.

I have a quick and dirty patch that uses google::dense_hash_set with a separate pointer for the first observer notification (instead of calling Prepend). It shows a substantial improvement in getElementsByClassName performance--with the matching code at the top of the profile, instead of the nsContentList constructor.
(Reporter)

Comment 1

11 years ago
Created attachment 294629 [details]
benchmark
(Reporter)

Comment 2

11 years ago
Created attachment 294631 [details] [diff] [review]
demo patch: use a fast hashset instead of an array

        Nightly trunk times: 635 826 716 620 953 658 616 939 652 617
Patched, symbols, optimized: 609 560 665 555 655 555 554 553 648 554
Is the issue with the benchmark just that we have lots of nsContentLists in flight all at once (waiting for GC)?

Looking at the patch, it doesn't look like it gives the right behavior.  In particular, it can notify an observer that has already been removed, and won't notify newly-added observers.  The latter probably doesn't matter; I think we might actually depend on the former.

I'm also not that happy with the allocation that has to happen in the notification method.  We call those a _lot_ when we mutate the DOM (something this testcase is not doing, I assume).

It might be a better idea to change the AppendUnlessExists caller to not call it.  For example, we could have a "safe" way to add observers (which does the check) and a "fast" way (which doesn't).  With a very few exceptions, I'm pretty sure observers can take the "fast" way; certainly nsContentList can, since it's being added when it's constructed.  There will still be a cost when _destroying_ all those content lists.  But in practice I doubt that we have that many content lists in flight all at once...

Another thing we could do, perhaps, is have a hashtable similar to the one for tagname-based lists for the function-based ones.  Looking at that benchmark, we really just want to create a single nsContentList and share it, right?
> But in practice I doubt that we have that many content lists in flight all at
> once...

In other words, I don't think this benchmark necessarily represents a realistic workload.  If I'm wrong, then we should think about the hashtable solution, possibly even customized for getElementsByClassName.  But it'd be good to have data on actual getElementsByClassName usage before doing that sort of thing.
(Reporter)

Comment 5

11 years ago
(In reply to comment #3)
> Is the issue with the benchmark just that we have lots of nsContentLists in
> flight all at once (waiting for GC)?

Yep, could be.

> Looking at the patch, it doesn't look like it gives the right behavior.  In
> particular, it can notify an observer that has already been removed, and won't
> notify newly-added observers.  The latter probably doesn't matter; I think we
> might actually depend on the former.

dense_hash_set iterators are stable allow removals. For additions, we could keep a table of things to add at the end of iteration.

> I'm also not that happy with the allocation that has to happen in the
> notification method.  We call those a _lot_ when we mutate the DOM (something
> this testcase is not doing, I assume).

Yeah, but that can be avoided as above.

> 
> It might be a better idea to change the AppendUnlessExists caller to not call
> it.  For example, we could have a "safe" way to add observers (which does the
> check) and a "fast" way (which doesn't).

That might be better.
(Reporter)

Comment 6

11 years ago
>
> dense_hash_set iterators are stable allow removals. 

er, dense_hash_set iterators allow removals (but not inserts).
(Reporter)

Comment 7

11 years ago
Created attachment 294637 [details]
shark profile -- functions above 1%, system calls charged to caller

Here's the profile of the benchmark, flawed though it may be.
Created attachment 294640 [details] [diff] [review]
FastAddMutationObserver

This should help a bit.
Do we have any other (maybe even real world) testcases where Add/RemoveMutationObserver shows up in profiles?
It might come up with spellcheck in a very large textarea with many mis-spellings, perhaps (one observer per mis-spelled word, I would think).  Not sure whether the observer-adding cost would be noticeable there.

You could probably get a similar effect to this testcase with GetElementsByTagName and such, of course.

Past that, look at callsites of AddMutationObserver.
Yeah, I think getting rid of the Contains() call here is the way to go. I've always found it odd that it's there, but I've carried it forward. I'd check all callsites and check if there really are any that aren't already safe, and if there really really are callers out there that needs the check, add a special slow path.

Updated

11 years ago
Keywords: perf
You need to log in before you can comment on or make changes to this bug.