Closed Bug 697440 Opened 13 years ago Closed 11 years ago

JS-wrapping nsDOMTokenList (e.g. classList) is too slow

Categories

(Core :: DOM: Core & HTML, defect)

7 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 717013

People

(Reporter: raynos2, Unassigned)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1



Actual results:

Accessing the classList property of an element is slow, however caching that classList object and re-using it is relatively fast.

The caching of the DOMTokenList that classList returns should be safe and would allow for a noticeable performance increase.

Exemplar benchmark : http://jsperf.com/cowbell-css-styles/2
We do in fact cache the classList.  The code is at http://hg.mozilla.org/mozilla-central/file/16a8d2ab5240/content/base/src/nsGenericElement.cpp#l1760 and looks like this:

  1765   nsGenericElement::nsDOMSlots *slots = DOMSlots();
  1766 
  1767   if (!slots->mClassList) {
...
  1775     slots->mClassList = new nsDOMTokenList(this, classAttr);
..
  1777   }
...
  1781   return slots->mClassList;

You could also verify this by testing that the return value of the classList getter always tests === to the previous return value.

Looking at that jsperf test, I see numbers around 1,700,000 for the "classList" test and around 4,900,000 for both the "classList cached prop" and "classList cached" tests.

The difference almost certainly reflects the fact that the "cached" tests just read a slot on the object via an IC, which is very fast.  The non-cached version ends up calling the getter, which is not optimized as well in the JS engine and then has to do a bunch of work in the glue code to get into C++ and so forth.

Resummarizing to reflect that, but that's just always going to be true: reading a slot is faster than calling a getter that has to do cross-language marshaling; fact of life.
Component: General → DOM
Product: Firefox → Core
QA Contact: general → general
Summary: Element.prototype.classList is not cached → Calling classList getter in a tight loop is slower than direct slot access on the object
That said, a major part of the cost in the getter case (80% or so) is finding the right JS reflection for the C++ object.  This is fast-pathed for some object types, but not DOMTokenList.

The new DOM bindings will make this all better.
Depends on: 622298
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Calling classList getter in a tight loop is slower than direct slot access on the object → JS-wrapping nsDOMTokenList (e.g. classList) is too slow
I should add that the jsperf test is somewhat questionable, since there is a fast path that makes adding a class that's already in the class list a no-op.  So unless the idea is to test the performance of that fast path, it's not a very useful test.
The jsperf test has a "teardown" set to remove the class after every instance of the test.

It should be adding the class freshly each time. i.e. not calling the no-op fast path.
After every instance of the test is not that helpful.  If you look at the jsperf docs, the resulting code  that's run looks like this (for the "classList cached" test, say):

    var count = some_big_number;
    var apiSelector = document.getElementById("foo");
    var jQuerySelector = $("#foo");
    var clist = apiSelector.classList;
    apiSelector.cached = clist;
    var start = new Date();
    while (count--) {
      clist.add("bar");
    }
    var end = new Date();
    clist.remove("bar");

and the test then reports some_big_number/(end - start)*1000 as the score.  So you're running an actual set once and running the fast-path (some_big_number-1) times and timing that.
Oh, and some_big_number is determined adaptively, so the faster the browser or the hardware the bigger that number is.
My mistake, testing the no-op fast path was a waste of time. 

After looking at the [benchmark](http://jsperf.com/cowbell-css-styles/3) again, it's clear that there is no difference between the calls anymore.

i.e. the actual difference between the C++ getter and the property lookup is dwarfed by the actual class change / re-render mechanism.

I presume this issue can not be closed.
Actually, I still see about 30% difference between the calls, so we do want to keep tracking this and retest once the new DOM bindings land.
Here's a patch to enable the new DOM bindings for DOMTokenList. I was planning on doing this in the next cycle, to let the NodeList/HTMLCollection bindings go out first and minimize the risk of regressions.
Peter, thanks!  I agree with landing that after the next branchpoint.

Two comments:

1)  This doesn't compile as-is with GCC because apparently the syntax |T foo[] = {}| (with an empty initializer, as in the method list for settable token lists) is not actually valid C++; C++ arrays must have nonzero length or something.  As a result the ArrayLength template doesn't match foo, which is of type "int[0u]".  I fixed that locally by adding a specialization of that template for zero-length arrays, but it's not clear that the result is valid C++ either.  :(  Not sure what the right approach there is.

2)  We should also change the classList quicktub and nsGenericElement getter to get a nsDOMTokenList, not nsIDOMDOMTokenList.  That way we take the wrapper cache fast path instead of ending up in xpconnect land and then discovering we have a wrapper cache.

Without #2, this patch is a 25% speedup or so for the uncached testcase in http://jsperf.com/cowbell-css-styles/2 but with #2 it's more like a 50% speedup.

The one drawback is that the cached case slows down by about 20% with this patch.  That seems mostly to be due to the slowness in getting the "add" method off the proxy on the JM end. Hopefully the JS engine folks will speed that up.

A related question: should we inline getListObject()?  And maybe the non-security-wrapper path in instanceIsListObject()?  Both of those are showing up somewhat in this profile.
(In reply to Boris Zbarsky (:bz) from comment #10)
> 1)  This doesn't compile as-is with GCC because apparently the syntax |T
> foo[] = {}| (with an empty initializer, as in the method list for settable
> token lists) is not actually valid C++; C++ arrays must have nonzero length
> or something.

Sorry, I forgot about that. I have a patch in my queue to fix it (adds an additional empty value to all property and method lists) but it's a bit ugly :-/.

> 2)  We should also change the classList quicktub and nsGenericElement getter
> to get a nsDOMTokenList, not nsIDOMDOMTokenList.  That way we take the
> wrapper cache fast path instead of ending up in xpconnect land and then
> discovering we have a wrapper cache.

Ok, good point.

> A related question: should we inline getListObject()?  And maybe the
> non-security-wrapper path in instanceIsListObject()?  Both of those are
> showing up somewhat in this profile.

Ok, I'll see what I can do.
Seems like this bug have a patch that was close to landing, but never did.

Nightly 26
classList 90,812 ±1.96% 7% slower
classList variable 96,871 ±0.98% fastest
classList cached 97,315 ±1.13% fastest
jQuery 68,756 ±0.40% 29% slower

Chrome 29
classList 330,658 ±1.32% 6% slower
classList variable 347,398 ±0.57% fastest
classList cached 332,434 ±0.43% 4% slower
jQuery 169,723 ±0.89% 51% slower
This was done as part of bug 788532.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Odd.  On a current nightly, I see numbers pretty similar to Chrome's, not 3x slower.....
Does this help to investigate?
That file seems to not have a usable profile, at least assuming it's supposed to be loaded via Cleopatra.

Here's a better question: did your nightly have the fix for bug 910795?  If not, what do the numbers look like in today's nightly?
I think yesterday's nightly already had that patch, and had the same performance.
Anyway, today's nightly is a little bit faster.

classList 104,073 ±0.60% 5% slower
classList variable 109,638 ±0.33% fastest
classList cached 109,137 ±0.51% fastest
jQuery 74,895 ±0.36% 32% slower
That's really odd.  My Firefox numbers are 3x faster than yours, while my Chrome numbers are a bit slower than yours (though I'm testing in Chrome 31, not Chrome 29).
I think you have a faster machine than mine, so it's probably a Chrome problem. This machine has a i3-2350M @2.30GHz
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.