697440 - JS-wrapping nsDOMTokenList (e.g. classList) is too slow

Reporter

Description

•

14 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1 Actual results: Accessing the classList property of an element is slow, however caching that classList object and re-using it is relatively fast. The caching of the DOMTokenList that classList returns should be safe and would allow for a noticeable performance increase. Exemplar benchmark : http://jsperf.com/cowbell-css-styles/2

Boris Zbarsky [:bzbarsky]

Comment 1

•

14 years ago

We do in fact cache the classList. The code is at http://hg.mozilla.org/mozilla-central/file/16a8d2ab5240/content/base/src/nsGenericElement.cpp#l1760 and looks like this: 1765 nsGenericElement::nsDOMSlots *slots = DOMSlots(); 1766 1767 if (!slots->mClassList) { ... 1775 slots->mClassList = new nsDOMTokenList(this, classAttr); .. 1777 } ... 1781 return slots->mClassList; You could also verify this by testing that the return value of the classList getter always tests === to the previous return value. Looking at that jsperf test, I see numbers around 1,700,000 for the "classList" test and around 4,900,000 for both the "classList cached prop" and "classList cached" tests. The difference almost certainly reflects the fact that the "cached" tests just read a slot on the object via an IC, which is very fast. The non-cached version ends up calling the getter, which is not optimized as well in the JS engine and then has to do a bunch of work in the glue code to get into C++ and so forth. Resummarizing to reflect that, but that's just always going to be true: reading a slot is faster than calling a getter that has to do cross-language marshaling; fact of life.

Component: General → DOM

Product: Firefox → Core

QA Contact: general → general

Summary: Element.prototype.classList is not cached → Calling classList getter in a tight loop is slower than direct slot access on the object

Boris Zbarsky [:bzbarsky]

Comment 2

•

14 years ago

That said, a major part of the cost in the getter case (80% or so) is finding the right JS reflection for the C++ object. This is fast-pathed for some object types, but not DOMTokenList. The new DOM bindings will make this all better.

Depends on: 622298

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Summary: Calling classList getter in a tight loop is slower than direct slot access on the object → JS-wrapping nsDOMTokenList (e.g. classList) is too slow

Boris Zbarsky [:bzbarsky]

Comment 3

•

14 years ago

I should add that the jsperf test is somewhat questionable, since there is a fast path that makes adding a class that's already in the class list a no-op. So unless the idea is to test the performance of that fast path, it's not a very useful test.

Jake Verbaten

Reporter

Comment 4

•

14 years ago

The jsperf test has a "teardown" set to remove the class after every instance of the test. It should be adding the class freshly each time. i.e. not calling the no-op fast path.

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

After every instance of the test is not that helpful. If you look at the jsperf docs, the resulting code that's run looks like this (for the "classList cached" test, say): var count = some_big_number; var apiSelector = document.getElementById("foo"); var jQuerySelector = $("#foo"); var clist = apiSelector.classList; apiSelector.cached = clist; var start = new Date(); while (count--) { clist.add("bar"); } var end = new Date(); clist.remove("bar"); and the test then reports some_big_number/(end - start)*1000 as the score. So you're running an actual set once and running the fast-path (some_big_number-1) times and timing that.

Boris Zbarsky [:bzbarsky]

Comment 6

•

14 years ago

Oh, and some_big_number is determined adaptively, so the faster the browser or the hardware the bigger that number is.

Jake Verbaten

Reporter

Comment 7

•

14 years ago

My mistake, testing the no-op fast path was a waste of time. After looking at the [benchmark](http://jsperf.com/cowbell-css-styles/3) again, it's clear that there is no difference between the calls anymore. i.e. the actual difference between the C++ getter and the property lookup is dwarfed by the actual class change / re-render mechanism. I presume this issue can not be closed.

Boris Zbarsky [:bzbarsky]

Comment 8

•

14 years ago

Actually, I still see about 30% difference between the calls, so we do want to keep tracking this and retest once the new DOM bindings land.

Peter Van der Beken [:peterv]

Comment 9

•

14 years ago

Attached patch New DOM bindings for DOMTokenList v1 — Details — Splinter Review

Here's a patch to enable the new DOM bindings for DOMTokenList. I was planning on doing this in the next cycle, to let the NodeList/HTMLCollection bindings go out first and minimize the risk of regressions.

Boris Zbarsky [:bzbarsky]

Comment 10

•

14 years ago

Peter, thanks! I agree with landing that after the next branchpoint. Two comments: 1) This doesn't compile as-is with GCC because apparently the syntax |T foo[] = {}| (with an empty initializer, as in the method list for settable token lists) is not actually valid C++; C++ arrays must have nonzero length or something. As a result the ArrayLength template doesn't match foo, which is of type "int[0u]". I fixed that locally by adding a specialization of that template for zero-length arrays, but it's not clear that the result is valid C++ either. :( Not sure what the right approach there is. 2) We should also change the classList quicktub and nsGenericElement getter to get a nsDOMTokenList, not nsIDOMDOMTokenList. That way we take the wrapper cache fast path instead of ending up in xpconnect land and then discovering we have a wrapper cache. Without #2, this patch is a 25% speedup or so for the uncached testcase in http://jsperf.com/cowbell-css-styles/2 but with #2 it's more like a 50% speedup. The one drawback is that the cached case slows down by about 20% with this patch. That seems mostly to be due to the slowness in getting the "add" method off the proxy on the JM end. Hopefully the JS engine folks will speed that up. A related question: should we inline getListObject()? And maybe the non-security-wrapper path in instanceIsListObject()? Both of those are showing up somewhat in this profile.

Peter Van der Beken [:peterv]

Comment 11

•

14 years ago

(In reply to Boris Zbarsky (:bz) from comment #10) > 1) This doesn't compile as-is with GCC because apparently the syntax |T > foo[] = {}| (with an empty initializer, as in the method list for settable > token lists) is not actually valid C++; C++ arrays must have nonzero length > or something. Sorry, I forgot about that. I have a patch in my queue to fix it (adds an additional empty value to all property and method lists) but it's a bit ugly :-/. > 2) We should also change the classList quicktub and nsGenericElement getter > to get a nsDOMTokenList, not nsIDOMDOMTokenList. That way we take the > wrapper cache fast path instead of ending up in xpconnect land and then > discovering we have a wrapper cache. Ok, good point. > A related question: should we inline getListObject()? And maybe the > non-security-wrapper path in instanceIsListObject()? Both of those are > showing up somewhat in this profile. Ok, I'll see what I can do.

Guilherme Lima

Comment 12

•

12 years ago

Seems like this bug have a patch that was close to landing, but never did. Nightly 26 classList 90,812 ±1.96% 7% slower classList variable 96,871 ±0.98% fastest classList cached 97,315 ±1.13% fastest jQuery 68,756 ±0.40% 29% slower Chrome 29 classList 330,658 ±1.32% 6% slower classList variable 347,398 ±0.57% fastest classList cached 332,434 ±0.43% 4% slower jQuery 169,723 ±0.89% 51% slower

Peter Van der Beken [:peterv]

Comment 13

•

12 years ago

This was done as part of bug 788532.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → DUPLICATE

Peter Van der Beken [:peterv]

Comment 14

•

12 years ago

Er, I mean bug 717013.

Boris Zbarsky [:bzbarsky]

Comment 15

•

12 years ago

Odd. On a current nightly, I see numbers pretty similar to Chrome's, not 3x slower.....

Guilherme Lima

Comment 16

•

12 years ago

Attached file from the webdeveloper profiler — Details

Does this help to investigate?

Boris Zbarsky [:bzbarsky]

Comment 17

•

12 years ago

That file seems to not have a usable profile, at least assuming it's supposed to be loaded via Cleopatra. Here's a better question: did your nightly have the fix for bug 910795? If not, what do the numbers look like in today's nightly?

Guilherme Lima

Comment 18

•

12 years ago

I think yesterday's nightly already had that patch, and had the same performance. Anyway, today's nightly is a little bit faster. classList 104,073 ±0.60% 5% slower classList variable 109,638 ±0.33% fastest classList cached 109,137 ±0.51% fastest jQuery 74,895 ±0.36% 32% slower

Boris Zbarsky [:bzbarsky]

Comment 19

•

12 years ago

That's really odd. My Firefox numbers are 3x faster than yours, while my Chrome numbers are a bit slower than yours (though I'm testing in Chrome 31, not Chrome 29).

Guilherme Lima

Comment 20

•

12 years ago

I think you have a faster machine than mine, so it's probably a Chrome problem. This machine has a i3-2350M @2.30GHz

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

New DOM bindings for DOMTokenList v1 14 years ago Peter Van der Beken [:peterv] 11.09 KB, patch		Details \| Diff \| Splinter Review
from the webdeveloper profiler 12 years ago Guilherme Lima 303.13 KB, application/octet-stream		Details