Closed
Bug 697440
Opened 13 years ago
Closed 11 years ago
JS-wrapping nsDOMTokenList (e.g. classList) is too slow
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 717013
People
(Reporter: raynos2, Unassigned)
References
Details
Attachments
(2 files)
11.09 KB,
patch
|
Details | Diff | Splinter Review | |
303.13 KB,
application/octet-stream
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.186 Safari/535.1 Actual results: Accessing the classList property of an element is slow, however caching that classList object and re-using it is relatively fast. The caching of the DOMTokenList that classList returns should be safe and would allow for a noticeable performance increase. Exemplar benchmark : http://jsperf.com/cowbell-css-styles/2
Comment 1•13 years ago
|
||
We do in fact cache the classList. The code is at http://hg.mozilla.org/mozilla-central/file/16a8d2ab5240/content/base/src/nsGenericElement.cpp#l1760 and looks like this: 1765 nsGenericElement::nsDOMSlots *slots = DOMSlots(); 1766 1767 if (!slots->mClassList) { ... 1775 slots->mClassList = new nsDOMTokenList(this, classAttr); .. 1777 } ... 1781 return slots->mClassList; You could also verify this by testing that the return value of the classList getter always tests === to the previous return value. Looking at that jsperf test, I see numbers around 1,700,000 for the "classList" test and around 4,900,000 for both the "classList cached prop" and "classList cached" tests. The difference almost certainly reflects the fact that the "cached" tests just read a slot on the object via an IC, which is very fast. The non-cached version ends up calling the getter, which is not optimized as well in the JS engine and then has to do a bunch of work in the glue code to get into C++ and so forth. Resummarizing to reflect that, but that's just always going to be true: reading a slot is faster than calling a getter that has to do cross-language marshaling; fact of life.
Component: General → DOM
Product: Firefox → Core
QA Contact: general → general
Summary: Element.prototype.classList is not cached → Calling classList getter in a tight loop is slower than direct slot access on the object
Comment 2•13 years ago
|
||
That said, a major part of the cost in the getter case (80% or so) is finding the right JS reflection for the C++ object. This is fast-pathed for some object types, but not DOMTokenList. The new DOM bindings will make this all better.
Depends on: 622298
Updated•13 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Calling classList getter in a tight loop is slower than direct slot access on the object → JS-wrapping nsDOMTokenList (e.g. classList) is too slow
Comment 3•13 years ago
|
||
I should add that the jsperf test is somewhat questionable, since there is a fast path that makes adding a class that's already in the class list a no-op. So unless the idea is to test the performance of that fast path, it's not a very useful test.
Reporter | ||
Comment 4•13 years ago
|
||
The jsperf test has a "teardown" set to remove the class after every instance of the test. It should be adding the class freshly each time. i.e. not calling the no-op fast path.
Comment 5•13 years ago
|
||
After every instance of the test is not that helpful. If you look at the jsperf docs, the resulting code that's run looks like this (for the "classList cached" test, say): var count = some_big_number; var apiSelector = document.getElementById("foo"); var jQuerySelector = $("#foo"); var clist = apiSelector.classList; apiSelector.cached = clist; var start = new Date(); while (count--) { clist.add("bar"); } var end = new Date(); clist.remove("bar"); and the test then reports some_big_number/(end - start)*1000 as the score. So you're running an actual set once and running the fast-path (some_big_number-1) times and timing that.
Comment 6•13 years ago
|
||
Oh, and some_big_number is determined adaptively, so the faster the browser or the hardware the bigger that number is.
Reporter | ||
Comment 7•13 years ago
|
||
My mistake, testing the no-op fast path was a waste of time. After looking at the [benchmark](http://jsperf.com/cowbell-css-styles/3) again, it's clear that there is no difference between the calls anymore. i.e. the actual difference between the C++ getter and the property lookup is dwarfed by the actual class change / re-render mechanism. I presume this issue can not be closed.
Comment 8•13 years ago
|
||
Actually, I still see about 30% difference between the calls, so we do want to keep tracking this and retest once the new DOM bindings land.
Comment 9•13 years ago
|
||
Here's a patch to enable the new DOM bindings for DOMTokenList. I was planning on doing this in the next cycle, to let the NodeList/HTMLCollection bindings go out first and minimize the risk of regressions.
Comment 10•13 years ago
|
||
Peter, thanks! I agree with landing that after the next branchpoint. Two comments: 1) This doesn't compile as-is with GCC because apparently the syntax |T foo[] = {}| (with an empty initializer, as in the method list for settable token lists) is not actually valid C++; C++ arrays must have nonzero length or something. As a result the ArrayLength template doesn't match foo, which is of type "int[0u]". I fixed that locally by adding a specialization of that template for zero-length arrays, but it's not clear that the result is valid C++ either. :( Not sure what the right approach there is. 2) We should also change the classList quicktub and nsGenericElement getter to get a nsDOMTokenList, not nsIDOMDOMTokenList. That way we take the wrapper cache fast path instead of ending up in xpconnect land and then discovering we have a wrapper cache. Without #2, this patch is a 25% speedup or so for the uncached testcase in http://jsperf.com/cowbell-css-styles/2 but with #2 it's more like a 50% speedup. The one drawback is that the cached case slows down by about 20% with this patch. That seems mostly to be due to the slowness in getting the "add" method off the proxy on the JM end. Hopefully the JS engine folks will speed that up. A related question: should we inline getListObject()? And maybe the non-security-wrapper path in instanceIsListObject()? Both of those are showing up somewhat in this profile.
Comment 11•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #10) > 1) This doesn't compile as-is with GCC because apparently the syntax |T > foo[] = {}| (with an empty initializer, as in the method list for settable > token lists) is not actually valid C++; C++ arrays must have nonzero length > or something. Sorry, I forgot about that. I have a patch in my queue to fix it (adds an additional empty value to all property and method lists) but it's a bit ugly :-/. > 2) We should also change the classList quicktub and nsGenericElement getter > to get a nsDOMTokenList, not nsIDOMDOMTokenList. That way we take the > wrapper cache fast path instead of ending up in xpconnect land and then > discovering we have a wrapper cache. Ok, good point. > A related question: should we inline getListObject()? And maybe the > non-security-wrapper path in instanceIsListObject()? Both of those are > showing up somewhat in this profile. Ok, I'll see what I can do.
Comment 12•11 years ago
|
||
Seems like this bug have a patch that was close to landing, but never did. Nightly 26 classList 90,812 ±1.96% 7% slower classList variable 96,871 ±0.98% fastest classList cached 97,315 ±1.13% fastest jQuery 68,756 ±0.40% 29% slower Chrome 29 classList 330,658 ±1.32% 6% slower classList variable 347,398 ±0.57% fastest classList cached 332,434 ±0.43% 4% slower jQuery 169,723 ±0.89% 51% slower
Comment 13•11 years ago
|
||
This was done as part of bug 788532.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment 14•11 years ago
|
||
Er, I mean bug 717013.
Comment 15•11 years ago
|
||
Odd. On a current nightly, I see numbers pretty similar to Chrome's, not 3x slower.....
Comment 16•11 years ago
|
||
Does this help to investigate?
Comment 17•11 years ago
|
||
That file seems to not have a usable profile, at least assuming it's supposed to be loaded via Cleopatra. Here's a better question: did your nightly have the fix for bug 910795? If not, what do the numbers look like in today's nightly?
Comment 18•11 years ago
|
||
I think yesterday's nightly already had that patch, and had the same performance. Anyway, today's nightly is a little bit faster. classList 104,073 ±0.60% 5% slower classList variable 109,638 ±0.33% fastest classList cached 109,137 ±0.51% fastest jQuery 74,895 ±0.36% 32% slower
Comment 19•11 years ago
|
||
That's really odd. My Firefox numbers are 3x faster than yours, while my Chrome numbers are a bit slower than yours (though I'm testing in Chrome 31, not Chrome 29).
Comment 20•11 years ago
|
||
I think you have a faster machine than mine, so it's probably a Chrome problem. This machine has a i3-2350M @2.30GHz
Assignee | ||
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•