Closed
Bug 917247
Opened 11 years ago
Closed 10 years ago
Peacekeeper domJQueryAttributeFilters test slower than Chrome
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
People
(Reporter: jandem, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
249.84 KB,
text/html
|
Details |
The original test is here: http://www.peacekeeper.therichins.net/test18.html I'm attaching a standalone version with a non-minified version of jQuery. The test does: $("div[class]"); $("div[class=title]"); $("div[class^=tit]"); $("div[class$=le]"); $("div[class*=itl]"); Numbers: Chrome 31: 1022 ms Firefox 23: 1560 ms Nightly 26: 1692 ms Let me know if I should reduce this further.
Comment 1•11 years ago
|
||
I'll bet money this is JS fail, akin to bug 786160 (which tests fundamentally the same thing). But I'll profile, just to check.
Comment 2•11 years ago
|
||
Well, looks like we could also do something to attr handling (and after that change how nsCSSRuleProcessor uses attrs.) Too much string usage and getting count of attrs should be O(1), but it is now O(n) (where n is number of empty slots) etc.
Comment 3•11 years ago
|
||
OK, here's what profile says: 1) 44% of the time is jitcode or JS VM code (lots of realloc/malloc here). 2) 46% of the time is calls to querySelectorAll. A few % is overhead from bug 820848 but fully 42% is under nsINOde::QuerySelectorAll. Bug 863966 adds another 4% or so. But the bulk of the time is in actual SelectorListMatches calls (20.5%) and self time in QuerySelectorAll (8% or so). 4% is realloc/malloc for the nsTArray inside the nsSimpleContentList. I wonder whether using an auto array in there is a good idea... 3) Under that 20% for selector matching above, 4% is AttrMatchesValue (that's the actual test for the ^=, $=, *= bits), 3% is AttrCount(), 3% is GetAttr (for the ^=, $=, *= bits). 4) 2% of the time is spent calling DOMProxyHandler::getElementIfPresent from array_slice. 5) 2% of the time is DOMProxyHandler::get on a nodelist from array_slice: this is the length get. This ends up having to do a manual proto chain walk and whatnot. :( 6) 5% of the time is self time in array_slice plus its memory ops. This is part of the JS VM time from item #1. 6) 4% is DOMProxyHandler::get from js::GetProperty. This is document.documentElement gets. I don't know why those didn't get ICed. :( The js::GetProperty is called from jitcode. 7) 3% is DOMProxyHandler::get from js::jit::DoGetPropFallback. Looks like mostly nodeType and ownerDocument gets on the document? I have a hard time seeing how to speed this up 1.5x+, offhand. Seems like we need to at least: * Take the ion/bindings fastpaths for the document DOM proxy for property gets. I'm not sure we even have a bug on this. Eric, do you know of one? This might win us 10% or more here. * Fix bug 820848. 2-3% win here. * Come up with a faster array_slice (esp the length get) for DOM nodelists? Or general array_slice speedups? * Speed up selector matching for attribute selectors. * Consider an auto array in nsSimpleContentList/nsBaseContentList.
Depends on: 820848
Flags: needinfo?(efaustbmo)
Comment 4•11 years ago
|
||
We probably need additional bugs blocking this one...
Comment 5•11 years ago
|
||
> * Come up with a faster array_slice (esp the length get) for DOM nodelists? Or general > array_slice speedups? Turns out this is bug 697343.
Depends on: 697343
Comment 6•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #3) > * Take the ion/bindings fastpaths for the document DOM proxy for property > gets. I'm not > sure we even have a bug on this. Eric, do you know of one? This might > win us 10% or > more here. I don't know of a specific one, no. Is there some reason that getting the rest of DOM proxies to use the Ion fastpaths is different for document?
Flags: needinfo?(efaustbmo)
Comment 7•11 years ago
|
||
No, apart from it being [OverrideBuiltins].
Comment 8•11 years ago
|
||
Oh. That makes it a bit uglier to use the current path, as we don't really wanna throw out the whole script every time the expando generation changes. Maybe we can be more aggressive in our JITInfo usage in the ICs?
Reporter | ||
Comment 9•11 years ago
|
||
Bug 933475 improved this from ~1310 ms to ~1080 ms. Chrome 32 gets ~1000 ms so it would be nice if we could win another 100 ms or so.
Depends on: 933475
Comment 10•11 years ago
|
||
Testing whether using nsAutoTArray for collecting results for querySelector* could help here. (based on the profile, and comment 3 it should help.)
Comment 11•11 years ago
|
||
But I can't really get any perf boost with AutoTArray.
Comment 12•11 years ago
|
||
Looks like fixing attribute handling gives some perf wins.
Comment 13•11 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9) > Bug 933475 improved this from ~1310 ms to ~1080 ms. Chrome 32 gets ~1000 ms > so it would be nice if we could win another 100 ms or so. I'm seeing very different results. For me Chrome 31 is a lot faster yet. On the testcase it's 2100 x 1400 ms On the original test it's 4470 x 6900 op/s
Comment 14•11 years ago
|
||
It may well strongly depend on the OS you're testing on, for what it's worth....
Comment 15•11 years ago
|
||
Testing on Windows 7 (64-bit OS, 32-bit browsers) I get the following (averaged across 6 runs). For the attached test case: Firefox Aurora 2013-11-16: 1164 ms Firefox Nightly 2013-11-15: 1133 ms Chrome 32.0.1700.14 beta-m: 748 ms Chrome 33.0.1707.0 dev-m: 755 ms Chrome 33.0.1710.0 canary: 763 ms For the original test: Firefox Aurora 2013-11-16: 8173 op/s Firefox Nightly 2013-11-15: 8641 op/s Chrome 32.0.1700.14 beta-m: 12884 op/s Chrome 33.0.1707.0 dev-m: 12457 op/s Chrome 33.0.1710.0 canary: 12249 op/s So essentially the same difference as Guilherme saw (with Nightly being ~50% slower). The more experimental versions of Chrome seem to regress slightly, but maybe they just use a different build configuration. Incidentally the Chromium builds from https://commondatastorage.googleapis.com/chromium-browser-continuous/index.html?path=Win/ perform significantly worse - perhaps those are non-PGO builds.
Comment 16•11 years ago
|
||
So fwiw, what I see on Mac is Chrome at about 9729ops/s and us at 9293ops/s. But that's a 64-bit Gecko build. I expect Jan was also testing a 64-bit build...
Comment 17•10 years ago
|
||
Bug 822442 makes us faster here... Would someone retest on Windows in a build with that patch, please?
Comment 18•10 years ago
|
||
On Windows 8.1 Update 1 (64-bit): On Peacekeeper Original testcase: 9526 op/sec On testcase attached here: 944 ms Tested on tinderbox builds from mozilla-inbound-win32-pgo built from https://hg.mozilla.org/integration/mozilla-inbound/rev/b73ad6f8bf26 and it contain patch you mentioned. Processor: Intel i7 2600K (not overclocked) if it matters.
Comment 19•10 years ago
|
||
Zlip792, what numbers do you see in Chrome and in builds before bug 822442?
Comment 20•10 years ago
|
||
On New Profile on today's (April 16 - CSET: dd50745d7f35) Nightly, I get these numbers: Original Testcase: 9726 op/sec Attached Testcase: 946 ms I have Chromium snapshots which I don't think have PGO so they might be slower but these are numbers what I get: Original Testcase: 8626.5 op/sec Attached Testcase: 1110 ms Chromium Version 36.0.1942.0 (264038) with V8 3.26.15 If need any help.. let me know...
Comment 21•10 years ago
|
||
Alright. So no much impact on Windows, but we were already faster than Chrome...
Comment 22•10 years ago
|
||
I'm seeing Nightly being faster than or as fast as Chromium on 64bit Linux and Win7. Marking fixed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•