Closed Bug 917247 Opened 11 years ago Closed 10 years ago

Peacekeeper domJQueryAttributeFilters test slower than Chrome

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file Test case
The original test is here: http://www.peacekeeper.therichins.net/test18.html

I'm attaching a standalone version with a non-minified version of jQuery. The test does:

    $("div[class]");
    $("div[class=title]");
    $("div[class^=tit]");
    $("div[class$=le]");
    $("div[class*=itl]");

Numbers:

Chrome 31:  1022 ms
Firefox 23: 1560 ms
Nightly 26: 1692 ms

Let me know if I should reduce this further.
I'll bet money this is JS fail, akin to bug 786160 (which tests fundamentally the same thing).  But I'll profile, just to check.
Well, looks like we could also do something to attr handling
(and after that change how nsCSSRuleProcessor uses attrs.) Too much string usage
and getting count of attrs should be O(1), but it is now O(n) (where n is number of empty slots) etc.
OK, here's what profile says:

1)  44% of the time is jitcode or JS VM code (lots of realloc/malloc here).
2)  46% of the time is calls to querySelectorAll.  A few % is overhead from bug 820848
    but fully 42% is under nsINOde::QuerySelectorAll.  Bug 863966 adds another 4% or
    so.  But the bulk of the time is in actual SelectorListMatches calls (20.5%) and
    self time in QuerySelectorAll (8% or so).  4% is realloc/malloc for the nsTArray
    inside the nsSimpleContentList.  I wonder whether using an auto array in there is a
    good idea...
3)  Under that 20% for selector matching above, 4% is AttrMatchesValue (that's the actual
    test for the ^=, $=, *= bits), 3% is AttrCount(), 3% is GetAttr (for the ^=, $=, *=
    bits).
4)  2% of the time is spent calling DOMProxyHandler::getElementIfPresent from
    array_slice.
5)  2% of the time is DOMProxyHandler::get on a nodelist from array_slice: this is the
    length get.  This ends up having to do a manual proto chain walk and whatnot.  :(
6)  5% of the time is self time in array_slice plus its memory ops.  This is part of the
    JS VM time from item #1.
6)  4% is DOMProxyHandler::get from js::GetProperty.  This is document.documentElement
    gets.  I don't know why those didn't get ICed.  :(  The js::GetProperty is called
    from jitcode.
7)  3% is DOMProxyHandler::get from js::jit::DoGetPropFallback.  Looks like mostly
    nodeType and ownerDocument gets on the document?

I have a hard time seeing how to speed this up 1.5x+, offhand.  Seems like we need to at least:

* Take the ion/bindings fastpaths for the document DOM proxy for property gets.  I'm not
  sure we even have a bug on this.  Eric, do you know of one?  This might win us 10% or
  more here.
* Fix bug 820848.  2-3% win here.
* Come up with a faster array_slice (esp the length get) for DOM nodelists?  Or general
  array_slice speedups?
* Speed up selector matching for attribute selectors.
* Consider an auto array in nsSimpleContentList/nsBaseContentList.
Depends on: 820848
Flags: needinfo?(efaustbmo)
We probably need additional bugs blocking this one...
> * Come up with a faster array_slice (esp the length get) for DOM nodelists?  Or general
>  array_slice speedups?

Turns out this is bug 697343.
Depends on: 697343
Depends on: 917509
(In reply to Boris Zbarsky [:bz] from comment #3)
> * Take the ion/bindings fastpaths for the document DOM proxy for property
> gets.  I'm not
>   sure we even have a bug on this.  Eric, do you know of one?  This might
> win us 10% or
>   more here.

I don't know of a specific one, no. Is there some reason that getting the rest of DOM proxies to use the Ion fastpaths is different for document?
Flags: needinfo?(efaustbmo)
No, apart from it being [OverrideBuiltins].
Oh. That makes it a bit uglier to use the current path, as we don't really wanna throw out the whole script every time the expando generation changes. Maybe we can be more aggressive in our JITInfo usage in the ICs?
Bug 933475 improved this from ~1310 ms to ~1080 ms. Chrome 32 gets ~1000 ms so it would be nice if we could win another 100 ms or so.
Depends on: 933475
Testing whether using nsAutoTArray for collecting results for querySelector* could help here.
(based on the profile, and comment 3 it should help.)
But I can't really get any perf boost with AutoTArray.
Looks like fixing attribute handling gives some perf wins.
(In reply to Jan de Mooij [:jandem] from comment #9)
> Bug 933475 improved this from ~1310 ms to ~1080 ms. Chrome 32 gets ~1000 ms
> so it would be nice if we could win another 100 ms or so.

I'm seeing very different results. For me Chrome 31 is a lot faster yet.
On the testcase it's 2100 x 1400 ms
On the original test it's 4470 x 6900 op/s
It may well strongly depend on the OS you're testing on, for what it's worth....
Testing on Windows 7 (64-bit OS, 32-bit browsers) I get the following (averaged across 6 runs).

For the attached test case:

Firefox Aurora 2013-11-16:  1164 ms
Firefox Nightly 2013-11-15: 1133 ms
Chrome 32.0.1700.14 beta-m:  748 ms
Chrome 33.0.1707.0 dev-m:    755 ms
Chrome 33.0.1710.0 canary:   763 ms

For the original test:

Firefox Aurora 2013-11-16:   8173 op/s
Firefox Nightly 2013-11-15:  8641 op/s
Chrome 32.0.1700.14 beta-m: 12884 op/s
Chrome 33.0.1707.0 dev-m:   12457 op/s
Chrome 33.0.1710.0 canary:  12249 op/s

So essentially the same difference as Guilherme saw (with Nightly being ~50% slower). The more experimental versions of Chrome seem to regress slightly, but maybe they just use a different build configuration. Incidentally the Chromium builds from https://commondatastorage.googleapis.com/chromium-browser-continuous/index.html?path=Win/ perform significantly worse - perhaps those are non-PGO builds.
So fwiw, what I see on Mac is Chrome at about 9729ops/s and us at 9293ops/s.  But that's a 64-bit Gecko build.  I expect Jan was also testing a 64-bit build...
Bug 822442 makes us faster here...  Would someone retest on Windows in a build with that patch, please?
On Windows 8.1 Update 1 (64-bit):

On Peacekeeper Original testcase: 9526 op/sec
On testcase attached here: 944 ms

Tested on tinderbox builds from mozilla-inbound-win32-pgo built from https://hg.mozilla.org/integration/mozilla-inbound/rev/b73ad6f8bf26 and it contain patch you mentioned.

Processor: Intel i7 2600K (not overclocked) if it matters.
Zlip792, what numbers do you see in Chrome and in builds before bug 822442?
On New Profile on today's (April 16 - CSET: dd50745d7f35) Nightly, I get these numbers:

Original Testcase: 9726 op/sec
Attached Testcase: 946 ms

I have Chromium snapshots which I don't think have PGO so they might be slower but these are numbers what I get:

Original Testcase: 8626.5 op/sec
Attached Testcase: 1110 ms

Chromium Version 36.0.1942.0 (264038) with V8 3.26.15

If need any help.. let me know...
Alright.  So no much impact on Windows, but we were already faster than Chrome...
I'm seeing Nightly being faster than or as fast as Chromium on 64bit Linux and Win7.
Marking fixed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: