Closed Bug 1422522 Opened 6 years ago Closed 6 years ago

stylo: GitHub page loading time is slower due to worse querySelectorAll performance.

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox57 --- wontfix
firefox58 --- disabled
firefox59 --- ?

People

(Reporter: emilio, Assigned: emilio)

References

Details

Attachments

(1 file)

str: Go to https://github.com/servo/servo/blob/master/components/layout_thread/lib.rs#L938

Open devtools.

Execute:

  let selector = ".js-menu-container.active, [data-hotkey], .contains-task-list, .js-reorderable-task-lists .js-comment-body > .contains-task-list > .task-list-item, .js-sso-modal-complete, .js-navigation-container, .js-active-navigation-container, .js-select-menu.js-load-contents, a.btn.disabled, .js-team-mention, .js-repo-health, .js-branch-search-field, .js-templates, .js-dashboard-deferred-module-content, form.js-recovery-provider-auto-redirect, .js-iconnav, .js-length-limited-input, link[rel=prefetch-viewed], .js-manage-requests-tabs-item, .js-manage-invitations-tabs-item, link[rel=pjax-prefetch], [data-favicon-override], .js-release-tag-field, .js-release-form .js-previewable-comment-form, .js-your-repositories-search, .js-more-repos-form, .js-search-suggester-field, form.js-auto-replay-enforced-sso-request, .js-sticky, form.js-sudo-required, form.js-high-risk-sudo-required, .js-selected-team-field, .js-touch-events, button[data-disable-invalid], [data-required-change], .js-autocomplete-field, .js-addon-purchase-field, .js-addon-downgrade-field, .js-selected-payment-method:not(.active), .js-billing-payment-methods, .js-comment-and-button, .js-preview-tab, a[rel*=facebox], .js-issue-link, .labeled-button:checked, .has-removed-contents, textarea.js-size-to-fit, .js-timeline-progressive-disclosure-container, .js-unread-item, .will-transition-once, .js-account-membership, .js-choose-plan-radio:checked, .js-plan-row.selected, .js-choose-account.selected, .js-blob-form, .js-check-for-fork, .js-compare-pr.open, .diff-table, meta[name=diff-view], .file-diff-split, .js-compare-tabs .tabnav-tab.selected, .wants-full-width-container, .CommunityTemplate-header, .js-inline-comments-container, td.js-line-comments.is-collapsed, .js-add-new-collab, #pages-cname-field, .js-enable-btn, .js-render-target, .js-subscription-toggle, .js-gist-dropzone, .js-remove-gist-file, .js-hook-url-field, .js-integration-permissions-selector, .js-issues-list-check:checked, .js-integrations-install-form, .js-org-reinstate-forms, .js-rename-owners-team-input, .js-company-owned:not(:checked), .js-company-owned:checked, .js-company-owned-autoselect, .js-repository-fallback-search, .js-two-factor-needs-enforced, .js-two-factor-enforcement-poller, .js-team-name-field, .js-new-team, .js-new-org-team, .js-select-team-menu, .js-org-transform-poller, .discussion-item.discussion-commits, .pull-request-ref-restore, #js-pull-restorable, .page-new-repo, .js-email-global-unsubscribe-form, .js-block-users-form, .js-plan-choice:checked, .js-setup-organization:checked, .js-signup-form, .js-contact-javascript-flag, .js-site-status-container, .js-calendar-graph-svg, .js-profile-timeline-year-list.js-sticky, .js-user-profile-sticky-fields.is-stuck, .js-user-profile-follow-button.is-stuck, .js-user-profile-following-toggle .js-toggler-container.on, .js-user-profile-following-mini-toggle .js-toggler-container.on, .js-select-menu .js-select-menu-tab.selected, .js-socket-channel[data-channel], .js-manifest-ready-check, .js-tree-finder-field, .js-card-cvv, .js-card-select-number-field, .js-select-country, [data-warn-unsaved-changes], #js-oauth-authorize-btn, .blob-expanded, .js-diff-progressive-loader, .js-diff-entry-loader, .js-draggable-screenshots-container, .js-banner, [data-ga-load], meta[name=analytics-event], .js-draggable-issues-container, .js-pinned-repos-reorder-list, .js-comment, .js-croppable-container, .js-filterable-field, input.js-date-input, .js-discover-repositories .js-ajax-pagination, .js-repository-recommendation, .js-unread-item, .js-unread-item, .js-u2f-box, .js-diff-progressive-container, .js-diff-load-container, .js-u2f-auth-form-body"

   var start = new Date(); for (var i = 0; i < 100; ++i) document.querySelectorAll(selector); console.log(new Date() - start)

Profile:

https://perfht.ml/2AmXRZe

This is a real selector that appears on Github and is queried multiple times. Don't ask why.
Depends on: 1422524
Depends on: 1422528
Hi! 

I work at GitHub, I can explain why this selector exists in our codebase (hint, it doesn't). We have an `observe` function which listens for DOM nodes that are added or removed to the DOM, to drive some of our behaviours. The impementation of `observe` is open source you can find it at [github.com/josh/selector-observer](https://github.com/josh/selector-observer/blob/master/lib/index.js) - the `observe` function relies on (what we have found to be) an efficient data set for storing mulitple selectors dubbed "Selector Set" (again, open source at [githbu.com/josh/selector-set](https://github.com/josh/selector-set)). The exact function that looks like it ends up getting called is: https://github.com/josh/selector-set/blob/09ea5fb137eeb760e0e154f87a79ffa932336084/selector-set.js#L356 - which queries for every element in the selector set (which is important so we can track which nodes are added during one change of HTML). 

I know the initial selector seems like a large amount, but it seems to be the most efficient way to query for those elements. We could batch selectors, or run each in a loop, but this seems more costly as we'd be accumulating arrays in JS rather than allowing the browser to do it. If you have any advice for us on how we can make this task more efficient we'd be happy to hear it.
(In reply to keithamus from comment #1)
> Hi! 
> 
> I work at GitHub, I can explain why this selector exists in our codebase
> (hint, it doesn't). We have an `observe` function which listens for DOM
> nodes that are added or removed to the DOM, to drive some of our behaviours.
> The impementation of `observe` is open source you can find it at
> [github.com/josh/selector-observer](https://github.com/josh/selector-
> observer/blob/master/lib/index.js) - the `observe` function relies on (what
> we have found to be) an efficient data set for storing mulitple selectors
> dubbed "Selector Set" (again, open source at
> [githbu.com/josh/selector-set](https://github.com/josh/selector-set)). The
> exact function that looks like it ends up getting called is:
> https://github.com/josh/selector-set/blob/
> 09ea5fb137eeb760e0e154f87a79ffa932336084/selector-set.js#L356 - which
> queries for every element in the selector set (which is important so we can
> track which nodes are added during one change of HTML). 
> 
> I know the initial selector seems like a large amount, but it seems to be
> the most efficient way to query for those elements. We could batch
> selectors, or run each in a loop, but this seems more costly as we'd be
> accumulating arrays in JS rather than allowing the browser to do it. If you
> have any advice for us on how we can make this task more efficient we'd be
> happy to hear it.

Hi, thanks for the pointers, that all makes sense!

Just to be clear, I'm not tracking this as a GitHub bug at all. Of course if you are profiling the browser and find out this without knowing where it comes from, it seems somewhat silly, but as you've explained, there are reasons :)

Anyway, this bug is just tracking the performance of the new style system[1]. We're slower on querySeletor / querySelectorAll than the old one (and thus haven't got rid of the old style system for this yet on releases). I found this page as an example of somewhere where it matters (it takes a fair amount of page load with the new style system, while it's negligible with the old one).

If you run Firefox Nightly and executed the STR from comment 0 you'd see pretty different numbers depending on whether layout.css.servo.enabled is on or off (3500 vs. 1700ms on my machine). Which means that we need to make it faster :)

Just wanted to make clear that I filed this bug as a good test-case for performance comparison inside Firefox, I'd have reached out GitHub if I thought there was something wrong on your end.

Thanks a lot for reaching out though! I was really curious about where did that selector come from :)

[1]: https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/
Priority: -- → P2
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> (In reply to keithamus from comment #1)
> > ...
> 
> Hi, thanks for the pointers, that all makes sense!

No problem, any time 
Hm, seems like half my comment was lost in that last reply. I'll try to recap what I typed:

We understand you're tracking this as a Firefox regression. I believe I've spotted this regression during testing, I'm happy to see you're working on this. If there's anything GitHub can do to help here, please let us know. GitHub 
And again! It seems like bugzilla trims comments past emoji. That last bit is meant to say GitHub :heart-emoji: Firefox. :)
(In reply to keithamus from comment #5)
> And again! It seems like bugzilla trims comments past emoji.

(Bug 1253535 :/ There seems to be some activity over there though, so hopefully that will work soon...)
This is better now than what it used to be, with that and other fixes, need to reprofile.
Flags: needinfo?(emilio)
[Tracking Requested - why for this release]:
This is disabled in 58, no need to track it for that release. I'll try to avoid disabling it in 59 too, though I'm busy with other stuff atm...
Attached file testcase
This is the testcase extracted from the given page.

It does 10 runs of the 100x query test, and records time for each run.
emilio asked me in bug 1430521 comment 4 whether my patch would help on this. After I had a new patch to improve the situation of bug 1414789, I also did some profile on this bug based on the testcase attached.

It seems servo/servo#19781 does help improving the performance here, but it doesn't close the gap between Stylo and Gecko.

From the profiling, I think the most overhead here is the FFI call of Gecko_ClassOrClassList for getting the class list. We should probably either inline that function to avoid FFI call, or cache the returned value at some level so that we can reuse it more.
Huh, weird, I did try the Linux pgo build locally from that bug and saw no improvements... Though that bug also shows some platform-dependent inconsistencies...
Oh, bleh, seems servo/servo#19781 is a different approach, not the in lining hack, nvm
I just measured the testcase I attached in comment 10 with the current m-c (which includes emilio's servo/servo#19787 as well as my servo/servo#19781) with a local release opt build (without pgo), and it seems that stylo is about 10% slower than gecko in that testcase.

The time consumption on my machine is
* stylo: 1864.2±8.12ms
* gecko: 1699.7±11.04ms

So yeah we may still need to optimize it further.
FWIW, this is the approach I use to compute the numbers:
* run the testcase with stylo enabled and disabled for five rounds, and record all the 100 numbers
* compute the median of stylo and gecko for each round
* compute the average and standard deviation of the medians
A profile for comparison:
* gecko: https://perfht.ml/2DJc8yU
* stylo: https://perfht.ml/2DKOyC3
So, as can be seen from the profiles, Gecko_ClassOrClassList indeed seems to be where the problem is.

In Gecko's profile, nsAttrAndChildArray::GetAttr in SelectorMatches takes 939ms, which, according to the source, should basically be from Element::GetClasses which is inlined.

In Stylo's profile, nsAttrAndChildArray::GetAttr in Gecko_ClassOrClassList takes 972ms, which is on par with that in Gecko. However, the function Gecko_ClassOrClassList itself, which is supposed to just cast the class list and pass it to Servo, takes 2072ms.

In these two profiles, Stylo is 968ms slower than Gecko, so if we can take back some time from the extra function call, we should be much faster here.
We could add a more specialized function for the has_class check, too. That makes sense since it's so hot, let me just do that.
Depends on: 1431421
Profiles I just did again:
* Gecko: https://perfht.ml/2ECco5S
* Stylo: https://perfht.ml/2EBApcY

The profiles seem to indicate that the FFI function call is still the main overhead.

According to emilio, based on his experiment with the hacky patch in bug 1431421 to inline a bit more, the overhead doesn't go away completely. Some of the overhead which seems to belong to the FFI call gets moved into matches_complex_selector_internal instead. It makes stylo faster than before, but still not as fast as Gecko.

I also tried a less hacky way to reduce the function call overhead, and observed similar result.

What I did is that I create two new FFI functions which simply call into the corresponding methods (nsAttrAndChildArray::GetAttr and Gecko_AttrValueContains). Although the compiler doesn't inline them nor the methods they call, the overhead seems to reduce probably because this kind of trivial functions don't need much setup themselves.

With that's done, the regression is reduced at most half. But this really depends on how C++ compiler is going to optimize those functions, so maybe it's also not worth taking.

I've almost run out of idea how this can be optimized at this moment. Maybe we would need to accept this regression. It seems to be 2%~5% according to my local test.
One possible optimization is to have a small cache (maybe lru cache) to store element and their class list during the selector matching, so that we wouldn't need to go through the series of function calls to get the list. But given that what we want to beat is just some function calls... it is probably very hard to actually help. Maybe I can experiment with it and see how it would look like.
So, a one-element class list cache seems to make stylo faster than gecko likely due to the elimination of additional FFI call in most cases, while a larger lru cache seems to slow down it further.

A one-element cache is probably very specific to this case... which I'm not sure whether it's worth adding at all.
I think unless we want to take the manual inlining of bug 1431421, maybe a bit cleaned up (which I don't love), given the size of the regression is nowhere near what it used to be (this used to be 3 to 4x slower instead of just 2/5% slower), I think we should avoid adding hacks for this.

Bug 1437452 should get us a fair amount of the optimizations we've done for free without cluttering the code. Bug 651120 would allow us to manually inline GetParsedAttrs without duplicating scary pointer arithmetic.
Depends on: 651120, 1437452
Flags: needinfo?(emilio)
Flags: needinfo?(emilio)
Flags: needinfo?(emilio)
Using the testcase attached on Windows 10:

Latest Nightly - ~2300 ms
Firefox 61.0.1 - ~2700 ms
Chrome 68 - ~2690 ms
Depends on: 1483964
Yeah, we can call this fixed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.