1422522 - stylo: GitHub page loading time is slower due to worse querySelectorAll performance.

Assignee

Description

•

8 years ago

str: Go to https://github.com/servo/servo/blob/master/components/layout_thread/lib.rs#L938 Open devtools. Execute: let selector = ".js-menu-container.active, [data-hotkey], .contains-task-list, .js-reorderable-task-lists .js-comment-body > .contains-task-list > .task-list-item, .js-sso-modal-complete, .js-navigation-container, .js-active-navigation-container, .js-select-menu.js-load-contents, a.btn.disabled, .js-team-mention, .js-repo-health, .js-branch-search-field, .js-templates, .js-dashboard-deferred-module-content, form.js-recovery-provider-auto-redirect, .js-iconnav, .js-length-limited-input, link[rel=prefetch-viewed], .js-manage-requests-tabs-item, .js-manage-invitations-tabs-item, link[rel=pjax-prefetch], [data-favicon-override], .js-release-tag-field, .js-release-form .js-previewable-comment-form, .js-your-repositories-search, .js-more-repos-form, .js-search-suggester-field, form.js-auto-replay-enforced-sso-request, .js-sticky, form.js-sudo-required, form.js-high-risk-sudo-required, .js-selected-team-field, .js-touch-events, button[data-disable-invalid], [data-required-change], .js-autocomplete-field, .js-addon-purchase-field, .js-addon-downgrade-field, .js-selected-payment-method:not(.active), .js-billing-payment-methods, .js-comment-and-button, .js-preview-tab, a[rel*=facebox], .js-issue-link, .labeled-button:checked, .has-removed-contents, textarea.js-size-to-fit, .js-timeline-progressive-disclosure-container, .js-unread-item, .will-transition-once, .js-account-membership, .js-choose-plan-radio:checked, .js-plan-row.selected, .js-choose-account.selected, .js-blob-form, .js-check-for-fork, .js-compare-pr.open, .diff-table, meta[name=diff-view], .file-diff-split, .js-compare-tabs .tabnav-tab.selected, .wants-full-width-container, .CommunityTemplate-header, .js-inline-comments-container, td.js-line-comments.is-collapsed, .js-add-new-collab, #pages-cname-field, .js-enable-btn, .js-render-target, .js-subscription-toggle, .js-gist-dropzone, .js-remove-gist-file, .js-hook-url-field, .js-integration-permissions-selector, .js-issues-list-check:checked, .js-integrations-install-form, .js-org-reinstate-forms, .js-rename-owners-team-input, .js-company-owned:not(:checked), .js-company-owned:checked, .js-company-owned-autoselect, .js-repository-fallback-search, .js-two-factor-needs-enforced, .js-two-factor-enforcement-poller, .js-team-name-field, .js-new-team, .js-new-org-team, .js-select-team-menu, .js-org-transform-poller, .discussion-item.discussion-commits, .pull-request-ref-restore, #js-pull-restorable, .page-new-repo, .js-email-global-unsubscribe-form, .js-block-users-form, .js-plan-choice:checked, .js-setup-organization:checked, .js-signup-form, .js-contact-javascript-flag, .js-site-status-container, .js-calendar-graph-svg, .js-profile-timeline-year-list.js-sticky, .js-user-profile-sticky-fields.is-stuck, .js-user-profile-follow-button.is-stuck, .js-user-profile-following-toggle .js-toggler-container.on, .js-user-profile-following-mini-toggle .js-toggler-container.on, .js-select-menu .js-select-menu-tab.selected, .js-socket-channel[data-channel], .js-manifest-ready-check, .js-tree-finder-field, .js-card-cvv, .js-card-select-number-field, .js-select-country, [data-warn-unsaved-changes], #js-oauth-authorize-btn, .blob-expanded, .js-diff-progressive-loader, .js-diff-entry-loader, .js-draggable-screenshots-container, .js-banner, [data-ga-load], meta[name=analytics-event], .js-draggable-issues-container, .js-pinned-repos-reorder-list, .js-comment, .js-croppable-container, .js-filterable-field, input.js-date-input, .js-discover-repositories .js-ajax-pagination, .js-repository-recommendation, .js-unread-item, .js-unread-item, .js-u2f-box, .js-diff-progressive-container, .js-diff-load-container, .js-u2f-auth-form-body" var start = new Date(); for (var i = 0; i < 100; ++i) document.querySelectorAll(selector); console.log(new Date() - start) Profile: https://perfht.ml/2AmXRZe This is a real selector that appears on Github and is queried multiple times. Don't ask why.

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

8 years ago

Depends on: 1422524

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

8 years ago

Depends on: 1422528

Keith Cirkel [:keithamus]

Comment 1

•

8 years ago

Hi! I work at GitHub, I can explain why this selector exists in our codebase (hint, it doesn't). We have an `observe` function which listens for DOM nodes that are added or removed to the DOM, to drive some of our behaviours. The impementation of `observe` is open source you can find it at [github.com/josh/selector-observer](https://github.com/josh/selector-observer/blob/master/lib/index.js) - the `observe` function relies on (what we have found to be) an efficient data set for storing mulitple selectors dubbed "Selector Set" (again, open source at [githbu.com/josh/selector-set](https://github.com/josh/selector-set)). The exact function that looks like it ends up getting called is: https://github.com/josh/selector-set/blob/09ea5fb137eeb760e0e154f87a79ffa932336084/selector-set.js#L356 - which queries for every element in the selector set (which is important so we can track which nodes are added during one change of HTML). I know the initial selector seems like a large amount, but it seems to be the most efficient way to query for those elements. We could batch selectors, or run each in a loop, but this seems more costly as we'd be accumulating arrays in JS rather than allowing the browser to do it. If you have any advice for us on how we can make this task more efficient we'd be happy to hear it.

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 2

•

8 years ago

(In reply to keithamus from comment #1) > Hi! > > I work at GitHub, I can explain why this selector exists in our codebase > (hint, it doesn't). We have an `observe` function which listens for DOM > nodes that are added or removed to the DOM, to drive some of our behaviours. > The impementation of `observe` is open source you can find it at > [github.com/josh/selector-observer](https://github.com/josh/selector- > observer/blob/master/lib/index.js) - the `observe` function relies on (what > we have found to be) an efficient data set for storing mulitple selectors > dubbed "Selector Set" (again, open source at > [githbu.com/josh/selector-set](https://github.com/josh/selector-set)). The > exact function that looks like it ends up getting called is: > https://github.com/josh/selector-set/blob/ > 09ea5fb137eeb760e0e154f87a79ffa932336084/selector-set.js#L356 - which > queries for every element in the selector set (which is important so we can > track which nodes are added during one change of HTML). > > I know the initial selector seems like a large amount, but it seems to be > the most efficient way to query for those elements. We could batch > selectors, or run each in a loop, but this seems more costly as we'd be > accumulating arrays in JS rather than allowing the browser to do it. If you > have any advice for us on how we can make this task more efficient we'd be > happy to hear it. Hi, thanks for the pointers, that all makes sense! Just to be clear, I'm not tracking this as a GitHub bug at all. Of course if you are profiling the browser and find out this without knowing where it comes from, it seems somewhat silly, but as you've explained, there are reasons :) Anyway, this bug is just tracking the performance of the new style system[1]. We're slower on querySeletor / querySelectorAll than the old one (and thus haven't got rid of the old style system for this yet on releases). I found this page as an example of somewhere where it matters (it takes a fair amount of page load with the new style system, while it's negligible with the old one). If you run Firefox Nightly and executed the STR from comment 0 you'd see pretty different numbers depending on whether layout.css.servo.enabled is on or off (3500 vs. 1700ms on my machine). Which means that we need to make it faster :) Just wanted to make clear that I filed this bug as a good test-case for performance comparison inside Firefox, I'd have reached out GitHub if I thought there was something wrong on your end. Thanks a lot for reaching out though! I was really curious about where did that selector come from :) [1]: https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Priority: -- → P2

Keith Cirkel [:keithamus]

Comment 3

•

8 years ago

(In reply to Emilio Cobos Álvarez [:emilio] from comment #2) > (In reply to keithamus from comment #1) > > ... > > Hi, thanks for the pointers, that all makes sense! No problem, any time

Keith Cirkel [:keithamus]

Comment 4

•

8 years ago

Hm, seems like half my comment was lost in that last reply. I'll try to recap what I typed: We understand you're tracking this as a Firefox regression. I believe I've spotted this regression during testing, I'm happy to see you're working on this. If there's anything GitHub can do to help here, please let us know. GitHub

Keith Cirkel [:keithamus]

Comment 5

•

8 years ago

And again! It seems like bugzilla trims comments past emoji. That last bit is meant to say GitHub :heart-emoji: Firefox. :)

Jan de Mooij [:jandem]

Comment 6

•

8 years ago

(In reply to keithamus from comment #5) > And again! It seems like bugzilla trims comments past emoji. (Bug 1253535 :/ There seems to be some activity over there though, so hopefully that will work soon...)

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 7

•

8 years ago

This is better now than what it used to be, with that and other fixes, need to reprofile.

Flags: needinfo?(emilio)

Chris Peterson [:cpeterson]

Comment 8

•

8 years ago

[Tracking Requested - why for this release]:

status-firefox57: --- → wontfix

status-firefox58: --- → ?

status-firefox59: --- → ?

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 9

•

8 years ago

This is disabled in 58, no need to track it for that release. I'll try to avoid disabling it in 59 too, though I'm busy with other stuff atm...

status-firefox58: ? → disabled

Xidorn Quan [:xidorn] UTC+10

Comment 10

•

8 years ago

Attached file testcase — Details

This is the testcase extracted from the given page. It does 10 runs of the 100x query test, and records time for each run.

Xidorn Quan [:xidorn] UTC+10

Comment 11

•

8 years ago

emilio asked me in bug 1430521 comment 4 whether my patch would help on this. After I had a new patch to improve the situation of bug 1414789, I also did some profile on this bug based on the testcase attached. It seems servo/servo#19781 does help improving the performance here, but it doesn't close the gap between Stylo and Gecko. From the profiling, I think the most overhead here is the FFI call of Gecko_ClassOrClassList for getting the class list. We should probably either inline that function to avoid FFI call, or cache the returned value at some level so that we can reuse it more.

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 12

•

8 years ago

Huh, weird, I did try the Linux pgo build locally from that bug and saw no improvements... Though that bug also shows some platform-dependent inconsistencies...

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 13

•

8 years ago

Oh, bleh, seems servo/servo#19781 is a different approach, not the in lining hack, nvm

Xidorn Quan [:xidorn] UTC+10

Comment 14

•

8 years ago

I just measured the testcase I attached in comment 10 with the current m-c (which includes emilio's servo/servo#19787 as well as my servo/servo#19781) with a local release opt build (without pgo), and it seems that stylo is about 10% slower than gecko in that testcase. The time consumption on my machine is * stylo: 1864.2±8.12ms * gecko: 1699.7±11.04ms So yeah we may still need to optimize it further.

Xidorn Quan [:xidorn] UTC+10

Comment 15

•

8 years ago

FWIW, this is the approach I use to compute the numbers: * run the testcase with stylo enabled and disabled for five rounds, and record all the 100 numbers * compute the median of stylo and gecko for each round * compute the average and standard deviation of the medians

Xidorn Quan [:xidorn] UTC+10

Comment 16

•

8 years ago

A profile for comparison: * gecko: https://perfht.ml/2DJc8yU * stylo: https://perfht.ml/2DKOyC3

Xidorn Quan [:xidorn] UTC+10

Comment 17

•

8 years ago

So, as can be seen from the profiles, Gecko_ClassOrClassList indeed seems to be where the problem is. In Gecko's profile, nsAttrAndChildArray::GetAttr in SelectorMatches takes 939ms, which, according to the source, should basically be from Element::GetClasses which is inlined. In Stylo's profile, nsAttrAndChildArray::GetAttr in Gecko_ClassOrClassList takes 972ms, which is on par with that in Gecko. However, the function Gecko_ClassOrClassList itself, which is supposed to just cast the class list and pass it to Servo, takes 2072ms. In these two profiles, Stylo is 968ms slower than Gecko, so if we can take back some time from the extra function call, we should be much faster here.

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 18

•

8 years ago

We could add a more specialized function for the has_class check, too. That makes sense since it's so hot, let me just do that.

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

8 years ago

Depends on: 1431421

Xidorn Quan [:xidorn] UTC+10

Comment 19

•

8 years ago

Profiles I just did again: * Gecko: https://perfht.ml/2ECco5S * Stylo: https://perfht.ml/2EBApcY The profiles seem to indicate that the FFI function call is still the main overhead. According to emilio, based on his experiment with the hacky patch in bug 1431421 to inline a bit more, the overhead doesn't go away completely. Some of the overhead which seems to belong to the FFI call gets moved into matches_complex_selector_internal instead. It makes stylo faster than before, but still not as fast as Gecko. I also tried a less hacky way to reduce the function call overhead, and observed similar result. What I did is that I create two new FFI functions which simply call into the corresponding methods (nsAttrAndChildArray::GetAttr and Gecko_AttrValueContains). Although the compiler doesn't inline them nor the methods they call, the overhead seems to reduce probably because this kind of trivial functions don't need much setup themselves. With that's done, the regression is reduced at most half. But this really depends on how C++ compiler is going to optimize those functions, so maybe it's also not worth taking. I've almost run out of idea how this can be optimized at this moment. Maybe we would need to accept this regression. It seems to be 2%~5% according to my local test.

Xidorn Quan [:xidorn] UTC+10

Comment 20

•

8 years ago

One possible optimization is to have a small cache (maybe lru cache) to store element and their class list during the selector matching, so that we wouldn't need to go through the series of function calls to get the list. But given that what we want to beat is just some function calls... it is probably very hard to actually help. Maybe I can experiment with it and see how it would look like.

Xidorn Quan [:xidorn] UTC+10

Comment 21

•

8 years ago

So, a one-element class list cache seems to make stylo faster than gecko likely due to the elimination of additional FFI call in most cases, while a larger lru cache seems to slow down it further. A one-element cache is probably very specific to this case... which I'm not sure whether it's worth adding at all.

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 22

•

8 years ago

I think unless we want to take the manual inlining of bug 1431421, maybe a bit cleaned up (which I don't love), given the size of the regression is nowhere near what it used to be (this used to be 3 to 4x slower instead of just 2/5% slower), I think we should avoid adding hacks for this. Bug 1437452 should get us a fair amount of the optimizations we've done for free without cluttering the code. Bug 651120 would allow us to manually inline GetParsedAttrs without duplicating scary pointer arithmetic.

Depends on: 651120, 1437452

Flags: needinfo?(emilio)

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

8 years ago

Flags: needinfo?(emilio)

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

8 years ago

Flags: needinfo?(emilio)

Guilherme Lima

Comment 23

•

7 years ago

Using the testcase attached on Windows 10: Latest Nightly - ~2300 ms Firefox 61.0.1 - ~2700 ms Chrome 68 - ~2690 ms

Emilio Cobos Álvarez [:emilio]

Assignee

Updated

•

7 years ago

Depends on: 1483964

Emilio Cobos Álvarez [:emilio]

Assignee

Comment 24

•

7 years ago

Yeah, we can call this fixed.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED