While profiling the testcase in bug 1331848, I created a version of the primary stylesheet that just |cat|s the original stylesheet together 100 times. I did this in order to dig deeper into CSS parsing performance, but noticed an interesting difference in selector matching performance characteristics between Gecko and Stylo. Using Stylo (sequential mode): * Original stylesheet: 21ms in matches_complex_selector: https://perfht.ml/2n6W8jE * duplicated-100x stylesheet: 1317 ms in matches_complex_selector : https://perfht.ml/2n75k7n Using Gecko: * Original stylesheet: 20ms in ResolveStyleFor: https://perfht.ml/2n6Vzq7 * duplicated-100x stylesheet: 190ms in ResolveStyleFor: https://perfht.ml/2n736VO So Servo's style system scales roughly linearly with the number of selectors to match (which is what I'd expect), whereas Gecko's scales sub-linearly. We should figure out what Gecko's secret sauce is here and copy it.
Boris, how does Gecko manage to achieve logarithmic performance characteristics here?
Or rather, any idea how? I can also dig into this when I get cycles.
It looks to me like servo is spending a lot more time in selector matching as a fraction of the profile than Gecko. So my first hunch is that we're not rejecting as many things as we could based on rulehash bits or something. I did measure how many rules get examined by get_matching_rules in servo during this restyle (i.e. the length of self.other_rules), and compared to the length of mUniversalRules at http://searchfox.org/mozilla-central/rev/557f236c19730116d3bf53c0deef36362cafafcd/layout/style/nsCSSRuleProcessor.cpp#639 and the two seem to be the same: 309. On the servo side, for the cases when we have 309 rules there we _seem_ to pretty consistently miss the bloom filter for 3 of them, but hit for everything else, so should reject quite rapidly. Given that, I don't know why we're spending so much time under matches_complex_selector_internal. That said, when I just disabled all the iframes on that page, I no longer see the "309 rules" list, so I dunno what's going on _there_. :(
This bug could seriously benefit from some testcase reduction (e.g. to strip out all the irrelevant stylesheets, at the very least). That would make it much simpler to measure various stuff relevant to the matching of this particular sheet...
One thing worth checking is how long it takes for the bloom filter to reject a rule. At first glance, Servo's filter may be rehashing things every time, where Gecko precomputes some hashes once.
Regarding the bloom filter, we were hashing all the time with fnv the precomputed hash. This is not too bad I think, but can be better. I opened https://github.com/servo/servo/pull/16070 to make it better.
I tried the changes from comment 8, I think, and they don't seem to help much.... Still looks a lot slower than Gecko. :( I'm hoping I somehow messed up and pulled the wrong rev.
So, I think I have a few ideas about what may be causing this. Other things in mind that are linear with the number of selectors: * The DependencySet (used to calculate restyle hints). I expect this not to have such a big impact, since we only use this when an element has a snapshot, which I expect it's not too frequent (though perhaps it is?). * The two lists of sibling/attribute-affecting selectors that we store on the stylist to do style sharing. This _shouldn't_ have such a big impact on perf, given the cache for the sequential traversal is not working. Making is_style_share_cache_disabled return true may help diagnose if that's the case. Both of them do selector matching, and both of them without a bloom filter AFAIK. I can hook the bloom filter there and see if it improves things.
The thing is, the profile seemed to be explicitly pointing to selector matching under the main traversal, right?
Bug 1357304 and bug 1357973 get us from 1300 to a bit under 400. Bug 1358375 gets us to 77, which is 2.5x better than Gecko as measured in comment 0. When that lands, I think we can go ahead and resolve this. :-)