See the testcase in bug 630480, where we end up walking the whole DOM over and over.
Right now we cache for a single node, for the duration of style resolution, which helps some if there are multiple :nth-* rules. But we still end up doing lots of work.
A simple solution, implemented in Webkit, is to store an index just for :nth-child in each node (17 bits in their case; it shares space with some flags) and have the others be slow. The problem with this approach is that we reresolve style on the frame tree, and frame tree order doesn't necessarily match DOM order, esp. with XBL. So we can't rely on our previous sibling already having its style reresolved and hence having a correct cached index. I suppose we could deal with this by not using the cache if the parent has a binding....
Another approach, given lazy frame construction, is to only cache things for the duration of the style resolution or reresolution pass (so until we exit the frame constructor or exit the toplevel style reresolution entrypoint, or querySelector* or matchesSelector call). So basically a hashtable hanging off the document or something like that.
Another option is to have such a hashtable hanging off the document and clear it on mutation. The drawback to this is that we might have a bunch of data hanging around if we insert a big DOM and selectors like this are present and then do no mutations. We could get rid of it on a timer, of course.
Oh, and a single shared hashtable is not so great for bug 631527. So another question is how to handle that.... perhaps a hashtable on the document for the DOM APIs and maybe style reresolution and per-thread hashtables for initial frame construction or something?
So one possibility would be to just hang the data off a TreeMatchContext (which is usable across multiple style resolution calls with my patches in bug 598832) and then push the TreeMatchContext creation up further into the frame constructor and such.
In fact, I plan to do just that. It certainly works pretty well on bug 630480, in my measurements.
And in fact, the patches in bug 598832 do that for frame construction (in particular, part 16).
Once those land, I'll look at pushing a TreeMatchContext up higher in style reresolution.
Created attachment 529912 [details]
Testcase to exercise the patch I'm about to attach
Without the patch, my browser takes about 1600ms to run this testcase. With the patch, it takes 30ms.
Oh, WebKit trunk takes about 1000ms on that testcase. Opera 11 takes about 2500ms. If I use :nth-child instead of :nth-of-type, then Opera takes about 2200ms, while WebKit takes about 120ms.
Created attachment 529914 [details] [diff] [review]
I wonder whether we should remove combine the various pass-through arguments here into a single struct or something.....
Comment on attachment 529914 [details] [diff] [review]