I was profiling the testcase in bug 501847, and found that we end up doing 100 mark/sweep cycles on the ruletree, touching a total of about 2 million style contexts, during that pageload. This takes up somewhere around 7% of the total pageload time just for the mark/sweep stuff. This happens because lots and lots of style contexts get destroyed during pageload. _That_ happens for mainly two different reasons: we destroy a whole bunch of frames due to frametree reconstructs, and we end up not creating frames for some frame construction items (whitespace?). The latter seems to account for about half of the calls into nsStyleContext::Destroy. I wonder whether we could have style contexts refcount rulenodes and do the mark not every 1000 style context destroys but every time we think we have 1000 rulenodes sitting at 0 refcount or something.... I also briefly toyed with trying to not even create those textnode style contexts in the frame constructor until we're sure we'll be creating a frame. Maybe that's worth doing, but we'd then need to store the parent style context in the frame construction item or something.
Maybe we should do both, in fact.
Originally, rule nodes were reference counted, but stopping the reference counting improved performance. However, the original AddRef and Release were probably virtual, and I think that was all done as one big de-COM-tamination. So maybe putting back non-virtual reference counting would improve things. (Though in the simple cases where we never do a GC, I don't think it would.)
Yeah, this would only matter in cases where we create/destroy frames a good bit, or on pages with lots of suppressed textnodes...
Created attachment 398810 [details] [diff] [review] One option I think the hack to not allocate style contexts at all unless we need them is not worth it: style context alloc/free is cheap, and this patch gives all the benefits of not rerunning rulenode gc unnecessarily. Helps by about the expected 7-8% on my big page testcase, but need to do performance testing in general, I guess. And maybe tune the 1000 value...
Created attachment 398811 [details] [diff] [review] And actually including the changes
That last diff might lead us to sweep during ruletree constuction if we get enough "internal" nodes (ones not pointed to by style contexts directly). But that would be only once we have 1000 such internal nodes.... We could fix that by having rulenodes addref their parents, I guess.
Comment on attachment 398811 [details] [diff] [review] And actually including the changes >+ // we notify the style context. style set >+ // Notify the style set that a rulenode is no longer in use or was just constructed and is not yet in use r=dbaron
(In reply to comment #6) > We could fix that by having rulenodes addref their parents, I guess. Actually, I think we probably should make rule nodes AddRef their parents, though it would mean the unused count would often be too low (rather than too high, as in this approach). Alternatively, we could just switch back to reference counting rule nodes and ditch the GC.
Just refcounting the rulenodes can easily lead to rulenode churn as we create/destroy style contexts, right? I'll add parent refcounting; do you want to re-review with that change?
Doesn't matter to me. Given that, it might also be worth lowering kGCInterval a bit; I'm not really sure, though. It was always just a made-up number.
Created attachment 402600 [details] [diff] [review] Smaller gc interval, refcounting parents
Created attachment 402601 [details] [diff] [review] Interdiff from what dbaron reviewed to that last patch