Tp/Tp2 regression from gfxPangoTextRun landing




13 years ago
13 years ago


(Reporter: roc, Assigned: roc)


Dependency tree / graph

Firefox Tracking Flags

(Not tracked)


Landing gfxPangoTextRun (new textframe disabled) caused a significant Tp/Tp2 regression. In particular bl-bldlnx01 Tp2 jumped from about 560 to about 600, and Tp from about 700 to about 750. This bug is to track that and maybe fix it.

If there's something I can do in code that's going to still be used by the new text frame, then I'll try to do it, but the problems may end up being unfixable that way, in which case we'll have to live with it and claw performance back when the new textframe is enabled.
My profiles are hard to interpret because running jprof in the VM seems to show PR_Now()/PR_IntervalNow() as *extremely* expensive, I presume because of virtualization. I'm not sure how far to trust this data.

Anyway, I profiled Tp2 (the non-international testset). Out of 360689 ticks, we're spending just 114 in gfxPangoTextRun::Init() which does all the thought-to-be-expensive Pango operations (and indeed 108 of those are in pango_shape). We're spending 138 ticks in gfxPangoTextRun::Draw() (134 of those are in cairo_show_glyphs). Interestingly the hottest gfxPangoTextRun operation is GetAdvanceWidth (1211 ticks) which is mostly spent under GetPartialLigatureWidth (1210 ticks). I'll see if I can do something about that.

It does show us spending 57697 ticks under gfxTextRunCache::GetOrMakeTextRun, but I suspect that's mostly the cost of context switches to virtualize the PR_Now calls, plus associated cache/TLB misses as we come back and fault everything back in. I will try only checking for a flush every 100 textruns and see if that helps.
Doing that changed things around. The time for GetAdvanceWidth dropped dramatically (and GetPartialLigatureWidth disappeared, which is good because there aren't any partial ligatures), and the time allocated to gfxPangoTextRun::Init increased somewhat. So that's more like what I'd expect, but it's still telling me that gfxPangoTextRun is costing less than the regression we observed ... I don't really trust my VM profiles. Maybe rhelmer's jprof tinderbox will tell me something reliable.
One thing my patch did is make us use Pango for ASCII text; before, we fell back to Xft. This has some benefits, like enabling ligatures and kerning, and ensuring that text rendering stays consistent if you add a non-ASCII character to an ASCII string, but of course we should expect a performance hit form that.

I may add back an Xft path to gfxPangoTextRun --- it's easy, we just need to add a CreateGlyphRunsXft function and call it when we feel like it. We can then see how that performs, so we can see what Pango is costing us. Ultimately though I think we want to use Pango, except maybe in embedded device situations.
We recovered almost all of the Tp/Tp2 regression by reintroducing Xft for 8bit text, so I'm marking this fixed.
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.