Closed Bug 1408612 Opened 7 years ago Closed 7 years ago

Reorganize gfxFont::DrawOneGlyph to go fast

Categories

(Core :: Layout: Text and Fonts, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: jrmuizel, Assigned: jfkthame)

References

(Regressed 1 open bug)

Details

Attachments

(4 files, 8 obsolete files)

55.26 KB, patch
Details | Diff | Splinter Review
20.84 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
8.20 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
5.48 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
In service of being able to record text as fast as possible it looks like there are some things that can be done to improve it's speed. 1. There are some conditions that we test per glyph that are actually not per glyph: if (fontParams.isVerticalFont) if (runParams.isRTL) { if (fontParams.haveSVGGlyphs) { if (fontParams.haveColorGlyphs && for (int32_t i = 0; i < fontParams.extraStrikes; ++i) { 2. Some of the glyph position math may be able to folded together 3. We do some math in double precision and some in single precision 4. Ideally we could come up with something that could be inlined into the caller so that we don't have to do a call per glyph
Blocks: 1408500
Jonathan, do you have time to look at speeding this up?
Flags: needinfo?(jfkthame)
OK, I'll have a look at this and see what we can squeeze... The mix between float (which Moz2D likes to use) and gfxFloat (i.e. double) has always seemed unfortunate. I guess we should be able to just use float throughout here. Some things (like checking .haveSVGGlyphs and .haveColorGlyphs) are there because the presence of an SVG or color rendering for the glyph is a per-glyph question, but if we know the font doesn't have such tables at all, we can skip the (more expensive) check for whether this specific glyph has either of them. It might be worth hoisting these conditions out, though, and having a simplified DrawOneSimpleGlyph that we can use for the (common) case where neither is present, so these tests would normally be done just once at the glyphRun level. And that reduced-functionality method should be easily inline-able, too.
Flags: needinfo?(jfkthame)
It would also be nice to move the call to Flush() out of DrawOneGlyph() and into the caller as we should be able to know ahead of time how many glyphs we'll draw.
I'm not sure that's true. We know the length of the range we're drawing from the textrun, which is the number of CompressedGlyph records we're going to process. But the number of glyphs we draw may be greater than that, because a single CompressedGlyph may actually reference a sequence of glyphs (due to complex font shaping that can map one input character to multiple glyphs).
Here's a first step, trying to use single-precision floats more consistently for positioning computations in the glyph-drawing code.
And reworking DrawGlyphs to use a simplified version of DrawOneGlyph in the most common case.
Jeff, can you try profiling with these patches and see if you can detect a difference compared to previously?
Flags: needinfo?(jmuizelaar)
We can avoid calling Flush() for every glyph if we pre-allocate space for a run of glyphs, and ensure the buffer is big enough to accept the upcoming set of OutputGlyph calls without overflowing.
Minor update to patch 3, to avoid hitting assertions in debug-webrender reftests due to trying to Flush when the buffer is empty.
Attachment #8921029 - Attachment is obsolete: true
Attachment #8920372 - Flags: review?(jmuizelaar)
Attachment #8920373 - Flags: review?(jmuizelaar)
Attachment #8921048 - Flags: review?(jmuizelaar)
Comment on attachment 8920372 [details] [diff] [review] patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text Review of attachment 8920372 [details] [diff] [review]: ----------------------------------------------------------------- So the type inside of gfx::Point is actually 'Float' so there a places where you could use that instead of 'float'. In reality we're probably unlikely to change the definition of 'Float' to anything other than 'float' so I don't feel strongly about this at all. I haven't tested the performance difference of this but it seems like a good change to make for consistencies sake anyways.
Attachment #8920372 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8920373 [details] [diff] [review] patch 2 - Use a simplified FastDrawGlyph method in place of DrawOneGlyph when color glyphs and other complexities are not present Review of attachment 8920373 [details] [diff] [review]: ----------------------------------------------------------------- I'll try run some performance tests on this code today. ::: gfx/thebes/gfxFont.cpp @@ +2009,5 @@ > + return false; > + } > + } else { > + gfx::Point glyphPt(*aPt); > + if (aFontParams.isVerticalFont) { We can pull this out of the loop by having a float offsetX, offsetY variables that are set before the loop. @@ +2029,5 @@ > + } > + // We can't use the simplified loop if there is inter-glyph spacing > + // to apply, but can still use the FastDrawGlyph method for each > + // individual glyph. > + drawGlyphFunc = &gfxFont::FastDrawGlyph; It's not great to add an indirect call here. What if we template the code instead?
Here's a version of patch 2 that uses templated DrawGlyphs and DrawOneGlyph methods, where the compiler should be able to optimize away the extra work from the versions used for "simple" fonts (no color, no fake-bold) and/or when there is no added inter-glyph spacing. I was initially a bit hesitant to go this way due to the likely increase in compiled-code size, but it doesn't seem too bad, and helps minimize the number of tests/branches that we'll hit in the inner loop of glyph rendering.
Attachment #8921513 - Flags: review?(jmuizelaar)
Attachment #8920373 - Attachment is obsolete: true
Attachment #8920373 - Flags: review?(jmuizelaar)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Attachment #8921048 - Attachment is obsolete: true
Attachment #8921048 - Flags: review?(jmuizelaar)
I wasn't able to apply these patches what revision are they against?
Flags: needinfo?(jmuizelaar)
Attachment #8920372 - Attachment is obsolete: true
Attachment #8921513 - Attachment is obsolete: true
Attachment #8921513 - Flags: review?(jmuizelaar)
Attachment #8921515 - Attachment is obsolete: true
Attachment #8921515 - Flags: review?(jmuizelaar)
Comment on attachment 8921513 [details] [diff] [review] patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing) Review of attachment 8921513 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1877,3 @@ > for (uint32_t i = 0; i < aCount; ++i, ++glyphData) { > if (glyphData->IsSimpleGlyph()) { > + float advance = glyphData->GetSimpleAdvance(); I didn't realize that the this logic was as complicated as it was: if (runParams.isRTL) { aPt->x -= aAdvance; glyphX = aPt->x; } else { glyphX = aPt->x; aPt->x += aAdvance; } Just doing it inline with a well predicted branch is probably going to be cheaper than doing the two multiplies. @@ +1898,4 @@ > } > } else { > + gfx::Point glyphPt(*aPt); > + glyphPt.x += hOffsetFactor * details->mXOffset + I guess the same branch vs. multiply situation exists here. Would it be possible to do the swap of details->mYOffset, details->mXOffset when we do the shaping instead of inside this loop?
Attachment #8921513 - Attachment is obsolete: false
Comment on attachment 8921635 [details] [diff] [review] patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended Review of attachment 8921635 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1615,5 @@ > + // Ensure the buffer has enough space for aGlyphCount glyphs to be added. > + // This MUST be called before OutputGlyph is used to actually store glyph > + // records in the buffer. It may be called repeated to add further capacity > + // in case we don't know up-front exactly what will be needed. > + void AddCapacity(uint32_t aGlyphCount) At this point is there any reason not to just use an AutoTArray instead of homegrowing it?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19) > At this point is there any reason not to just use an AutoTArray instead of > homegrowing it? I did that initially, and it seemed fine; but then I realized it'll fail in a rare edge case. If we use an AutoTArray, we'll want to avoid using SetLength or AppendElements to allocate the space we actually need, because those methods will default-construct the new elements; so instead, the idea was to use SetCapacity to reserve the necessary buffer space without default-constructing all the Glyph records. Then we can get the Elements() pointer to the buffer and write our glyph data directly there. And when we find that we need additional space because a single CompressedGlyph in the textrun actually points to a list of multiple glyphs, we use SetCapacity again to enlarge the array's buffer. This buffer growth would work fine as long as it still fits within the array's inline buffer (we just use a bit more of it), or if the buffer was already heap-allocated (in which case AutoTArray will use realloc(), which preserves the contents). But the problem will arise when the initial capacity we allocate (based on the length of the run) fits within the array's inline buffer, but subsequently we need to extend the buffer, and it exceeds the inline buffer and needs to reallocate to the heap. Because we haven't "officially" filled the array (with SetLength or AppendElements), its mLength is still zero; as far as it's concerned the buffer is just uninitialized space, and when it switches to use a malloc'd buffer it won't copy the existing contents.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #18) > Would it be > possible to do the swap of details->mYOffset, details->mXOffset when we do > the shaping instead of inside this loop? I've wondered if we should do that, but I'd prefer to tackle it as a separate bug; we'll have to make sure we handle it consistently throughout shaping/measurement/drawing, so it's not as localized as the rest of the optimization here.
OK, I switched this back to if-branches instead of paired multiplications; as you note, they should get predicted with pretty high accuracy, so probably come fairly cheap.
Attachment #8921970 - Flags: review?(jmuizelaar)
Attachment #8921513 - Attachment is obsolete: true
Attachment #8921634 - Attachment is obsolete: true
Attachment #8921634 - Flags: review?(jmuizelaar)
Rebased for updated version of patch 2. I also did a little instrumentation, and it looks like the 3 patches here altogether give me around a 12-15% speedup for DrawGlyphs, on my macOS opt build. We may be able to get a little bit more by swapping the mXOffset/mYOffset fields of DetailedGlyph when shaping with vertical fonts, to avoid that test/branch in DrawOneGlyph, but OTOH that's only relevant when DetailedGlyph records are present, so it won't have any effect for the vast majority of simple text content.
Attachment #8921973 - Flags: review?(jmuizelaar)
Attachment #8921635 - Attachment is obsolete: true
Attachment #8921635 - Flags: review?(jmuizelaar)
(In reply to Jonathan Kew (:jfkthame) from comment #20) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #19) > > At this point is there any reason not to just use an AutoTArray instead of > > homegrowing it? > > I did that initially, and it seemed fine; but then I realized it'll fail in > a rare edge case. > > If we use an AutoTArray, we'll want to avoid using SetLength or > AppendElements to allocate the space we actually need, because those methods > will default-construct the new elements; so instead, the idea was to use > SetCapacity to reserve the necessary buffer space without > default-constructing all the Glyph records. Then we can get the Elements() > pointer to the buffer and write our glyph data directly there. And when we > find that we need additional space because a single CompressedGlyph in the > textrun actually points to a list of multiple glyphs, we use SetCapacity > again to enlarge the array's buffer. > > This buffer growth would work fine as long as it still fits within the > array's inline buffer (we just use a bit more of it), or if the buffer was > already heap-allocated (in which case AutoTArray will use realloc(), which > preserves the contents). But the problem will arise when the initial > capacity we allocate (based on the length of the run) fits within the > array's inline buffer, but subsequently we need to extend the buffer, and it > exceeds the inline buffer and needs to reallocate to the heap. Because we > haven't "officially" filled the array (with SetLength or AppendElements), > its mLength is still zero; as far as it's concerned the buffer is just > uninitialized space, and when it switches to use a malloc'd buffer it won't > copy the existing contents. Ok. That makes sense. Can you add some of this information as comments?
Attachment #8921970 - Flags: review?(jmuizelaar) → review+
Attachment #8921973 - Flags: review?(jmuizelaar) → review+
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d1f4b0f6182e patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text. r=jrmuizel https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1fc33b95a4 patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing). r=jrmuizel https://hg.mozilla.org/integration/mozilla-inbound/rev/d1c3d56beb1c patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended. r=jrmuizel
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
There's still some float/double conversion happening here: #define ToDeviceUnits(aAppUnits, aDevUnitsPerAppUnit) \ (double(aAppUnits)*double(aDevUnitsPerAppUnit))
Indeed, there are doubles hiding in there. Also a couple of fields in the parameter structs that we should change to floats, so they don't keep polluting our drawing calculations with unnecessary doubleness. Here's a simple followup to handle these.
Attachment #8922423 - Flags: review?(jmuizelaar)
Comment on attachment 8922423 [details] [diff] [review] followup - Change a few more doubles to floats in the gfxFont drawing code paths Review of attachment 8922423 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1872,5 @@ > // short-circuit the most common case to avoid sqrt() and division > return 1.0; > } > > + gfx::Float m = sqrt(t.width * t.width + t.height * t.height); sqrtf?
Attachment #8922423 - Flags: review?(jmuizelaar) → review+
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29) > > + gfx::Float m = sqrt(t.width * t.width + t.height * t.height); > > sqrtf? Good point, will fix.
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0998d26202bd followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
Oops. :( So we finally reached a point where reducing the precision affected rendering - in particular, there are some glyphs shifted by a pixel in transformed elements. OK, I'll check with tryserver to see if we can be more selective here.
Flags: needinfo?(jfkthame)
Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f25bb9912fdf followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
Re-landed the followup without the changes that appear to affect glyph spacing (so we still use double-precision during conversions between appUnits and device pixels). Maybe we can get rid of more of that later, but it'll require more careful attention to rounding, or something.
Depends on: 1517551
Regressions: 1626978
No longer depends on: 1517551
Regressions: 1517551
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: