Closed
Bug 1408612
Opened 7 years ago
Closed 7 years ago
Reorganize gfxFont::DrawOneGlyph to go fast
Categories
(Core :: Layout: Text and Fonts, enhancement)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
mozilla58
Tracking | Status | |
---|---|---|
firefox58 | --- | fixed |
People
(Reporter: jrmuizel, Assigned: jfkthame)
References
(Regressed 1 open bug)
Details
Attachments
(4 files, 8 obsolete files)
55.26 KB,
patch
|
Details | Diff | Splinter Review | |
20.84 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
8.20 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
5.48 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
In service of being able to record text as fast as possible it looks like there are some things that can be done to improve it's speed.
1. There are some conditions that we test per glyph that are actually not per glyph:
if (fontParams.isVerticalFont)
if (runParams.isRTL) {
if (fontParams.haveSVGGlyphs) {
if (fontParams.haveColorGlyphs &&
for (int32_t i = 0; i < fontParams.extraStrikes; ++i) {
2. Some of the glyph position math may be able to folded together
3. We do some math in double precision and some in single precision
4. Ideally we could come up with something that could be inlined into the caller so that we don't have to do a call per glyph
Reporter | ||
Comment 1•7 years ago
|
||
Jonathan, do you have time to look at speeding this up?
Flags: needinfo?(jfkthame)
Assignee | ||
Comment 2•7 years ago
|
||
OK, I'll have a look at this and see what we can squeeze...
The mix between float (which Moz2D likes to use) and gfxFloat (i.e. double) has always seemed unfortunate. I guess we should be able to just use float throughout here.
Some things (like checking .haveSVGGlyphs and .haveColorGlyphs) are there because the presence of an SVG or color rendering for the glyph is a per-glyph question, but if we know the font doesn't have such tables at all, we can skip the (more expensive) check for whether this specific glyph has either of them. It might be worth hoisting these conditions out, though, and having a simplified DrawOneSimpleGlyph that we can use for the (common) case where neither is present, so these tests would normally be done just once at the glyphRun level.
And that reduced-functionality method should be easily inline-able, too.
Flags: needinfo?(jfkthame)
Reporter | ||
Comment 3•7 years ago
|
||
It would also be nice to move the call to Flush() out of DrawOneGlyph() and into the caller as we should be able to know ahead of time how many glyphs we'll draw.
Assignee | ||
Comment 4•7 years ago
|
||
I'm not sure that's true. We know the length of the range we're drawing from the textrun, which is the number of CompressedGlyph records we're going to process. But the number of glyphs we draw may be greater than that, because a single CompressedGlyph may actually reference a sequence of glyphs (due to complex font shaping that can map one input character to multiple glyphs).
Assignee | ||
Comment 5•7 years ago
|
||
Here's a first step, trying to use single-precision floats more consistently for positioning computations in the glyph-drawing code.
Assignee | ||
Comment 6•7 years ago
|
||
And reworking DrawGlyphs to use a simplified version of DrawOneGlyph in the most common case.
Assignee | ||
Comment 7•7 years ago
|
||
Jeff, can you try profiling with these patches and see if you can detect a difference compared to previously?
Flags: needinfo?(jmuizelaar)
Assignee | ||
Comment 8•7 years ago
|
||
We can avoid calling Flush() for every glyph if we pre-allocate space for a run of glyphs, and ensure the buffer is big enough to accept the upcoming set of OutputGlyph calls without overflowing.
Assignee | ||
Comment 9•7 years ago
|
||
Minor update to patch 3, to avoid hitting assertions in debug-webrender reftests due to trying to Flush when the buffer is empty.
Assignee | ||
Updated•7 years ago
|
Attachment #8921029 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8920372 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8920373 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921048 -
Flags: review?(jmuizelaar)
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8920372 [details] [diff] [review]
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text
Review of attachment 8920372 [details] [diff] [review]:
-----------------------------------------------------------------
So the type inside of gfx::Point is actually 'Float' so there a places where you could use that instead of 'float'. In reality we're probably unlikely to change the definition of 'Float' to anything other than 'float' so I don't feel strongly about this at all.
I haven't tested the performance difference of this but it seems like a good change to make for consistencies sake anyways.
Attachment #8920372 -
Flags: review?(jmuizelaar) → review+
Reporter | ||
Comment 11•7 years ago
|
||
Comment on attachment 8920373 [details] [diff] [review]
patch 2 - Use a simplified FastDrawGlyph method in place of DrawOneGlyph when color glyphs and other complexities are not present
Review of attachment 8920373 [details] [diff] [review]:
-----------------------------------------------------------------
I'll try run some performance tests on this code today.
::: gfx/thebes/gfxFont.cpp
@@ +2009,5 @@
> + return false;
> + }
> + } else {
> + gfx::Point glyphPt(*aPt);
> + if (aFontParams.isVerticalFont) {
We can pull this out of the loop by having a float offsetX, offsetY variables that are set before the loop.
@@ +2029,5 @@
> + }
> + // We can't use the simplified loop if there is inter-glyph spacing
> + // to apply, but can still use the FastDrawGlyph method for each
> + // individual glyph.
> + drawGlyphFunc = &gfxFont::FastDrawGlyph;
It's not great to add an indirect call here. What if we template the code instead?
Assignee | ||
Comment 12•7 years ago
|
||
Here's a version of patch 2 that uses templated DrawGlyphs and DrawOneGlyph methods, where the compiler should be able to optimize away the extra work from the versions used for "simple" fonts (no color, no fake-bold) and/or when there is no added inter-glyph spacing. I was initially a bit hesitant to go this way due to the likely increase in compiled-code size, but it doesn't seem too bad, and helps minimize the number of tests/branches that we'll hit in the inner loop of glyph rendering.
Attachment #8921513 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8920373 -
Attachment is obsolete: true
Attachment #8920373 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Assignee | ||
Comment 13•7 years ago
|
||
Rebased for the updated version of patch 2.
Attachment #8921515 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921048 -
Attachment is obsolete: true
Attachment #8921048 -
Flags: review?(jmuizelaar)
Reporter | ||
Comment 14•7 years ago
|
||
I wasn't able to apply these patches what revision are they against?
Flags: needinfo?(jmuizelaar)
Assignee | ||
Comment 15•7 years ago
|
||
Rebased to current mozilla-central tip (967c95cee709).
Assignee | ||
Updated•7 years ago
|
Attachment #8920372 -
Attachment is obsolete: true
Assignee | ||
Comment 16•7 years ago
|
||
Attachment #8921634 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921513 -
Attachment is obsolete: true
Attachment #8921513 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8921635 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921515 -
Attachment is obsolete: true
Attachment #8921515 -
Flags: review?(jmuizelaar)
Reporter | ||
Comment 18•7 years ago
|
||
Comment on attachment 8921513 [details] [diff] [review]
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing)
Review of attachment 8921513 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/thebes/gfxFont.cpp
@@ +1877,3 @@
> for (uint32_t i = 0; i < aCount; ++i, ++glyphData) {
> if (glyphData->IsSimpleGlyph()) {
> + float advance = glyphData->GetSimpleAdvance();
I didn't realize that the this logic was as complicated as it was:
if (runParams.isRTL) {
aPt->x -= aAdvance;
glyphX = aPt->x;
} else {
glyphX = aPt->x;
aPt->x += aAdvance;
}
Just doing it inline with a well predicted branch is probably going to be cheaper than doing the two multiplies.
@@ +1898,4 @@
> }
> } else {
> + gfx::Point glyphPt(*aPt);
> + glyphPt.x += hOffsetFactor * details->mXOffset +
I guess the same branch vs. multiply situation exists here. Would it be possible to do the swap of details->mYOffset, details->mXOffset when we do the shaping instead of inside this loop?
Attachment #8921513 -
Attachment is obsolete: false
Reporter | ||
Comment 19•7 years ago
|
||
Comment on attachment 8921635 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended
Review of attachment 8921635 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/thebes/gfxFont.cpp
@@ +1615,5 @@
> + // Ensure the buffer has enough space for aGlyphCount glyphs to be added.
> + // This MUST be called before OutputGlyph is used to actually store glyph
> + // records in the buffer. It may be called repeated to add further capacity
> + // in case we don't know up-front exactly what will be needed.
> + void AddCapacity(uint32_t aGlyphCount)
At this point is there any reason not to just use an AutoTArray instead of homegrowing it?
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> At this point is there any reason not to just use an AutoTArray instead of
> homegrowing it?
I did that initially, and it seemed fine; but then I realized it'll fail in a rare edge case.
If we use an AutoTArray, we'll want to avoid using SetLength or AppendElements to allocate the space we actually need, because those methods will default-construct the new elements; so instead, the idea was to use SetCapacity to reserve the necessary buffer space without default-constructing all the Glyph records. Then we can get the Elements() pointer to the buffer and write our glyph data directly there. And when we find that we need additional space because a single CompressedGlyph in the textrun actually points to a list of multiple glyphs, we use SetCapacity again to enlarge the array's buffer.
This buffer growth would work fine as long as it still fits within the array's inline buffer (we just use a bit more of it), or if the buffer was already heap-allocated (in which case AutoTArray will use realloc(), which preserves the contents). But the problem will arise when the initial capacity we allocate (based on the length of the run) fits within the array's inline buffer, but subsequently we need to extend the buffer, and it exceeds the inline buffer and needs to reallocate to the heap. Because we haven't "officially" filled the array (with SetLength or AppendElements), its mLength is still zero; as far as it's concerned the buffer is just uninitialized space, and when it switches to use a malloc'd buffer it won't copy the existing contents.
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #18)
> Would it be
> possible to do the swap of details->mYOffset, details->mXOffset when we do
> the shaping instead of inside this loop?
I've wondered if we should do that, but I'd prefer to tackle it as a separate bug; we'll have to make sure we handle it consistently throughout shaping/measurement/drawing, so it's not as localized as the rest of the optimization here.
Assignee | ||
Comment 22•7 years ago
|
||
OK, I switched this back to if-branches instead of paired multiplications; as you note, they should get predicted with pretty high accuracy, so probably come fairly cheap.
Attachment #8921970 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921513 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8921634 -
Attachment is obsolete: true
Attachment #8921634 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 23•7 years ago
|
||
Rebased for updated version of patch 2. I also did a little instrumentation, and it looks like the 3 patches here altogether give me around a 12-15% speedup for DrawGlyphs, on my macOS opt build. We may be able to get a little bit more by swapping the mXOffset/mYOffset fields of DetailedGlyph when shaping with vertical fonts, to avoid that test/branch in DrawOneGlyph, but OTOH that's only relevant when DetailedGlyph records are present, so it won't have any effect for the vast majority of simple text content.
Attachment #8921973 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•7 years ago
|
Attachment #8921635 -
Attachment is obsolete: true
Attachment #8921635 -
Flags: review?(jmuizelaar)
Reporter | ||
Comment 24•7 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #20)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> > At this point is there any reason not to just use an AutoTArray instead of
> > homegrowing it?
>
> I did that initially, and it seemed fine; but then I realized it'll fail in
> a rare edge case.
>
> If we use an AutoTArray, we'll want to avoid using SetLength or
> AppendElements to allocate the space we actually need, because those methods
> will default-construct the new elements; so instead, the idea was to use
> SetCapacity to reserve the necessary buffer space without
> default-constructing all the Glyph records. Then we can get the Elements()
> pointer to the buffer and write our glyph data directly there. And when we
> find that we need additional space because a single CompressedGlyph in the
> textrun actually points to a list of multiple glyphs, we use SetCapacity
> again to enlarge the array's buffer.
>
> This buffer growth would work fine as long as it still fits within the
> array's inline buffer (we just use a bit more of it), or if the buffer was
> already heap-allocated (in which case AutoTArray will use realloc(), which
> preserves the contents). But the problem will arise when the initial
> capacity we allocate (based on the length of the run) fits within the
> array's inline buffer, but subsequently we need to extend the buffer, and it
> exceeds the inline buffer and needs to reallocate to the heap. Because we
> haven't "officially" filled the array (with SetLength or AppendElements),
> its mLength is still zero; as far as it's concerned the buffer is just
> uninitialized space, and when it switches to use a malloc'd buffer it won't
> copy the existing contents.
Ok. That makes sense. Can you add some of this information as comments?
Reporter | ||
Updated•7 years ago
|
Attachment #8921970 -
Flags: review?(jmuizelaar) → review+
Reporter | ||
Updated•7 years ago
|
Attachment #8921973 -
Flags: review?(jmuizelaar) → review+
Comment 25•7 years ago
|
||
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1f4b0f6182e
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text. r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1fc33b95a4
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing). r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1c3d56beb1c
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended. r=jrmuizel
Comment 26•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d1f4b0f6182e
https://hg.mozilla.org/mozilla-central/rev/fb1fc33b95a4
https://hg.mozilla.org/mozilla-central/rev/d1c3d56beb1c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Reporter | ||
Comment 27•7 years ago
|
||
There's still some float/double conversion happening here:
#define ToDeviceUnits(aAppUnits, aDevUnitsPerAppUnit) \
(double(aAppUnits)*double(aDevUnitsPerAppUnit))
Assignee | ||
Comment 28•7 years ago
|
||
Indeed, there are doubles hiding in there. Also a couple of fields in the parameter structs that we should change to floats, so they don't keep polluting our drawing calculations with unnecessary doubleness. Here's a simple followup to handle these.
Attachment #8922423 -
Flags: review?(jmuizelaar)
Reporter | ||
Comment 29•7 years ago
|
||
Comment on attachment 8922423 [details] [diff] [review]
followup - Change a few more doubles to floats in the gfxFont drawing code paths
Review of attachment 8922423 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/thebes/gfxFont.cpp
@@ +1872,5 @@
> // short-circuit the most common case to avoid sqrt() and division
> return 1.0;
> }
>
> + gfx::Float m = sqrt(t.width * t.width + t.height * t.height);
sqrtf?
Attachment #8922423 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 30•7 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> > + gfx::Float m = sqrt(t.width * t.width + t.height * t.height);
>
> sqrtf?
Good point, will fix.
Comment 31•7 years ago
|
||
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0998d26202bd
followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
![]() |
||
Comment 32•7 years ago
|
||
Backed out for reftest failures e.g. layout/reftests/svg/text-gradient-01.svg and layout/reftests/writing-mode/1193519-sideways-lr-decoration-1.html:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0871b1ac380a3a3f39ad1484e2e7ce75e965453
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0998d26202bdeb091a3405db281cdf6e43d14d05&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=139987022&repo=mozilla-inbound
Push which ran more failing jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3705f52f723481aa8f4b2cf781b539f6a48fc105&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Flags: needinfo?(jfkthame)
Assignee | ||
Comment 33•7 years ago
|
||
Oops. :( So we finally reached a point where reducing the precision affected rendering - in particular, there are some glyphs shifted by a pixel in transformed elements. OK, I'll check with tryserver to see if we can be more selective here.
Flags: needinfo?(jfkthame)
Comment 34•7 years ago
|
||
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f25bb9912fdf
followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
Assignee | ||
Comment 35•7 years ago
|
||
Re-landed the followup without the changes that appear to affect glyph spacing (so we still use double-precision during conversions between appUnits and device pixels). Maybe we can get rid of more of that later, but it'll require more careful attention to rounding, or something.
![]() |
||
Comment 36•7 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•