Reorganize gfxFont::DrawOneGlyph to go fast

RESOLVED FIXED in Firefox 58

Status

()

Core
Layout: Text
RESOLVED FIXED
a month ago
21 days ago

People

(Reporter: jrmuizel, Assigned: jfkthame)

Tracking

(Blocks: 1 bug)

unspecified
mozilla58
Points:
---

Firefox Tracking Flags

(firefox58 fixed)

Details

Attachments

(4 attachments, 8 obsolete attachments)

55.26 KB, patch
Details | Diff | Splinter Review
20.84 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
8.20 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
5.48 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
(Reporter)

Description

a month ago
In service of being able to record text as fast as possible it looks like there are some things that can be done to improve it's speed.

1. There are some conditions that we test per glyph that are actually not per glyph:
if (fontParams.isVerticalFont)
if (runParams.isRTL) {
if (fontParams.haveSVGGlyphs) {
if (fontParams.haveColorGlyphs &&
for (int32_t i = 0; i < fontParams.extraStrikes; ++i) {

2. Some of the glyph position math may be able to folded together

3. We do some math in double precision and some in single precision

4. Ideally we could come up with something that could be inlined into the caller so that we don't have to do a call per glyph
(Reporter)

Updated

a month ago
Blocks: 1408500
(Reporter)

Comment 1

a month ago
Jonathan, do you have time to look at speeding this up?
Flags: needinfo?(jfkthame)
(Assignee)

Comment 2

a month ago
OK, I'll have a look at this and see what we can squeeze...

The mix between float (which Moz2D likes to use) and gfxFloat (i.e. double) has always seemed unfortunate. I guess we should be able to just use float throughout here.

Some things (like checking .haveSVGGlyphs and .haveColorGlyphs) are there because the presence of an SVG or color rendering for the glyph is a per-glyph question, but if we know the font doesn't have such tables at all, we can skip the (more expensive) check for whether this specific glyph has either of them. It might be worth hoisting these conditions out, though, and having a simplified DrawOneSimpleGlyph that we can use for the (common) case where neither is present, so these tests would normally be done just once at the glyphRun level.

And that reduced-functionality method should be easily inline-able, too.
Flags: needinfo?(jfkthame)
(Reporter)

Comment 3

a month ago
It would also be nice to move the call to Flush() out of DrawOneGlyph() and into the caller as we should be able to know ahead of time how many glyphs we'll draw.
(Assignee)

Comment 4

a month ago
I'm not sure that's true. We know the length of the range we're drawing from the textrun, which is the number of CompressedGlyph records we're going to process. But the number of glyphs we draw may be greater than that, because a single CompressedGlyph may actually reference a sequence of glyphs (due to complex font shaping that can map one input character to multiple glyphs).
(Assignee)

Comment 5

a month ago
Created attachment 8920372 [details] [diff] [review]
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text

Here's a first step, trying to use single-precision floats more consistently for positioning computations in the glyph-drawing code.
(Assignee)

Comment 6

a month ago
Created attachment 8920373 [details] [diff] [review]
patch 2 - Use a simplified FastDrawGlyph method in place of DrawOneGlyph when color glyphs and other complexities are not present

And reworking DrawGlyphs to use a simplified version of DrawOneGlyph in the most common case.
(Assignee)

Comment 7

a month ago
Jeff, can you try profiling with these patches and see if you can detect a difference compared to previously?
Flags: needinfo?(jmuizelaar)
(Assignee)

Comment 8

28 days ago
Created attachment 8921029 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended

We can avoid calling Flush() for every glyph if we pre-allocate space for a run of glyphs, and ensure the buffer is big enough to accept the upcoming set of OutputGlyph calls without overflowing.
(Assignee)

Comment 9

27 days ago
Created attachment 8921048 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended

Minor update to patch 3, to avoid hitting assertions in debug-webrender reftests due to trying to Flush when the buffer is empty.
(Assignee)

Updated

27 days ago
Attachment #8921029 - Attachment is obsolete: true
(Assignee)

Updated

27 days ago
Attachment #8920372 - Flags: review?(jmuizelaar)
(Assignee)

Updated

27 days ago
Attachment #8920373 - Flags: review?(jmuizelaar)
(Assignee)

Updated

27 days ago
Attachment #8921048 - Flags: review?(jmuizelaar)
(Reporter)

Comment 10

27 days ago
Comment on attachment 8920372 [details] [diff] [review]
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text

Review of attachment 8920372 [details] [diff] [review]:
-----------------------------------------------------------------

So the type inside of gfx::Point is actually 'Float' so there a places where you could use that instead of 'float'. In reality we're probably unlikely to change the definition of 'Float' to anything other than 'float' so I don't feel strongly about this at all.

I haven't tested the performance difference of this but it seems like a good change to make for consistencies sake anyways.
Attachment #8920372 - Flags: review?(jmuizelaar) → review+
(Reporter)

Comment 11

27 days ago
Comment on attachment 8920373 [details] [diff] [review]
patch 2 - Use a simplified FastDrawGlyph method in place of DrawOneGlyph when color glyphs and other complexities are not present

Review of attachment 8920373 [details] [diff] [review]:
-----------------------------------------------------------------

I'll try run some performance tests on this code today.

::: gfx/thebes/gfxFont.cpp
@@ +2009,5 @@
> +                                    return false;
> +                                }
> +                            } else {
> +                                gfx::Point glyphPt(*aPt);
> +                                if (aFontParams.isVerticalFont) {

We can pull this out of the loop by having a float offsetX, offsetY variables that are set before the loop.

@@ +2029,5 @@
> +        }
> +        // We can't use the simplified loop if there is inter-glyph spacing
> +        // to apply, but can still use the FastDrawGlyph method for each
> +        // individual glyph.
> +        drawGlyphFunc = &gfxFont::FastDrawGlyph;

It's not great to add an indirect call here. What if we template the code instead?
(Assignee)

Comment 12

26 days ago
Created attachment 8921513 [details] [diff] [review]
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing)

Here's a version of patch 2 that uses templated DrawGlyphs and DrawOneGlyph methods, where the compiler should be able to optimize away the extra work from the versions used for "simple" fonts (no color, no fake-bold) and/or when there is no added inter-glyph spacing. I was initially a bit hesitant to go this way due to the likely increase in compiled-code size, but it doesn't seem too bad, and helps minimize the number of tests/branches that we'll hit in the inner loop of glyph rendering.
Attachment #8921513 - Flags: review?(jmuizelaar)
(Assignee)

Updated

26 days ago
Attachment #8920373 - Attachment is obsolete: true
Attachment #8920373 - Flags: review?(jmuizelaar)
(Assignee)

Updated

26 days ago
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
(Assignee)

Comment 13

26 days ago
Created attachment 8921515 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended

Rebased for the updated version of patch 2.
Attachment #8921515 - Flags: review?(jmuizelaar)
(Assignee)

Updated

26 days ago
Attachment #8921048 - Attachment is obsolete: true
Attachment #8921048 - Flags: review?(jmuizelaar)
(Reporter)

Comment 14

26 days ago
I wasn't able to apply these patches what revision are they against?
Flags: needinfo?(jmuizelaar)
(Assignee)

Comment 15

26 days ago
Created attachment 8921633 [details] [diff] [review]
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text

Rebased to current mozilla-central tip (967c95cee709).
(Assignee)

Updated

26 days ago
Attachment #8920372 - Attachment is obsolete: true
(Assignee)

Comment 16

26 days ago
Created attachment 8921634 [details] [diff] [review]
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing)
Attachment #8921634 - Flags: review?(jmuizelaar)
(Assignee)

Updated

26 days ago
Attachment #8921513 - Attachment is obsolete: true
Attachment #8921513 - Flags: review?(jmuizelaar)
(Assignee)

Comment 17

26 days ago
Created attachment 8921635 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended
Attachment #8921635 - Flags: review?(jmuizelaar)
(Assignee)

Updated

26 days ago
Attachment #8921515 - Attachment is obsolete: true
Attachment #8921515 - Flags: review?(jmuizelaar)
(Reporter)

Comment 18

26 days ago
Comment on attachment 8921513 [details] [diff] [review]
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing)

Review of attachment 8921513 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +1877,3 @@
>      for (uint32_t i = 0; i < aCount; ++i, ++glyphData) {
>          if (glyphData->IsSimpleGlyph()) {
> +            float advance = glyphData->GetSimpleAdvance();

I didn't realize that the this logic was as complicated as it was:
if (runParams.isRTL) {	
            aPt->x -= aAdvance;	
            glyphX = aPt->x;	
} else {	
            glyphX = aPt->x;	
            aPt->x += aAdvance;	
}

Just doing it inline with a well predicted branch is probably going to be cheaper than doing the two multiplies.

@@ +1898,4 @@
>                          }
>                      } else {
> +                        gfx::Point glyphPt(*aPt);
> +                        glyphPt.x += hOffsetFactor * details->mXOffset +

I guess the same branch vs. multiply situation exists here. Would it be possible to do the swap of details->mYOffset, details->mXOffset when we do the shaping instead of inside this loop?
Attachment #8921513 - Attachment is obsolete: false
(Reporter)

Comment 19

26 days ago
Comment on attachment 8921635 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended

Review of attachment 8921635 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +1615,5 @@
> +    // Ensure the buffer has enough space for aGlyphCount glyphs to be added.
> +    // This MUST be called before OutputGlyph is used to actually store glyph
> +    // records in the buffer. It may be called repeated to add further capacity
> +    // in case we don't know up-front exactly what will be needed.
> +    void AddCapacity(uint32_t aGlyphCount)

At this point is there any reason not to just use an AutoTArray instead of homegrowing it?
(Assignee)

Comment 20

26 days ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> At this point is there any reason not to just use an AutoTArray instead of
> homegrowing it?

I did that initially, and it seemed fine; but then I realized it'll fail in a rare edge case.

If we use an AutoTArray, we'll want to avoid using SetLength or AppendElements to allocate the space we actually need, because those methods will default-construct the new elements; so instead, the idea was to use SetCapacity to reserve the necessary buffer space without default-constructing all the Glyph records. Then we can get the Elements() pointer to the buffer and write our glyph data directly there. And when we find that we need additional space because a single CompressedGlyph in the textrun actually points to a list of multiple glyphs, we use SetCapacity again to enlarge the array's buffer.

This buffer growth would work fine as long as it still fits within the array's inline buffer (we just use a bit more of it), or if the buffer was already heap-allocated (in which case AutoTArray will use realloc(), which preserves the contents). But the problem will arise when the initial capacity we allocate (based on the length of the run) fits within the array's inline buffer, but subsequently we need to extend the buffer, and it exceeds the inline buffer and needs to reallocate to the heap. Because we haven't "officially" filled the array (with SetLength or AppendElements), its mLength is still zero; as far as it's concerned the buffer is just uninitialized space, and when it switches to use a malloc'd buffer it won't copy the existing contents.
(Assignee)

Comment 21

26 days ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #18)
> Would it be
> possible to do the swap of details->mYOffset, details->mXOffset when we do
> the shaping instead of inside this loop?

I've wondered if we should do that, but I'd prefer to tackle it as a separate bug; we'll have to make sure we handle it consistently throughout shaping/measurement/drawing, so it's not as localized as the rest of the optimization here.
(Assignee)

Comment 22

25 days ago
Created attachment 8921970 [details] [diff] [review]
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing)

OK, I switched this back to if-branches instead of paired multiplications; as you note, they should get predicted with pretty high accuracy, so probably come fairly cheap.
Attachment #8921970 - Flags: review?(jmuizelaar)
(Assignee)

Updated

25 days ago
Attachment #8921513 - Attachment is obsolete: true
(Assignee)

Updated

25 days ago
Attachment #8921634 - Attachment is obsolete: true
Attachment #8921634 - Flags: review?(jmuizelaar)
(Assignee)

Comment 23

25 days ago
Created attachment 8921973 [details] [diff] [review]
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended

Rebased for updated version of patch 2. I also did a little instrumentation, and it looks like the 3 patches here altogether give me around a 12-15% speedup for DrawGlyphs, on my macOS opt build. We may be able to get a little bit more by swapping the mXOffset/mYOffset fields of DetailedGlyph when shaping with vertical fonts, to avoid that test/branch in DrawOneGlyph, but OTOH that's only relevant when DetailedGlyph records are present, so it won't have any effect for the vast majority of simple text content.
Attachment #8921973 - Flags: review?(jmuizelaar)
(Assignee)

Updated

25 days ago
Attachment #8921635 - Attachment is obsolete: true
Attachment #8921635 - Flags: review?(jmuizelaar)
(Reporter)

Comment 24

25 days ago
(In reply to Jonathan Kew (:jfkthame) from comment #20)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> > At this point is there any reason not to just use an AutoTArray instead of
> > homegrowing it?
> 
> I did that initially, and it seemed fine; but then I realized it'll fail in
> a rare edge case.
> 
> If we use an AutoTArray, we'll want to avoid using SetLength or
> AppendElements to allocate the space we actually need, because those methods
> will default-construct the new elements; so instead, the idea was to use
> SetCapacity to reserve the necessary buffer space without
> default-constructing all the Glyph records. Then we can get the Elements()
> pointer to the buffer and write our glyph data directly there. And when we
> find that we need additional space because a single CompressedGlyph in the
> textrun actually points to a list of multiple glyphs, we use SetCapacity
> again to enlarge the array's buffer.
> 
> This buffer growth would work fine as long as it still fits within the
> array's inline buffer (we just use a bit more of it), or if the buffer was
> already heap-allocated (in which case AutoTArray will use realloc(), which
> preserves the contents). But the problem will arise when the initial
> capacity we allocate (based on the length of the run) fits within the
> array's inline buffer, but subsequently we need to extend the buffer, and it
> exceeds the inline buffer and needs to reallocate to the heap. Because we
> haven't "officially" filled the array (with SetLength or AppendElements),
> its mLength is still zero; as far as it's concerned the buffer is just
> uninitialized space, and when it switches to use a malloc'd buffer it won't
> copy the existing contents.

Ok. That makes sense. Can you add some of this information as comments?
(Reporter)

Updated

25 days ago
Attachment #8921970 - Flags: review?(jmuizelaar) → review+
(Reporter)

Updated

25 days ago
Attachment #8921973 - Flags: review?(jmuizelaar) → review+

Comment 25

25 days ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1f4b0f6182e
patch 1 - Use single-precision floats more consistently for glyph position computations when drawing text. r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb1fc33b95a4
patch 2 - Create templated gfxFont::DrawGlyphs and DrawOneGlyph methods to allow parts of their complexity to be optimized away at compile time for the most common cases (no color glyphs, no inter-glyph spacing). r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1c3d56beb1c
patch 3 - Take GlyphBufferAzure::Flush() out of the inner loop of DrawGlyphs by pre-allocating capacity for a run of glyphs to be appended. r=jrmuizel

Comment 26

25 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d1f4b0f6182e
https://hg.mozilla.org/mozilla-central/rev/fb1fc33b95a4
https://hg.mozilla.org/mozilla-central/rev/d1c3d56beb1c
Status: ASSIGNED → RESOLVED
Last Resolved: 25 days ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
(Reporter)

Comment 27

24 days ago
There's still some float/double conversion happening here:
#define ToDeviceUnits(aAppUnits, aDevUnitsPerAppUnit) \
    (double(aAppUnits)*double(aDevUnitsPerAppUnit))
(Assignee)

Comment 28

24 days ago
Created attachment 8922423 [details] [diff] [review]
followup - Change a few more doubles to floats in the gfxFont drawing code paths

Indeed, there are doubles hiding in there. Also a couple of fields in the parameter structs that we should change to floats, so they don't keep polluting our drawing calculations with unnecessary doubleness. Here's a simple followup to handle these.
Attachment #8922423 - Flags: review?(jmuizelaar)
(Reporter)

Comment 29

24 days ago
Comment on attachment 8922423 [details] [diff] [review]
followup - Change a few more doubles to floats in the gfxFont drawing code paths

Review of attachment 8922423 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +1872,5 @@
>          // short-circuit the most common case to avoid sqrt() and division
>          return 1.0;
>      }
>  
> +    gfx::Float m = sqrt(t.width * t.width + t.height * t.height);

sqrtf?
Attachment #8922423 - Flags: review?(jmuizelaar) → review+
(Assignee)

Comment 30

24 days ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> > +    gfx::Float m = sqrt(t.width * t.width + t.height * t.height);
> 
> sqrtf?

Good point, will fix.

Comment 31

24 days ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0998d26202bd
followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
Backed out for reftest failures e.g. layout/reftests/svg/text-gradient-01.svg and layout/reftests/writing-mode/1193519-sideways-lr-decoration-1.html:

https://hg.mozilla.org/integration/mozilla-inbound/rev/a0871b1ac380a3a3f39ad1484e2e7ce75e965453

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0998d26202bdeb091a3405db281cdf6e43d14d05&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=139987022&repo=mozilla-inbound

Push which ran more failing jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3705f52f723481aa8f4b2cf781b539f6a48fc105&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=retry&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception
Flags: needinfo?(jfkthame)
(Assignee)

Comment 33

24 days ago
Oops. :( So we finally reached a point where reducing the precision affected rendering - in particular, there are some glyphs shifted by a pixel in transformed elements. OK, I'll check with tryserver to see if we can be more selective here.
Flags: needinfo?(jfkthame)

Comment 34

23 days ago
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f25bb9912fdf
followup - Change a few more doubles to floats in the gfxFont drawing code paths. r=jrmuizel
(Assignee)

Comment 35

23 days ago
Re-landed the followup without the changes that appear to affect glyph spacing (so we still use double-precision during conversions between appUnits and device pixels). Maybe we can get rid of more of that later, but it'll require more careful attention to rounding, or something.
https://hg.mozilla.org/mozilla-central/rev/f25bb9912fdf

Comment 37

21 days ago
16 failures in 912 pushes (0.018 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 16

Platform breakdown:
* linux64: 5
* linux32: 3
* osx-10-10: 2
* linux64-stylo-disabled: 2
* linux64-qr: 2
* android-4-3-armv7-api16: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1408612&startday=2017-10-23&endday=2017-10-29&tree=all
You need to log in before you can comment on or make changes to this bug.