Text Zooming 4x slower after landing bug 362682

RESOLVED FIXED in mozilla1.9.1b1

Status

()

Core
Graphics
P3
normal
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: romaxa, Assigned: karlt)

Tracking

(Blocks: 1 bug, {perf, regression})

Trunk
mozilla1.9.1b1
x86
Linux
perf, regression
Points:
---
Dependency tree / graph
Bug Flags:
wanted-next +
blocking1.9 -

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments, 3 obsolete attachments)

(Reporter)

Description

10 years ago
Created attachment 288486 [details] [diff] [review]
Patch backout of bug 362682.

Ibm T43, No HW Video Acceleration, 800MHz CPU.

Take "bug 362682 patch" updated to trunk in attachments.
Build latest trunk seamonkey with and without 362682 fix

Time for zooming on wikipedia.org page (single press Ctrl+"+"):
Without  bug 362682  fix   : ~3 sec.
With bug 362682 fix (Trunk): ~12 sec.

See profile data in attachments.
(Reporter)

Comment 1

10 years ago
Created attachment 288487 [details]
Profile data for current trunk build (12 sec)
(Reporter)

Comment 2

10 years ago
Created attachment 288488 [details]
Profile data for patched trunk build (~3 sec)

Profile data for disabled bug 362682 fix.
(Reporter)

Updated

10 years ago
Keywords: regression

Comment 3

10 years ago
Comment on attachment 288486 [details] [diff] [review]
Patch backout of bug 362682.

We aren't backing out pango or fixes to make it work.
Attachment #288486 - Attachment is obsolete: true
(Reporter)

Comment 4

10 years ago
?? I'm not asking to backout pango or any other fixes... this patch just for building and comparing performance...


We're spending at least 4x as long as we used to in FcFontSetSortDestroy.  Moving up the call tree, we're spending 3x as long in pango_fc_font_map_get_type, which is the main place where time is spent, actually (about 50% of the pageload).  Further up the tree, we're getting here from pango_itemize_with_base_dir.

It looks like we have about 40% more calls into pango_itemize_with_base_dir
from gfxPangoFontGroup::CreateGlyphRunsItemizing.  The actualy time under gfxPangoFontGroup::CreateGlyphRunsItemizing is about 2.5x as much, largely due to the added gfxPangoFont::GetMetrics calls.

We also have a new caller of pango_itemize_with_base_dir: pango_layout_iter_get_char_extents.  This actually accounts for something like 70% of the calls to pango_itemize_with_base_dir.  This is called from pango_font_get_metrics.  In general, pango_font_get_metrics is over half the time in this profile...
Flags: blocking1.9?
Keywords: perf

Comment 6

10 years ago
behdad: Sounds like pango is doing a lot more work than our earlier code.  Any ideas here?  Have any of these things been fixed in more recent versions of pango?

Comment 7

10 years ago
1) What version of pango is it with?
2) Does using pango-1.18.3 make it faster?
3) Are there hexboxes on the page?
4) I have a hard time processing the profile data.  Can someone point out which calls in mozilla code base are causing what?
Attachment #288487 - Attachment mime type: application/octet-stream → application/x-bzip2
Attachment #288488 - Attachment mime type: application/octet-stream → application/x-bzip2
I think romaxa will have to field questions 1-3.

For question 4, we have the following:

nsThebesFontMetrics::GetMaxAscent and nsThebesFontMetrics::GetExternalLeading
   call
nsThebesFontMetrics::GetMetrics calls
gfxPangoFont::GetMetrics() calls
pango_font_get_metrics

gfxPangoFontGroup::CreateGlyphRunsItemizing calls
gfxPangoFont::GetMetrics calls
pango_font_get_metrics

gfxPangoFontGroup::CreateGlyphRunsItemizing calls
pango_itemize_with_base_dir

pango_font_get_metrics largely ends up in pango_itemize_with_base_dir as well.

As I said in comment 5, we seem to be spending a lot of time destroying stuff in FcFontSetSort.  About 30% of the non-idle time in this profile, in fact.

In general, here's how the profile data works:

              43969 gfxPangoFont::GetMetrics()
200725   0    43969 pango_font_get_metrics
              43968 pango_cairo_font_get_type
                  1 pango_font_get_font_map

The 200725 is a sequence number and can be ignored.  For the rest:
0 is the number of ticks for which we were in the pango_font_get_metrics function itself.  The 43969 next to it is the number of ticks for which we were under the function.  43968 of those we were calling pango_cairo_font_get_type from pango_font_get_metrics.  The remaining 1, we were calling pango_font_get_font_map.

The only caller of pango_font_get_metrics is gfxPangoFont::GetMetrics, so all  43969 ticks involved we were coming from that.  In general, there would be multiple callers, each with its own fraction of the total calls.

Comment 9

10 years ago
Ok, this all means we are not correctly reusing fonts and creating new ones again and again.  No idea why.  It may be because of:


        /* look up the gfxPangoFont from the PangoFont */
        // XXX we need a function to do this.. until then do this
        // behdad: use g_object_[sg]et_qdata() for it.
        PangoFontDescription *d = pango_font_describe(item->analysis.font);
        nsRefPtr<gfxPangoFont> font = GetOrMakeFont(NS_ConvertUTF8toUTF16(pango_font_description_get_family(d)), GetStyle());


Can someone put a couple of print statements to make sure we are getting back the same PangoFont attached to gfxPangoFont?
(Assignee)

Comment 10

10 years ago
If that's the issue then this depends on bug 401988.

Does the zooming get faster, when repeating the zoom (back to a previously used zoom level)?
Depends on: 401988
I tossed in:

        NS_ASSERTION(item->analysis.font == font->GetPangoFont(),
                     "Error, die");

right after that code, and I don't see it fire when starting the browser and loading a few pages.  I assume that's what you meant?  Note that GetPangoFont() might do some font-realizing in the background...

romaxa, can you check on your end just to make sure?
No longer depends on: 401988
Oh, the assert doesn't fire when zooming either.

Comment 13

10 years ago
Ok, thanks.  I'll check it out tomorrow.
(Reporter)

Comment 14

10 years ago
I have debian Unstable,
libpango1.0-0  -  1.18.3-1

Stuart, +'ing this and setting to P3.  Please adjust the priority as needed.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3

Updated

10 years ago
Assignee: nobody → karlt

Updated

10 years ago
Assignee: karlt → mozbugz
(Assignee)

Comment 16

10 years ago
Attachment 298411 [details] [diff] (bug 401988) and attachment 302933 [details] [diff] [review] (bug 414692) may have
changed this some, but FcFontSort is still being called about twice as often
as we'd like.

For each family/style combination (style includes size) there are usually 2
FcFontSorts happening:

1. To map the family/style to an actual font.  (This is not really avoidable
   if we want to respect fontconfig configuration.  There are caches that mean
   that this sort is usually not repeated for the same family/style input
   data.)

2. During calculation of font metrics for the actual font.

The second FcFontSort happens because pango_font_get_metrics (or more
specifically pango_fc_font_create_metrics_for_context) lays out some sample
text to get approximate character width data.  However the layout methods used
don't take a PangoFont as a parameter but a PangoFontDescription, so
pango_fc_font_create_metrics_for_context constructs a PangoFontDescription.
The layout then converts this PangoFontDescription into a PangoFont (which is
usually the same original PangoFont).  This last conversion uses another
FcFontSort because the PangoFontDescription differs from that used in step 1
and so the conversion is not cached.

The pango_context_get_metrics() API takes a PangoFontDescription instead of a
PangoFont, so this API could enable getting metrics using the same
PangoFontDescription as used in step 1 (and so the conversion to actual font
would be cached).  However, the current implementation of
pango_context_get_metrics uses pango_font_get_metrics, and so inherits the
same issue.

Possible options that I can see for solving this are:

A. In gfxPangoFont::GetMetrics, don't use pango_font_get_metrics but calculate
   the metrics from the FT_Face.

B. In pango_fc_font_create_metrics_for_context, layout the sample text using
   only the PangoFont without fallback, and only measure the widths of
   characters that are actually in the font.  Could a PangoAnalysis be
   constructed for pango_shape?  Is it expensive to shape when there might be
   missing glyphs?

C. Provide a means of telling PangoFontClass.get_metrics() not to bother
   calculating an accurate average character width (perhaps returing maximum
   advance, or width of an x), and modify pango_context_get_metrics to use
   this and layout the sample text itself.  Perhaps a NULL language on
   pango_font_get_metrics could mean this, or perhaps some Pango-internal
   language could be used.
(Assignee)

Comment 17

10 years ago
Created attachment 306358 [details] [diff] [review]
pango_fc_font_create_metrics_for_context test patch

This is not intended for check in, as it uses a private function in
libpangoft2, but it enables some experimentation.

It should be used with attachment 305686 [details] [diff] [review] and attachment 305300 [details] [diff] [review] (bug 416725).

What this does is provide the layout functions with a PangoFontMap that
usually returns the PangoFont we are interested in (and only does an FcFontSort
when the PangoFont doesn't support the characters in the sample string).

With this patch, the number of FcFontSorts used to start up, load
http://en.wikipedia.org/wiki/Main_Page, and shut down, was reduced
from 64 to 34.

This gives a 10% improvement in talos page set load time with local tests.

Updated

10 years ago
Flags: wanted-next+
Flags: tracking1.9+
Flags: blocking1.9-

Comment 18

9 years ago
Without reading the patch, replying just to your comment:

(In reply to comment #16)
> Attachment 298411 [details] [diff] (bug 401988) and attachment 302933 [details] [diff] [review] (bug 414692) may have
> changed this some, but FcFontSort is still being called about twice as often
> as we'd like.
> 
> For each family/style combination (style includes size) there are usually 2
> FcFontSorts happening:
> 
> 1. To map the family/style to an actual font.  (This is not really avoidable
>    if we want to respect fontconfig configuration.  There are caches that mean
>    that this sort is usually not repeated for the same family/style input
>    data.)

If your code is calling this, how about calling FcFontMatch instead?

> 2. During calculation of font metrics for the actual font.
> 
> The second FcFontSort happens because pango_font_get_metrics (or more
> specifically pango_fc_font_create_metrics_for_context) lays out some sample
> text to get approximate character width data.  However the layout methods used
> don't take a PangoFont as a parameter but a PangoFontDescription, so
> pango_fc_font_create_metrics_for_context constructs a PangoFontDescription.
> The layout then converts this PangoFontDescription into a PangoFont (which is
> usually the same original PangoFont).  This last conversion uses another
> FcFontSort because the PangoFontDescription differs from that used in step 1
> and so the conversion is not cached.

So obviously the question is, why is it using a different font description?  Couldn't find the answer in your comment.

> The pango_context_get_metrics() API takes a PangoFontDescription instead of a
> PangoFont, so this API could enable getting metrics using the same
> PangoFontDescription as used in step 1 (and so the conversion to actual font
> would be cached).  However, the current implementation of
> pango_context_get_metrics uses pango_font_get_metrics, and so inherits the
> same issue.
> 
> Possible options that I can see for solving this are:
> 
> A. In gfxPangoFont::GetMetrics, don't use pango_font_get_metrics but calculate
>    the metrics from the FT_Face.

That would be very unfortunate.

> B. In pango_fc_font_create_metrics_for_context, layout the sample text using
>    only the PangoFont without fallback, and only measure the widths of
>    characters that are actually in the font.  Could a PangoAnalysis be
>    constructed for pango_shape?  Is it expensive to shape when there might be
>    missing glyphs?

PangoAnalysis is not that hard to create, no.  But I think such a change would be a regression.

> C. Provide a means of telling PangoFontClass.get_metrics() not to bother
>    calculating an accurate average character width (perhaps returing maximum
>    advance, or width of an x), and modify pango_context_get_metrics to use
>    this and layout the sample text itself.  Perhaps a NULL language on
>    pango_font_get_metrics could mean this, or perhaps some Pango-internal
>    language could be used.

Humm.  Lets first see why we are calling FcFontSort with different patterns...
(Assignee)

Comment 19

9 years ago
(In reply to comment #18)
> > B. In pango_fc_font_create_metrics_for_context, layout the sample text using
> >    only the PangoFont without fallback, and only measure the widths of
> >    characters that are actually in the font.  Could a PangoAnalysis be
> >    constructed for pango_shape?  Is it expensive to shape when there might be
> >    missing glyphs?
> 
> PangoAnalysis is not that hard to create, no.  But I think such a change would
> be a regression.

pango_context_get_metrics would still be available to provide an average width of all characters in the sample string for the language.  i.e.

pango_font_get_metrics would provide metrics for the PangoFont, while
pango_context_get_metrics would provide metrics for the PangoFontDescription.
(Assignee)

Updated

9 years ago
Depends on: 449356
(Assignee)

Comment 20

9 years ago
Created attachment 333681 [details] [diff] [review]
get the metrics directly from the FT_Face

This uses some basic tests for average character width.

pango_font_get_metrics measured a sample string for the language to get an
average character width (which is what took so much time).  We could do
something similar efficiently on the gfxFontGroup, but it's only going to be
an approximation anyway, and consistency with other platforms may be a better
thing.

CSS3 seems to be going in the direction of using the width of a zero for a
character width (http://www.w3.org/TR/css3-values/#relative0) so maybe that
would be a more consistent measure for sizes of text input elements anyway.

The tryserver talos tests report only a 2% Tp improvement because talos does
multiple runs over the same pages and so the saving is effectively divided by
about 10.  There is also a 2% Ts improvement and there may be ~1% saving on
Tp_RSS.
Attachment #306358 - Attachment is obsolete: true
Attachment #333681 - Flags: review?
(Assignee)

Comment 21

9 years ago
Created attachment 333682 [details]
metrics test

I used this to test the patch, and compared results with FF 3.0.  It uses
underline, overline for maxAscent, strike through, <input size="N"> for
aveCharWidth and line height, and ch units for zero width.  It is based on a
script that John wrote to display results in every family on the system.  It
uses UniversalXPConnect, so save to file in order to run.
(Assignee)

Updated

9 years ago
Attachment #333681 - Flags: review? → review?(pavlov)

Comment 22

9 years ago
(In reply to comment #20)

> CSS3 seems to be going in the direction of using the width of a zero for a
> character width (http://www.w3.org/TR/css3-values/#relative0) so maybe that
> would be a more consistent measure for sizes of text input elements anyway.

I don't think that's correct.  The width of zero is interesting for setting width of input boxes that are expected be used to enter numbers.  That's for sure.  And pango metrics has approximate_digit_width for that reason.  Just because the CSS text says it falls back to approximate char width does not mean the reverse is true.

Comment 23

9 years ago
For example, that would awfully fail for East Asian languages (Chinese, Japanese, Korean) where all ideographs have the same width, which is definitely more than the width of ASCII zero.
(Assignee)

Updated

9 years ago
Blocks: 279032
(Assignee)

Comment 24

9 years ago
(In reply to comment #23)
Yes, I think measuring a sample string does provide a better measure than OpenType's avgCharWidth, even, and the spec admits it's problems.

http://www.microsoft.com/typography/otspec/os2.htm#acw
"The value for xAvgCharWidth is calculated by obtaining the arithmetic average
 of the width of all non-zero width glyphs in the font. Furthermore, it is
 strongly recommended that implementers do not rely on this value for computing
 layout for lines of text. Especially, for cases where complex scripts are
 used."

I'd be happy to implement sample string measuring on gfxFontGroup if this is what is best. That would also be better in that it would use the font that is likely to be used rather than just the first font.  But there isn't really a "right" answer here.

For example, Japanese users often would be entering Latin characters and often Japanese characters, but we can't know in advance which.
The Pango sample string includes a mix of Latin and Japanese, including punctuation, and so gets somewhere in between (closer to Japanese, which makes sense).  It's a better measure, but it can't be "right" in all situations.  For example, it's not ideal if the user is entering all double-width characters.

What I guess is best for input boxes is that they are large enough, rather than close to any right answer, which would require the width of the largest character that is likely appear in large numbers.  Perhaps something like a 75th percentile of a sample set maybe suitable.

But there are other issues too.  Web authors do not necessarily think about the "correct" behavior when constructing pages, but often construct what looks right.  Having the page look similar on all platforms is an advantage to weigh up in the equation.  The value of changing behavior to a better measure that is still not exactly anything in particular needs to be weighed against the cost of changing the behavior.
Any estimated width is going to be wrong in many cases. It's surely common for a Chinese user to enter all-Latin text in one textbox and all-Chinese text in another.

IMHO since we're going to be wrong a lot of the time no matter what, everyone would be better off if we're just consistent in our behaviour across platforms, and when possible, across browsers. Doing something simple and fast is a bonus.
> What I guess is best for input boxes is that they are large enough, rather than
> close to any right answer

The best is that for a fixed-width font they are exactly the "right width", to the pixel, and that for variable-width fonts they closely match IE/Windows.  Anything else for the latter will break page layouts.
(Assignee)

Comment 27

9 years ago
(In reply to comment #26)
> The best is that for a fixed-width font they are exactly the "right width", to
> the pixel,

I'm watching out for that, thanks.

> and that for variable-width fonts they closely match IE/Windows. 
> Anything else for the latter will break page layouts.

That's very difficult in cases where the fonts are different.
Of course; not much we can do about that.  But given the same font, we should be producing widths that are very close to IE/Windows.

nsTextControlFrame has some code to handle the fixed-width and variable-width cases separately, and relies on the font metrics to both tell them apart and to do the sizing in the two different cases.  I'm not sure how relevant that is to what you're doing, but reading that code might give you an idea of what it expects, at least.
(Assignee)

Comment 29

9 years ago
Created attachment 334397 [details] [diff] [review]
reftests/canvas/text-horzline-with-top.html change

Some fonts have the top of the em square very close to the top of the full
length stems.  After snapping positions to pixels and hinting the extents of
the glyphs are not exactly at expected points.  Using a larger font size makes
rounding effects less significant.
(Assignee)

Updated

9 years ago
Depends on: 385263
(Assignee)

Updated

9 years ago
No longer depends on: 449356
(Assignee)

Updated

9 years ago
Attachment #333681 - Flags: review?(pavlov)
(Assignee)

Comment 30

9 years ago
Comment on attachment 333681 [details] [diff] [review]
get the metrics directly from the FT_Face

(An updated version of) this got included in the changes for bug 385263.
Attachment #333681 - Attachment is obsolete: true
(Assignee)

Comment 31

9 years ago
Since bug 385263 landed zooming on http://www.wikipedia.org is at least twice as fast.  That together with attachment 298411 [details] [diff] [review] and attachment 302933 [details] [diff] [review] should have recovered the time lost in bug 362682.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1b1
(Assignee)

Updated

9 years ago
No longer blocks: 279032
You need to log in before you can comment on or make changes to this bug.