Closed Bug 416725 Opened 16 years ago Closed 16 years ago

font selection for numerals affected by characters of different language elsewhere on the page

Categories

(Core :: Graphics, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: karlt, Assigned: karlt)

References

Details

(Keywords: perf, regression, testcase)

Attachments

(10 files, 6 obsolete files)

475 bytes, text/html
Details
2.08 KB, image/png
Details
571 bytes, text/html
Details
2.73 KB, image/png
Details
569 bytes, text/html
Details
156.32 KB, image/png
Details
170.76 KB, image/png
Details
3.60 KB, patch
Details | Diff | Splinter Review
153.06 KB, image/png
Details
14.83 KB, patch
pavlov
: review+
pavlov
: superreview+
pavlov
: approval1.9+
Details | Diff | Splinter Review
Attached file testcase
In the attached testcase the two numbers are rendered with different fonts due to the Japanese characters in the page.

pango_itemize selects fonts based on the script/language of the character, but for common characters (belonging to more than one script) the language/font is selected based on surrounding characters, which may be in different words.  This is (debatably) helpful behavior when there is text from more than one script in the page.

However, the TextRunWordCache constructs pseudo-sentences of words which have not already been cached, so the text that pango_itemize gets can be very different from the original sentence.
Attached image snapshot of testcase
Suggesting P2 (even though this situation may not happen too often)
because a fix for this will fix bug 416475.
Blocks: 416475
Flags: blocking1.9?
Priority: -- → P2
This can be fixed simply by ensuring that pango_itemize receives only individual words if it is TextRuns for individual words that are wanted.

The "right" place to fix this would seem to be TextRunWordCache::MakeTextRun as that is what is constructing pseudo-sentences of unrelated words.

Stuart and John, would you expect calling gfxFontGroup::MakeTextRun with individual words instead of a space-separated list to adversely affect performance on MS and Mac.
Rob, I see you went to quite some effort to delay generating word entries so that they could all be generated from one string.  Sorry to suggest removing this.

I'll do a bit of performance testing.
(fixes up placement of preprocessor conditional)

single run Talos Tp Results on Linux
 without patch: 149.1, 149.6
   with patch:  148.1, 150.0
Attachment #302703 - Attachment is obsolete: true
Attachment #303015 - Flags: review?(roc)
Try a page that's just a big page of Unicode (say Chinese) text?

This patch affects all platforms so it might be a performance issue on some other platform. But I'm OK with the concept, we might just have to check it in and see what Talos says officially.
Maybe not Chinese. Maybe just English text with spaces?
Flags: blocking1.9? → blocking1.9+
Comment on attachment 303015 [details] [diff] [review]
gfxFontGroup::MakeTextRun with individual words v1.1

Why are you pulling tempString into the loop? More efficient to leave it outside the loop and just clear it at the end of each iteration.

Other than that it looks fine.
Attachment #303015 - Flags: superreview+
Attachment #303015 - Flags: review?(roc)
Attachment #303015 - Flags: review+
Comment on attachment 303015 [details] [diff] [review]
gfxFontGroup::MakeTextRun with individual words v1.1

checked this in
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1203284400&maxdate=1203284830&who=karlt%2B%25karlt.net

but backed out due to performance issues on MS Windows Tp 440 -> 465.
Talos machines didn't notice any difference but Tp2 did: 355 -> 375.
Attached file testcase with brackets
The word cache is also interfering with Pango's algorithm for matching fonts of paired characters.  The word cache can cause the opening and closing characters to no longer balance.
This won't be fixed by ensuring that pango_itemize receives only
individual words.
I think the best solution is to ensure that pango_itemize uses the same font for the first font in the fontset irrespective of the language of the characters (as was the case before changes in bug 401988).
Blocks: 401988
Implements the solution suggested in comment 13.

This gives about a 5% improvement in local talos page load time.
With browser.display.auto_quality_min_font_size = 0,
the improvement is about 10%.

The difference between auto_quality_min_font_size = 20 and = 0 drops from about 8% to 3%.
Attachment #303015 - Attachment is obsolete: true
Attachment #304657 - Flags: review?(roc)
For the patch to perform completely correctly we need to fix a separate issue (with similar symptoms) in that generic fonts [for GetFontAt(0)->GetPangoFont()] are being selected based on the first language used rather than the current DOM language.
Keywords: perf
The problem is that the gfxPangoFontCache keys fonts against PangoFontDescription but the selection also depends on PangoLanguage.

Including the language in the key increases the number of calls to pango_context_load_font() during a talos pageset run from 2132 to 2375.
This seems to make the page load metric 1% worse. 

The gfxPangoFontCache also has the issue that a hash is used as a key and there is no check that even the PangoFontDescription matches.

I'm tempted to remove this cache and increase the expiry time on the gfxFontCache, which serves a similar purpose, and expiry seems nicer than filling up 5000 entries then clearing them all.  However, even with a very large expiry time on the gfxFontCache, this increases the number of calls to pango_context_load_font() from 2375 to 2616.  The performance affect from this actually seems less noticeable perhaps due to the PangoFcFontMap pattern cache, but there may be performance effect.

The difference may be due to different weights in gfxFontStyles mapping to the same weight in the font description, or perhaps the double size in gfxFontStyle mapping to a fixed point size (with a scale factor of 1000).  There are no PangoFontDescriptions that share the same PangoFontDescription hash.
Comment on attachment 304657 [details] [diff] [review]
PangoFontMap that provides the same primary font irrespective of script

Looks OK to me but behdad really should review this I'd think.

+NewGfxPangoFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont);
+
+void
+SetBaseFont(gfxPangoFontMap *aFontMap, PangoFont *aBaseFont);

For namespace purity, should make these static methods of class gfxPangoFontMap
Attachment #304657 - Flags: review?(roc) → superreview+
Comment on attachment 304657 [details] [diff] [review]
PangoFontMap that provides the same primary font irrespective of script

Would you be able to look over this, please, Behdad?

Comment 0 and comment 11 explain why there is a problem, and comment 12 to 14
are relevant.  Our word cache assumes that the rendering of each word
doesn't depend on surrounding context.  "words" here means groups of
characters separated by spaces.

This patch provides/causes consistent font selection for script-common
characters (assuming that the base font supports the common characters).

This behavior is more in line with the "UA-dependent default 'font-family'"
in Step 5 of the CSS font matching algorithm, which Mozilla should follow.
http://www.w3.org/TR/CSS21/fonts.html#algorithm
and with the (implied) concept of a generic family being a (fixed)
chosen real family.
http://www.w3.org/TR/CSS21/fonts.html#generic-font-families

Authors will need to use the "lang" attribute if they want to indicate a
change in the generic/default font for the same character.

(The shape engine selection may still be corrupted by the adjacent unrelated
words.  If this is a problem then we'll also need to have a separate call to
pango_itemize for each word [or group of words in the same script].)
Attachment #304657 - Flags: review?(mozilla)
(In reply to comment #14)
> Created an attachment (id=304657) [details]
> PangoFontMap that provides the same primary font irrespective of script
> 
> Implements the solution suggested in comment 13.
> 
> This gives about a 5% improvement in local talos page load time.
> With browser.display.auto_quality_min_font_size = 0,
> the improvement is about 10%.
> 
> The difference between auto_quality_min_font_size = 20 and = 0 drops from about
> 8% to 3%.

I tried this patch.
It is not good for me.
Two or more fonts are applied for alphabets and numerical characters.
I will attach screenshots.
This is the screen shot with current trunk build
with the patch for http://hidenosuke.org/antenna/ .
It is not good.
This is the screen shot with 080212(JST) build
for http://hidenosuke.org/antenna/ .
It is good for me.
(In reply to comment #19)
> I tried this patch.
> It is not good for me.

Thank you for testing.  I now notice that I failed to attach the necessary changes to the Makefile to compile the new files.  I would have expected unresolved symbols without these.

(In reply to comment #20)
> This is the screen shot with current trunk build
> with the patch for http://hidenosuke.org/antenna/ .

I'll attach a new patch.  I can't reproduce the behavior with this patch (but I do see that behavior without the patch).
Included Makefile.in changes.

Also keeping Pango's behavior for system fonts, because system fonts should mimic what the system does.
There wouldn't be much difference if the language of the style for the UI were the language of the selected localization (i.e. the strings).  However, it seems
that the language of the style for the UI comes from the locale, so, for
non-localized apps, the language is not representative of the English strings, and the generic fonts for the language are different.

(In reply to comment #17)

> +NewGfxPangoFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont);
> +
> +void
> +SetBaseFont(gfxPangoFontMap *aFontMap, PangoFont *aBaseFont);
> 
> For namespace purity, should make these static methods of class
> gfxPangoFontMap

Defined struct gfxPangoFontMap in the header so that these methods can be

static PangoFontMap *
gfxPangoFontMap::NewFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont);

void
gfxPangoFontMap::SetBaseFont(PangoFont *aBaseFont);
Attachment #304657 - Attachment is obsolete: true
Attachment #305296 - Flags: review?(mozilla)
Attachment #304657 - Flags: review?(mozilla)
This is useful for testing but not ready to check in.
It hacks around the language issue mentioned in comment 15 and 16.
corrected imbalanced parentheses
Attachment #305297 - Attachment is obsolete: true
I tried new patch(305296).
This is a screen shot for the build with new patch
for http://hidenosuke.org/antenna/ .
This is better than previous patch.

By the way, I think only Japanese fonts are used for all characters.
(Please look at attachment 305161 [details].)
Is this intended?
(In reply to comment #26)
> By the way, I think only Japanese fonts are used for all characters.

It is expected.  It is not always possible to determine language
from the characters, and Mozilla's word cache means only the context of the
word is available, so any attempt to determine the language of the words
independently is likely to result in inconsistent looking text.

What this patch does is use the language specified in the document to select
the (primary) font.  The document at http://www.hidenosuke.org/antenna
specifies lang=ja_JP and font-family:monospace.  With default Mozilla preferences, this CSS generic family maps to the system/fontconfig generic pseudo-family also called "monospace".  For lang=ja, fontconfig provides a Japanese font for the generic pseudo-family.

> (Please look at attachment 305161 [details].)

The behavior of 081202 used no language in resolving the fontconfig generic,
and so Western fonts tended to be selected as primary fonts.  I can see that
this produced a pleasant result for this web page (and probably many others).
This however is not the best behavior for Hebrew or Arabic text, and it can
mean that a Chinese font may be used to render Han characters in a Japanese
document.

Different people will have different opinions on what the best behavior is for
a mix of Latin and Japanese characters.  If there were a Japanese font that
provided good glyphs for both Latin and Japanese characters, then I think that
would be the best font to use as the glyphs would be designed to be used
together.  This is particularly important for monospace text as a mix of fonts
would provide a mix of widths (though I see that the Japanese characters are about twice the width of Latin characters with the ja monospace font from fontconfig here).

However, some fonts designed for Western characters render Latin characters
better than some (many?) Japanese fonts.  It is possible to have Latin characters in Japanese documents rendered with Western fonts, either

* by the author using the "lang=en" attribute on elements that contain (only!)
  Latin content, or

* by the user setting up font preferences for the Japanese generic fonts in
  such a way that a Western font is tried first.  The Western font will then
  be used for the characters that it supports but (provided the Western font
  doesn't support Japanese characters) a different font will be used for
  Japansese characters.

The user can set up font preferences either in Mozilla (as was done with
attachment #XXXXXXXX, and I expect that is what Bill's extension in Bug XXXXX
Comment XXXXX does) or with fontconfig.  Here's an example fontconfig
element that could be in ~/.fonts.conf:

 <match target="pattern" >
  <test name="lang" >
   <string>ja</string>
  </test>
  <test name="family" >
   <string>sans-serif</string>
  </test>
  <edit mode="prepend" binding="strong" name="family" >
   <string>DejaVu Sans</string>
  </edit>
  <edit mode="prepend" binding="weak" name="family" >
   <string>Sazanami Gothic</string>
  </edit>
 </match>

A similar element could be added for family=serif, but I wouldn't recommend
doing this for monospace.
If the author specifies a Western font first in CSS 'font-family', that works too, right? A lot of pages do that.

Maybe we should specify some Western fonts first in our prefs for ja?
(In reply to comment #27)
> The user can set up font preferences either in Mozilla (as was done with
> attachment #XXXXXXXX, and I expect that is what Bill's extension in Bug XXXXX
> Comment XXXXX does)

attachment 299736 [details] [diff] [review] and bug 401988 comment 59, that is!
(In reply to comment #28)
> If the author specifies a Western font first in CSS 'font-family', that works
> too, right? A lot of pages do that.

Yes, provided the fonts exist on the platform.  fontconfig <alias> elements have "weak" binding, so the language overrides the alias.

> Maybe we should specify some Western fonts first in our prefs for ja?

Probably, but it caused a performance hit (bug 416068) and the name-list preferences are not very visible so can be confusing.
The performance hit would be less now though with attachment 302746 [details] [diff] [review]
(bug 414692).
Implemented gfx_pango_fontset_get_font() and corrected some indentation.

The only time a PangoContext is being passed to a PangoFontMap of different type from get_fontmap(context) is pango_font_map_load_fontset(fontmap, context,...) (where the fontmap is explicitly passed), and the only assumptions I've found that get_fontmap(context) returns a fontmap of typeof(self) are in render_layout functions.

But if you'd like to have get_fontmap(context) in fontmaps always return a fontmap of typeof(self) then the fontmap can be set temporarily on the context.  (I can't see a good way to copy a PangoContext.)

The reasoning behind this patch is that a word is not enough context to reliably and consistently detect language.  So the document/element language is used to select the primary font and the word language is used if that font doesn't support the chars.

This is pretty consistent with HTML specification that implies that the lang attribute should be applied to most of the content but individual "characters that are atypical for a particular language" are still rendered with a best attempt.  IIUC it is also what we do on other platforms.

http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1
Attachment #305296 - Attachment is obsolete: true
Attachment #305686 - Flags: review?(mozilla)
Attachment #305296 - Flags: review?(mozilla)
+                if (childLevel > baseLevel || !self->mBaseFont)

The !self->mBaseFont should not (need not) be here.
I recommend that this should be fixed for 1.9 so that we don't have behavior as in attachment 305160 [details] (where the date and time should be one monospace font).
Target Milestone: --- → mozilla1.9
Flags: tracking1.9+ → blocking1.9+
Karl, you should ping behdad on irc in case he's not seeing the review request.
(In reply to comment #31)
Hi Karl,

Waht's going on?
attachment 305686 [details] [diff] [review] works fine as far as I tested.
Attachment #305686 - Attachment description: PangoFontMap that provides the same primary font irrespective of script v1.2 → PangoFontMap that provides the same primary font from document language rather than adjacent characters v1.2
(In reply to comment #35)
Thanks for testing.  The patch needs review.
Behdad would be the best person to review, but he is also a busy person.

Behdad:
Mozilla's situation is slightly different from normal Pango use for two reasons:
1. We only have one word of context available to determine the script.
2. HTML provides for specifying the language of the text, so we should use that
   when it makes sense, even for common characters.
Attachment #305686 - Flags: review?(mozilla) → review?(pavlov)
Comment on attachment 305686 [details] [diff] [review]
PangoFontMap that provides the same primary font from document language rather than adjacent characters v1.2

I would suggest doing thist without adding new files.

Besides the first font for the language, it doesn't seem like this addresses using different fonts if the "base" font doesn't work?  I have some thoughts on that, but maybe this is enough for now?
This isn't something we'd stop the release for.  We'll take a good patch.  Moving to wanted1.9.0.x+.
Flags: wanted1.9.0.x+
Flags: blocking1.9-
Flags: blocking1.9+
Personally I feel we should make sure this is fixed for the release, but I won't renom as I have no new information beyond comment 33.
Keywords: regression
(In reply to comment #37)
> I would suggest doing this without adding new files.

Moved to single file.

> Besides the first font for the language, it doesn't seem like this
> addresses using different fonts if the "base" font doesn't work?

Yes, if the prefs for the document language specify only a fontconfig generic
name, and a common character near a character from a script of different
language cannot be represented using the base font for the doc language, then
that different language sometimes get used to choose the font.
I doubt that happens so often.  This patch should cover most cases.
Attachment #305686 - Attachment is obsolete: true
Attachment #314516 - Flags: review?(pavlov)
Attachment #305686 - Flags: review?(pavlov)
Attachment #314516 - Flags: superreview+
Attachment #314516 - Flags: review?(pavlov)
Attachment #314516 - Flags: review+
Attachment #314516 - Flags: approval1.9+
This may have been responsible for a 16-17% reduction in Tp.
(In reply to comment #42)
> This may have been responsible for a 16-17% reduction in Tp.
> 

Have you confirmed this?  Have a link to talos?  That's a huge hit that can't go unchecked.
That's not a hit, it's an improvement :-).
It is sort of visible here but the times plotted are up to 2 hours after
the build ids.

http://graphs.mozilla.org/#spst=range&spss=1207712982.4914286&spse=1207734847.627143&spstart=1198095381&spend=1208548750&bpst=cursor&bpstart=1207712982.4914286&bpend=1207734847.627143&m1tid=61148&m1bl=0&m1avg=0&m2tid=61164&m2bl=0&m2avg=0&m3tid=61214&m3bl=0&m3avg=0&m4tid=146480&m4bl=0&m4avg=0

The buildids in the csv data are not so helpful either

http://graphs.mozilla.org/dumpdata.cgi?setid=61148&setid=61164&setid=61214&setid=146480

"61148","qm-mini-ubuntu01","1.9","tp_loadtime_avg","0"
"61148","1207722738","625.911","200804082258","None"
"61148","1207727648","533.444","200804090020","None"

"61164","qm-mini-ubuntu02","1.9","tp_loadtime_avg","0"
"61164","1207721355","623.73","200804082234","None"
"61164","1207726249","527.273","200804090007","None"
"61164","1207730879","527.719","200804090119","None"

"61214","qm-mini-ubuntu05","1.9","tp_loadtime_avg","0"
"61214","1207725989" (Wed Apr  9 00:26:29 2008),"629.122","200804082314","None"
"61214","1207730929","528.941","200804090034","None"
"61214","1207735498","529.459","200804090146","None"

"146480","qm-mini-ubuntu03","1.9","tp_loadtime_avg","0"
"146480","1207722494","530.398","200804082150","None"
"146480","1207726869","437.033","200804082258","None"
"146480","1207730924","430.444","200804090020","None"

As they imply a non-existent range:

<200804082258
>200804082314

But analysing what was reported on Tinderbox

http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox&maxdate=1207729498&legend=0&norules=1

gives this range:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1207723740&maxdate=1207724819

which would indicate attachment 314516 [details] [diff] [review] (as expected).
Flags: wanted1.9.0.x+
Depends on: 460717
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: