Closed Bug 416725 Opened 13 years ago Closed 13 years ago
font selection for numerals affected by characters of different language elsewhere on the page
475 bytes, text/html
2.08 KB, image/png
571 bytes, text/html
2.73 KB, image/png
569 bytes, text/html
156.32 KB, image/png
170.76 KB, image/png
3.60 KB, patch
|Details | Diff | Splinter Review|
153.06 KB, image/png
PangoFontMap that provides the same primary font from document language rather than adjacent characters v1.4
14.83 KB, patch
|Details | Diff | Splinter Review|
In the attached testcase the two numbers are rendered with different fonts due to the Japanese characters in the page. pango_itemize selects fonts based on the script/language of the character, but for common characters (belonging to more than one script) the language/font is selected based on surrounding characters, which may be in different words. This is (debatably) helpful behavior when there is text from more than one script in the page. However, the TextRunWordCache constructs pseudo-sentences of words which have not already been cached, so the text that pango_itemize gets can be very different from the original sentence.
Suggesting P2 (even though this situation may not happen too often) because a fix for this will fix bug 416475.
This can be fixed simply by ensuring that pango_itemize receives only individual words if it is TextRuns for individual words that are wanted. The "right" place to fix this would seem to be TextRunWordCache::MakeTextRun as that is what is constructing pseudo-sentences of unrelated words. Stuart and John, would you expect calling gfxFontGroup::MakeTextRun with individual words instead of a space-separated list to adversely affect performance on MS and Mac.
Rob, I see you went to quite some effort to delay generating word entries so that they could all be generated from one string. Sorry to suggest removing this. I'll do a bit of performance testing.
(fixes up placement of preprocessor conditional) single run Talos Tp Results on Linux without patch: 149.1, 149.6 with patch: 148.1, 150.0
Try a page that's just a big page of Unicode (say Chinese) text? This patch affects all platforms so it might be a performance issue on some other platform. But I'm OK with the concept, we might just have to check it in and see what Talos says officially.
Maybe not Chinese. Maybe just English text with spaces?
Flags: blocking1.9? → blocking1.9+
Comment on attachment 303015 [details] [diff] [review] gfxFontGroup::MakeTextRun with individual words v1.1 Why are you pulling tempString into the loop? More efficient to leave it outside the loop and just clear it at the end of each iteration. Other than that it looks fine.
Comment on attachment 303015 [details] [diff] [review] gfxFontGroup::MakeTextRun with individual words v1.1 checked this in http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1203284400&maxdate=1203284830&who=karlt%2B%25karlt.net but backed out due to performance issues on MS Windows Tp 440 -> 465.
Talos machines didn't notice any difference but Tp2 did: 355 -> 375.
The word cache is also interfering with Pango's algorithm for matching fonts of paired characters. The word cache can cause the opening and closing characters to no longer balance.
This won't be fixed by ensuring that pango_itemize receives only individual words.
I think the best solution is to ensure that pango_itemize uses the same font for the first font in the fontset irrespective of the language of the characters (as was the case before changes in bug 401988).
Implements the solution suggested in comment 13. This gives about a 5% improvement in local talos page load time. With browser.display.auto_quality_min_font_size = 0, the improvement is about 10%. The difference between auto_quality_min_font_size = 20 and = 0 drops from about 8% to 3%.
For the patch to perform completely correctly we need to fix a separate issue (with similar symptoms) in that generic fonts [for GetFontAt(0)->GetPangoFont()] are being selected based on the first language used rather than the current DOM language.
The problem is that the gfxPangoFontCache keys fonts against PangoFontDescription but the selection also depends on PangoLanguage. Including the language in the key increases the number of calls to pango_context_load_font() during a talos pageset run from 2132 to 2375. This seems to make the page load metric 1% worse. The gfxPangoFontCache also has the issue that a hash is used as a key and there is no check that even the PangoFontDescription matches. I'm tempted to remove this cache and increase the expiry time on the gfxFontCache, which serves a similar purpose, and expiry seems nicer than filling up 5000 entries then clearing them all. However, even with a very large expiry time on the gfxFontCache, this increases the number of calls to pango_context_load_font() from 2375 to 2616. The performance affect from this actually seems less noticeable perhaps due to the PangoFcFontMap pattern cache, but there may be performance effect. The difference may be due to different weights in gfxFontStyles mapping to the same weight in the font description, or perhaps the double size in gfxFontStyle mapping to a fixed point size (with a scale factor of 1000). There are no PangoFontDescriptions that share the same PangoFontDescription hash.
Comment on attachment 304657 [details] [diff] [review] PangoFontMap that provides the same primary font irrespective of script Looks OK to me but behdad really should review this I'd think. +NewGfxPangoFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont); + +void +SetBaseFont(gfxPangoFontMap *aFontMap, PangoFont *aBaseFont); For namespace purity, should make these static methods of class gfxPangoFontMap
Attachment #304657 - Flags: review?(roc) → superreview+
Comment on attachment 304657 [details] [diff] [review] PangoFontMap that provides the same primary font irrespective of script Would you be able to look over this, please, Behdad? Comment 0 and comment 11 explain why there is a problem, and comment 12 to 14 are relevant. Our word cache assumes that the rendering of each word doesn't depend on surrounding context. "words" here means groups of characters separated by spaces. This patch provides/causes consistent font selection for script-common characters (assuming that the base font supports the common characters). This behavior is more in line with the "UA-dependent default 'font-family'" in Step 5 of the CSS font matching algorithm, which Mozilla should follow. http://www.w3.org/TR/CSS21/fonts.html#algorithm and with the (implied) concept of a generic family being a (fixed) chosen real family. http://www.w3.org/TR/CSS21/fonts.html#generic-font-families Authors will need to use the "lang" attribute if they want to indicate a change in the generic/default font for the same character. (The shape engine selection may still be corrupted by the adjacent unrelated words. If this is a problem then we'll also need to have a separate call to pango_itemize for each word [or group of words in the same script].)
(In reply to comment #14) > Created an attachment (id=304657) [details] > PangoFontMap that provides the same primary font irrespective of script > > Implements the solution suggested in comment 13. > > This gives about a 5% improvement in local talos page load time. > With browser.display.auto_quality_min_font_size = 0, > the improvement is about 10%. > > The difference between auto_quality_min_font_size = 20 and = 0 drops from about > 8% to 3%. I tried this patch. It is not good for me. Two or more fonts are applied for alphabets and numerical characters. I will attach screenshots.
This is the screen shot with current trunk build with the patch for http://hidenosuke.org/antenna/ . It is not good.
This is the screen shot with 080212(JST) build for http://hidenosuke.org/antenna/ . It is good for me.
(In reply to comment #19) > I tried this patch. > It is not good for me. Thank you for testing. I now notice that I failed to attach the necessary changes to the Makefile to compile the new files. I would have expected unresolved symbols without these. (In reply to comment #20) > This is the screen shot with current trunk build > with the patch for http://hidenosuke.org/antenna/ . I'll attach a new patch. I can't reproduce the behavior with this patch (but I do see that behavior without the patch).
Included Makefile.in changes. Also keeping Pango's behavior for system fonts, because system fonts should mimic what the system does. There wouldn't be much difference if the language of the style for the UI were the language of the selected localization (i.e. the strings). However, it seems that the language of the style for the UI comes from the locale, so, for non-localized apps, the language is not representative of the English strings, and the generic fonts for the language are different. (In reply to comment #17) > +NewGfxPangoFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont); > + > +void > +SetBaseFont(gfxPangoFontMap *aFontMap, PangoFont *aBaseFont); > > For namespace purity, should make these static methods of class > gfxPangoFontMap Defined struct gfxPangoFontMap in the header so that these methods can be static PangoFontMap * gfxPangoFontMap::NewFontMap(PangoFontMap *aChildFontMap, PangoFont *aBaseFont); void gfxPangoFontMap::SetBaseFont(PangoFont *aBaseFont);
This is useful for testing but not ready to check in. It hacks around the language issue mentioned in comment 15 and 16.
corrected imbalanced parentheses
Attachment #305297 - Attachment is obsolete: true
I tried new patch(305296). This is a screen shot for the build with new patch for http://hidenosuke.org/antenna/ . This is better than previous patch. By the way, I think only Japanese fonts are used for all characters. (Please look at attachment 305161 [details].) Is this intended?
(In reply to comment #26) > By the way, I think only Japanese fonts are used for all characters. It is expected. It is not always possible to determine language from the characters, and Mozilla's word cache means only the context of the word is available, so any attempt to determine the language of the words independently is likely to result in inconsistent looking text. What this patch does is use the language specified in the document to select the (primary) font. The document at http://www.hidenosuke.org/antenna specifies lang=ja_JP and font-family:monospace. With default Mozilla preferences, this CSS generic family maps to the system/fontconfig generic pseudo-family also called "monospace". For lang=ja, fontconfig provides a Japanese font for the generic pseudo-family. > (Please look at attachment 305161 [details].) The behavior of 081202 used no language in resolving the fontconfig generic, and so Western fonts tended to be selected as primary fonts. I can see that this produced a pleasant result for this web page (and probably many others). This however is not the best behavior for Hebrew or Arabic text, and it can mean that a Chinese font may be used to render Han characters in a Japanese document. Different people will have different opinions on what the best behavior is for a mix of Latin and Japanese characters. If there were a Japanese font that provided good glyphs for both Latin and Japanese characters, then I think that would be the best font to use as the glyphs would be designed to be used together. This is particularly important for monospace text as a mix of fonts would provide a mix of widths (though I see that the Japanese characters are about twice the width of Latin characters with the ja monospace font from fontconfig here). However, some fonts designed for Western characters render Latin characters better than some (many?) Japanese fonts. It is possible to have Latin characters in Japanese documents rendered with Western fonts, either * by the author using the "lang=en" attribute on elements that contain (only!) Latin content, or * by the user setting up font preferences for the Japanese generic fonts in such a way that a Western font is tried first. The Western font will then be used for the characters that it supports but (provided the Western font doesn't support Japanese characters) a different font will be used for Japansese characters. The user can set up font preferences either in Mozilla (as was done with attachment #XXXXXXXX, and I expect that is what Bill's extension in Bug XXXXX Comment XXXXX does) or with fontconfig. Here's an example fontconfig element that could be in ~/.fonts.conf: <match target="pattern" > <test name="lang" > <string>ja</string> </test> <test name="family" > <string>sans-serif</string> </test> <edit mode="prepend" binding="strong" name="family" > <string>DejaVu Sans</string> </edit> <edit mode="prepend" binding="weak" name="family" > <string>Sazanami Gothic</string> </edit> </match> A similar element could be added for family=serif, but I wouldn't recommend doing this for monospace.
If the author specifies a Western font first in CSS 'font-family', that works too, right? A lot of pages do that. Maybe we should specify some Western fonts first in our prefs for ja?
(In reply to comment #27) > The user can set up font preferences either in Mozilla (as was done with > attachment #XXXXXXXX, and I expect that is what Bill's extension in Bug XXXXX > Comment XXXXX does) attachment 299736 [details] [diff] [review] and bug 401988 comment 59, that is!
(In reply to comment #28) > If the author specifies a Western font first in CSS 'font-family', that works > too, right? A lot of pages do that. Yes, provided the fonts exist on the platform. fontconfig <alias> elements have "weak" binding, so the language overrides the alias. > Maybe we should specify some Western fonts first in our prefs for ja? Probably, but it caused a performance hit (bug 416068) and the name-list preferences are not very visible so can be confusing. The performance hit would be less now though with attachment 302746 [details] [diff] [review] (bug 414692).
Implemented gfx_pango_fontset_get_font() and corrected some indentation. The only time a PangoContext is being passed to a PangoFontMap of different type from get_fontmap(context) is pango_font_map_load_fontset(fontmap, context,...) (where the fontmap is explicitly passed), and the only assumptions I've found that get_fontmap(context) returns a fontmap of typeof(self) are in render_layout functions. But if you'd like to have get_fontmap(context) in fontmaps always return a fontmap of typeof(self) then the fontmap can be set temporarily on the context. (I can't see a good way to copy a PangoContext.) The reasoning behind this patch is that a word is not enough context to reliably and consistently detect language. So the document/element language is used to select the primary font and the word language is used if that font doesn't support the chars. This is pretty consistent with HTML specification that implies that the lang attribute should be applied to most of the content but individual "characters that are atypical for a particular language" are still rendered with a best attempt. IIUC it is also what we do on other platforms. http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1
+ if (childLevel > baseLevel || !self->mBaseFont) The !self->mBaseFont should not (need not) be here.
I recommend that this should be fixed for 1.9 so that we don't have behavior as in attachment 305160 [details] (where the date and time should be one monospace font).
Target Milestone: --- → mozilla1.9
Karl, you should ping behdad on irc in case he's not seeing the review request.
(In reply to comment #31) Hi Karl, Waht's going on? attachment 305686 [details] [diff] [review] works fine as far as I tested.
Attachment #305686 - Attachment description: PangoFontMap that provides the same primary font irrespective of script v1.2 → PangoFontMap that provides the same primary font from document language rather than adjacent characters v1.2
(In reply to comment #35) Thanks for testing. The patch needs review. Behdad would be the best person to review, but he is also a busy person. Behdad: Mozilla's situation is slightly different from normal Pango use for two reasons: 1. We only have one word of context available to determine the script. 2. HTML provides for specifying the language of the text, so we should use that when it makes sense, even for common characters.
Attachment #305686 - Flags: review?(mozilla) → review?(pavlov)
Comment on attachment 305686 [details] [diff] [review] PangoFontMap that provides the same primary font from document language rather than adjacent characters v1.2 I would suggest doing thist without adding new files. Besides the first font for the language, it doesn't seem like this addresses using different fonts if the "base" font doesn't work? I have some thoughts on that, but maybe this is enough for now?
This isn't something we'd stop the release for. We'll take a good patch. Moving to wanted1.9.0.x+.
Personally I feel we should make sure this is fixed for the release, but I won't renom as I have no new information beyond comment 33.
(In reply to comment #37) > I would suggest doing this without adding new files. Moved to single file. > Besides the first font for the language, it doesn't seem like this > addresses using different fonts if the "base" font doesn't work? Yes, if the prefs for the document language specify only a fontconfig generic name, and a common character near a character from a script of different language cannot be represented using the base font for the doc language, then that different language sometimes get used to choose the font. I doubt that happens so often. This patch should cover most cases.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
This may have been responsible for a 16-17% reduction in Tp.
(In reply to comment #42) > This may have been responsible for a 16-17% reduction in Tp. > Have you confirmed this? Have a link to talos? That's a huge hit that can't go unchecked.
That's not a hit, it's an improvement :-).
It is sort of visible here but the times plotted are up to 2 hours after the build ids. http://graphs.mozilla.org/#spst=range&spss=1207712982.4914286&spse=1207734847.627143&spstart=1198095381&spend=1208548750&bpst=cursor&bpstart=1207712982.4914286&bpend=1207734847.627143&m1tid=61148&m1bl=0&m1avg=0&m2tid=61164&m2bl=0&m2avg=0&m3tid=61214&m3bl=0&m3avg=0&m4tid=146480&m4bl=0&m4avg=0 The buildids in the csv data are not so helpful either http://graphs.mozilla.org/dumpdata.cgi?setid=61148&setid=61164&setid=61214&setid=146480 "61148","qm-mini-ubuntu01","1.9","tp_loadtime_avg","0" "61148","1207722738","625.911","200804082258","None" "61148","1207727648","533.444","200804090020","None" "61164","qm-mini-ubuntu02","1.9","tp_loadtime_avg","0" "61164","1207721355","623.73","200804082234","None" "61164","1207726249","527.273","200804090007","None" "61164","1207730879","527.719","200804090119","None" "61214","qm-mini-ubuntu05","1.9","tp_loadtime_avg","0" "61214","1207725989" (Wed Apr 9 00:26:29 2008),"629.122","200804082314","None" "61214","1207730929","528.941","200804090034","None" "61214","1207735498","529.459","200804090146","None" "146480","qm-mini-ubuntu03","1.9","tp_loadtime_avg","0" "146480","1207722494","530.398","200804082150","None" "146480","1207726869","437.033","200804082258","None" "146480","1207730924","430.444","200804090020","None" As they imply a non-existent range: <200804082258 >200804082314 But analysing what was reported on Tinderbox http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox&maxdate=1207729498&legend=0&norules=1 gives this range: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1207723740&maxdate=1207724819 which would indicate attachment 314516 [details] [diff] [review] (as expected).
You need to log in before you can comment on or make changes to this bug.