CJK native font names are not recognized on non-CJK Windows and English names are not on CJK windows

RESOLVED FIXED

Status

Core Graveyard
GFX: Win32
RESOLVED FIXED
14 years ago
9 years ago

People

(Reporter: Jungshik Shin, Assigned: Jungshik Shin)

Tracking

({intl})

Trunk
x86
Windows 2000
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Assignee)

Description

14 years ago
Thanks for the reference [0] (see bug 203745 comment #11 and bug 203745 comment
#10). That's new to me. MS must have shuffled around MSDN pages. When I wrote
comment #8, I tried to give a reference. I'm sure I have seen it in the
reference page for EnumFontFamiliesEx [1], but they changed the page and there's
no mention of 'native' font names any more.

Now according to the page you cited, on Win2k/XP, everything should just work
fine because CreateFont/CreateFontIndirect take care of it as long as we use 'W'
version of APIs, which we don't at the moment. The MSDN doesn't make it clear,
but that's the only way 'CreateFont/CreateFontIndirect' can work with both
'native' ('localized name' used in MSDN is a misnomer when refering to CJK names
of CJK fonts) names and  English names on Win2k/XP regardless of the current
system locale. 

On Win 9x/ME and NT4, unless we hard-code a mapping table between 'native' names
 and English names for Windows core fonts for SC, TC, J and K, I think we're out
of luck. We can bypass Win32 APIs  to retrieve whatever we want from a
(truetype) font, but that seems to be too much work (for too little gain). So,
we can do two things on Windows:

1. a. detect the Windows version in |GlobalInit| of nsFontMetricsWin.
   b. use 'W' APIs and Unicode font names (as passed by layout)  on Win 2k/XP
and 'A' APIs and font names converted to the system codepage (with
WideCharToMultiByte) on Win 9x/ME and NT4 [2]
   c. make pretty extensive changes because we use 'A' version of logFont with
|lfFaceName| in a C string throughout. We have to use 'W' version now.

2. This is optional and necessary only on Win 9x/ME and NT4.
   a. add the mapping table mentioned above. 
   b. if CreateFont* fails and lfFaceName of logFont is non-ASCII, look it up in
the mapping table (hash it if necessary) for the corresponding English name. [3]
   c. copy 'W' logFont to 'A' logFont except for lfFaceName, for which we have
to use the English name obtained in step b. 
   d. call CreateFont* (A version) with 'A' logFont

I'm tempted to do just the first of two because 1) that alone may (or may not)
be quite a work 2) the number of those who can benefit from this are relatively
small (users of Win 9x/ME/NT4 the language of which don't match the language of
fonts. e.g. English Win 9x/ME users who want to view Chinese pages). We can do
the second later if we determine that we can't ignore those cases.

[0]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/gdi/fontext_4rw4.asp
[1]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/gdi/fontext_4mk8.asp
[2]
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2388
    Here, font names in UTF-16 are converted to C string in the current system
code page.
    If we can assume that every Win9x/ME user has MS IE 5.x or higher installed,
we're safe to use 'W' APIs even on Win 9x/ME. In similar cases, however, ftang
and roy didn't make that assumption. Instead, they wrote an equivalent of MSLU
(Microsoft Layer for Unicode).We can do the same here to avoid |if (Win2k/XP)
use W APIs else use A APIs| pattern.
[3] In theory, we have to do the opposite as well. That is, we have to make
English font names get recognized on Win 9x/ME (CJK) and NT4 (with CJK locale).
Therefore, we have to find the corresponding native name in the system code
page. In practice, the need for that is not greate because most CJK web pages
use native font names.
(Assignee)

Updated

14 years ago
No longer blocks: 203745

Comment 1

14 years ago
Perhaps, switching to CreateFontIndirectW "just works" on Win9x/ME with the CJK
native name, thus recoving the parity with WinXP.

The MSDN artcile says that CreateFontIndirectW needs UnicoWS.dll on Win9x/ME
(the "Microsoft Layer for Unicode on Windows 95/98/Me Systems"). If if "just
works", it might be tempting to switch over, and request users on those "old"
plaforms to get it...

But if that isn't the case and the native font names don't work with IE either,
I wouldn't mind giving the bug a very low priority. It seems to involve rather
intensive changes whichever bullet point you listed.
(Assignee)

Comment 2

14 years ago
> Perhaps, switching to CreateFontIndirectW "just works" on Win9x/ME with the CJK
> native name, thus recoving the parity with WinXP.

  As it is now, Korean fonts names work on Korean Win 9x/ME and Win 2k/XP with
Korean locale. However, they don't work on non-Korean Win 9x/ME and Win 2k/XP
with non-Korean locale.  The other way around, English names for Korean fonts
are not recognized on Korean Win 9x/ME and Win 2k/XP with Korean locale. Both
are true of SC, TC and J. 

Switching to 'W' APIs solve both problems for Win 2k/XP. However, it doesn't
solve either of the problem on Win 9x/ME and NT4. 'W' APIs on Win 9x/ME
supported by MSLU are just wrappers over 'A' APIs with WideCharToMultiByte
called on our behalf for wchar_t strings and can't do anything beyond what
native 'A' APIs can do on that platform. Therefore, requiring MSLU doesn't earn
much for us.

 A way to "solve' this problem on Win 9x/ME and NT4 is to hard-code a mapping
table for widely used fonts (i.e. the solution is not universal) and try
alternative names if the first attempt fails. Alternative is to poke into fonts
ourselves.  This is #2 item in comment #0.


As for MS IE 6, I'm pretty sure MS IE didn't work either until recently [1].
However, the latest version works on WIn 2k/XP (I'll attach a test case to bug
203745). I have to boot to Windows ME (English version) and see how MS IE works
there. What Opera does on Win 2k/XP is inconclusive...At least it can't
recognize English names for Korean fonts on Win2k/XP (with Korean locale), but
it seems like it can recognize C/J names for C/J fonts on Win2k/XP with Korean
locale. 

Given that numerous CJK web pages specify only native font names  and MS IE now
works correctly on Win 2k/XP. I think we have to do the first of two in comment
#0 (maybe not very urgent but nonetheless rather soon). The second item should
be given a very low priority as you wrote. 

[1] It could have been the case, in a typical MS fashion, that they had known
the 'trick' (undocumented) and used it in MS IE while leaving relevant MSDN
pages out of date.
Status: NEW → ASSIGNED

Comment 3

14 years ago
>  As it is now, Korean fonts names work on Korean Win 9x/ME and Win 2k/XP with
>Korean locale.

So, this side of the equation is fine.

> However, they don't work on non-Korean Win 9x/ME and Win 2k/XP
>with non-Korean locale.

This is the other side that prompted my earlier query in the other bug. The MSDN
article seemed to imply that, in this situation, given the CJK name on input,
EnumerateFontEx will return the English name on output, which can then be
subsequently used in CreateFont, thereby saving from hardcoding a mapping table.
That's is why I asked if this was also your reading of that article?

> The other way around, English names for Korean fonts
>are not recognized on Korean Win 9x/ME and Win 2k/XP with Korean locale. Both
>are true of SC, TC and J.

This case remains open, knowing that the first goal is to find a cheap way to
support the CJK native name, "given that numerous CJK web pages specify only
native font names" as you said.
(Assignee)

Comment 4

14 years ago
I guess you referred to the following :
> To get the appropriate font on different language versions of the OS, call 
> EnumFontFamiliesEx with the desired font characteristics in the LOGFONT 
> structure, retrieve the appropriate typeface name, and create the font using 
> CreateFont or CreateFontIndirect.

It's not very clear (it seems to imply that). Anyway, assuming it does, this
will solve the problem on 2k/XP (if we use 'W' APIs), but I suspect it will have
no effect on Win 9x/ME  because there's no _genuine_ W API on Win 9x/ME. For
instance, suppose  you pass a Korean font name to EnumFontFamilesExW on English
Win 9x/ME (with MSLU installed). The Korean font name in UTF-16 will be
converted to CP1252 before EnumFontFamilesExA is called. Because CP1252 doesn't
have any Korean character, the font name will turn to '?' or sth like that and
EnumFontFamilesExA can't do any magic with that. 

Comment 5

14 years ago
> there's no _genuine_ W API on Win 9x/ME

Some (such as ExtOutW) are indeed genuine. Should EnumFontFamilesExW be genuine
too, it would do the magic.
(Assignee)

Comment 6

14 years ago
You're right. After wrting comment #4, I realized that ExtTextOutW is genuine on
Win 9x/ME. So, I expected MSDN not to mention the need for MSLU  in the
reference page of ExtTextOutW, but it is not the case. Anyway, the only way to
figure it out is to experiment. 

Updated

14 years ago
Blocks: 145288
(Assignee)

Comment 7

13 years ago
http://www.trigeminal.com/samples/font_choices.html
has some information on this issue.
(Assignee)

Comment 8

13 years ago
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winprog/other_existing_unicode_support.asp
has the list of 'W' APIs natively implemented on Win 9x/ME. 
Depends on: 239279
(Assignee)

Comment 9

13 years ago
rbs, I didn't realize that you had made this a blocker to bug 145288 until a
moment ago. Anyway, I hit upon basically the same idea as yours (bug 145288,
comment #28).
I didn't know about gFamilyName so that my patch just uses a new persistent
properties (gFontNameProperties), but I'll change that. Fortunately, as a proof
of concept, I didn't yet implement caching. It'd have been a total waste of my
name because everything is there for me (gFamilyName) :-)
(Assignee)

Comment 10

13 years ago
Created attachment 173505 [details]
testcase with CJK Windows 'core' fonts
(Assignee)

Updated

13 years ago
Attachment #173505 - Attachment is patch: false
Attachment #173505 - Attachment mime type: text/plain → text/html; charset=UTF-8
(Assignee)

Comment 11

13 years ago
Created attachment 173506 [details]
test case(corrected)

Sorry for bug spam
(Assignee)

Updated

13 years ago
Attachment #173505 - Attachment is obsolete: true
(Assignee)

Comment 12

13 years ago
Created attachment 173544 [details] [diff] [review]
patch

This works well in all the cases as far as fonts listed in
fontNameMap.properties are concerned. It also 'fixes' bug 145288.

rbs, can you take a look? 

With CJK Windows core fonts installed (regardless of the system locale), all 4
combinations (for the 5th line, I only specify 'lang=xx' so that it's normal
that it looks different) for each font should look identical (btw, I had a typo
in my script that generated  attachment 173506 [details]. The corrected version is at
http://jshin.net/moztest/cjkfonts2.html )
Attachment #173544 - Flags: review?(rbs)

Comment 13

13 years ago
Comment on attachment 173544 [details] [diff] [review]
patch

It seems that your patch is not equivalent to what was there.

The winName used to behave as if it was a full-blown substitution directly into
the given CSS font-family list.

Thus, it was used in
LookForFontWeightTable(aDC, aName) as well.

Now, your patch skips that. Beware to check other places too.
(Assignee)

Comment 14

13 years ago
Created attachment 174029 [details] [diff] [review]
patch update 

> It seems that your patch is not equivalent to what was there.

In my patch, the mapping function is called later (when LogFont struct is
filled up for Win32 'A' APIs. WideCharToMultiByte was used in that place) than
it was before (FindLocalFont). As you pointed out, I missed the other place
where I should have called LookupWinFontName (as a substitute for
WideCharToMultiByte). In this patch, I took care of it.
Attachment #173544 - Attachment is obsolete: true
(Assignee)

Comment 15

13 years ago
Created attachment 174030 [details] [diff] [review]
patch (alternative)

In this patch, LookupWinFontName is used in place of PL_HashLookup() in
FindLocalFont(). WideCharToMultiByte() has to be used down the road because
with this patch, nsString is hashed instead of nsCString (the latter can  be
passed over directly to 'A' APIs while the former need go through WCtoMB()).

The code size is slightly smaller this way, but perf-wise, it's a little
(perhaps a very little) slower than attachment 174029 [details] [diff] [review]. In addition, this patch
has a lower risk of regression.

rbs, can you review whichever you like better?
(Assignee)

Updated

13 years ago
Attachment #173544 - Flags: review?(rbs)

Comment 16

13 years ago
Comment on attachment 174029 [details] [diff] [review]
patch update 

I prefer this one, with the following comments.

 PLHashTable* nsFontMetricsWin::gFontMaps = nsnull;
 PRUint16* nsFontMetricsWin::gEmptyCCMap = nsnull;
+nsString nsFontMetricsWin::gCodepageStr; 

Better to use a pointer (like the other variables next to you).

==================

I find |LookupWinFontName| rather convoluted. Why have both
|gFamilyNames| and |gFontNameMapProperties|, and go to great 
lengths to add things in the hash on OOM conditions. Adding things can 
only trigger other OOM conditions.

I suggest to only have the properties map (it is a hash), and change
the signature to:

/* static */ void
LookupWinFontName(const nsAString& aName, nsAString& aWinName)
{
   aWinName = aName;

+  [...] // early check

+  if (!gFontNameMapProperties)
+    NS_LoadPersistentPropertiesFromURISpec(&gFontNameMapProperties,
+      NS_LITERAL_CSTRING("resource://gre/res/fonts/fontNameMap.properties"));

   if (gFontNameMapProperties) {
     ... look up there
   }
}

=================

Also, I think it might be easier to simply the property file as:

AsciiName.cp = CJKName 
CJKName = AsciiName.cp

with the lookup becoming:

if (IsASCII(aName)) {
  lookup value of name.cp
}
else {
  lookup value of name
  check that value ends with cp (with the existing function StringEndsWith)
  and truncate
}

==================

 typedef struct nsFontFamilyName
 {
   char* mName;
   char* mWinName;
 } nsFontFamilyName;

-static nsFontFamilyName gFamilyNameTable[] =

The struct is not needed anymore since nobody uses it.

===================

+  // There's little worry about characters not covered by the current 
+  // codepage because LookupWinFontName invoked in FindLocalFont 
+  // takes care of most cases. 


Suggesting:

The risk of loosing characters not covered by the current 
codepage is reduced because LookupWinFontName invoked earlier 
has taken care of most cases.
(Assignee)

Comment 17

13 years ago
Thanks for review and suggestion.

(In reply to comment #16)
> (From update of attachment 174029 [details] [diff] [review] [edit])
> I prefer this one, with the following comments.

attachment 174029 [details] [diff] [review] doesn't have "There's little worry ....." you rewrote. It's
only in attachment 174030 [details] [diff] [review]. So, I'm a little confused about your preference. 

> I find |LookupWinFontName| rather convoluted. Why have both
> |gFamilyNames| and |gFontNameMapProperties|, and go to great 

Yes, it's convoluted because I wanted to do 'lowercase', 'stripwhitespace',
'code page' comparison and WCtoMB conversion (in case of attachment 174029 [details] [diff] [review]) only
for the first time. With your suggestion for the properties file, it's simpler
than before, but still I have to call 'lowercase' (not just for the ASCII range
but for the whole Unicode repertoire which is quite expensive) and
'stripwhitespace' (and WCtoMB in case of 174029) everytime. Do you think we can
live with perf. lose arising from that? 

Comment 18

13 years ago
I think I didn't pick the right attachment. I meant attachment 174030 [details] [diff] [review].

I don't anticipate a perf hit. I feared the premture optimization instead. There
is 1) the early check that you had 2) the relatively fast path for the common
ASCII case and then, 3) the fall through for the rare CJK case.
(Assignee)

Comment 19

13 years ago
Created attachment 174441 [details] [diff] [review]
patch without caching per rbs' comment
Attachment #174029 - Attachment is obsolete: true
Attachment #174030 - Attachment is obsolete: true
Attachment #174441 - Flags: superreview?(dbaron)
Attachment #174441 - Flags: review?(rbs)

Comment 20

13 years ago
Comment on attachment 174441 [details] [diff] [review]
patch without caching per rbs' comment

On seeing that you opted not to assign/truncate anything at the start of
LookupWinFontName, you might want to then use a |PRBool LookupWinFontName|, and
do:

     // the familyNameQuirks should not affect generic & global fonts
+    nsAutoString winName; 
+    PRBool found = LookupWinFontName(*name, winName); 
-    nsFontWin* font = LoadFont(aDC, *winName, mFont.familyNameQuirks);
+    nsFontWin* font = LoadFont(aDC, found? winName : *name,
mFont.familyNameQuirks);

(this way, there is no implied dependency on the caller passing an empty
string.)

r=rbs with that.
Attachment #174441 - Flags: review?(rbs) → review+
(Assignee)

Comment 21

13 years ago
Comment on attachment 174441 [details] [diff] [review]
patch without caching per rbs' comment

thanks. I made the change in my tree. Besides, I got rid of hash callback
functions (for gFamilyNames).
This patch looks like it removes all the mapping that was represented by
gFamilyNameTable.

Also, I'm not crazy about using a .properties file for something that could
easily be in the .cpp file, although I guess having to encode UTF-8 or UTF-16 in
the .cpp file makes it a bit of a pain.

Comment 23

13 years ago
The mappings in gFamilyNameTable don't have much value anymore (see bug 145288). 

As for a .properties, yeah I don't have strong feelings either way too. I had
initially thought that jshin could just leverage gFamilyNameTable and hardcode
the mappings there. (That's why I marked bug 145288 dependent on this - see bug
145288 comment 28).

jshin, I guess you decided to use a .properties because it eases extensibility
down the track, and you preferred to resolve the problem once for all?
(Assignee)

Comment 24

13 years ago
It seems like Visual C++ doesn't like having UTF-8 literal strings in the source
code. (I think I tried them only to get a strange error.) Moreover, other people
who work on the file need to be careful when editing the file. Well, I can use
'\xE..\x8......blah blah', but that's not pretty. More importantly, as rbs
mentioned, I want to let end-users (or packagers) to add fonts they need. Hmm, I
may have to change the wording about 'localizability' in the properties file.
'One should not translate anything, but can add fonts'. 
 
Localizers shouldn't be able to add fonts; if fonts need to be added, they
should be added for everyone.

Furthermore, since this isn't in a localization JAR package or chrome URL,
localizers shouldn't even get to a point where they consider translating it.
Comment on attachment 174441 [details] [diff] [review]
patch without caching per rbs' comment

I'm a little skeptical about removing something like that during a beta cycle,
but sr=dbaron.

(Does the "to speed start-up" case vary by user, or is it always teh exact same
fonts?)
Attachment #174441 - Flags: superreview?(dbaron) → superreview+
(Assignee)

Comment 27

13 years ago
landed on the trunk on March 6th
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.