Closed
Bug 177877
(non_unicode_wide)
Opened 22 years ago
Closed 22 years ago
extending non-Unicode (custome-encoding) TTF support for multibyte converters
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jshin1987, Assigned: jshin1987)
Details
Attachments
(2 files, 8 obsolete files)
112.85 KB,
image/jpeg
|
Details | |
28.31 KB,
patch
|
rbs
:
superreview+
|
Details | Diff | Splinter Review |
While trying to enable the rendering of pre-1933 Korean text represented in U+1100 Hangul Jamos with fonts with custom-encodings (Oxxx and Nxxx fonts mentioned in bug 176315) under MS Windows, I found the following problems in the current implementation in nsFontMetricWin.cpp for custom-encoded non-Unicode TTFs. 1. |nsFontNonUnicode| in nsFontMetricWin.cpp assumes that all non-Unicode TTFs (with custome/hack encoding) have single byte converters. Therefore, it uses 'A'nsi functions like ExtTextOutA and GetTextExtentPoint32A. It also assumes that the result of charset. conversion has the same length as the source (which is the case for single byte converters for MathML and webdings). Unfortunately, these assumption don't hold for fonts with more than 256 glyphs and double-byte converters. There are two problems to solve here. One is how to specify that a font needs a double byte converter in fontEncoding.properties file. The other is modify or subclass nsFontNonUnicode. One way to solve the former is add '.double' postfix to the encoding name (value in fontEncoding.properties file). The postfix has to be stripped away right after setting the fonttype (say, eFontType_NonUnicodeDouble). 2. Win32 API functions for getting font names return family names in the lang. corresponding to the current locale. For TeX, Mathematica, Webding fonts, this is not an issue because they usually have only English names and Win32 APIs fall back to English names if they're the only names specified in fonts. (This could be an issue even for them in the future if somebody add locale/language-specific names to them). However, two sets of Korean fonts I'd like to use mentioned in bug 176315 have both English and Korean names and Korean names are returned when Mozilla is running under Korean locale. Therefore, specifying English names in fontEncoding.properties file doesn't work. I thought specifying both Korean and English names might work. However,it turned out that it doesn't because a shortcut is taken to convert the locale character encoding to UTF-16 assuming that font names are representable in US-ASCII. Using a full-pledged converter from the locale charset to UTF-16(I guess we can even use MultiByteToWideChar Win32 API here) might be an easy solution (albeit a bit expensive). A better approach I like to implement is to figure out a way to get English-only (fully representable in US-ASCII) family name regardless of the current locale and use that throughout. (when names are exposed in UI, of course, the locale/language specific names had better be used. Even better would be doing it based on lang/locale pack currently in use.) This will also eliminate the need for an ugly hack to identify MS P Gothic ( Shift_JIS string for 'MS P Gothic' is hard-coded). A quick look at Win32 APIs didn't give me anything that looks promising. (Xft/fontconfig offer API calls for getting family/face names in various lang/locale and English. I wish there were a similar API call under Windows) Any help from Windows guru would be nice.
Updated•22 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 1•22 years ago
|
||
I added nsFontNonUnicodeWide class. It's mostly identical to nsFontNonUnicode class except that it uses 'W'ide functions instead of 'A'NSI functions and converters for this class are assumed to return the result in UCS2. Before calling 'W'ide functions, this result is converted to UCS2 in native byte order. To work around the problem of family name in ANSI code page of the current locale(I couldn't figure out how to get family name in US-ASCII- English- regardless of the current locale. Win32 API doesn't offer this functionality), I had to add a kludge to check the current codepage and call MultiByteToWideChar. fontEncoding.properties have two entries for each font-encoding pair (one with family name in English and the other with family name in native language in UTF-8). rbs, could you take a look?
Assignee | ||
Comment 2•22 years ago
|
||
Assignee | ||
Updated•22 years ago
|
Alias: non_unicode_wide
Assignee | ||
Comment 3•22 years ago
|
||
rbs, if you can take a look, it'd be nice. well, bugzilla will be down soon...
Attachment #104997 -
Attachment is obsolete: true
Assignee | ||
Comment 4•22 years ago
|
||
Now nsFontWinNonUnicodeWide directly inherits nsFontWin instead of indirectly inheriting it via nsFontWinNonUnicode. GetBoundingMetrics is implemented for nsFontWinNonUnicodeWide in case it's used by MathML in some cases.
Attachment #105658 -
Attachment is obsolete: true
Assignee | ||
Comment 6•22 years ago
|
||
Comment on attachment 106538 [details] [diff] [review] patch v2 (nsFontWinNonUnicode directly inherits nsFontWin) rbs, thank you in advance...
Attachment #106538 -
Flags: review?(rbs)
Assignee | ||
Comment 7•22 years ago
|
||
Comment on attachment 106538 [details] [diff] [review] patch v2 (nsFontWinNonUnicode directly inherits nsFontWin) Saving rbs for sr and asking Shanjian for r :-) Not be a Trojian horse, I have to reveal that eventually this patch will increase the 'disk' footprint of libuconv.so/uconv.dll by about 45k - 60k. However, according to what alecf wrote when combining all ucv* into a single so/dll, this should not be a problem for those who would never view old Korean pages.
Attachment #106538 -
Flags: review?(rbs) → review?(shanjian)
Assignee | ||
Comment 8•22 years ago
|
||
Sorry for spamming. My previous comment about the size of uconv was not for this bug. It's for bug 176315.
Comment 9•22 years ago
|
||
I took a look just now and I am going to have a close look tomorrow. Jshin, can you give me some idea why we have such a issue? Is there truetype font available for those characters? Or those characters are just not encoded in unicode? It is a little bit of odd to add non-unicode support now. I think non-unicode font is dying. Are those fonts you are trying to support available on other platforms?
Assignee | ||
Comment 10•22 years ago
|
||
Shanjia, thank you for lookinag at it. Pls, note that this bug is one of a series of bugs I filed for Hangul Jamo rendering. Other bugs in the series are bug 176290(which is also for MathML and 'symbol' fonts) and bug 176315. (you may also take a look at http://bugzilla.gnome.org/show_bug.cgi?id=95708) Here's some background to help you understand the issue at hand. Hangul Jamo(alphabet) is just like Indic scripts. Having __nominal__ glyphs for them at their Unicode codepoint is not sufficient at all. (that is, fonts like MS Arial Unicode and Cyberbit are totally useless for Hangul Jamo rendering other than the Unicode code chart because what they have for U+1100 Jamo is *non-combining* glyphs for Jamo) They have to be put together in a certain way to form syllables. Therefore, the proper way to support them is with opentype fonts with gsub/gpos tables. Unfortunately, no *opentype* font for Hangul Jamos is available for free and it might take pretty long to have freely available opentype fonts (as opposed to mere truetype fonts) with gsub/gpos and other bells and whistles for Hangul Jamos. The only opentype for Hangul Jamos is included in Korean version of MS Office XP and virtually none in opensource community has access to it. (I bought English MS Office XP for the sole purpose of accesing it because I never use MS Office but it turned out that English MS Office XP doesn't include it) In the meantime, two fonts I want to support are at least downloadable on the net although its license has some restrcition(I'm contacting the foundry Hanyang System in Korea to ask for a more liberal license). They're truetype fonts with Unicode cmap with glyphs for Hangul Jamo composition in PUA. The mapping from a __sequence__ of Hangul Jamos to a sequence of glyph codepoints in two fonts (note that the mapping is not 1 to 1 but n to m) has to be done somewhere. That mapping is dealt with in bug 176315. 176315 also solves Hangul Jamo rendering problemm for Mozilla-X11core font, Xprint(because GTK,X11, and Xprint modules in gfx/src already have a mechanism to deal with custom-encoded fonts. See, for instance, http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#421 The array defined there play the role of fontEncoding.properties file for gtk,X11,Xprint gfx modules) and possibly MacOS (for MathML, custom-encoding font handling was added to MacOS gfx and that may just work for 'wide' custom-encoding fonts as well.) This bug sits on top of bug 176315 to make use of two converters I added there to render Hangul Conjoining Jamos (Mozilla-Windows already has code for handling custom-encoded fonts as used by MathML and some 'symbol' fonts, but they don't work for 'wide' custom encoded fonts) Bug 176290 is to make Mozilla-Xft take advantage of two converters added in bug 176315 as well as MathML(and 'symbol' font) converters. Another thing to note is that currently Mozilla-Unix/Linux doesn't make use of opentype or AAT fonts even for Indic scripts for which free opentype fonts began to emerge. (I'm a bit confused about the status of Mozilla-Windows and Mozilla-MacOS in this aspect. Do they use Opentype fonts or AAT fonts to render complex scripts? If they do, do they use OS-provided APIs such as Uniscribe and ATSUI or roll out in-house rendering? It seems like neither of them do any of this, but I'm not sure at the moment. If the answer is negative, my patch can also be used to render Indic scripts wih still widely available custom-encoded truetype fonts for Indic scripts in Mozilla-Windows if converters for them are added) That means even if opentype fonts for Hangul Jamo are available, Mozilla-Unix/Linux wouldn't work for Hangul Jamos. Mozilla's ctl extension (mozilla/extensions/ctl) renders Indic and Thai scripts with basically the same approach I'm taking here and in bug 176315. In other words, Mozilla/Unix/Linux rely on custom-encoded fonts for Indic scripts to render Indic scripts. Hangul Jamos are a bit simpler than Indic scripts so that just adding converters (in bug 176315) makes Mozilla-X11core and Xprint work. However, Mozilla-Windows and Mozilla-Xft need a bit more work and this bug is for Mozilla-Windows and bug 176290 is for Mozilla-Xft. Hopefully, this long comment helped you understand my intent in this bug and bug 176290 and bug 176315.
Comment 11•22 years ago
|
||
I finished looking through the patch. I have 2 suggestions. 1, The indentation is incorrect in several places, probably tab is used there. + // encoding in fontEncoding.properties for a wide NonUnicode font + // has '.wide' at the end. + if (Substring(encoding, encoding.Length() - 5, 5) == NS_LITERAL_STRING(".wide")) + aFontType = eFontType_NonUnicodeWide; + else + aFontType = eFontType_NonUnicode; 2, the following code appears several time in nsFontWinNonUnicode implementation. + char str_local[CHAR_BUFFER_SIZE]; + char* str = str_local; + PRInt32 destLength; + if (NS_FAILED(mConverter->GetMaxLength(aString, aLength, &destLength))) + return 0; + if (destLength > CHAR_BUFFER_SIZE) { + str = new char[destLength]; + if (!str) return 0; + } + + PRInt32 srcLength = aLength; + mConverter->Convert(aString, &srcLength, str, &destLength); + + +#ifdef IS_LITTLE_ENDIAN + // convert BigEndian UCS2 to little endian UCS2 + char* pstr = str; + while (pstr < str + destLength) { + PRUint8 tmp = pstr[0]; + pstr[0] = pstr[1]; + pstr[1] = tmp; + pstr += 2; // swap every two bytes + } +#endif and + if (str != str_local) { + delete[] str; + } It is possible to have a help class to simply this. memory deallocation can be handled in object destruction.
Assignee | ||
Comment 12•22 years ago
|
||
Shanjian, Can you review this update? As you pointed out, I added a helper class for temp. buffer allocation and removed tabs.
Attachment #106538 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #108981 -
Flags: review?(shanjian)
Comment 13•22 years ago
|
||
Comment on attachment 108981 [details] [diff] [review] a new patch addressing Shanjian's concerns >@@ -1468,7 +1532,12 @@ > nsAutoString encoding; > if (NS_SUCCEEDED(GetEncoding(aShortName, encoding))) { > aCharset = DEFAULT_CHARSET; >- aFontType = eFontType_NonUnicode; >+ // encoding in fontEncoding.properties for a wide NonUnicode font >+ // has '.wide' at the end. The indentation of these 2 line does not seems right. Please double check.
Attachment #108981 -
Flags: review?(shanjian) → review+
Assignee | ||
Comment 14•22 years ago
|
||
Comment on attachment 108981 [details] [diff] [review] a new patch addressing Shanjian's concerns Thank you for r, Shanjian. I've fixed the indentation in my copy. Now asking rbs for sr.
Attachment #108981 -
Flags: superreview?(rbs)
Comment 15•22 years ago
|
||
The approach in the patch is OK but it seems to me that it would be much simpler just to build upon the existing code with a simple |if| as follows: /* yep, I know that IsWide() doesn't exist, but it is generally good to be able to query the biodata of objects, and functions to GetState/Behavior() should have been added in the first place in nsIUnicodeEncoder. But now you can initialize as: GetConverter(converter, &isWide); font = new nsFontWinNonUnicode(&logFont, hfont, ccmap, converter, isWide); and use a member bolean variable mIsWide for such a purpose. */ if (converter->IsWide()) // or if (mIsWide) // use 'W': ::ExtTextOutW(aDC, aX, aY, 0, NULL, (PRUnichar*) str, destLength / 2, NULL); else // use 'A': ::ExtTextOutA(aDC, aX, aY, 0, NULL, str, destLength, NULL); The reason I am suggesting this is because the code you are adding is differing from the existing code by just this type of one-liners, and so the benefits would be to cut down the excessive C++ and ease maintenance. (A direct effect would be that most of your code would be kept in the testing radars even without those rare Jamo fonts. Presently, your code is almost never going to be tested given that most people won't have those fonts.) The auto memory buffer is a welcome addition, though. BTW, rename |nsCharBufferHdl| to |nsAutoCharBuffer|. It is customary in the Mozilla codebase to use the prefix |auto| for this type of scope-based data structures. + PRInt32 destLength; + if (NS_FAILED(mConverter->GetMaxLength(aString, aLength, &destLength))) + return 0; + + nsCharBufferHdl buffHdl; + char* str = buffHdl.GetBuffer(destLength); + if (!str) return 0; + + PRInt32 srcLength = aLength; + mConverter->Convert(aString, &srcLength, str, &destLength); + +#ifdef IS_LITTLE_ENDIAN + // Convert BE UCS2 to LE UCS2 + buffHdl.ByteSwap16(destLength); +#endif The above chunk is redundant and can be put in a helper function, e.g., ConvertUnicodeToGlyphs(..., nsAutoCharBuffer& aResult) or something. As shanjian wondered, I have also been wondering if there is a wide demand for bending backward this way rather than leading the way forward. Is this patch just for the sake of it, or is there a sizeable demand for supporting this feature other than your own needs? Finally, there isn't much point in landing this patch without landing bug 176315 first.
Assignee | ||
Comment 16•22 years ago
|
||
Assignee | ||
Updated•22 years ago
|
Attachment #108981 -
Attachment is obsolete: true
Assignee | ||
Comment 17•22 years ago
|
||
I found that |nsAutoFontDataBuffer| already does |nsAutoCharBuffer| except that its static buffer is larger and the type is unsigned. Template is not supposed to be used for xpcode, but can't I use even in platform specific code for which the only compiler we use(VC++) is known to work well with it?
Assignee | ||
Comment 18•22 years ago
|
||
I also found that there are several places where PRUnichar[] is used similarly to the same way char[] is used for which I made a helper class per Shanjian's suggestion. So, I extended the prev. patch to deal with the redundancy there as well. This is actually outside the scope of this bug, but I thought I might do it while I was at it. This also uses template so that it depends on the Mozilla coding policy.
Assignee | ||
Comment 19•22 years ago
|
||
Comment on attachment 109259 [details] [diff] [review] a similar patch using template to unify two nsAuto* classes I'm asking for r and sr of the second in a series of three (attachment 1092588, attachment 1092598, attachment 1092600), but please feel free to pick one you think of as the most appropriate.
Attachment #109259 -
Flags: superreview?(rbs)
Attachment #109259 -
Flags: review?(shanjian)
Assignee | ||
Comment 20•22 years ago
|
||
Thank you for review and suggestion. I made changes as you suggested although to me having a separate subclass appears to be cleaner despite some redundancy. (there's a method for 'wide' font that is more like the one for Unicode font class than the one for 'narrow' non-unicode font). As for your question about the demand for this feature, my short answer would be definitely yes. The leader of Jikji Project (Korean equivalent of Gutenberg Project) is very much interested in this work because the project got kinda stuck due to the lack of portable/cross-platform support for old Korean on the web. It also has to be noted that this patch is not only for Korean but could be used for other complex scripts for which opentype fonts are not readily available yet. That category includes a lot of South/South Asian scripts. Moreover, having opentype fonts doesn't make Mozilla all of sudden render them correctly. Opentype fonts are intelligent but certainly NOT to the degree that they can do all the magic for complex scripts without 'clients' of opentype fonts doing anything. As far as I can tell, there's no trace of opentype font support in Mozilla code, yet. Therefore, even though I don't have any doubt whatsoever about a thesis that opentype font support is the way of the future, I wouldn't use that as an argument against implementing the only feasible way to support some complex scripts represented NOT in hacked text encoding BUT in the proper Unicode way. To me, being able to render complex scripts encoded in their Unicode codeposition (instead of PUA code points or hacked **text** encoding) is not a step backward but a big step forward, which would accelerate opentype font developments with gsub/gpos and other opentype tables for them. I wrote a very long response to your question, but decided not to clutter bugzilla. Instead, I put it up at http://jshin.net/i18n/korean/177877.comment.txt. It's incomplete, not so well organized and is in plain text. Nonetheless, you're welcome to read it.
Comment 21•22 years ago
|
||
Comment on attachment 109260 [details] [diff] [review] a more aggresive patch to remove redundancy with temp. buffer alloc/free. This one is my favorite. I only have the few nits mentioned below. s=rbs applies from now on. ======================== name.StripWhitespace(); ToLowerCase(name); + // if we have not init the property yet, init it right now. ---------- mMat.eM12 = mMat.eM21 = zero; - mMat.eM11 = mMat.eM22 = one; + mMat.eM11 = mMat.eM22 = one; ---------------- @@ -5247,7 +5308,6 @@ return 0; } - NS_IMETHODIMP nit/whitespace: no need to mess the patch with these useless changes. ================================ +// a wrapper function overloading |GetGlyphIndices| defined above. +static nsresult +GetGlyphIndices(HDC aDC, + nsCharacterMap** aCMAP, + const PRUnichar* aString, + PRUint32 aLength, + nsAutoChar16Buffer& aBuffer) +{ + PRUnichar* str = aBuffer.GetArray(aLength); + if (!str) + return NS_ERROR_OUT_OF_MEMORY; + if (!GetGlyphIndices(aDC, aCMAP, aString, aLength, str, aLength)) + return NS_ERROR_UNEXPECTED; + return NS_OK; +} What about just changing the definition of the old function so that it accepts |nsAutoChar16Buffer& aBuffer| as argument. This way you can remove the internal alloc that is done therein. ================================ + PRBool mIsWide; + #ifdef MOZ_MATHML virtual nsresult GetBoundingMetrics(HDC aDC, @@ -1959,6 +2098,7 @@ nsCOMPtr<nsIUnicodeEncoder> mConverter; }; readability: just move the declaration at the end.
Attachment #109260 -
Flags: superreview+
Attachment #108981 -
Flags: superreview?(rbs) → superreview-
Attachment #109259 -
Flags: superreview?(rbs) → superreview-
Assignee | ||
Comment 22•22 years ago
|
||
rbs, can you take a final look and transfer your sr stamp here ? Thanks.
Attachment #109258 -
Attachment is obsolete: true
Attachment #109259 -
Attachment is obsolete: true
Attachment #109260 -
Attachment is obsolete: true
Assignee | ||
Comment 23•22 years ago
|
||
My own 'nit' detector found unnecessary '// if output.....' and I've just removed. ---------------- // if output buffer too small, allocate a bigger array -- the caller should free - PRUint16* result = aBuffer; - if (aLength > aBufferLength) { - result = new PRUint16[aLength]; - if (!result) return nsnull; + PRUint16* result = aBuffer.GetArray(aLength); -------------- Just to save time for rbs, the only diff. brounght in in this patch is the following along with the change in GetGlyphIndices() as suggested. ( I thought there are several invocations like this and made an overloading wrapper, but it turned out there's only one...) - GetGlyphIndices(aDC, nsnull, &aChar, 1, &aGlyphIndex, 1); + nsAutoChar16Buffer gbuffer; + GetGlyphIndices(aDC, nsnull, &aChar, 1, gbuffer); + aGlyphIndex = *(gbuffer.GetArray()); }
Comment 24•22 years ago
|
||
Comment on attachment 109300 [details] [diff] [review] a new patch addressing rbs' concerns ============= +static nsresult +GetGlyphIndices(HDC aDC, + nsCharacterMap** aCMAP, + const PRUnichar* aString, + PRUint32 aLength, + nsAutoChar16Buffer& aBuffer) <-------- readability: rename aBuffer -> aResult +static nsresult <--------- so don't return |nsnull| later [...] + if (!aString) return nsnull; return NS_ERROR_NULL_POINTER ============= + nsAutoChar16Buffer gbuffer; <------------- + GetGlyphIndices(aDC, nsnull, &aChar, 1, gbuffer); + aGlyphIndex = *(gbuffer.GetArray()); Just drop the prefix 'g' throughout your patch -- it is is often used for global variables (e.g., use |buf| and |buffer| if you need two names) Error check for robustness: if (NS_SUCCEEDED(GetGlyphIndices(aDC, nsnull, &aChar, 1, buffer))) aGlyphIndex = *(buffer.GetArray()); ============= + PRBool mIsWide; <-------------- + #ifdef MOZ_MATHML virtual nsresult GetBoundingMetrics(HDC aDC, @@ -1959,6 +2080,7 @@ nsCOMPtr<nsIUnicodeEncoder> mConverter; }; I mentioned earlier to do: nsCOMPtr<nsIUnicodeEncoder> mConverter; + PRBool mIsWide; }; ============= + nsresult rv = GetGlyphIndices(aDC, &mCMAP, aString, aLength, buffer); + if (NS_FAILED(rv)) { return NS_ERROR_UNEXPECTED; <---- |rv| is lost } use: return rv; (2 places) ============= // we reach here if we couldn't transliterate, so fallback to question marks - if (aLength > aBufferLength) { - // allocate a bigger array that the caller should free - result = new PRUnichar[aLength]; - if (!result) return nsnull; - } + result = aBuffer.GetArray(*aCount); <-------------------- + if (!result) return NS_ERROR_OUT_OF_MEMORY; for (PRUint32 i = 0; i < aLength; i++) { result[i] = NS_REPLACEMENT_CHAR; } *aCount = aLength; <------------------------------------- you are using an uninitialized varibale. Use |result = aBuffer.GetArray(aLength)| instead.
Attachment #109300 -
Flags: superreview-
Comment 25•22 years ago
|
||
> readability: rename aBuffer -> aResult
ignore this, it isn't worth the trouble. It will just make the patch bigger
without much gain.
Assignee | ||
Comment 26•22 years ago
|
||
hopefully this is the last iteration :-)
Attachment #109300 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #109379 -
Flags: superreview?(rbs)
Comment 27•22 years ago
|
||
Comment on attachment 109379 [details] [diff] [review] a new patch with rbs' concerns addressed sr=rbs
Attachment #109379 -
Flags: superreview?(rbs) → superreview+
Assignee | ||
Updated•22 years ago
|
Attachment #106538 -
Flags: review?(shanjian)
Assignee | ||
Comment 28•22 years ago
|
||
Comment on attachment 109259 [details] [diff] [review] a similar patch using template to unify two nsAuto* classes Sorry for spamming. I'm cleaning up obsolete r request. Fix just checked in. Thank you all.
Attachment #109259 -
Flags: review?(shanjian)
Comment 29•22 years ago
|
||
-> fixed
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 30•22 years ago
|
||
Jungshik Shin: could you please verify this change? otherwise, please provide the test case/steps then I can verify it. Thanks!
Assignee | ||
Comment 31•22 years ago
|
||
Yuying, You have to wait until my patch for bug 176315 is checked in. I'll let you know when that's done.
You need to log in
before you can comment on or make changes to this bug.
Description
•