Closed Bug 176315 (old_korean) Opened 22 years ago Closed 22 years ago

need to have converters for rendering Old Korean text with Un series fonts

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.4beta

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(3 files, 27 obsolete files)

186.76 KB, image/jpeg
Details
147.48 KB, image/jpeg
Details
72.08 KB, patch
ftang
: review+
Details | Diff | Splinter Review
Ogulim/Obatang/OGungseo/Odotum and Ngulim/Nbatang/NGungseo/Ndotum are distributed by Microsoft in its Old Korean support tool for MS Word 2000. The former set of fonts have 6 sets of glyphs for 125 leading consonants (90 + 1 encoded in U+1100 Jamo block and 34 extra ) and 2 sets of glyphs for 95 medial vowels (66 + 1 encoded in U+1100 block and 28 extra) and 4 sets of glyphs for 141 trailing consonants (82 encoded in U+1100 block, 59 extra). Which of multiple glyphs to be used is context-dependent. That is, whether trailing consonant is present or what type of vowels (horizontal, vertical and hori-vertical ) is used and so forth. This was all worked out and has been implemented in Lambda (Unicode-enabled LaTeX, http://www.ktug.or.kr), Yudit(http://www.yudit.org) and Pango (http://bugzilla.mozilla.org/show_bug.cgi?id=95708) The latter (Nxxx) group of fonts have about glyphs for about 5500 precomposed syllables (made out of L's, V's, and T's mentioned above) along with a single set of glyphs for L, V, and T for on-the-fly generation of glyphs for syllable by simple overstriking. Both sets of fonts would enable Mozilla to render about 1.5 million syllables. As a ground-work for making use of those fonts, converters have to be written for converting a sequence of characters to a sequence of code points for glyphs. A similar work was done in the past for X11 BDF 'johab-1', 'johabs-1', 'johabsh-1' fonts so that I expect this can be done failry easily. In addition, a similar approach was taken to make use of Mathematica and Computer Modern fonts in rendering MathML. I'll try to begin to work on this soon. Once this is done, next step would be make each rendering engine (Gfx/window, Gfx/Gtk, Gfx/mac ) use this.
Ooops. I meant http://bugzilla.gnome.org/show_bug.cgi?id=95708 BTW, bug 176290 is about enabling hack-encoded fonts in Mozilla with Xft.
Keywords: intl
QA Contact: ruixu → ylong
Ngulim and Ogulim fonts are available at http://office.microsoft.com/korea/assistance/2000/weboldhg.aspx When installing this under non-Korean Windows, garbbled text will appear. (in case of Win2k/XP, locale can be set to Korean to avoid this). However, extract.exe ( http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/extract-o.asp) can be used to just extract two font files. Under Linux, cabextract(http://www.kyz.uklinux.net/cabextract.php3) can be used. BTW, I put up the list of extra Jamos avaialble in Oxxx.ttf at http://jshin.net/i18n/korean/jamos_ogulim.txt The list of precomposed syllables (pre-1933-orthography) in Nxxx.ttf is at http://jshin.net/i18n/korean/ngulim.html. (Obviously, Ngulim.ttf has to be installed and Mozilla-Xft works fine in that case.)
Assignee: yokoyama → jshin
adding back Roy to CC. I want to be assigned this, but somehow my privilege has changed and I can't accept this...
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attached patch patch v1 (obsolete) — Splinter Review
The fist patch. Tentatively, I used charset name 'x-hykoreanjamo-0' and 'x-hykoreanjamo-1' for Oxxx style fonts and Nxxx style fonts, respectively. I used 'hykoreanjamo-0' and 'hykoreanjamo-1' as their XLFDs. All of these are subject to change (if there are better names) because I'm the only one using this name at the moment :-).
Attached file nsUnicodeToJamoTTF.h (new) (obsolete) —
Two classes (nsUnicodeToJamoOgTTF and nsUnicodeToJamoNgTTF) are derived from base class (nsUnicodeToJamoTTF).
Attached file nsUnicodeToJamoTTF.cpp (new) (obsolete) —
implementation of nsUnicodeToJamoTTF, nsUnicodeToJamoOgTTF, nsUnicodeToJamoNgTTF
Attached file jamodefs.h (obsolete) —
Some macros for Hangul Jamos and encodings used in Nxxx and Oxxx fonts.
Attached file nsJamoConvUtil.h (new) (obsolete) —
Utility class for dealing with Hangul Jamos
Attached file nsJamoConvUtil.cpp (new) (obsolete) —
Utility class implementation
Attached file oj2ns.h (new) (obsolete) —
Look-up tables for mapping extended Jamo sequences to precomposed syllables (obsolete syllables) encoded in PUA in Nxxx fonts. Unlike the mapping between modern Hangul syllables(in U+AC00) and Hangul Jamo seq., the mapping is not algorithmic. The table is not yet complete because it's tedious and time-consuming to type in about 5000 obsolete syllables. This file covers about 10% and I'll update it.
Attached image a screenshot with New Gulim font (obsolete) —
This shot was taken of Mozilla's rendering of http://jshin.net/i18n/korean/ogtest.html. The font used is NewGulim.
The rendering of http://jshin.net/i18n/korean/ogulim.html with New Gulim font. To the right of Mozilla is 'gedit' rendering almost identical content (with my patch to Pango applied).
Attachment #104464 - Attachment is obsolete: true
Mozilla rendering http://jshin.net/i18n/korean/ngulim.html with Old Gulim font.
How to test : At the moment, it only works with Mozilla using X11 core font. Mozilla with Xft doesn't yet work (bug 176290 deals with the issue). Neither does it work for MS Windows and MacOS. I guess what has been done for MathML can be employed to make this work under MS Windows and MacOS. 1. Downloadl New Gulim and Old Gulim as explained in comment #2. 2. Put them somewhere on your machine (in a separate directory) 3. Make fonts.scale with the following content: 2 Ngulim.ttf -HanYang-Newgulim-medium-r-normal--0-0-0-0-p-0-iso10646-1 Ogulim.ttf -HanYang-Oldgulim-medium-r-normal--0-0-0-0-p-0-iso10646-1 4. Run 'mkfontdir' in the directory 5. Make fonts.alias with the following content: -HanYang-NewGulimJamo-medium-r-normal--0-0-0-0-p-0-hykoreanjamo-1 \ -HanYang-NewGulim-medium-r-normal--0-0-0-0-p-0-iso10646-1 -HanYang-OldGulimJamo-medium-r-normal--0-0-0-0-p-0-hykoreanjamo-0 \ -HanYang-OldGulim-medium-r-normal--0-0-0-0-p-0-iso10646-1 '\' at the end denotes the continuation and two lines separated by '\' have to be concatenated into one line. 6. If your XFree86 (Linux/FreeBSD/NetBSD) configuration is set up to load 'freetype' module by default, just running 'xset fp+ `pwd` ' will make these fonts available. You can change XF86 configuration to load 'freetype' module next time you launch X server. 7. If not, you have two options. A. Add the directory to the list of paths searched by xfs (most Linux distributions these days use xfs) in your xfs configuration file (/etc/X11/fs/config or /usr/X11R6/lib/X11/fs/config) and relaunch xfs ('kill -USR1 PID_of_xfs' will effectively do it. B. Run a separate xfs at a different port. In that case, you have to make xfs configuration file. When bug 176290 is resolved, Xft-enabled Mozilla would relieve end-users of this chore completely.
It'd be very nice if somebody could review my patch. (There are for sure some rough edges.) I'm not sure whom to ask for review. The goal of this bug is writing a converter for these custom-encoded fonts and I think I achieve the goal except for 4,500 syllables missing in the conversion table(oj2ns.h) from Jamo seq. to precomposed syllable glyphs in Nxxx style fonts.(see comment #10). Not having them doesn't prevent Nxxx style fonts from rendering the total of 1.5 million syllables because they have fall-back glyphs for combining Jamos. These fall-back glyphs can be used for on-the-fly generation of glyphs for syllables.
I gave wrong URLs in comment #12 and comment #13.They should be http://jshin.net/i18n/korean/ngtest.html http://jshin.net/i18n/korean/ogtest.html Two more test pages are up at http://jshin.net/i18n/korean/hunmin-ng.html http://jshin.net/i18n/korean/hunmin-og.html (You can figure out how the page should be rendered by looking at http://jshin.net/i18n/korean/hunmin.html with Mozilla-Xft or Mozilla-FT with CODE2000 font installed.) While testing two pages above, I found a problem with my converter dealing with the end of a run. Actually, I think I'm doing the right thing (except that it doesn't yet check the output buffer overrun) with 'Finish' and 'Convert' method. (What I did is similar to what's done in nsUnicodeToUTF8 to deal with surrogate pairs). It seems like 'Finish' method is called NOT before BUT after the first character NOT representable by my converter. That is, when 'C_1C_2C_3C_4N' (where C_i's stand for covered characters and N is for a non-representable character) and C_4 is in the buffer without being committed (because characters to come in next chunk of input can combine with C_4 just like high surrogate in UTF-16 can combine with low surrogate ) when N is encountered. 'Finish' method seems to be invoked AFTER 'N' is rendered instead of BEFORE 'N' is rendered. The result is 'C_1C_2C_3NC_4'. Could any one shed some light on this? I'll also look around. Thank you.
I found out the cause of the problem mentioned in my previous comment. I made a false assumption that Finish method would be called everytime a Unicode char. sequence entirely convered by nsUnicodeToJamoTTF is followed by a Unicode character not covered by it. Instead, what's left in the internal buffer(mJamos) is processed next time Convert method is invoked, which is when a next chunk of covered seq. follows a chunk of unconvered sequence. Therefore, 'C1C2C3C4NC5C6' is rendered as 'C1C2C3NC4C5C6' if 'C4' is left unprocessed in the internal buffer. In case of nsUnicodeToUTF8, this never happens because it covers the entire Unicode char. repertoire as opposed to a subset. A work-around is just assume that NO input chunk ever ends in the *middle* of a syllable as is done in nsUnicodeToSunIndic(), nsUnicodeToTIS620, and nsUnicodeToX11Johab. I'm not sure whether that's a valid assumption. It seems not, but it's rather tough to handle this case correctly without changing the way Convert() and Finish() are invoked by charset converter clients(e.g. gfx rendering routines). Of course, I'd be glad to know if there's any reason unknown to me that the assumption aforementioned is always valid.
Attached file nsJamoConvUtil.cpp(new) (obsolete) —
Now a sequence like 'L*SV*M?' or 'L*ST*M?' (L is for leading consonant, V for voewl, T for trailing consonant, M for tone mark and S for precomposed modern Hangul syllable. '*' and '?' are used as in regular expression) is treated in JamoNormalize() according to Unicode 3.2 section 3.11 along with Unicode 2.0 Jamo compatibilty decomposition.
Attachment #104462 - Attachment is obsolete: true
I've just filed bug 177877 to make use of conveters I'm implementing here under MS Windows. I also found several problems in my converters while trying to make use of them under MS Windows. Somehow, what worked perfectly under Linux(compiled with gcc) broke down under MS Windows (comp. by VC++). Anyway, I'll upload updated patches soon.
Attached file jamodefs.h (obsolete) —
Attachment #104460 - Attachment is obsolete: true
Attached file nsJamoConvUtil.h (obsolete) —
Attachment #104461 - Attachment is obsolete: true
Attached file nsJamoConvUtil.cpp (obsolete) —
Attachment #104698 - Attachment is obsolete: true
Attached file nsUnicodeToJamoTTF.h (obsolete) —
Attachment #104457 - Attachment is obsolete: true
Attached file nsUnicodeToJamoTTF.cpp (obsolete) —
Attachment #104459 - Attachment is obsolete: true
fontconfig (and in turn Xft) don't like blank glyphs at code points not listed in the blank glyph list of fonts.conf file. Since Ogulim and similar fonts have custom-encoding for Hangul LC and Vowel fillers, Xft replace them with empty boxes. To work around that, nsUnicodeToJamoOgTTF class was modified a little to turn out the regular(official) Unicode position for LFill and VFill instead of custom-code points. They're just fillers and only requirement for them is that they are blank and Vfill is non-advancing while Lfill is advancing.
Attachment #105011 - Attachment is obsolete: true
Attached file nsJamoConvUtil.h (obsolete) —
expanded the coverage to include Hanjas (CJK Ideographs: Ng font has the full coverage of CJK Ideographs from U+3400 to U+9Fa5) , symbols defined in KS X 1001 and US-ASCII.
Attachment #105008 - Attachment is obsolete: true
Attached file nsJamoConvUtil.cpp (obsolete) —
Attachment #105009 - Attachment is obsolete: true
Attached file nsJamoConvUtil.cpp (obsolete) —
sorry there was a typo...
Attachment #105775 - Attachment is obsolete: true
Attached file nsUnicodeToJamoTTF.h (obsolete) —
Attached file nsUnicodeToJamoTTF.cpp (obsolete) —
the coverage is expanded to include pre-1933 precomposed syllables in PUA code points. There are quite a lot of pre-1933 Korean text represented in PUA code points of Ngulim-like fonts. Some examples are at http://www.korean.go.kr
Attachment #105010 - Attachment is obsolete: true
Attachment #105332 - Attachment is obsolete: true
Attached file nsJamoConvUtil.h (obsolete) —
Attachment #105774 - Attachment is obsolete: true
Attached file nsJamoConvUtil.cpp (obsolete) —
There's a typo in a script used to generate one of mapping tables. The mapping affected was fixed.
Attached file nsUnicodeToJamoTTF.h (obsolete) —
Compiling with VC++ 6 under Windows, I found a couple of glitches uncaught by gcc and fixed them.
Attachment #105778 - Attachment is obsolete: true
Attachment #105779 - Attachment is obsolete: true
Attached file nsUnicodeToJamoTTF.cpp (obsolete) —
Could I get a review? Thank you..
Attachment #105781 - Attachment is obsolete: true
FYI, with patches for bug 177877 and bug 176290, this patch enables Mozilla under Windows and Mozilla-Xft(Linux/*BSD) to render old Korean. For MacOS, probably no change is necessary (or just a rather simple change a la patch for bug 177877). So, this is not only for Mozilla-X11core.
Attached patch all in one patch (obsolete) — Splinter Review
Fixed an overflow problem in the comparison function. BTW, with this patch, Xprint can be used to print Old Korean pages represented in U+1100 Jamos. Therefore, this patch is 'the' infrastructure to render U+1100 Jamos with/under X11core, Xprint, Xft, Windows and MacOS.
Attachment #104456 - Attachment is obsolete: true
Attachment #104463 - Attachment is obsolete: true
Attachment #105006 - Attachment is obsolete: true
Attachment #106453 - Attachment is obsolete: true
Attachment #106454 - Attachment is obsolete: true
Attachment #106457 - Attachment is obsolete: true
Attachment #106458 - Attachment is obsolete: true
Comment on attachment 108535 [details] [diff] [review] all in one patch Simon or Roy, can you review? Not to make this kinda Trojan horse, I have to tell you that this will eventually (the bulk of data is missing although without it it still works ) increase the size of libuconv.so/uconv.dll by 45k-60k. However, according to what alecf wrote when he combined all ucv*so/dll into one, this shouldn't affect the memory footprint for those who never view old Korean.
Attachment #108535 - Flags: review?(smontagu)
Comment on attachment 108535 [details] [diff] [review] all in one patch Transferring review request to ftang.
Attachment #108535 - Flags: review?(smontagu) → review?(ftang)
Just to help expedite the review process, here's a brief explanation(most of which has already been mentioned here before). This patch is very similar, in the spirit, to what's done in http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvko/nsUnicodeToX11Johab.cpp This is more complicated and extensive because the mapping rules from a sequence of U+1100 Jamos to a sequence of glyphs in Ogulim and Ngulim fonts are more complex than the mapping used for 'X11Johab' fonts. As I wrote earlier, this is the foundation for supporting old Korean on all the platforms (Mozilla-X11corefont, Mozilla-xft, Mozilla-Win, Mozilla under MacOS,Xprint) while nsUnicodeToX11Johab.cpp is only for Mozilla-X11corefont. It also includes 'normalization' (see |JamoNormalize| in nsJamoConvUtil.cpp) of Jamo sequences which converts a sequence of basic Jamos (e.g. U+1100 U+1100) to a single cluster Jamo(U+1101). That normalization (compatibility) was removed sometime between Unicode 2.0 and 3.0(which I regard as a very critical mistake), but it's very likely that it will be reintroduced as a 'named tailoring' to the Unicode normalization (not a part of the _frozen_ Unicode normalization) In comment #37, I wrote: > the bulk of data is missing although without it it still works. The missing data is in oj2ns.h (see comment #10). The reason it still works with an incomplete look-up table is that |nsUnicodeToJamoNgTTF| has a fallback for Jamo sequences for which no precomposed syllable glyph is available.
I removed NS_SUPPORTS.... (which was removed across mozilla-tree recently) and made a patch against the current cvs. Frank, can you review the patch? As I wrote before, this patch is basically a more sophisticated version of what you checked in for X11 Johab fonts 3 or 4 years ago.(nsUnicodeToX11Johab.cpp). 1.4alpha cycle seems to be good for checking in this because bug 177877 was fixed and bug 176290 is likely to be fixed soon because Chris finally appears to have some time to test my patch and work on it.
Attachment #108535 - Attachment is obsolete: true
Attachment #108535 - Flags: review?(ftang)
Comment on attachment 116275 [details] [diff] [review] a new patch against the current cvs sorry for spamming.
Attachment #116275 - Flags: superreview?(dbaron)
Attachment #116275 - Flags: review?(ftang)
Comment on attachment 116275 [details] [diff] [review] a new patch against the current cvs Frank may be too busy to review this patch. Roy, could you take a look? I really love to see this go in for 1.4. The patch is very long, but actually, it's pretty simple (if some long tables are excluded) It defines two new font-specific encoders and that's about it. As I wrote before, this is an extension of Frank's patch to support X11Johab fonts back in 1998.
Attachment #116275 - Flags: review?(ftang) → review?(yokoyama)
nits: 1) Sorry, but I see non-ascii chars in the comments. (korean?) It is advisable not to have those chars in the code. They may cause compiler errors in different system local (eg. Win-Ja) MSVC++6 is not an unicode app. It has problem with non-system chars. 2) I think we you remove the space between '!' and 'length'. There was one other place with similar syntax. eg. if (! length) I don't see Frank's email address in cc list. I'll ping ftang if he can review your patch in the timely manner. Your patch looks good though. Frank?
Thanks a lot, Roy, for reviewing and noticing that Frank was not on CC. I'm at loss as to why I hadn't noticed it before. I thought I added him when I filed this bug. > I see non-ascii chars in the comments. (korean?) > It is advisable not to have those chars in the code I fully agree with you on the point. I'll replace them with transliteration in US-ASCII. BTW, Win32 VC++ 6 (both EN and KO) doesn't seem to mind having UTF-8 comments. (In CP949 or EUC-KR, it's impossible to represent those characters :-)) > if (! length) Oh, that's my habit... I'll take care of it.
Attached patch a new streamlined patch (obsolete) — Splinter Review
I removed support for non-free 'New Gulim/Old Gulim' fonts I didn't feel comfortable about. In their places, I put support for a series of GPL'd fonts (UnBatang, UnGulim, UnDotum, etc) that are currently being developed by Won-Kyu PARK. The first in the series(UnBatang) is available at http://www.i18nl10n.com/fonts/UnBatang.ttf The font is still a work in progress and Hangul Jamo glyphs are not so refined as those of Ogulim font, but they'll get better. With support for Ngulim gone, this patch won't increase uconv dll/so as much as I mentioned in my earlier comment because about 50kB array for Ngulim-like fonts is NOT necessary any more. Also in this patch, I got rid of nsJamoConvUtil.cpp and put all the auxillary methods defined for it into nsUnicodeToJamoTTF.cpp as static functions. With bug 176320 about to be fixed soon (my patch was sr'd and is now waiting for review), it'd be great to get this patch in as well. This has been waiting for several months. BTW, a couple of new test pages were put up at http://jshin.net/i18n/korean/untest.html http://jshin.net/i18n/korean/hunmin-un.html
Attachment #116275 - Attachment is obsolete: true
Summary: need to have converters for rendering Old Korean text with Nxxx and Oxxx fonts → need to have converters for rendering Old Korean text with Un series fonts
Target Milestone: --- → mozilla1.4beta
Attachment #116275 - Flags: superreview?(dbaron)
Attachment #116275 - Flags: review?(yokoyama)
Comment on attachment 121104 [details] [diff] [review] a new streamlined patch Asking review. There's a little mix-up from another patch. Please, ignore 'nsEunsureUTF8' lines in nsUConvModule.*. Thanks.
Attachment #121104 - Flags: superreview?(bzbarsky)
Attachment #121104 - Flags: review?(ftang)
Not sure when I'll get to this (certainly not till Wed), but one thing that leapt out at me was that the big tables should all be const, no?
Thanks, Boris, for pointing out const problem. I turned them all to const static. Your comment also prompted me to replace PRUnichar with PRUint8 in several const static arrays saving about 3kB. In addition, two files(in gfx/src/gtk and gfx/src/Xlib) I forgot to include in the previous patch are included along with the patch for fontEncoding.properties file. As for sr, thank you for letting me know your schedule. I'll ask if rbs(who wrote several converters for MathML) can review it
Attachment #121104 - Attachment is obsolete: true
Attachment #121104 - Flags: superreview?(bzbarsky)
Attachment #121104 - Flags: review?(ftang)
Comment on attachment 121152 [details] [diff] [review] patch with static array size reduced r=ftang
Attachment #121152 - Flags: review+
Comment on attachment 121152 [details] [diff] [review] patch with static array size reduced Thank you for r, Frank. Now asking for sr.
Attachment #121152 - Flags: superreview?(rbs)
Comment on attachment 121152 [details] [diff] [review] patch with static array size reduced sr=rbs +#include "nsUCvKODll.h" +#include "nsUnicodeToJamoTTF.h" +#include "jamodefs.h" +#include "prmem.h" why bother with a separate, tiny, jamodefs.h file? +#include "jclusters.h" save people from guessing: |jamoclusters.h|
Attachment #121152 - Flags: superreview?(rbs) → superreview+
Minor nit: -- snip -- + * + * Contributor(s): Jungshik Shin <jshin@mailaps.org> + * -- snip -- ... can you change that to ... -- snip -- + * + * Contributor(s): + * Jungshik Shin <jshin@mailaps.org> + * -- snip -- in all places where it occurs (just to make automated processing of that stuff less painfull :) , please ?
Thank you all. Fix just got checked in with Roger's and Roland's concerns addressed.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
FYI, the checkin for this patch broke several tinderboxes due to a basic C++ porting issue. http://www.mozilla.org/hacking/portable-cpp.html#variables_in_for
There was also OS/2 bustage caused by mis-matched function declaration/definitions. E:/OS2_2.45_Clobber/mozilla/intl/uconv/ucvko/nsUnicodeToJamoTTF.cpp(875:10) : error EDC3068: Function overloading conflict between "nsresult(PRUnichar*,PRInt32*,PRInt32)" and "nsresult(PRUnichar*,PRInt32*,const PRInt32)". E:/OS2_2.45_Clobber/mozilla/intl/uconv/ucvko/nsUnicodeToJamoTTF.cpp(939:38) : error EDC3071: Call to "ScanDecomposeSyllable" matches more than one function.
I'm sorry for stupid mistakes and thank you for fixing them on my behalf. I should have gone off making sure that at least Linux/MacOS/Win32 got built all right (I do, but this time I was too excited(?)..)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: