Closed
Bug 88944
Opened 23 years ago
Closed 23 years ago
meta bug on Korean converters
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
FIXED
mozilla0.9.3
People
(Reporter: jshin, Assigned: tetsuroy)
References
()
Details
(Keywords: intl)
Attachments
(15 files)
71.19 KB,
patch
|
Details | Diff | Splinter Review | |
2.56 KB,
text/plain
|
Details | |
1.56 KB,
text/plain
|
Details | |
2.04 KB,
text/plain
|
Details | |
1.48 KB,
text/plain
|
Details | |
90.00 KB,
application/x-gzip
|
Details | |
1.19 KB,
patch
|
Details | Diff | Splinter Review | |
195.68 KB,
text/plain
|
Details | |
178.71 KB,
text/plain
|
Details | |
35.37 KB,
patch
|
Details | Diff | Splinter Review | |
63.53 KB,
text/html
|
Details | |
122.59 KB,
text/html
|
Details | |
2.02 KB,
text/html
|
Details | |
28.56 KB,
image/jpeg
|
Details | |
1.50 KB,
patch
|
Details | Diff | Splinter Review |
This meta-bug list four issues about Korean converters for which separate bugs have been filed. One may wonder why I file this meta-bug. That's because I fixed them all together and I found it hard to split my patch(which I will attach later) into four independent pieces. 1. CP949(Windows-949, Unified Hangul Code) converter in both directions : bug 7962 - filled up the placeholder files(nsUnicodeToCP949.{cpp,h} and nsCP949ToUnicode.{cpp,h}) in intl/uconv/ucvko - added u2BytesGR128Charset in intl/uconv/public/uconvutil.h - added the correspodingg scanner function uCheckAndScanAlways2ByteGR128() in intl/uconv/src/uscan.c - added the place-holder for m_generator list in ugen.c - added 'x-windows-949' related entries to intl/uconv/src/*.properties files - added two new files (u20cp949hangul.uf,u20cp949hangul.ut) to intl/uconv/ucvko directory. they do NOT include characters covered by KS X 1001. They only include 8822 Hangul syllables only available in CP949 (in pre-composed form) - added new entries for CP949 to intl/uconv/ucvko/nsUCvKOCID.h and nsUCvKoModule.cpp 2. EUC-KR -> Unicode converter enhancement (bug 9962) 8byte (4 character) sequence to represent 8822 Hangul syllables (not encoded in pre-composed form in KS X 1001 derived encodings such as EUC-KR) is recognized and properly converted to Hangul syllables in Unicode. - modified intl/uconv/ucvko/nsEUCKRToUnicode.cpp - modified intl/uconv/src/uscan.c * name changes in functions: xxxxComposedHangul -> xxxxDecomposedHangul 3. UnicodeToKSC5601.cpp (bug 9961 and bug 88922) When rendering UTF-8 (or CP949) pages with Hangul syllables not available in ksc5601.1987.0 fonts in pre-composed form, the leading 'blank' glyph of 4-character sequence had better be omitted. - modified intl/uconv/src/ugen.c * add another arguement to function uGenDecomposedHangulCommon() for the length of the sequence to generate (8byte for Unicode -> EUC-KR converter and 6byte for Unicode -> ksc5601.1987-0 font encoding) * name changes in functions: xxxxComposedHangul -> xxxDecomposedHangul 4. Hangul JOHAB encoding support (bug 71489, bug 80111, bug 88939) - renamed intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h} as nsUnicodeToJohabNoAscii.{cpp,h} (as it does NOT include US-ASCII part. It's only for Sun's ksc5601.1992-3 font encoding) - renamed 'x-johab' to 'x-johab-noascii' to make it clear that it doesn't include US-ASCII. 'x-johab' will be used for the genuine JOHAB encoding. - made new intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h} to support the genuine JOHAB encoding (including US-ASCII part) - added new files intl/uconv/ucvko/nsJohabToUnicode.{cpp,h} - corresponding changes in intl/uconv/ucvko/nsUCvKOCID.h and nsUCvKoModule.cpp - added 'x-johab' and 'x-johab-noascii' related entries to intl/uconv/src/*.properties files - made necessary changes in gfx/src/gtk/nsFontMetricsGTK.cpp for 'x-johab' and 'x-johab-noascii'
Reporter | ||
Comment 1•23 years ago
|
||
Reporter | ||
Comment 2•23 years ago
|
||
Reporter | ||
Comment 3•23 years ago
|
||
Reporter | ||
Comment 4•23 years ago
|
||
Reporter | ||
Comment 5•23 years ago
|
||
Reporter | ||
Comment 6•23 years ago
|
||
1. mozilla/intl/uconv/ucvko/u20cp949hangul.uf and u20cp949hangul.ut can be generated by filtering CP949.TXT (available at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT) with egrep -v '^#' | grep 'HANGUL SYLL' | egrep -v '^0x(A[1-F]|[B-C][0-F])([B-F][0-F]|A[1-F])' and then running the result thru intl/uconv/tools/umaptable. I'm going to attach gzipped tarred file for these two. 2. nsJohabToUnicode.cpp, nsJohabToUnicode.h, nsUnicodeToJohabNoAscii.(h|cpp) are all new files.
Reporter | ||
Comment 7•23 years ago
|
||
Reporter | ||
Updated•23 years ago
|
OS: Linux → All
Comment 10•23 years ago
|
||
Jungshik Shin- most of your code looks good. However, please do the following: 1. attach u20cp949hangul.uf and u20cp949hangul.ut file as plain text 2. The diff in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=40975 have big gfx changes. I think that is a mistake. You probably attach something wrong there. It looks like some code already check in. Why don't do the following. submit two patch, one include everthing inside intl/uconv . you can do this by 1. cvs add newfiles 2. cvs diff -uN intl/uconv In this way, all the new files you add (nsJohabToUnicode.cpp etc) will be included in your patch and also put a seperate patch for gfx/src/gtk And make sure you review it before you attach it. I think we need Sun team to help to QA this on unix also. katakai, can you arrange that. This is definitely too late for moz0.9.2 But I think it will be nice if we can land this into trunk ASAP (after the patch is ready)
Target Milestone: --- → mozilla0.9.3
Reporter | ||
Comment 11•23 years ago
|
||
Frank, Thank you for your advice. I'm gonna attach a new patch for gfx/src/gtk (sorry for making it a mess) and u20cp949hangul.uf and u20cp949hangul.ut in plain text As for including new files in intl/uconv patch, I'm afraid I can't because to run 'cvs add newfile' I need to have a write access to the repository, but I don't. Without 'cvs add', 'cvs diff -uN' appears not to work for new files(it complains that it doesn't know anything about new files as it should). What would you say I should do with this?
Reporter | ||
Comment 12•23 years ago
|
||
Reporter | ||
Comment 13•23 years ago
|
||
Reporter | ||
Comment 14•23 years ago
|
||
Reporter | ||
Comment 15•23 years ago
|
||
Comment 16•23 years ago
|
||
r=ftang for http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41101 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41107 The new mapping table for cp949 will increase about 30K in ucvko. But we won't use them unless we need to convert to/from Korean.
Comment 17•23 years ago
|
||
jshin@pantheon.yale.edu- do you want a cvs write access? Let's get superreview review your change so you can land into trunk. Please attach test pages into this bug also.
Status: NEW → ASSIGNED
Comment 18•23 years ago
|
||
Testing: OK, I'll ask our l10n team for testing. But the testing will be done in next week as Sun offices (except Tokyo) are closed whole this week.
Reporter | ||
Comment 19•23 years ago
|
||
> jshin@pantheon.yale.edu- do you want a cvs write access? Yes, let me have it. Can you grant me a write access? > Let's get superreview review your change so you can land into trunk. Who do I have to ask for superreview? BTW, it's a little bit involved to split away JOHAB support from my patch (in addition to 4 new files, there are changes made for that in my patch). Can you review them as well? > Please attach test pages into this bug also. All right. I'll do that later today.
Reporter | ||
Comment 20•23 years ago
|
||
The page at the URL given in the URL field of this bug have had links to test pages for four encodings (EUC-KR with 8byte seq. representation, JOHAB, CP949 and UTF-8). I'm now attaching CP949, JOHAB and EUC-KR with 8byte seq. test pages (the same pages as available at the URL above).
Reporter | ||
Comment 21•23 years ago
|
||
Reporter | ||
Comment 22•23 years ago
|
||
Reporter | ||
Comment 23•23 years ago
|
||
Reporter | ||
Comment 24•23 years ago
|
||
In addition to trying test pages attached and available at the URL given above in my build of Mozilla (with Korean fonts in both ksc5601.1987-0 encoding and johabsh-1 encoding - x-x11-johab in Mozilla ), I've tested various Korean converters with 'nsconv' (intl/uconv/tests) used as a filter and found them working as expected.
Comment 25•23 years ago
|
||
sr=blizzard
Comment 26•23 years ago
|
||
yokoyama, can you land this into trunk? make sure you first land 75707, and then merged with this one and test http://warp/u/ftang/utf8test/buffer.cgi with all these charset.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Assignee | ||
Comment 28•23 years ago
|
||
Jungshik: I am trying to land your patches to the trunk; but can you tell me which one of patches is for u20cp949hangul.uf and for u20cp949hangul.ut? http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103 I am assuming 41102 is for u20cp949hangul.uf and 41103 for u20cp949hangul.ut, correct? Your gzipped patch (40984) didn't work for WinZip. :(
Reporter | ||
Comment 29•23 years ago
|
||
Roy, Yes, you're right. 41102 is u20cp949hangul.uf (CP949 FROM Unicode) and 41103 is u20cp949hangul.ut ( CP949 TO Unicode). Thank you ! Jungshik
Assignee | ||
Comment 30•23 years ago
|
||
Jungshik: I completed the patch and did a little test run. I see two new menus Korean (JOHAB) and Korean (UHC) under View/CharEncoding/More [see attached] Do they suppose to be placed under View/CharEncoding/More/East Asian?
Assignee | ||
Comment 31•23 years ago
|
||
Assignee | ||
Comment 32•23 years ago
|
||
Jungshik: I tested your test pages they look great. Created an attachment (id=41168) Johab encoding test page Created an attachment (id=41169) CP949 test page I notice few characters (only handful) shown as '?'. For example by using (41169 ) CP949 test page: 0xa2e6, 0xa2e7, 0xa2e8, 0xa2e9, 0xa2ea,... 0xa2f0, 0xa2f1, 0xa2f2, 0xa2f3,..... Are they as expected?
Whiteboard: /r=ftang; /sr=blizzard.
Assignee | ||
Comment 33•23 years ago
|
||
Assignee | ||
Comment 34•23 years ago
|
||
reminder: we need to apply the patch (id=42266) to commercial tree as well.
Comment 35•23 years ago
|
||
navigator.properties, please also change the commercial one.
Reporter | ||
Comment 36•23 years ago
|
||
Roy, Thank you for testing. Some characters (e.g. 0xa2e6 - 0xa2fe) rendered as '?' are normal. They're *empty* code points in both CP949 and EUC-KR. I should have gotten them omitted, but I forgot to do that while generating test pages with a script.
Whiteboard: /r=ftang; /sr=blizzard.
Reporter | ||
Comment 37•23 years ago
|
||
Ftang,Thank you for patching menu properties file. Roy, thank you for catching that glitch. I put new encodings along with other East Asian encodings in my local copy, but completely forgot about it when uploading patches.
Comment 38•23 years ago
|
||
Just in case, my previous response to Roy (about characters rendered as '?') was not clear, the result is *expected* and perfectly normal and that does NOT mean anything wrong with the converter. It just indicates that the test page contains some empty code points which are expected to be rendered as '?' because there's no character defined at those points.
Assignee | ||
Comment 39•23 years ago
|
||
Jungshik: Understood. We are carpooling the changes on all the platform. Since there are makefile changes, we are testing on Mac and Linux. Should be able to check into trunk very shortly.
Comment 40•23 years ago
|
||
I land the GTK change into trunk now.
Assignee | ||
Comment 41•23 years ago
|
||
checked into the trunk. one more left: we need to apply the patch (id=42266) to commercial tree.
Assignee | ||
Comment 42•23 years ago
|
||
checked-in to trunk and commercial tree.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 43•23 years ago
|
||
I wonder if these checkins caused a few noticable regressions: At http://counter.li.org/reports/short.html what used to read (Føroyar) now reads (F?royar). In addition I crash when tooltips are enabled. (Linux CVS; an hour old)
Reporter | ||
Comment 44•23 years ago
|
||
> I wonder if these checkins caused a few noticable regressions: > At http://counter.li.org/reports/short.html . In encodings that don't cover U+00F8 (latin small letter o with stroke), it's perfectly normal that it's rendered with '?'. Make sure your View|Character Coding is set to Western(ISO-8859-1 or ISO-8859-15). > In addition I crash when tooltips are enabled. (Linux CVS; an > hour old) I don't think this patch can cause a problem in tooltips. P.S. Roy, thank you for landing this on the trunk.
You need to log in
before you can comment on or make changes to this bug.
Description
•