Closed
Bug 88944
Opened 24 years ago
Closed 24 years ago
meta bug on Korean converters
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
FIXED
mozilla0.9.3
People
(Reporter: jshin, Assigned: tetsuroy)
References
()
Details
(Keywords: intl)
Attachments
(15 files)
|
71.19 KB,
patch
|
Details | Diff | Splinter Review | |
|
2.56 KB,
text/plain
|
Details | |
|
1.56 KB,
text/plain
|
Details | |
|
2.04 KB,
text/plain
|
Details | |
|
1.48 KB,
text/plain
|
Details | |
|
90.00 KB,
application/x-gzip
|
Details | |
|
1.19 KB,
patch
|
Details | Diff | Splinter Review | |
|
195.68 KB,
text/plain
|
Details | |
|
178.71 KB,
text/plain
|
Details | |
|
35.37 KB,
patch
|
Details | Diff | Splinter Review | |
|
63.53 KB,
text/html
|
Details | |
|
122.59 KB,
text/html
|
Details | |
|
2.02 KB,
text/html
|
Details | |
|
28.56 KB,
image/jpeg
|
Details | |
|
1.50 KB,
patch
|
Details | Diff | Splinter Review |
This meta-bug list four issues about Korean converters
for which separate bugs have been filed. One may wonder why
I file this meta-bug. That's because I fixed them all together
and I found it hard to split my patch(which I will attach
later) into four independent pieces.
1. CP949(Windows-949, Unified Hangul Code) converter in both
directions : bug 7962
- filled up the placeholder files(nsUnicodeToCP949.{cpp,h}
and nsCP949ToUnicode.{cpp,h}) in intl/uconv/ucvko
- added u2BytesGR128Charset in intl/uconv/public/uconvutil.h
- added the correspodingg scanner function
uCheckAndScanAlways2ByteGR128() in intl/uconv/src/uscan.c
- added the place-holder for m_generator list in ugen.c
- added 'x-windows-949' related entries to intl/uconv/src/*.properties files
- added two new files (u20cp949hangul.uf,u20cp949hangul.ut)
to intl/uconv/ucvko directory. they do NOT include characters
covered by KS X 1001. They only include 8822 Hangul syllables
only available in CP949 (in pre-composed form)
- added new entries for CP949 to
intl/uconv/ucvko/nsUCvKOCID.h and nsUCvKoModule.cpp
2. EUC-KR -> Unicode converter enhancement (bug 9962)
8byte (4 character) sequence to represent 8822 Hangul syllables
(not encoded in pre-composed form in KS X 1001 derived encodings
such as EUC-KR) is recognized and properly converted to
Hangul syllables in Unicode.
- modified intl/uconv/ucvko/nsEUCKRToUnicode.cpp
- modified intl/uconv/src/uscan.c
* name changes in functions: xxxxComposedHangul -> xxxxDecomposedHangul
3. UnicodeToKSC5601.cpp (bug 9961 and bug 88922)
When rendering UTF-8 (or CP949) pages with Hangul syllables
not available in ksc5601.1987.0 fonts in pre-composed form,
the leading 'blank' glyph of 4-character sequence
had better be omitted.
- modified intl/uconv/src/ugen.c
* add another arguement to function uGenDecomposedHangulCommon()
for the length of the sequence to generate (8byte for
Unicode -> EUC-KR converter and 6byte for Unicode -> ksc5601.1987-0
font encoding)
* name changes in functions: xxxxComposedHangul -> xxxDecomposedHangul
4. Hangul JOHAB encoding support (bug 71489, bug 80111, bug 88939)
- renamed intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
as nsUnicodeToJohabNoAscii.{cpp,h} (as it does NOT include
US-ASCII part. It's only for Sun's ksc5601.1992-3 font encoding)
- renamed 'x-johab' to 'x-johab-noascii' to make it clear
that it doesn't include US-ASCII.
'x-johab' will be used for the genuine JOHAB encoding.
- made new intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
to support the genuine JOHAB encoding (including US-ASCII part)
- added new files intl/uconv/ucvko/nsJohabToUnicode.{cpp,h}
- corresponding changes in intl/uconv/ucvko/nsUCvKOCID.h
and nsUCvKoModule.cpp
- added 'x-johab' and 'x-johab-noascii' related entries to
intl/uconv/src/*.properties files
- made necessary changes in gfx/src/gtk/nsFontMetricsGTK.cpp
for 'x-johab' and 'x-johab-noascii'
| Reporter | ||
Comment 1•24 years ago
|
||
| Reporter | ||
Comment 2•24 years ago
|
||
| Reporter | ||
Comment 3•24 years ago
|
||
| Reporter | ||
Comment 4•24 years ago
|
||
| Reporter | ||
Comment 5•24 years ago
|
||
| Reporter | ||
Comment 6•24 years ago
|
||
1. mozilla/intl/uconv/ucvko/u20cp949hangul.uf and u20cp949hangul.ut
can be generated by filtering CP949.TXT (available at
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT)
with
egrep -v '^#' | grep 'HANGUL SYLL' | egrep -v
'^0x(A[1-F]|[B-C][0-F])([B-F][0-F]|A[1-F])'
and then running the result thru intl/uconv/tools/umaptable.
I'm going to attach gzipped tarred file for these two.
2. nsJohabToUnicode.cpp, nsJohabToUnicode.h, nsUnicodeToJohabNoAscii.(h|cpp)
are all new files.
| Reporter | ||
Comment 7•24 years ago
|
||
| Reporter | ||
Updated•24 years ago
|
OS: Linux → All
Comment 10•24 years ago
|
||
Jungshik Shin-
most of your code looks good. However, please do the following:
1. attach u20cp949hangul.uf and u20cp949hangul.ut file as plain text
2. The diff in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=40975
have big gfx changes. I think that is a mistake. You probably attach something
wrong there. It looks like some code already check in.
Why don't do the following. submit two patch, one include everthing inside
intl/uconv . you can do this by
1. cvs add newfiles
2. cvs diff -uN intl/uconv
In this way, all the new files you add (nsJohabToUnicode.cpp etc) will be
included in your patch
and also put a seperate patch for gfx/src/gtk
And make sure you review it before you attach it.
I think we need Sun team to help to QA this on unix also. katakai, can you
arrange that. This is definitely too late for moz0.9.2 But I think it will be
nice if we can land this into trunk ASAP (after the patch is ready)
Target Milestone: --- → mozilla0.9.3
| Reporter | ||
Comment 11•24 years ago
|
||
Frank,
Thank you for your advice.
I'm gonna attach a new patch for gfx/src/gtk (sorry for making it a mess) and
u20cp949hangul.uf and u20cp949hangul.ut in plain text
As for including new files in intl/uconv patch, I'm afraid I can't
because to run 'cvs add newfile' I need to have a write access to
the repository, but I don't. Without 'cvs add', 'cvs diff -uN'
appears not to work for new files(it complains that it doesn't know anything
about new files as it should). What would you say I should do
with this?
| Reporter | ||
Comment 12•24 years ago
|
||
| Reporter | ||
Comment 13•24 years ago
|
||
| Reporter | ||
Comment 14•24 years ago
|
||
| Reporter | ||
Comment 15•24 years ago
|
||
Comment 16•24 years ago
|
||
r=ftang for
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41101
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41107
The new mapping table for cp949 will increase about 30K in ucvko. But we won't
use them unless we need to convert to/from Korean.
Comment 17•24 years ago
|
||
jshin@pantheon.yale.edu- do you want a cvs write access?
Let's get superreview review your change so you can land into trunk.
Please attach test pages into this bug also.
Status: NEW → ASSIGNED
Comment 18•24 years ago
|
||
Testing:
OK, I'll ask our l10n team for testing. But the testing will be done
in next week as Sun offices (except Tokyo) are closed whole this week.
| Reporter | ||
Comment 19•24 years ago
|
||
> jshin@pantheon.yale.edu- do you want a cvs write access?
Yes, let me have it. Can you grant me a write access?
> Let's get superreview review your change so you can land into trunk.
Who do I have to ask for superreview?
BTW, it's a little bit involved to split away JOHAB support from my patch
(in addition to 4 new files, there are changes made for that in my patch).
Can you review them as well?
> Please attach test pages into this bug also.
All right. I'll do that later today.
| Reporter | ||
Comment 20•24 years ago
|
||
The page at the URL given in the URL field of this bug
have had links to test pages for
four encodings (EUC-KR with 8byte seq. representation,
JOHAB, CP949 and UTF-8). I'm now attaching CP949, JOHAB
and EUC-KR with 8byte seq. test pages (the same pages
as available at the URL above).
| Reporter | ||
Comment 21•24 years ago
|
||
| Reporter | ||
Comment 22•24 years ago
|
||
| Reporter | ||
Comment 23•24 years ago
|
||
| Reporter | ||
Comment 24•24 years ago
|
||
In addition to trying test pages attached and available at the URL given above
in my build of Mozilla (with Korean fonts in both ksc5601.1987-0 encoding
and johabsh-1 encoding - x-x11-johab in Mozilla ),
I've tested various Korean converters with 'nsconv' (intl/uconv/tests)
used as a filter and found them working as expected.
Comment 25•24 years ago
|
||
sr=blizzard
Comment 26•24 years ago
|
||
yokoyama, can you land this into trunk?
make sure you first land 75707, and then merged with this one and test
http://warp/u/ftang/utf8test/buffer.cgi with all these charset.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
| Assignee | ||
Comment 28•24 years ago
|
||
Jungshik:
I am trying to land your patches to the trunk; but
can you tell me which one of patches is
for u20cp949hangul.uf and for u20cp949hangul.ut?
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103
I am assuming 41102 is for u20cp949hangul.uf
and 41103 for u20cp949hangul.ut, correct?
Your gzipped patch (40984) didn't work for WinZip. :(
| Reporter | ||
Comment 29•24 years ago
|
||
Roy,
Yes, you're right. 41102 is u20cp949hangul.uf (CP949 FROM Unicode)
and 41103 is u20cp949hangul.ut ( CP949 TO Unicode).
Thank you !
Jungshik
| Assignee | ||
Comment 30•24 years ago
|
||
Jungshik:
I completed the patch and did a little test run.
I see two new menus Korean (JOHAB) and Korean (UHC)
under View/CharEncoding/More [see attached]
Do they suppose to be placed under View/CharEncoding/More/East Asian?
| Assignee | ||
Comment 31•24 years ago
|
||
| Assignee | ||
Comment 32•24 years ago
|
||
Jungshik:
I tested your test pages they look great.
Created an attachment (id=41168) Johab encoding test page
Created an attachment (id=41169) CP949 test page
I notice few characters (only handful) shown as '?'.
For example by using (41169 ) CP949 test page:
0xa2e6, 0xa2e7, 0xa2e8, 0xa2e9, 0xa2ea,...
0xa2f0, 0xa2f1, 0xa2f2, 0xa2f3,.....
Are they as expected?
Whiteboard: /r=ftang; /sr=blizzard.
| Assignee | ||
Comment 33•24 years ago
|
||
| Assignee | ||
Comment 34•24 years ago
|
||
reminder: we need to apply the patch (id=42266) to commercial tree as well.
Comment 35•24 years ago
|
||
navigator.properties, please also change the commercial one.
| Reporter | ||
Comment 36•24 years ago
|
||
Roy,
Thank you for testing.
Some characters (e.g. 0xa2e6 - 0xa2fe) rendered as '?' are normal. They're
*empty* code
points in both CP949 and EUC-KR. I should have gotten them omitted,
but I forgot to do that while generating test pages with a script.
Whiteboard: /r=ftang; /sr=blizzard.
| Reporter | ||
Comment 37•24 years ago
|
||
Ftang,Thank you for patching menu properties file.
Roy, thank you for catching that glitch.
I put new encodings along with other East Asian encodings
in my local copy, but completely forgot
about it when uploading patches.
Comment 38•24 years ago
|
||
Just in case, my previous response to Roy (about characters rendered
as '?') was not clear, the result is *expected* and perfectly
normal and that does NOT mean
anything wrong with the converter. It just indicates that the test
page contains some empty code points which are expected to be rendered
as '?' because there's no character defined at those points.
| Assignee | ||
Comment 39•24 years ago
|
||
Jungshik:
Understood. We are carpooling the changes on all the platform.
Since there are makefile changes, we are testing on Mac and Linux.
Should be able to check into trunk very shortly.
Comment 40•24 years ago
|
||
I land the GTK change into trunk now.
| Assignee | ||
Comment 41•24 years ago
|
||
checked into the trunk.
one more left: we need to apply the patch (id=42266) to commercial tree.
| Assignee | ||
Comment 42•24 years ago
|
||
checked-in to trunk and commercial tree.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 43•24 years ago
|
||
I wonder if these checkins caused a few noticable regressions:
At http://counter.li.org/reports/short.html what used to read (Føroyar) now
reads (F?royar). In addition I crash when tooltips are enabled. (Linux CVS; an
hour old)
| Reporter | ||
Comment 44•24 years ago
|
||
> I wonder if these checkins caused a few noticable regressions:
> At http://counter.li.org/reports/short.html .
In encodings that don't cover U+00F8 (latin small letter o with stroke),
it's perfectly normal that it's rendered with '?'. Make sure your
View|Character Coding is set to Western(ISO-8859-1 or ISO-8859-15).
> In addition I crash when tooltips are enabled. (Linux CVS; an
> hour old)
I don't think this patch can cause a problem in tooltips.
P.S. Roy, thank you for landing this on the trunk.
You need to log in
before you can comment on or make changes to this bug.
Description
•