Closed Bug 88944 Opened 23 years ago Closed 23 years ago

meta bug on Korean converters

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.3

People

(Reporter: jshin, Assigned: tetsuroy)

References

()

Details

(Keywords: intl)

Attachments

(15 files)

71.19 KB, patch
Details | Diff | Splinter Review
2.56 KB, text/plain
Details
1.56 KB, text/plain
Details
2.04 KB, text/plain
Details
1.48 KB, text/plain
Details
90.00 KB, application/x-gzip
Details
1.19 KB, patch
Details | Diff | Splinter Review
195.68 KB, text/plain
Details
178.71 KB, text/plain
Details
35.37 KB, patch
Details | Diff | Splinter Review
63.53 KB, text/html
Details
122.59 KB, text/html
Details
2.02 KB, text/html
Details
28.56 KB, image/jpeg
Details
1.50 KB, patch
Details | Diff | Splinter Review
This meta-bug list four issues about Korean converters
for which separate bugs have been filed. One may wonder why
I file this meta-bug. That's because I fixed them all together
and I found it hard to split my patch(which I will attach
later) into four independent pieces.

1. CP949(Windows-949, Unified Hangul Code) converter in both
directions : bug 7962

  - filled up the placeholder files(nsUnicodeToCP949.{cpp,h}
    and nsCP949ToUnicode.{cpp,h}) in intl/uconv/ucvko
  - added u2BytesGR128Charset in intl/uconv/public/uconvutil.h
  - added the correspodingg scanner function
    uCheckAndScanAlways2ByteGR128() in intl/uconv/src/uscan.c
  - added the place-holder for m_generator list in ugen.c
  - added 'x-windows-949' related entries to intl/uconv/src/*.properties files
  - added two new files (u20cp949hangul.uf,u20cp949hangul.ut)
    to intl/uconv/ucvko directory. they do NOT include characters
    covered by KS X 1001. They only include 8822 Hangul syllables
    only available in CP949 (in pre-composed form)
  - added new entries for CP949 to
    intl/uconv/ucvko/nsUCvKOCID.h and nsUCvKoModule.cpp


2. EUC-KR -> Unicode converter enhancement (bug 9962)
8byte (4 character) sequence to represent 8822 Hangul syllables
(not encoded in pre-composed form in KS X 1001 derived encodings
such as EUC-KR) is recognized and properly converted to
Hangul syllables in Unicode.

  - modified intl/uconv/ucvko/nsEUCKRToUnicode.cpp
  - modified intl/uconv/src/uscan.c
    * name changes in functions: xxxxComposedHangul -> xxxxDecomposedHangul

3. UnicodeToKSC5601.cpp (bug 9961 and bug 88922)

  When rendering UTF-8 (or CP949) pages with Hangul syllables
  not available in ksc5601.1987.0 fonts in pre-composed form,
  the leading 'blank' glyph of 4-character sequence
  had better be omitted.

  - modified intl/uconv/src/ugen.c
   * add another arguement to function uGenDecomposedHangulCommon()
     for the length of the sequence to generate (8byte for
     Unicode -> EUC-KR converter and 6byte for Unicode -> ksc5601.1987-0
     font encoding)
   * name changes in functions: xxxxComposedHangul -> xxxDecomposedHangul
4. Hangul JOHAB encoding support (bug 71489, bug 80111, bug 88939)

  - renamed intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
    as nsUnicodeToJohabNoAscii.{cpp,h} (as it does NOT include
    US-ASCII part. It's only for Sun's ksc5601.1992-3 font encoding)
  - renamed 'x-johab' to 'x-johab-noascii' to make it clear
    that it doesn't include US-ASCII.
    'x-johab' will be used for the genuine JOHAB encoding.
  - made  new intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
    to support the genuine JOHAB encoding (including US-ASCII part)
  - added new files intl/uconv/ucvko/nsJohabToUnicode.{cpp,h}
  - corresponding changes in intl/uconv/ucvko/nsUCvKOCID.h
    and nsUCvKoModule.cpp
  - added 'x-johab' and 'x-johab-noascii' related entries to
    intl/uconv/src/*.properties files
  - made necessary changes in gfx/src/gtk/nsFontMetricsGTK.cpp
    for 'x-johab' and 'x-johab-noascii'
Attached file nsJohabToUnicode.cpp
Attached file nsJohabToUnicode.h
1. mozilla/intl/uconv/ucvko/u20cp949hangul.uf and u20cp949hangul.ut
  can be generated by filtering CP949.TXT (available at 
  ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT)
  with 

    egrep -v '^#' | grep 'HANGUL SYLL' |  egrep -v  
'^0x(A[1-F]|[B-C][0-F])([B-F][0-F]|A[1-F])'

  and then running the result thru intl/uconv/tools/umaptable.

  I'm going to attach gzipped tarred file for these two. 

2. nsJohabToUnicode.cpp, nsJohabToUnicode.h, nsUnicodeToJohabNoAscii.(h|cpp)
   are all new files.


Blocks: 88939
Blocks: 88922
Blocks: 7962
Blocks: 9962
OS: Linux → All
Switching QA contact to teruko for now.
QA Contact: andreasb → teruko
Reassign to ftang.
Assignee: nhotta → ftang
Keywords: intl
Jungshik Shin-
most of your code looks good. However, please do the following:
1. attach u20cp949hangul.uf and u20cp949hangul.ut file as plain text
2. The diff in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=40975
have big gfx changes. I think that is a mistake. You probably attach something
wrong there. It looks like some code already check in.
Why don't do the following. submit two patch, one include everthing inside
intl/uconv . you can do this by
1. cvs add newfiles
2. cvs diff -uN intl/uconv
In this way, all the new files you add (nsJohabToUnicode.cpp etc) will be
included in your patch

and also put a seperate patch for gfx/src/gtk
And make sure you review it before you attach it.

I think we need Sun team to help to QA this on unix also. katakai, can you
arrange that. This is definitely too late for moz0.9.2 But I think it will be
nice if we can land this into trunk ASAP (after the patch is ready)



Target Milestone: --- → mozilla0.9.3
Frank,

Thank you for your advice.

I'm gonna attach a new patch for gfx/src/gtk (sorry for making it a mess) and
u20cp949hangul.uf and u20cp949hangul.ut in plain text

As for including new files in intl/uconv patch, I'm afraid I can't
because to run 'cvs add newfile' I need to have a write access to
the repository, but I don't. Without 'cvs add', 'cvs diff -uN' 
appears not to work for new files(it complains that it doesn't know anything
about new files as it should). What would you say I should do
with this? 

jshin@pantheon.yale.edu- do you want a cvs write access?
Let's get superreview review your change so you can land into trunk.

Please attach test pages into this bug also. 
Status: NEW → ASSIGNED
Testing:

OK, I'll ask our l10n team for testing. But the testing will be done
in next week as Sun offices (except Tokyo) are closed whole this week.
> jshin@pantheon.yale.edu- do you want a cvs write access?

  Yes, let me have it. Can you grant me a write access?

> Let's get superreview review your change so you can land into trunk.

  Who do I have to ask for superreview?

 BTW, it's a little bit involved to split away JOHAB support from my patch
(in addition to 4 new files, there are changes made for that in  my patch).
Can you review them as well?

> Please attach test pages into this bug also.

  All right. I'll do that later today.
The page at the URL given in the URL field of this bug
have had links to test pages for
four encodings (EUC-KR with 8byte seq. representation,
JOHAB, CP949 and UTF-8). I'm now attaching CP949, JOHAB
and EUC-KR with 8byte seq. test pages (the same pages
as available at the URL above).

Attached file CP949 test page
 In addition to trying test pages attached and available at the URL given above
in my build of Mozilla (with Korean fonts in both ksc5601.1987-0 encoding
and johabsh-1 encoding - x-x11-johab in Mozilla ), 
I've tested various Korean converters with 'nsconv' (intl/uconv/tests)
used as a filter and found them working as expected. 
sr=blizzard
yokoyama, can you land this into trunk?
make sure you first land 75707, and then merged with this one and test 
http://warp/u/ftang/utf8test/buffer.cgi with all these charset. 
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Accepting. I'll land this to the trunk.
Status: NEW → ASSIGNED
Jungshik: 
I am trying to land your patches to the trunk; but
can you tell me which one of patches is 
for u20cp949hangul.uf and for u20cp949hangul.ut?
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103

I am assuming 41102 is for u20cp949hangul.uf
and 41103 for u20cp949hangul.ut, correct?
Your gzipped patch (40984) didn't work for WinZip. :(
Roy,

 Yes, you're right. 41102 is u20cp949hangul.uf (CP949 FROM Unicode)
and 41103 is u20cp949hangul.ut ( CP949 TO Unicode).

  Thank you !

   Jungshik
Jungshik: 
I completed the patch and did a little test run.

I see two new menus Korean (JOHAB) and Korean (UHC) 
under View/CharEncoding/More [see attached]
Do they suppose to be placed under View/CharEncoding/More/East Asian?

Jungshik:  

I tested your test pages they look great.
Created an attachment (id=41168)  Johab encoding test page
Created an attachment (id=41169)  CP949 test page

I notice few characters (only handful) shown as '?'.  
For example by using (41169 ) CP949 test page:
0xa2e6, 0xa2e7, 0xa2e8, 0xa2e9, 0xa2ea,...
0xa2f0, 0xa2f1, 0xa2f2, 0xa2f3,.....

Are they as expected?

Whiteboard: /r=ftang; /sr=blizzard.
reminder: we need to apply the patch (id=42266) to commercial tree as well.
navigator.properties, please also change the commercial one.
Roy,

Thank you for testing.

Some characters (e.g. 0xa2e6 - 0xa2fe) rendered as '?' are normal. They're
*empty* code
points in both CP949 and EUC-KR. I should have gotten them omitted,
but I forgot to do that while generating test pages with a script.


Whiteboard: /r=ftang; /sr=blizzard.
Ftang,Thank you for patching menu properties file.
Roy, thank you for catching that glitch.
I put new encodings along with other East Asian encodings
in my local copy, but completely forgot
about it when uploading patches.
Just in case, my previous response to Roy (about characters rendered
as '?') was not clear, the result is *expected* and perfectly 
normal and that does NOT mean
anything wrong with the converter. It just indicates that the test
page contains some empty code points which are expected to be rendered
as '?' because there's no character defined at those points.
Jungshik:  
Understood.  We are carpooling the changes on all the platform.
Since there are makefile changes, we are testing on Mac and Linux.
Should be able to check into trunk very shortly.
I land the GTK change into trunk now. 
checked into the trunk.
one more left: we need to apply the patch (id=42266) to commercial tree.
checked-in to trunk and commercial tree.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
I wonder if these checkins caused a few noticable regressions:
At http://counter.li.org/reports/short.html what used to read (Føroyar) now
reads (F?royar). In addition I crash when tooltips are enabled. (Linux CVS; an
hour old)
> I wonder if these checkins caused a few noticable regressions:
> At http://counter.li.org/reports/short.html . 

  In encodings that don't cover U+00F8 (latin small letter o with stroke),
it's perfectly normal that it's rendered with '?'. Make sure your 
View|Character Coding is set to Western(ISO-8859-1 or ISO-8859-15).

> In addition I crash when tooltips are enabled. (Linux CVS; an
> hour old)

  I don't think this patch can cause a problem in tooltips. 

P.S. Roy, thank you for landing this on the trunk. 
Keywords: nsBranch
Verified as fixed in 8-28 trunk build.
Status: RESOLVED → VERIFIED
Keywords: nsbranchnsbranch+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: