Closed Bug 88944 Opened 23 years ago Closed 23 years ago

meta bug on Korean converters

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.3

People

(Reporter: jshin, Assigned: tetsuroy)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(15 files)

patch for all four issues mentioned 23 years ago Jungshik Shin 71.19 KB, patch		Details \| Diff \| Splinter Review
nsJohabToUnicode.cpp 23 years ago Jungshik Shin 2.56 KB, text/plain		Details
nsJohabToUnicode.h 23 years ago Jungshik Shin 1.56 KB, text/plain		Details
nsUnicodeToJohabNoAscii.cpp 23 years ago Jungshik Shin 2.04 KB, text/plain		Details
nsUnicodeToJohabNoAscii.h 23 years ago Jungshik Shin 1.48 KB, text/plain		Details
gzipped and tarreu u20cp949hangul.uf and u20cp949hangul.ut 23 years ago Jungshik Shin 90.00 KB, application/x-gzip		Details
a new gfx/src/grk patch 23 years ago Jungshik Shin 1.19 KB, patch		Details \| Diff \| Splinter Review
Unicode -> CP949 (non-KSX1001 part) mapping table 23 years ago Jungshik Shin 195.68 KB, text/plain		Details
CP949(non-KSX1001 part) -> Unicode mapping table 23 years ago Jungshik Shin 178.71 KB, text/plain		Details
a new patch for intl/uconv (new files NOT included, but one typo fixed and gfx diff excluded) 23 years ago Jungshik Shin 35.37 KB, patch		Details \| Diff \| Splinter Review
Johab encoding test page 23 years ago Jungshik Shin 63.53 KB, text/html		Details
CP949 test page 23 years ago Jungshik Shin 122.59 KB, text/html		Details
EUC-KR with 8byte seq. for Hangul syllables 23 years ago Jungshik Shin 2.02 KB, text/html		Details
View/CharEncoding/More menu 23 years ago Roy Yokoyama 28.56 KB, image/jpeg		Details
Thanks to ftang. menu issue is resolve by changning properties file. 23 years ago Roy Yokoyama 1.50 KB, patch		Details \| Diff \| Splinter Review

Jungshik Shin

Reporter

Description

•

23 years ago

This meta-bug list four issues about Korean converters
for which separate bugs have been filed. One may wonder why
I file this meta-bug. That's because I fixed them all together
and I found it hard to split my patch(which I will attach
later) into four independent pieces.

1. CP949(Windows-949, Unified Hangul Code) converter in both
directions : bug 7962

  - filled up the placeholder files(nsUnicodeToCP949.{cpp,h}
    and nsCP949ToUnicode.{cpp,h}) in intl/uconv/ucvko
  - added u2BytesGR128Charset in intl/uconv/public/uconvutil.h
  - added the correspodingg scanner function
    uCheckAndScanAlways2ByteGR128() in intl/uconv/src/uscan.c
  - added the place-holder for m_generator list in ugen.c
  - added 'x-windows-949' related entries to intl/uconv/src/*.properties files
  - added two new files (u20cp949hangul.uf,u20cp949hangul.ut)
    to intl/uconv/ucvko directory. they do NOT include characters
    covered by KS X 1001. They only include 8822 Hangul syllables
    only available in CP949 (in pre-composed form)
  - added new entries for CP949 to
    intl/uconv/ucvko/nsUCvKOCID.h and nsUCvKoModule.cpp


2. EUC-KR -> Unicode converter enhancement (bug 9962)
8byte (4 character) sequence to represent 8822 Hangul syllables
(not encoded in pre-composed form in KS X 1001 derived encodings
such as EUC-KR) is recognized and properly converted to
Hangul syllables in Unicode.

  - modified intl/uconv/ucvko/nsEUCKRToUnicode.cpp
  - modified intl/uconv/src/uscan.c
    * name changes in functions: xxxxComposedHangul -> xxxxDecomposedHangul

3. UnicodeToKSC5601.cpp (bug 9961 and bug 88922)

  When rendering UTF-8 (or CP949) pages with Hangul syllables
  not available in ksc5601.1987.0 fonts in pre-composed form,
  the leading 'blank' glyph of 4-character sequence
  had better be omitted.

  - modified intl/uconv/src/ugen.c
   * add another arguement to function uGenDecomposedHangulCommon()
     for the length of the sequence to generate (8byte for
     Unicode -> EUC-KR converter and 6byte for Unicode -> ksc5601.1987-0
     font encoding)
   * name changes in functions: xxxxComposedHangul -> xxxDecomposedHangul
4. Hangul JOHAB encoding support (bug 71489, bug 80111, bug 88939)

  - renamed intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
    as nsUnicodeToJohabNoAscii.{cpp,h} (as it does NOT include
    US-ASCII part. It's only for Sun's ksc5601.1992-3 font encoding)
  - renamed 'x-johab' to 'x-johab-noascii' to make it clear
    that it doesn't include US-ASCII.
    'x-johab' will be used for the genuine JOHAB encoding.
  - made  new intl/uconv/ucvko/nsUnicodeToJohab.{cpp,h}
    to support the genuine JOHAB encoding (including US-ASCII part)
  - added new files intl/uconv/ucvko/nsJohabToUnicode.{cpp,h}
  - corresponding changes in intl/uconv/ucvko/nsUCvKOCID.h
    and nsUCvKoModule.cpp
  - added 'x-johab' and 'x-johab-noascii' related entries to
    intl/uconv/src/*.properties files
  - made necessary changes in gfx/src/gtk/nsFontMetricsGTK.cpp
    for 'x-johab' and 'x-johab-noascii'

Jungshik Shin

Reporter

Comment 1

•

23 years ago

Attached patch patch for all four issues mentioned — Details — Splinter Review

Jungshik Shin

Reporter

Comment 2

•

23 years ago

Attached file nsJohabToUnicode.cpp — Details

Jungshik Shin

Reporter

Comment 3

•

23 years ago

Attached file nsJohabToUnicode.h — Details

Jungshik Shin

Reporter

Comment 4

•

23 years ago

Attached file nsUnicodeToJohabNoAscii.cpp — Details

Jungshik Shin

Reporter

Comment 5

•

23 years ago

Attached file nsUnicodeToJohabNoAscii.h — Details

Jungshik Shin

Reporter

Comment 6

•

23 years ago

1. mozilla/intl/uconv/ucvko/u20cp949hangul.uf and u20cp949hangul.ut
  can be generated by filtering CP949.TXT (available at 
  ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT)
  with 

    egrep -v '^#' | grep 'HANGUL SYLL' |  egrep -v  
'^0x(A[1-F]|[B-C][0-F])([B-F][0-F]|A[1-F])'

  and then running the result thru intl/uconv/tools/umaptable.

  I'm going to attach gzipped tarred file for these two. 

2. nsJohabToUnicode.cpp, nsJohabToUnicode.h, nsUnicodeToJohabNoAscii.(h|cpp)
   are all new files.

Jungshik Shin

Reporter

Comment 7

•

23 years ago

Attached file gzipped and tarreu u20cp949hangul.uf and u20cp949hangul.ut — Details

Jungshik Shin

Reporter

Updated

•

23 years ago

Blocks: 88939

Jungshik Shin

Reporter

Updated

•

23 years ago

Blocks: 88922

Jungshik Shin

Reporter

Updated

•

23 years ago

Blocks: 7962

Jungshik Shin

Reporter

Updated

•

23 years ago

Blocks: 9962

Jungshik Shin

Reporter

Updated

•

23 years ago

OS: Linux → All

Andreas Becker

Comment 8

•

23 years ago

Switching QA contact to teruko for now.

QA Contact: andreasb → teruko

nhottanscp

Comment 9

•

23 years ago

Reassign to ftang.

Assignee: nhotta → ftang

Keywords: intl

Frank Tang

Comment 10

•

23 years ago

Jungshik Shin-
most of your code looks good. However, please do the following:
1. attach u20cp949hangul.uf and u20cp949hangul.ut file as plain text
2. The diff in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=40975
have big gfx changes. I think that is a mistake. You probably attach something
wrong there. It looks like some code already check in.
Why don't do the following. submit two patch, one include everthing inside
intl/uconv . you can do this by
1. cvs add newfiles
2. cvs diff -uN intl/uconv
In this way, all the new files you add (nsJohabToUnicode.cpp etc) will be
included in your patch

and also put a seperate patch for gfx/src/gtk
And make sure you review it before you attach it.

I think we need Sun team to help to QA this on unix also. katakai, can you
arrange that. This is definitely too late for moz0.9.2 But I think it will be
nice if we can land this into trunk ASAP (after the patch is ready)

Target Milestone: --- → mozilla0.9.3

Jungshik Shin

Reporter

Comment 11

•

23 years ago

Frank,

Thank you for your advice.

I'm gonna attach a new patch for gfx/src/gtk (sorry for making it a mess) and
u20cp949hangul.uf and u20cp949hangul.ut in plain text

As for including new files in intl/uconv patch, I'm afraid I can't
because to run 'cvs add newfile' I need to have a write access to
the repository, but I don't. Without 'cvs add', 'cvs diff -uN' 
appears not to work for new files(it complains that it doesn't know anything
about new files as it should). What would you say I should do
with this?

Jungshik Shin

Reporter

Comment 12

•

23 years ago

Attached patch a new gfx/src/grk patch — Details — Splinter Review

Jungshik Shin

Reporter

Comment 13

•

23 years ago

Attached file Unicode -> CP949 (non-KSX1001 part) mapping table — Details

Jungshik Shin

Reporter

Comment 14

•

23 years ago

Attached file CP949(non-KSX1001 part) -> Unicode mapping table — Details

Jungshik Shin

Reporter

Comment 15

•

23 years ago

Attached patch a new patch for intl/uconv (new files NOT included, but one typo fixed and gfx diff excluded) — Details — Splinter Review

Frank Tang

Comment 16

•

23 years ago

r=ftang for 
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41101
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41107
The new mapping table for cp949 will increase about 30K in ucvko. But we won't 
use them unless we need to convert to/from Korean.

Frank Tang

Comment 17

•

23 years ago

jshin@pantheon.yale.edu- do you want a cvs write access?
Let's get superreview review your change so you can land into trunk.

Please attach test pages into this bug also.

Status: NEW → ASSIGNED

Masaki Katakai

Comment 18

•

23 years ago

Testing:

OK, I'll ask our l10n team for testing. But the testing will be done
in next week as Sun offices (except Tokyo) are closed whole this week.

Jungshik Shin

Reporter

Comment 19

•

23 years ago

> jshin@pantheon.yale.edu- do you want a cvs write access?

  Yes, let me have it. Can you grant me a write access?

> Let's get superreview review your change so you can land into trunk.

  Who do I have to ask for superreview?

 BTW, it's a little bit involved to split away JOHAB support from my patch
(in addition to 4 new files, there are changes made for that in  my patch).
Can you review them as well?

> Please attach test pages into this bug also.

  All right. I'll do that later today.

Jungshik Shin

Reporter

Comment 20

•

23 years ago

The page at the URL given in the URL field of this bug
have had links to test pages for
four encodings (EUC-KR with 8byte seq. representation,
JOHAB, CP949 and UTF-8). I'm now attaching CP949, JOHAB
and EUC-KR with 8byte seq. test pages (the same pages
as available at the URL above).

Jungshik Shin

Reporter

Comment 21

•

23 years ago

Attached file Johab encoding test page — Details

Jungshik Shin

Reporter

Comment 22

•

23 years ago

Attached file CP949 test page — Details

Jungshik Shin

Reporter

Comment 23

•

23 years ago

Attached file EUC-KR with 8byte seq. for Hangul syllables — Details

Jungshik Shin

Reporter

Comment 24

•

23 years ago

 In addition to trying test pages attached and available at the URL given above
in my build of Mozilla (with Korean fonts in both ksc5601.1987-0 encoding
and johabsh-1 encoding - x-x11-johab in Mozilla ), 
I've tested various Korean converters with 'nsconv' (intl/uconv/tests)
used as a filter and found them working as expected.

Christopher Blizzard (:blizzard)

Comment 25

•

23 years ago

sr=blizzard

Frank Tang

Comment 26

•

23 years ago

yokoyama, can you land this into trunk?
make sure you first land 75707, and then merged with this one and test 
http://warp/u/ftang/utf8test/buffer.cgi with all these charset.

Assignee: ftang → yokoyama

Status: ASSIGNED → NEW

Roy Yokoyama

Assignee

Comment 27

•

23 years ago

Accepting. I'll land this to the trunk.

Status: NEW → ASSIGNED

Roy Yokoyama

Assignee

Comment 28

•

23 years ago

Jungshik: 
I am trying to land your patches to the trunk; but
can you tell me which one of patches is 
for u20cp949hangul.uf and for u20cp949hangul.ut?
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41102
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=41103

I am assuming 41102 is for u20cp949hangul.uf
and 41103 for u20cp949hangul.ut, correct?
Your gzipped patch (40984) didn't work for WinZip. :(

Jungshik Shin

Reporter

Comment 29

•

23 years ago

Roy,

 Yes, you're right. 41102 is u20cp949hangul.uf (CP949 FROM Unicode)
and 41103 is u20cp949hangul.ut ( CP949 TO Unicode).

  Thank you !

   Jungshik

Roy Yokoyama

Assignee

Comment 30

•

23 years ago

Jungshik: 
I completed the patch and did a little test run.

I see two new menus Korean (JOHAB) and Korean (UHC) 
under View/CharEncoding/More [see attached]
Do they suppose to be placed under View/CharEncoding/More/East Asian?

Roy Yokoyama

Assignee

Comment 31

•

23 years ago

Attached image View/CharEncoding/More menu — Details

Roy Yokoyama

Assignee

Comment 32

•

23 years ago

Jungshik:  

I tested your test pages they look great.
Created an attachment (id=41168)  Johab encoding test page
Created an attachment (id=41169)  CP949 test page

I notice few characters (only handful) shown as '?'.  
For example by using (41169 ) CP949 test page:
0xa2e6, 0xa2e7, 0xa2e8, 0xa2e9, 0xa2ea,...
0xa2f0, 0xa2f1, 0xa2f2, 0xa2f3,.....

Are they as expected?

Whiteboard: /r=ftang; /sr=blizzard.

Roy Yokoyama

Assignee

Comment 33

•

23 years ago

Attached patch Thanks to ftang. menu issue is resolve by changning properties file. — Details — Splinter Review

Roy Yokoyama

Assignee

Comment 34

•

23 years ago

reminder: we need to apply the patch (id=42266) to commercial tree as well.

nhottanscp

Comment 35

•

23 years ago

navigator.properties, please also change the commercial one.

Jungshik Shin

Reporter

Comment 36

•

23 years ago

Roy,

Thank you for testing.

Some characters (e.g. 0xa2e6 - 0xa2fe) rendered as '?' are normal. They're
*empty* code
points in both CP949 and EUC-KR. I should have gotten them omitted,
but I forgot to do that while generating test pages with a script.

Whiteboard: /r=ftang; /sr=blizzard.

Jungshik Shin

Reporter

Comment 37

•

23 years ago

Ftang,Thank you for patching menu properties file.
Roy, thank you for catching that glitch.
I put new encodings along with other East Asian encodings
in my local copy, but completely forgot
about it when uploading patches.

Jungshik Shin

Comment 38

•

23 years ago

Just in case, my previous response to Roy (about characters rendered
as '?') was not clear, the result is *expected* and perfectly 
normal and that does NOT mean
anything wrong with the converter. It just indicates that the test
page contains some empty code points which are expected to be rendered
as '?' because there's no character defined at those points.

Roy Yokoyama

Assignee

Comment 39

•

23 years ago

Jungshik:  
Understood.  We are carpooling the changes on all the platform.
Since there are makefile changes, we are testing on Mac and Linux.
Should be able to check into trunk very shortly.

Frank Tang

Comment 40

•

23 years ago

I land the GTK change into trunk now.

Roy Yokoyama

Assignee

Comment 41

•

23 years ago

checked into the trunk.
one more left: we need to apply the patch (id=42266) to commercial tree.

Roy Yokoyama

Assignee

Comment 42

•

23 years ago

checked-in to trunk and commercial tree.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

R.K.Aa.

Comment 43

•

23 years ago

I wonder if these checkins caused a few noticable regressions:
At http://counter.li.org/reports/short.html what used to read (Føroyar) now
reads (F?royar). In addition I crash when tooltips are enabled. (Linux CVS; an
hour old)

Jungshik Shin

Reporter

Comment 44

•

23 years ago

> I wonder if these checkins caused a few noticable regressions:
> At http://counter.li.org/reports/short.html . 

  In encodings that don't cover U+00F8 (latin small letter o with stroke),
it's perfectly normal that it's rendered with '?'. Make sure your 
View|Character Coding is set to Western(ISO-8859-1 or ISO-8859-15).

> In addition I crash when tooltips are enabled. (Linux CVS; an
> hour old)

  I don't think this patch can cause a problem in tooltips. 

P.S. Roy, thank you for landing this on the trunk.

Cindy Roberts

Updated

•

23 years ago

Keywords: nsBranch

Teruko Kobayashi

Comment 45

•

23 years ago

Verified as fixed in 8-28 trunk build.

Status: RESOLVED → VERIFIED

msanz

Updated

•

23 years ago

Keywords: nsbranch → nsbranch+

You need to log in before you can comment on or make changes to this bug.