Closed Bug 134749 Opened 22 years ago Closed 22 years ago

two new chars to add to KS X 1001 based converters(EUC-KR,CP949,ISO-2022-KR)

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl, Whiteboard: done)

Attachments

(2 files)

Two new characters - EURO SIGN and REGISTERED SIGN - were added
to KS X 1001 in December 1998 (KS X 1001:1998), but Mozilla's
encoding conveters based on KS X 1001 (EUC-KR, CP949, ISO-2022-KR
and JOHAB) don't support them yet. 
All of these converters depend on two files, u20ksc5601gl.uf
u20ksc5601gl.ut. Adding two characters to these files (actually,
the original Unicode mapping table and generating them automatically)
would solve this problem. I've already generated them. 

However, there's a problem with doing this. Korean true type fonts for
MS-Windows have been updated to include these two characters, but
most X11 BDF fonts (in ksc5601.1987-0) don't have these two characters.
With increasing use of TTF, this won't be an issue in the long run,
but for the time being, this could be problematic.
Keywords: intl
QA Contact: ruixu → ylong
The following note was manually added to the file automatically
generated by umaptable and some Unix filters:

  * Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749)

    - More specifically, CP949.TXT at
      ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
      was used with the following Unix filters (which remove CP949 extension
      of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points).

      egrep -v '^#' CP949.TXT |  \
      egrep  '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \
      perl -pe \
      's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge' 
\     ./umaptable -ut

   - Difference between the previous version and this version is two 
     new characters
     added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and
     0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71
     (0x2267 in GL and 0xA2E7 in GR). This change brings up
     the mapping table to the specification in KS X 1001:1998.
* Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749)

    - More specifically, CP949.TXT at
      ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
      was used with the following Unix filters (which remove CP949 extension
      of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points).

      egrep -v '^#' CP949.TXT |  \
      egrep  '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \
      perl -pe \
      's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge'
|\	./umaptable -ut

   - Difference between the previous version and this version is two 
     new characters
     added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and
     0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71
     (0x2267 in GL and 0xA2E7 in GR). This change brings up
     the mapping table to the specification in KS X 1001:1998.
Target Milestone: --- → mozilla1.0.1
FYI, we will support these characters in our future Solaris releases, patches
might be provided upon request.
BTW, it will be great if this can be included in Mozilla 1.0 or in next Netscape
major release.
Confirming based on comments.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Roy and Frank,
Could you review this?  This is a trivial bug with a simple patch. 
Whiteboard: done, waiting for review
jshin: Sorry but u20ksc5601gl.uf and u20ksc5601gl.ut don't exist in moz tree,
       and none of file has #include "u20ksc560lgl.uf", nor reference to such 
       files. 
Roy,
Thank you for looking at my patch. I made a mistake
in my bug report. The filenames for
two attachments (attachment 77133 [details] and attachment 77136 [details])
are not u20ksc5601.(ut|uf) but u20kscgl.ut and u20kscgl.uf.
They're included by nsUCvKoModule.cpp (intl/uconv/ucvko)
Could you look at them again and give them a review?

Adding just two chars in the map table shouldn't have
a huge difference in file size.  However, I see

filename           trunk              patch
-------------------------------------------------
u20kscgl.uf      582kb           420kb
u20kscgl.ut      141kb           135kb

I never used the _fromu_ nor _umaptable_; but
the trunk file says it's generated by using fromu instead of
umaptable.  Would you expound it for me?
Thank you for taking a look at my patch.

> Adding just two chars in the map table shouldn't have
> a huge difference in file size.  However, I see
filename           trunk              patch
-------------------------------------------------
u20kscgl.uf      582kb           420kb
u20kscgl.ut      141kb           135kb

> I never used the _fromu_ nor _umaptable_; but
> the trunk file says it's generated by using fromu instead of
> umaptable.  Would you expound it for me?

  I also noticed the huge difference in the size of two
files when I first made them. I double-checked the number
of characters being mapped is correct (that is, all the
characters in KS X 1001 are mapped). 
Then, I forgot about it because
with new map files all the test cases worked just as well
(http://jshin.net/i18n/koencodings.html) and better
in that Euro and Registered sign are rendered correctly
and mail/news message and html files saved in EUC-KR,
CP949 and Johab have two new characters not in
NCRs but in their code point values in corresponding
encodings. I conducted these tests under both
Win2k and Linux.  

 Both fromu and umaptable were written by Frank and he'd be able to
explain the details. Apparently, fromu used a less efficient algorithm than
umaptable (which has replaced fromu) and that's the cause
of the huge difference in the file size. For example, 
umaptable uses far fewer format 2 mappings than 
fromu ( 0x5a vs 0x54B) Format 2 mapping is the most space-consuming
(in terms of the size of the generated map file. note that 
a bulk of space is used by comment for human readers).

  umaptable has been used without a problem to generate
other mapping tables (in ucvko, CP949 mapping table
and Johab Jamo table were generated with this).

  I hope this would be sufficient to convince you that
new mapping tables work all right :-). Could you take
another look? 

  Thank you,
sorry for spamming. I forgot to add the URL for a short test case.
If this patch works and a Korean font used for rendering the page
has Euro and Registered sign, Euro and Registered sign should
follow 'Tel' sign. Under Linux, my patch to ksc5601.1992-3.enc
file (that I submitted to XFree86) has to be applied and 
MS Gulim/Batang/Dotum (as opposed to Baekmuk Gulim/Batang) has to
be used. Therefore, it's better to test this under MS-Windows. 
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping

/r=yokoyama; thanks for your  extensive research and time.
Attachment #77133 - Flags: review+
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table

/r=yokoyama; thanks for your  extensive research and time.
Attachment #77136 - Flags: review+
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping

sr=alecf
Attachment #77133 - Flags: superreview+
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table

oops, sorry I got distracted between reviewing the first one and reviewing the
2nd one :)
sr=alecf
Attachment #77136 - Flags: superreview+
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping

a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77133 - Flags: approval+
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table

a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77136 - Flags: approval+
fix checked in to the trunk
thank you all !
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: done, waiting for review → done
changing qa contact to myself
QA Contact: ylong → yokoyama
verified the checkin.  
Jungshik: do you have a test page so that I can verify the EURO SIGN and
REGISTERED SIGN?
> Jungshik: do you have a test page so that I can verify the EURO SIGN and
> REGISTERED SIGN?

  Yup. Pls, point the latest nightly at the URL given in the URL field
(http://jshin.net/i18n/euckr_newchars.html)  :-)
As I mentioned earlier, setting up necessary fonts with two
new chars under Linux is tricky(I've done that, but you don't
have to bother ) so that you'd better check it out under Win2k. 
ylong:  can you verify?  The code is verified. thanks
QA Contact: yokoyama → ylong
Verified the test page in comment #20 is displayed fine with euro and reg. signs
on 07-22 trunk build / Win2k-SC.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: