Closed
Bug 134749
Opened 23 years ago
Closed 22 years ago
two new chars to add to KS X 1001 based converters(EUC-KR,CP949,ISO-2022-KR)
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla1.0.1
People
(Reporter: jshin1987, Assigned: jshin1987)
References
()
Details
(Keywords: intl, Whiteboard: done)
Attachments
(2 files)
134.64 KB,
text/plain
|
tetsuroy
:
review+
alecf
:
superreview+
asa
:
approval+
|
Details |
419.18 KB,
text/plain
|
tetsuroy
:
review+
alecf
:
superreview+
asa
:
approval+
|
Details |
Two new characters - EURO SIGN and REGISTERED SIGN - were added
to KS X 1001 in December 1998 (KS X 1001:1998), but Mozilla's
encoding conveters based on KS X 1001 (EUC-KR, CP949, ISO-2022-KR
and JOHAB) don't support them yet.
All of these converters depend on two files, u20ksc5601gl.uf
u20ksc5601gl.ut. Adding two characters to these files (actually,
the original Unicode mapping table and generating them automatically)
would solve this problem. I've already generated them.
However, there's a problem with doing this. Korean true type fonts for
MS-Windows have been updated to include these two characters, but
most X11 BDF fonts (in ksc5601.1987-0) don't have these two characters.
With increasing use of TTF, this won't be an issue in the long run,
but for the time being, this could be problematic.
Assignee | ||
Comment 1•23 years ago
|
||
The following note was manually added to the file automatically
generated by umaptable and some Unix filters:
* Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749)
- More specifically, CP949.TXT at
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
was used with the following Unix filters (which remove CP949 extension
of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points).
egrep -v '^#' CP949.TXT | \
egrep '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \
perl -pe \
's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge'
\ ./umaptable -ut
- Difference between the previous version and this version is two
new characters
added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and
0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71
(0x2267 in GL and 0xA2E7 in GR). This change brings up
the mapping table to the specification in KS X 1001:1998.
Assignee | ||
Comment 2•23 years ago
|
||
* Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749)
- More specifically, CP949.TXT at
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
was used with the following Unix filters (which remove CP949 extension
of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points).
egrep -v '^#' CP949.TXT | \
egrep '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \
perl -pe \
's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge'
|\ ./umaptable -ut
- Difference between the previous version and this version is two
new characters
added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and
0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71
(0x2267 in GL and 0xA2E7 in GR). This change brings up
the mapping table to the specification in KS X 1001:1998.
Assignee | ||
Updated•23 years ago
|
Target Milestone: --- → mozilla1.0.1
Comment 3•23 years ago
|
||
FYI, we will support these characters in our future Solaris releases, patches
might be provided upon request.
BTW, it will be great if this can be included in Mozilla 1.0 or in next Netscape
major release.
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•22 years ago
|
||
Roy and Frank,
Could you review this? This is a trivial bug with a simple patch.
Whiteboard: done, waiting for review
Comment 6•22 years ago
|
||
jshin: Sorry but u20ksc5601gl.uf and u20ksc5601gl.ut don't exist in moz tree,
and none of file has #include "u20ksc560lgl.uf", nor reference to such
files.
Assignee | ||
Comment 7•22 years ago
|
||
Roy,
Thank you for looking at my patch. I made a mistake
in my bug report. The filenames for
two attachments (attachment 77133 [details] and attachment 77136 [details])
are not u20ksc5601.(ut|uf) but u20kscgl.ut and u20kscgl.uf.
They're included by nsUCvKoModule.cpp (intl/uconv/ucvko)
Could you look at them again and give them a review?
Comment 8•22 years ago
|
||
Adding just two chars in the map table shouldn't have
a huge difference in file size. However, I see
filename trunk patch
-------------------------------------------------
u20kscgl.uf 582kb 420kb
u20kscgl.ut 141kb 135kb
I never used the _fromu_ nor _umaptable_; but
the trunk file says it's generated by using fromu instead of
umaptable. Would you expound it for me?
Assignee | ||
Comment 9•22 years ago
|
||
Thank you for taking a look at my patch.
> Adding just two chars in the map table shouldn't have
> a huge difference in file size. However, I see
filename trunk patch
-------------------------------------------------
u20kscgl.uf 582kb 420kb
u20kscgl.ut 141kb 135kb
> I never used the _fromu_ nor _umaptable_; but
> the trunk file says it's generated by using fromu instead of
> umaptable. Would you expound it for me?
I also noticed the huge difference in the size of two
files when I first made them. I double-checked the number
of characters being mapped is correct (that is, all the
characters in KS X 1001 are mapped).
Then, I forgot about it because
with new map files all the test cases worked just as well
(http://jshin.net/i18n/koencodings.html) and better
in that Euro and Registered sign are rendered correctly
and mail/news message and html files saved in EUC-KR,
CP949 and Johab have two new characters not in
NCRs but in their code point values in corresponding
encodings. I conducted these tests under both
Win2k and Linux.
Both fromu and umaptable were written by Frank and he'd be able to
explain the details. Apparently, fromu used a less efficient algorithm than
umaptable (which has replaced fromu) and that's the cause
of the huge difference in the file size. For example,
umaptable uses far fewer format 2 mappings than
fromu ( 0x5a vs 0x54B) Format 2 mapping is the most space-consuming
(in terms of the size of the generated map file. note that
a bulk of space is used by comment for human readers).
umaptable has been used without a problem to generate
other mapping tables (in ucvko, CP949 mapping table
and Johab Jamo table were generated with this).
I hope this would be sufficient to convince you that
new mapping tables work all right :-). Could you take
another look?
Thank you,
Assignee | ||
Comment 10•22 years ago
|
||
sorry for spamming. I forgot to add the URL for a short test case.
If this patch works and a Korean font used for rendering the page
has Euro and Registered sign, Euro and Registered sign should
follow 'Tel' sign. Under Linux, my patch to ksc5601.1992-3.enc
file (that I submitted to XFree86) has to be applied and
MS Gulim/Batang/Dotum (as opposed to Baekmuk Gulim/Batang) has to
be used. Therefore, it's better to test this under MS-Windows.
Comment 11•22 years ago
|
||
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping
/r=yokoyama; thanks for your extensive research and time.
Attachment #77133 -
Flags: review+
Comment 12•22 years ago
|
||
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table
/r=yokoyama; thanks for your extensive research and time.
Attachment #77136 -
Flags: review+
Comment 13•22 years ago
|
||
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping
sr=alecf
Attachment #77133 -
Flags: superreview+
Comment 14•22 years ago
|
||
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table
oops, sorry I got distracted between reviewing the first one and reviewing the
2nd one :)
sr=alecf
Attachment #77136 -
Flags: superreview+
Comment 15•22 years ago
|
||
Comment on attachment 77133 [details]
KS X 1001 GL to Unicode mapping
a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77133 -
Flags: approval+
Comment 16•22 years ago
|
||
Comment on attachment 77136 [details]
Unicode to KS X 1001 GL mapping table
a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77136 -
Flags: approval+
Assignee | ||
Comment 17•22 years ago
|
||
fix checked in to the trunk
thank you all !
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: done, waiting for review → done
Comment 19•22 years ago
|
||
verified the checkin.
Jungshik: do you have a test page so that I can verify the EURO SIGN and
REGISTERED SIGN?
Assignee | ||
Comment 20•22 years ago
|
||
> Jungshik: do you have a test page so that I can verify the EURO SIGN and
> REGISTERED SIGN?
Yup. Pls, point the latest nightly at the URL given in the URL field
(http://jshin.net/i18n/euckr_newchars.html) :-)
As I mentioned earlier, setting up necessary fonts with two
new chars under Linux is tricky(I've done that, but you don't
have to bother ) so that you'd better check it out under Win2k.
Comment 21•22 years ago
|
||
ylong: can you verify? The code is verified. thanks
QA Contact: yokoyama → ylong
Comment 22•22 years ago
|
||
Verified the test page in comment #20 is displayed fine with euro and reg. signs
on 07-22 trunk build / Win2k-SC.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•