Closed Bug 134749 Opened 23 years ago Closed 22 years ago

two new chars to add to KS X 1001 based converters(EUC-KR,CP949,ISO-2022-KR)

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl, Whiteboard: done)

Attachments

(2 files)

Two new characters - EURO SIGN and REGISTERED SIGN - were added to KS X 1001 in December 1998 (KS X 1001:1998), but Mozilla's encoding conveters based on KS X 1001 (EUC-KR, CP949, ISO-2022-KR and JOHAB) don't support them yet. All of these converters depend on two files, u20ksc5601gl.uf u20ksc5601gl.ut. Adding two characters to these files (actually, the original Unicode mapping table and generating them automatically) would solve this problem. I've already generated them. However, there's a problem with doing this. Korean true type fonts for MS-Windows have been updated to include these two characters, but most X11 BDF fonts (in ksc5601.1987-0) don't have these two characters. With increasing use of TTF, this won't be an issue in the long run, but for the time being, this could be problematic.
Keywords: intl
QA Contact: ruixu → ylong
The following note was manually added to the file automatically generated by umaptable and some Unix filters: * Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749) - More specifically, CP949.TXT at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT was used with the following Unix filters (which remove CP949 extension of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points). egrep -v '^#' CP949.TXT | \ egrep '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \ perl -pe \ 's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge' \ ./umaptable -ut - Difference between the previous version and this version is two new characters added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and 0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71 (0x2267 in GL and 0xA2E7 in GR). This change brings up the mapping table to the specification in KS X 1001:1998.
* Note added by Jungshik Shin <jshin@mailaps.org> (bug 134749) - More specifically, CP949.TXT at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT was used with the following Unix filters (which remove CP949 extension of EUC-KR and convert EUC-KR code points to KS X 1001 GL code points). egrep -v '^#' CP949.TXT | \ egrep '^0x(A[1-F]|[B-E][0-F]|F[0-E])(A[1-F]|[B-E][0-F]|F[0-E])' | \ perl -pe \ 's/^0x([A-F][0-F][A-F][0-F])/"0x" . sprintf "%04X", hex($1)- 0x8080/ge' |\ ./umaptable -ut - Difference between the previous version and this version is two new characters added : EURO SIGN (U+20AC) at row 2, column 70 (0x2266 in GL and 0xA2E6 in GR) and REGISTERED SIGN (U+00AE) at row 2, column 71 (0x2267 in GL and 0xA2E7 in GR). This change brings up the mapping table to the specification in KS X 1001:1998.
Target Milestone: --- → mozilla1.0.1
FYI, we will support these characters in our future Solaris releases, patches might be provided upon request. BTW, it will be great if this can be included in Mozilla 1.0 or in next Netscape major release.
Confirming based on comments.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Roy and Frank, Could you review this? This is a trivial bug with a simple patch.
Whiteboard: done, waiting for review
jshin: Sorry but u20ksc5601gl.uf and u20ksc5601gl.ut don't exist in moz tree, and none of file has #include "u20ksc560lgl.uf", nor reference to such files.
Roy, Thank you for looking at my patch. I made a mistake in my bug report. The filenames for two attachments (attachment 77133 [details] and attachment 77136 [details]) are not u20ksc5601.(ut|uf) but u20kscgl.ut and u20kscgl.uf. They're included by nsUCvKoModule.cpp (intl/uconv/ucvko) Could you look at them again and give them a review?
Adding just two chars in the map table shouldn't have a huge difference in file size. However, I see filename trunk patch ------------------------------------------------- u20kscgl.uf 582kb 420kb u20kscgl.ut 141kb 135kb I never used the _fromu_ nor _umaptable_; but the trunk file says it's generated by using fromu instead of umaptable. Would you expound it for me?
Thank you for taking a look at my patch. > Adding just two chars in the map table shouldn't have > a huge difference in file size. However, I see filename trunk patch ------------------------------------------------- u20kscgl.uf 582kb 420kb u20kscgl.ut 141kb 135kb > I never used the _fromu_ nor _umaptable_; but > the trunk file says it's generated by using fromu instead of > umaptable. Would you expound it for me? I also noticed the huge difference in the size of two files when I first made them. I double-checked the number of characters being mapped is correct (that is, all the characters in KS X 1001 are mapped). Then, I forgot about it because with new map files all the test cases worked just as well (http://jshin.net/i18n/koencodings.html) and better in that Euro and Registered sign are rendered correctly and mail/news message and html files saved in EUC-KR, CP949 and Johab have two new characters not in NCRs but in their code point values in corresponding encodings. I conducted these tests under both Win2k and Linux. Both fromu and umaptable were written by Frank and he'd be able to explain the details. Apparently, fromu used a less efficient algorithm than umaptable (which has replaced fromu) and that's the cause of the huge difference in the file size. For example, umaptable uses far fewer format 2 mappings than fromu ( 0x5a vs 0x54B) Format 2 mapping is the most space-consuming (in terms of the size of the generated map file. note that a bulk of space is used by comment for human readers). umaptable has been used without a problem to generate other mapping tables (in ucvko, CP949 mapping table and Johab Jamo table were generated with this). I hope this would be sufficient to convince you that new mapping tables work all right :-). Could you take another look? Thank you,
sorry for spamming. I forgot to add the URL for a short test case. If this patch works and a Korean font used for rendering the page has Euro and Registered sign, Euro and Registered sign should follow 'Tel' sign. Under Linux, my patch to ksc5601.1992-3.enc file (that I submitted to XFree86) has to be applied and MS Gulim/Batang/Dotum (as opposed to Baekmuk Gulim/Batang) has to be used. Therefore, it's better to test this under MS-Windows.
Comment on attachment 77133 [details] KS X 1001 GL to Unicode mapping /r=yokoyama; thanks for your extensive research and time.
Attachment #77133 - Flags: review+
Comment on attachment 77136 [details] Unicode to KS X 1001 GL mapping table /r=yokoyama; thanks for your extensive research and time.
Attachment #77136 - Flags: review+
Comment on attachment 77133 [details] KS X 1001 GL to Unicode mapping sr=alecf
Attachment #77133 - Flags: superreview+
Comment on attachment 77136 [details] Unicode to KS X 1001 GL mapping table oops, sorry I got distracted between reviewing the first one and reviewing the 2nd one :) sr=alecf
Attachment #77136 - Flags: superreview+
Comment on attachment 77133 [details] KS X 1001 GL to Unicode mapping a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77133 - Flags: approval+
Comment on attachment 77136 [details] Unicode to KS X 1001 GL mapping table a=asa (on behalf of drivers) for checkin to 1.1
Attachment #77136 - Flags: approval+
fix checked in to the trunk thank you all !
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: done, waiting for review → done
changing qa contact to myself
QA Contact: ylong → yokoyama
verified the checkin. Jungshik: do you have a test page so that I can verify the EURO SIGN and REGISTERED SIGN?
> Jungshik: do you have a test page so that I can verify the EURO SIGN and > REGISTERED SIGN? Yup. Pls, point the latest nightly at the URL given in the URL field (http://jshin.net/i18n/euckr_newchars.html) :-) As I mentioned earlier, setting up necessary fonts with two new chars under Linux is tricky(I've done that, but you don't have to bother ) so that you'd better check it out under Win2k.
ylong: can you verify? The code is verified. thanks
QA Contact: yokoyama → ylong
Verified the test page in comment #20 is displayed fine with euro and reg. signs on 07-22 trunk build / Win2k-SC.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: