Closed Bug 125407 Opened 23 years ago Closed 23 years ago

GBK conveter: some GBK characters can not be converted

Categories

(Core :: Internationalization, defect)

Sun
SunOS
defect
Not set
normal

Tracking

()

VERIFIED WONTFIX
Future

People

(Reporter: masaki.katakai, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

The following code points of GBK can not be displayed properly. It seem that GBK converter can not be converted to utf-8. 0xA6EC 0xA6ED 0xA6F3 0xA6D9 - 0xA6DF 0xA989 - 0xA995 0xFE50 - 0xFEA0 % nsconv -f gbk -t utf-8 gbk.html > utf8.html % nsconv -f utf-8 -t gbk utf8.html > gbk-.html does not work. % native2ascii -encoding GBK gbk.html gbk.html > gbk_u.html native2ascii of Java seems to support those code points. It seems that those are converted to \uexxx area but we need to support. What's is your opinion?
Attached file test cases - gbk.tgz
I've put the files under http://users.goo.ne.jp/mkatakai/mozilla/bugs/gbk/ (but please ignore the banner codes...)
Keywords: intl
QA Contact: ruixu → teruko
the following is page 11 of GB18030 0xA6EC 0xA6ED 0xA6F3 0xA6D9 - 0xA6DF the following are in page 82 of GB18030 . but they are control characters. 0xA989 - 0xA995 the following is in page 81 of GB18030 0xFE50 - 0xFEA0
Assignee: yokoyama → ftang
Also 0xA8BC 0xA8BF 0xA958 0xA95B are displayed as ? in which 0xA958 and 0xA95B have no glyphs.
I post a mail to unicode.org and here is one answer I got from Date: Thu, 14 Feb 2002 17:04:30 -0800 From: "Qingjiang (Brian) Yuan" <brian.yuan@sun.com> Yung-Fong Tang wrote: > I have additional question about GB18030 > > the following code point in GB18030 are map to Private Usaer Araea in > Unicode but have a glyph in the GB18030 standard. What does that mean ? > It means those characters/symbols are not in Unicode 3.0. The following are the Characters that are not in Unicode 3.0 according to the CESI: GB18030 Unicode (Private Use Area) > A8BC E7C7 > FE51 E816 > FE52 E817 > FE53 E818 > FE59 E81E > FE61 E826 > FE66 E82B > FE67 E82C > FE6C E831 > FE6D E832 > FE76 E83B > FE7E E843 > FE90 E854 > FE91 E855 > FEA0 E864 But looks like there are more symbols that are not in Unicode 3.0. Also Date: Fri, 15 Feb 2002 17:23:51 -0800 From: Markus Scherer <markus.scherer@jtcsv.com> Yung-Fong Tang wrote: By printing a glyph for those character in the GB18030, it really DEFINED what those characters should be in Unicode- which I think is not It does not make it a character from the point of view of the Unicode standard, but it amounts to an "agreement between sender and recipient" that if they interpret the data in a GB 18030-related context, they treat this code point as being assigned this character. It's like agreeing to have Seuss characters there and printing something with a code chart with the Seuss characters. You may want to get a font that is designed for GB 18030 and see if it shows that glyph for that code point. Using such a font should be sufficient to enter into the above "agreement". Similarly, when you are on a Windows or IBM or Apple machine and you display PUA code points, you will want to display the glyphs/characters that Windows respectively IBM or Apple assign there, at least with _some_ font(s). markus
Status: NEW → ASSIGNED
for this bug, we need to clearly defined which one is buggy and which one is GB18030 specification issue.
mark this bug as wontfix since the problem is the GB18030 standard itself.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Target Milestone: --- → Future
Mark as verified as wontfix according to comments above.
Status: RESOLVED → VERIFIED
QA Contact: teruko → ylong
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: