Closed
Bug 125407
Opened 23 years ago
Closed 23 years ago
GBK conveter: some GBK characters can not be converted
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
WONTFIX
Future
People
(Reporter: masaki.katakai, Assigned: ftang)
References
()
Details
(Keywords: intl)
Attachments
(1 file)
1.44 KB,
application/octet-stream
|
Details |
The following code points of GBK can not be displayed properly.
It seem that GBK converter can not be converted to utf-8.
0xA6EC
0xA6ED
0xA6F3
0xA6D9 - 0xA6DF
0xA989 - 0xA995
0xFE50 - 0xFEA0
% nsconv -f gbk -t utf-8 gbk.html > utf8.html
% nsconv -f utf-8 -t gbk utf8.html > gbk-.html
does not work.
% native2ascii -encoding GBK gbk.html gbk.html > gbk_u.html
native2ascii of Java seems to support those code points.
It seems that those are converted to \uexxx area but we
need to support. What's is your opinion?
Reporter | ||
Comment 1•23 years ago
|
||
Reporter | ||
Comment 2•23 years ago
|
||
I've put the files under http://users.goo.ne.jp/mkatakai/mozilla/bugs/gbk/
(but please ignore the banner codes...)
Assignee | ||
Comment 3•23 years ago
|
||
the following is page 11 of GB18030
0xA6EC
0xA6ED
0xA6F3
0xA6D9 - 0xA6DF
the following are in page 82 of GB18030 . but they are control characters.
0xA989 - 0xA995
the following is in page 81 of GB18030
0xFE50 - 0xFEA0
Assignee: yokoyama → ftang
Also
0xA8BC
0xA8BF
0xA958
0xA95B
are displayed as ?
in which 0xA958 and 0xA95B have no glyphs.
Assignee | ||
Comment 5•23 years ago
|
||
I post a mail to unicode.org and here is one answer I got from
Date: Thu, 14 Feb 2002 17:04:30 -0800
From: "Qingjiang (Brian) Yuan" <brian.yuan@sun.com>
Yung-Fong Tang wrote:
> I have additional question about GB18030
>
> the following code point in GB18030 are map to Private Usaer Araea in
> Unicode but have a glyph in the GB18030 standard. What does that mean ?
>
It means those characters/symbols are not in Unicode 3.0.
The following are the Characters that are not in Unicode 3.0 according
to the CESI:
GB18030
Unicode (Private Use Area)
> A8BC E7C7
> FE51 E816
> FE52 E817
> FE53 E818
> FE59 E81E
> FE61 E826
> FE66 E82B
> FE67 E82C
> FE6C E831
> FE6D E832
> FE76 E83B
> FE7E E843
> FE90 E854
> FE91 E855
> FEA0 E864
But looks like there are more symbols that are not in Unicode 3.0.
Also
Date: Fri, 15 Feb 2002 17:23:51 -0800
From: Markus Scherer <markus.scherer@jtcsv.com>
Yung-Fong Tang wrote:
By printing a glyph for those character in the GB18030, it really DEFINED what
those characters should be in Unicode- which I think is not
It does not make it a character from the point of view of the Unicode standard,
but it amounts to an "agreement between sender and recipient" that if they
interpret the data in a GB 18030-related context, they treat this code point as
being assigned this character.
It's like agreeing to have Seuss characters there and printing something with a
code chart with the Seuss characters.
You may want to get a font that is designed for GB 18030 and see if it shows
that glyph for that code point.
Using such a font should be sufficient to enter into the above "agreement".
Similarly, when you are on a Windows or IBM or Apple machine and you display PUA
code points, you will want to display the glyphs/characters that Windows
respectively IBM or Apple assign there, at least with _some_ font(s).
markus
Status: NEW → ASSIGNED
Assignee | ||
Comment 6•23 years ago
|
||
for this bug, we need to clearly defined which one is buggy and which one is
GB18030 specification issue.
Assignee | ||
Comment 7•23 years ago
|
||
mark this bug as wontfix since the problem is the GB18030 standard itself.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Target Milestone: --- → Future
Comment 8•23 years ago
|
||
Mark as verified as wontfix according to comments above.
Status: RESOLVED → VERIFIED
Updated•23 years ago
|
QA Contact: teruko → ylong
You need to log in
before you can comment on or make changes to this bug.
Description
•