Closed
Bug 172393
Opened 22 years ago
Closed 21 years ago
Universal charset detector should accept JIS C 6200 in iso-2022-jp
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
People
(Reporter: jgmyers, Assigned: shanjian)
Details
(Keywords: intl)
Attachments
(2 files)
165 bytes,
text/html
|
Details | |
9.16 KB,
patch
|
ftang
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
The universal charset detector fails to recognize the attached test case. The
reason it fails is that the test case contains the escape sequence ESC ( I,
which selects JIS C 6200-1969, apparently a single-width Katakana set and
definitely not permitted by the ISO-2022-JP standard. The Mozilla ISO-2022-JP
converter, however, appears to support converting it to Unicode.
It would make sense to add support for this escape sequence to the ISO-2022-JP
detector table. It should at least keep from going into the error state when it
gets this sequence.
Reporter | ||
Comment 1•22 years ago
|
||
Attachment #101549 -
Attachment mime type: text/plain → text/html
Comment 2•22 years ago
|
||
Note: only Universal detector doesn't work with this test case, auto-detect
Japanese and East Asian will detect it as iso-2022-jp.
Comment 3•22 years ago
|
||
I have not heard of "JIS C 6200-1969" but what I see in the test
case contains so-called "Half-width katakana", which is usually
referred to as JISX 0201 or JISX 0201-1997. A detector designed to
detect Japanese should always support this ESC sequenec.
Assignee | ||
Comment 4•22 years ago
|
||
auto-japanese works because all other candidates are eliminated first when hit
this ESC. I will update both universal and PSM detector for iso-2022-jp.
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•22 years ago
|
||
Assignee | ||
Comment 6•22 years ago
|
||
Our existing code only handles strict iso-2022-jp encoding, which include escape
sequence of:
ASCII: ESC ( B
JIS-ROMAN: ESC ( J
jis c 6226-1978 ESC $ @
jis x 0208-1983 ESC $ B
There are several iso-2022-jp variation, include
iso-2022-jp-1, added JIS X 0212-1990, ESC $ D
iso-2022-jp2, which contains gb2312 and ksx.
JIS, added half width katakana, ESC ( I
jis x 0212-1990, ESC $ ( D
jis x 0208-1990, ESC & @ ESC $ B
jis x 0208:1997, ESC & @ ESC $ B
Our iso-2022-jp converter handles JIS without jisx0208-1990 and jisx0208-1997. I
think it make sense to make it consistent with converter. So I made this patch.
The real change is in geniso2022.pl, all other files are either
generated/populated or just license update.
Assignee | ||
Comment 7•22 years ago
|
||
ftang, could you r=?
Updated•22 years ago
|
Attachment #101580 -
Flags: review+
Comment 8•22 years ago
|
||
Comment on attachment 101580 [details] [diff] [review]
patch
r=ftang
Looks right to me.
Comment 9•22 years ago
|
||
and all other files are generated by the .pl changes.
Assignee | ||
Comment 10•22 years ago
|
||
chris, could you sr=?
Assignee | ||
Updated•22 years ago
|
Attachment #101580 -
Flags: superreview?(blizzard)
Comment on attachment 101580 [details] [diff] [review]
patch
sr=roc+moz
Attachment #101580 -
Flags: superreview?(blizzard) → superreview+
Reporter | ||
Comment 13•21 years ago
|
||
Checking in intl/chardet/src/nsISO2022JPVerifier.h;
/cvsroot/mozilla/intl/chardet/src/nsISO2022JPVerifier.h,v <--
nsISO2022JPVerifier.h
new revision: 1.9; previous revision: 1.8
done
Checking in intl/chardet/tools/geniso2022jp.pl;
/cvsroot/mozilla/intl/chardet/tools/geniso2022jp.pl,v <-- geniso2022jp.pl
new revision: 1.4; previous revision: 1.3
done
Checking in intl/chardet/tools/genverifier.pm;
/cvsroot/mozilla/intl/chardet/tools/genverifier.pm,v <-- genverifier.pm
new revision: 1.4; previous revision: 1.3
done
Checking in extensions/universalchardet/src/nsEscSM.cpp;
/cvsroot/mozilla/extensions/universalchardet/src/nsEscSM.cpp,v <-- nsEscSM.cpp
new revision: 1.6; previous revision: 1.5
done
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•