Closed Bug 172393 Opened 22 years ago Closed 21 years ago

Universal charset detector should accept JIS C 6200 in iso-2022-jp

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jgmyers, Assigned: shanjian)

Details

(Keywords: intl)

Attachments

(2 files)

The universal charset detector fails to recognize the attached test case. The reason it fails is that the test case contains the escape sequence ESC ( I, which selects JIS C 6200-1969, apparently a single-width Katakana set and definitely not permitted by the ISO-2022-JP standard. The Mozilla ISO-2022-JP converter, however, appears to support converting it to Unicode. It would make sense to add support for this escape sequence to the ISO-2022-JP detector table. It should at least keep from going into the error state when it gets this sequence.
Attached file test case
Keywords: intl
QA Contact: ruixu → ylong
Attachment #101549 - Attachment mime type: text/plain → text/html
Keywords: nsbeta1
Note: only Universal detector doesn't work with this test case, auto-detect Japanese and East Asian will detect it as iso-2022-jp.
I have not heard of "JIS C 6200-1969" but what I see in the test case contains so-called "Half-width katakana", which is usually referred to as JISX 0201 or JISX 0201-1997. A detector designed to detect Japanese should always support this ESC sequenec.
auto-japanese works because all other candidates are eliminated first when hit this ESC. I will update both universal and PSM detector for iso-2022-jp.
Status: NEW → ASSIGNED
Attached patch patchSplinter Review
Our existing code only handles strict iso-2022-jp encoding, which include escape sequence of: ASCII: ESC ( B JIS-ROMAN: ESC ( J jis c 6226-1978 ESC $ @ jis x 0208-1983 ESC $ B There are several iso-2022-jp variation, include iso-2022-jp-1, added JIS X 0212-1990, ESC $ D iso-2022-jp2, which contains gb2312 and ksx. JIS, added half width katakana, ESC ( I jis x 0212-1990, ESC $ ( D jis x 0208-1990, ESC & @ ESC $ B jis x 0208:1997, ESC & @ ESC $ B Our iso-2022-jp converter handles JIS without jisx0208-1990 and jisx0208-1997. I think it make sense to make it consistent with converter. So I made this patch. The real change is in geniso2022.pl, all other files are either generated/populated or just license update.
ftang, could you r=?
Attachment #101580 - Flags: review+
Comment on attachment 101580 [details] [diff] [review] patch r=ftang Looks right to me.
and all other files are generated by the .pl changes.
chris, could you sr=?
Attachment #101580 - Flags: superreview?(blizzard)
i18n triage team: nsbeta1-
Keywords: nsbeta1nsbeta1-
Comment on attachment 101580 [details] [diff] [review] patch sr=roc+moz
Attachment #101580 - Flags: superreview?(blizzard) → superreview+
Checking in intl/chardet/src/nsISO2022JPVerifier.h; /cvsroot/mozilla/intl/chardet/src/nsISO2022JPVerifier.h,v <-- nsISO2022JPVerifier.h new revision: 1.9; previous revision: 1.8 done Checking in intl/chardet/tools/geniso2022jp.pl; /cvsroot/mozilla/intl/chardet/tools/geniso2022jp.pl,v <-- geniso2022jp.pl new revision: 1.4; previous revision: 1.3 done Checking in intl/chardet/tools/genverifier.pm; /cvsroot/mozilla/intl/chardet/tools/genverifier.pm,v <-- genverifier.pm new revision: 1.4; previous revision: 1.3 done Checking in extensions/universalchardet/src/nsEscSM.cpp; /cvsroot/mozilla/extensions/universalchardet/src/nsEscSM.cpp,v <-- nsEscSM.cpp new revision: 1.6; previous revision: 1.5 done
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: