Closed Bug 426271 Opened 17 years ago Closed 17 years ago

Auto-detect of Japanese Character Encoding does not work

Categories

(Core :: General, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: masa141421356, Assigned: smontagu)

References

Details

(Keywords: intl, jp-critical, regression)

Attachments

(2 files, 3 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008033105 Minefield/3.0pre Build Identifier: Auto-detect of Japanese Character Encoding does not work on Trunk. Reproducible: Always Steps to Reproduce: 1.Set preference as user_pref("intl.accept_languages", "ja,en-us,en"); user_pref("intl.charset.default", "Shift_JIS"); user_pref("intl.charset.detector", "ja_parallel_state_machine"); user_pref("intl.charsetmenu.browser.static", "Shift_JIS, EUC-JP, ISO-2022-JP, ISO-8859-1, UTF-8"); 2.Acccess http://space.geocities.jp/alice0775/STORE/EUC-JP.html (for EUC-JP) http://space.geocities.jp/alice0775/STORE/UTF-8.html (UTF-8) 3. Actual Results: Auto-detect is failed Expected Results: Auto-detect should success. I think this issue is jp-critical and blocking1.9.
Flags: blocking1.9?
Keywords: jp-critical
At this case, both of HTTP response header and META in html contains only "Content-Type: text/html", "charset" is not contained.
Regression window is : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032512 : work fine Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032513 : NG
Confirmed with Fx trunk 2008033105(Win-XP SP2). No problem when Auto Detect/Universal, with all EUC-JP,UTF-8,Shift_JIS test pages. > View/Character Encoding/Auto Detect/Universal > (intl.charset.detector=universal_charset_detector)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl, regression
Assignee: nobody → smontagu
I can reproduce with recent trunk Linux. In the case of 2 URLs in comment 0 auto-detect doesn't work at all. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008040120 Firefox 3.0pre
OS: Windows XP → All
Marking blocking, based on regression... need an automated test for this.
Blocks: 424916
Flags: blocking1.9? → blocking1.9+
I tried to back the patch out of bug 424916 and it seeme to have solved this problem. I think this is a regression of bug 424916.
Attached patch Patch (obsolete) — Splinter Review
So based on comment 3 I think the way to fix this is to get rid of the old CJK parallel state machine detectors in intl/chardet and just use the universal detector with a language filter. The universal detector has been much better maintained, and it will remove a lot of duplicated data. The patch is a bit large and scary, but most of it is just moving XPCOM stuff around. Since time is short, I'm requesting code review already while I work on testcases. The patch doesn't include the cvs removes: intl/chardet/src/Big5Statistics.h intl/chardet/src/EUCJPStatistics.h intl/chardet/src/EUCKRStatistics.h intl/chardet/src/EUCTWStatistics.h intl/chardet/src/GB2312Statistics.h intl/chardet/src/nsBIG5Verifier.h intl/chardet/src/nsCP1252Verifier.h intl/chardet/src/nsEUCJPVerifier.h intl/chardet/src/nsEUCKRVerifier.h intl/chardet/src/nsEUCTWVerifier.h intl/chardet/src/nsGB18030Verifier.h intl/chardet/src/nsGB2312Verifier.h intl/chardet/src/nsHZVerifier.h intl/chardet/src/nsISO2022CNVerifier.h intl/chardet/src/nsISO2022JPVerifier.h intl/chardet/src/nsISO2022KRVerifier.h intl/chardet/src/nsPSMDetectors.cpp intl/chardet/src/nsPSMDetectors.h intl/chardet/src/nsPkgInt.h intl/chardet/src/nsSJISVerifier.h intl/chardet/src/nsUCS2BEVerifier.h intl/chardet/src/nsUCS2LEVerifier.h intl/chardet/src/nsUTF8Verifier.h intl/chardet/src/nsVerifier.h
Attachment #313966 - Flags: superreview?(dbaron)
Attachment #313966 - Flags: review?(dbaron)
Attached patch Corrected some silly typos (obsolete) — Splinter Review
Attachment #313966 - Attachment is obsolete: true
Attachment #313968 - Flags: superreview?(dbaron)
Attachment #313968 - Flags: review?(dbaron)
Attachment #313966 - Flags: superreview?(dbaron)
Attachment #313966 - Flags: review?(dbaron)
Attached patch MochitestsSplinter Review
I knew there was another review I really meant to get to yesterday... sorry about missing this one. I'm having trouble understanding the filtering stuff. It seems when FILTER_JAPANESE, etc., is used, the universal detector will consider all Japanese *plus* all non-CJK encodings, when we really only want the Japanese ones (plus UTF-8), or something like that. When FILTER_NONE is used, it seems like we'll consider all encodings except for CJK, when we really want to consider all of them.
Er, never mind about the first problem -- I didn't read the nsUniversalDetector.cpp changes closely enough. But I still think the second problem exists, and it might help to fix it by replacing FILTER_NONE by FILTER_ALL (with at least one additional constant for FILTER_NON_CJK) -- and then you could always do Latin1 and UTF8, regardless of filter (which looks like what you intended).
Attachment #313968 - Flags: superreview?(dbaron)
Attachment #313968 - Flags: superreview-
Attachment #313968 - Flags: review?(dbaron)
Attachment #313968 - Flags: review-
Version: unspecified → Trunk
Attached patch Address David's comments (obsolete) — Splinter Review
Attachment #313968 - Attachment is obsolete: true
Attachment #314495 - Flags: superreview?(dbaron)
Attachment #314495 - Flags: review?(dbaron)
The right patch this time
Attachment #314495 - Attachment is obsolete: true
Attachment #314496 - Flags: superreview?(dbaron)
Attachment #314496 - Flags: review?(dbaron)
Attachment #314495 - Flags: superreview?(dbaron)
Attachment #314495 - Flags: review?(dbaron)
Comment on attachment 314496 [details] [diff] [review] Address David's comments r+sr=dbaron
Attachment #314496 - Flags: superreview?(dbaron)
Attachment #314496 - Flags: superreview+
Attachment #314496 - Flags: review?(dbaron)
Attachment #314496 - Flags: review+
Checked in.
Status: NEW → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Works fine. Thank you. -> Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: