Closed Bug 426271 Opened 13 years ago Closed 13 years ago

Auto-detect of Japanese Character Encoding does not work

Categories

(Core :: General, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: masa141421356, Assigned: smontagu)

References

Details

(Keywords: intl, jp-critical, regression)

Attachments

(2 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008033105 Minefield/3.0pre
Build Identifier: 

Auto-detect of Japanese Character Encoding does not work on Trunk.


Reproducible: Always

Steps to Reproduce:
1.Set preference as
user_pref("intl.accept_languages", "ja,en-us,en");
user_pref("intl.charset.default", "Shift_JIS");
user_pref("intl.charset.detector", "ja_parallel_state_machine");
user_pref("intl.charsetmenu.browser.static", "Shift_JIS, EUC-JP, ISO-2022-JP,
ISO-8859-1, UTF-8");

2.Acccess
 http://space.geocities.jp/alice0775/STORE/EUC-JP.html   (for EUC-JP)
 http://space.geocities.jp/alice0775/STORE/UTF-8.html    (UTF-8)
3.
Actual Results:  
Auto-detect is failed

Expected Results:  
Auto-detect should success.

I think this issue is jp-critical and blocking1.9.
Flags: blocking1.9?
Keywords: jp-critical
At this case, both of HTTP response header and META in html contains only
"Content-Type: text/html", "charset" is not contained.
Regression window is :
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032512  :
work fine
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032513  : NG
Confirmed with Fx trunk 2008033105(Win-XP SP2).
No problem when Auto Detect/Universal, with all EUC-JP,UTF-8,Shift_JIS test pages.
> View/Character Encoding/Auto Detect/Universal
> (intl.charset.detector=universal_charset_detector)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl, regression
Assignee: nobody → smontagu
I can reproduce with recent trunk Linux.
In the case of 2 URLs in comment 0 auto-detect doesn't work at all.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008040120 Firefox 3.0pre
OS: Windows XP → All
Marking blocking, based on regression... need an automated test for this.
Blocks: 424916
Flags: blocking1.9? → blocking1.9+
I tried to back the patch out of bug 424916
and it seeme to have solved this problem.
I think this is a regression of bug 424916.
Attached patch Patch (obsolete) — Splinter Review
So based on comment 3 I think the way to fix this is to get rid of the old CJK parallel state machine detectors in intl/chardet and just use the universal detector with a language filter. The universal detector has been much better maintained, and it will remove a lot of duplicated data.

The patch is a bit large and scary, but most of it is just moving XPCOM stuff around. Since time is short, I'm requesting code review already while I work on testcases.

The patch doesn't include the cvs removes:
intl/chardet/src/Big5Statistics.h 
intl/chardet/src/EUCJPStatistics.h 
intl/chardet/src/EUCKRStatistics.h 
intl/chardet/src/EUCTWStatistics.h 
intl/chardet/src/GB2312Statistics.h 
intl/chardet/src/nsBIG5Verifier.h 
intl/chardet/src/nsCP1252Verifier.h 
intl/chardet/src/nsEUCJPVerifier.h 
intl/chardet/src/nsEUCKRVerifier.h 
intl/chardet/src/nsEUCTWVerifier.h 
intl/chardet/src/nsGB18030Verifier.h 
intl/chardet/src/nsGB2312Verifier.h 
intl/chardet/src/nsHZVerifier.h 
intl/chardet/src/nsISO2022CNVerifier.h 
intl/chardet/src/nsISO2022JPVerifier.h 
intl/chardet/src/nsISO2022KRVerifier.h 
intl/chardet/src/nsPSMDetectors.cpp 
intl/chardet/src/nsPSMDetectors.h 
intl/chardet/src/nsPkgInt.h 
intl/chardet/src/nsSJISVerifier.h 
intl/chardet/src/nsUCS2BEVerifier.h 
intl/chardet/src/nsUCS2LEVerifier.h 
intl/chardet/src/nsUTF8Verifier.h 
intl/chardet/src/nsVerifier.h
Attachment #313966 - Flags: superreview?(dbaron)
Attachment #313966 - Flags: review?(dbaron)
Attached patch Corrected some silly typos (obsolete) — Splinter Review
Attachment #313966 - Attachment is obsolete: true
Attachment #313968 - Flags: superreview?(dbaron)
Attachment #313968 - Flags: review?(dbaron)
Attachment #313966 - Flags: superreview?(dbaron)
Attachment #313966 - Flags: review?(dbaron)
Attached patch MochitestsSplinter Review
I knew there was another review I really meant to get to yesterday... sorry about missing this one.

I'm having trouble understanding the filtering stuff.  It seems when FILTER_JAPANESE, etc., is used, the universal detector will consider all Japanese *plus* all non-CJK encodings, when we really only want the Japanese ones (plus UTF-8), or something like that.  When FILTER_NONE is used, it seems like we'll consider all encodings except for CJK, when we really want to consider all of them.
Er, never mind about the first problem -- I didn't read the nsUniversalDetector.cpp changes closely enough.  But I still think the second problem exists, and it might help to fix it by replacing FILTER_NONE by FILTER_ALL (with at least one additional constant for FILTER_NON_CJK) -- and then you could always do Latin1 and UTF8, regardless of filter (which looks like what you intended).
Attachment #313968 - Flags: superreview?(dbaron)
Attachment #313968 - Flags: superreview-
Attachment #313968 - Flags: review?(dbaron)
Attachment #313968 - Flags: review-
Version: unspecified → Trunk
Attached patch Address David's comments (obsolete) — Splinter Review
Attachment #313968 - Attachment is obsolete: true
Attachment #314495 - Flags: superreview?(dbaron)
Attachment #314495 - Flags: review?(dbaron)
The right patch this time
Attachment #314495 - Attachment is obsolete: true
Attachment #314496 - Flags: superreview?(dbaron)
Attachment #314496 - Flags: review?(dbaron)
Attachment #314495 - Flags: superreview?(dbaron)
Attachment #314495 - Flags: review?(dbaron)
Comment on attachment 314496 [details] [diff] [review]
Address David's comments

r+sr=dbaron
Attachment #314496 - Flags: superreview?(dbaron)
Attachment #314496 - Flags: superreview+
Attachment #314496 - Flags: review?(dbaron)
Attachment #314496 - Flags: review+
Checked in.
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Works fine. Thank you.
-> Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.