Closed
Bug 426271
Opened 17 years ago
Closed 17 years ago
Auto-detect of Japanese Character Encoding does not work
Categories
(Core :: General, defect, P1)
Tracking
()
VERIFIED
FIXED
People
(Reporter: masa141421356, Assigned: smontagu)
References
Details
(Keywords: intl, jp-critical, regression)
Attachments
(2 files, 3 obsolete files)
8.86 KB,
patch
|
Details | Diff | Splinter Review | |
47.38 KB,
patch
|
dbaron
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008033105 Minefield/3.0pre
Build Identifier:
Auto-detect of Japanese Character Encoding does not work on Trunk.
Reproducible: Always
Steps to Reproduce:
1.Set preference as
user_pref("intl.accept_languages", "ja,en-us,en");
user_pref("intl.charset.default", "Shift_JIS");
user_pref("intl.charset.detector", "ja_parallel_state_machine");
user_pref("intl.charsetmenu.browser.static", "Shift_JIS, EUC-JP, ISO-2022-JP,
ISO-8859-1, UTF-8");
2.Acccess
http://space.geocities.jp/alice0775/STORE/EUC-JP.html (for EUC-JP)
http://space.geocities.jp/alice0775/STORE/UTF-8.html (UTF-8)
3.
Actual Results:
Auto-detect is failed
Expected Results:
Auto-detect should success.
I think this issue is jp-critical and blocking1.9.
Reporter | ||
Updated•17 years ago
|
Flags: blocking1.9?
Keywords: jp-critical
Reporter | ||
Comment 1•17 years ago
|
||
At this case, both of HTTP response header and META in html contains only
"Content-Type: text/html", "charset" is not contained.
![]() |
||
Comment 2•17 years ago
|
||
Regression window is :
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032512 :
work fine
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032513 : NG
Comment 3•17 years ago
|
||
Confirmed with Fx trunk 2008033105(Win-XP SP2).
No problem when Auto Detect/Universal, with all EUC-JP,UTF-8,Shift_JIS test pages.
> View/Character Encoding/Auto Detect/Universal
> (intl.charset.detector=universal_charset_detector)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Updated•17 years ago
|
Keywords: intl,
regression
Reporter | ||
Comment 4•17 years ago
|
||
According to Bonsai, Bug 424916 may be related.
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=PhoenixTinderbox&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-03-25+12%3A00%3A00&maxdate=2008-03-25+14%3A00%3A00&cvsroot=%2Fcvsroot
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → smontagu
Comment 5•17 years ago
|
||
I can reproduce with recent trunk Linux.
In the case of 2 URLs in comment 0 auto-detect doesn't work at all.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008040120 Firefox 3.0pre
OS: Windows XP → All
Marking blocking, based on regression... need an automated test for this.
Blocks: 424916
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Comment 7•17 years ago
|
||
I tried to back the patch out of bug 424916
and it seeme to have solved this problem.
I think this is a regression of bug 424916.
Assignee | ||
Comment 8•17 years ago
|
||
So based on comment 3 I think the way to fix this is to get rid of the old CJK parallel state machine detectors in intl/chardet and just use the universal detector with a language filter. The universal detector has been much better maintained, and it will remove a lot of duplicated data.
The patch is a bit large and scary, but most of it is just moving XPCOM stuff around. Since time is short, I'm requesting code review already while I work on testcases.
The patch doesn't include the cvs removes:
intl/chardet/src/Big5Statistics.h
intl/chardet/src/EUCJPStatistics.h
intl/chardet/src/EUCKRStatistics.h
intl/chardet/src/EUCTWStatistics.h
intl/chardet/src/GB2312Statistics.h
intl/chardet/src/nsBIG5Verifier.h
intl/chardet/src/nsCP1252Verifier.h
intl/chardet/src/nsEUCJPVerifier.h
intl/chardet/src/nsEUCKRVerifier.h
intl/chardet/src/nsEUCTWVerifier.h
intl/chardet/src/nsGB18030Verifier.h
intl/chardet/src/nsGB2312Verifier.h
intl/chardet/src/nsHZVerifier.h
intl/chardet/src/nsISO2022CNVerifier.h
intl/chardet/src/nsISO2022JPVerifier.h
intl/chardet/src/nsISO2022KRVerifier.h
intl/chardet/src/nsPSMDetectors.cpp
intl/chardet/src/nsPSMDetectors.h
intl/chardet/src/nsPkgInt.h
intl/chardet/src/nsSJISVerifier.h
intl/chardet/src/nsUCS2BEVerifier.h
intl/chardet/src/nsUCS2LEVerifier.h
intl/chardet/src/nsUTF8Verifier.h
intl/chardet/src/nsVerifier.h
Attachment #313966 -
Flags: superreview?(dbaron)
Attachment #313966 -
Flags: review?(dbaron)
Assignee | ||
Comment 9•17 years ago
|
||
Attachment #313966 -
Attachment is obsolete: true
Attachment #313968 -
Flags: superreview?(dbaron)
Attachment #313968 -
Flags: review?(dbaron)
Attachment #313966 -
Flags: superreview?(dbaron)
Attachment #313966 -
Flags: review?(dbaron)
Assignee | ||
Comment 10•17 years ago
|
||
I knew there was another review I really meant to get to yesterday... sorry about missing this one.
I'm having trouble understanding the filtering stuff. It seems when FILTER_JAPANESE, etc., is used, the universal detector will consider all Japanese *plus* all non-CJK encodings, when we really only want the Japanese ones (plus UTF-8), or something like that. When FILTER_NONE is used, it seems like we'll consider all encodings except for CJK, when we really want to consider all of them.
Er, never mind about the first problem -- I didn't read the nsUniversalDetector.cpp changes closely enough. But I still think the second problem exists, and it might help to fix it by replacing FILTER_NONE by FILTER_ALL (with at least one additional constant for FILTER_NON_CJK) -- and then you could always do Latin1 and UTF8, regardless of filter (which looks like what you intended).
Attachment #313968 -
Flags: superreview?(dbaron)
Attachment #313968 -
Flags: superreview-
Attachment #313968 -
Flags: review?(dbaron)
Attachment #313968 -
Flags: review-
Reporter | ||
Updated•17 years ago
|
Version: unspecified → Trunk
Assignee | ||
Comment 13•17 years ago
|
||
Attachment #313968 -
Attachment is obsolete: true
Attachment #314495 -
Flags: superreview?(dbaron)
Attachment #314495 -
Flags: review?(dbaron)
Assignee | ||
Comment 14•17 years ago
|
||
The right patch this time
Attachment #314495 -
Attachment is obsolete: true
Attachment #314496 -
Flags: superreview?(dbaron)
Attachment #314496 -
Flags: review?(dbaron)
Attachment #314495 -
Flags: superreview?(dbaron)
Attachment #314495 -
Flags: review?(dbaron)
Comment on attachment 314496 [details] [diff] [review]
Address David's comments
r+sr=dbaron
Attachment #314496 -
Flags: superreview?(dbaron)
Attachment #314496 -
Flags: superreview+
Attachment #314496 -
Flags: review?(dbaron)
Attachment #314496 -
Flags: review+
Assignee | ||
Comment 16•17 years ago
|
||
Checked in.
Status: NEW → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
See Also: → https://launchpad.net/bugs/217613
You need to log in
before you can comment on or make changes to this bug.
Description
•