Closed
Bug 426271
Opened 15 years ago
Closed 15 years ago
Auto-detect of Japanese Character Encoding does not work
Categories
(Core :: General, defect, P1)
Tracking
()
VERIFIED
FIXED
People
(Reporter: masa141421356, Assigned: smontagu)
References
Details
(Keywords: intl, jp-critical, regression)
Attachments
(2 files, 3 obsolete files)
8.86 KB,
patch
|
Details | Diff | Splinter Review | |
47.38 KB,
patch
|
dbaron
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008033105 Minefield/3.0pre Build Identifier: Auto-detect of Japanese Character Encoding does not work on Trunk. Reproducible: Always Steps to Reproduce: 1.Set preference as user_pref("intl.accept_languages", "ja,en-us,en"); user_pref("intl.charset.default", "Shift_JIS"); user_pref("intl.charset.detector", "ja_parallel_state_machine"); user_pref("intl.charsetmenu.browser.static", "Shift_JIS, EUC-JP, ISO-2022-JP, ISO-8859-1, UTF-8"); 2.Acccess http://space.geocities.jp/alice0775/STORE/EUC-JP.html (for EUC-JP) http://space.geocities.jp/alice0775/STORE/UTF-8.html (UTF-8) 3. Actual Results: Auto-detect is failed Expected Results: Auto-detect should success. I think this issue is jp-critical and blocking1.9.
Reporter | ||
Updated•15 years ago
|
Flags: blocking1.9?
Keywords: jp-critical
Reporter | ||
Comment 1•15 years ago
|
||
At this case, both of HTTP response header and META in html contains only "Content-Type: text/html", "charset" is not contained.
![]() |
||
Comment 2•15 years ago
|
||
Regression window is : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032512 : work fine Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032513 : NG
Comment 3•15 years ago
|
||
Confirmed with Fx trunk 2008033105(Win-XP SP2).
No problem when Auto Detect/Universal, with all EUC-JP,UTF-8,Shift_JIS test pages.
> View/Character Encoding/Auto Detect/Universal
> (intl.charset.detector=universal_charset_detector)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Updated•15 years ago
|
Keywords: intl,
regression
Reporter | ||
Comment 4•15 years ago
|
||
According to Bonsai, Bug 424916 may be related. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=PhoenixTinderbox&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-03-25+12%3A00%3A00&maxdate=2008-03-25+14%3A00%3A00&cvsroot=%2Fcvsroot
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → smontagu
Comment 5•15 years ago
|
||
I can reproduce with recent trunk Linux. In the case of 2 URLs in comment 0 auto-detect doesn't work at all. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008040120 Firefox 3.0pre
OS: Windows XP → All
Marking blocking, based on regression... need an automated test for this.
Blocks: 424916
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Comment 7•15 years ago
|
||
I tried to back the patch out of bug 424916 and it seeme to have solved this problem. I think this is a regression of bug 424916.
Assignee | ||
Comment 8•15 years ago
|
||
So based on comment 3 I think the way to fix this is to get rid of the old CJK parallel state machine detectors in intl/chardet and just use the universal detector with a language filter. The universal detector has been much better maintained, and it will remove a lot of duplicated data. The patch is a bit large and scary, but most of it is just moving XPCOM stuff around. Since time is short, I'm requesting code review already while I work on testcases. The patch doesn't include the cvs removes: intl/chardet/src/Big5Statistics.h intl/chardet/src/EUCJPStatistics.h intl/chardet/src/EUCKRStatistics.h intl/chardet/src/EUCTWStatistics.h intl/chardet/src/GB2312Statistics.h intl/chardet/src/nsBIG5Verifier.h intl/chardet/src/nsCP1252Verifier.h intl/chardet/src/nsEUCJPVerifier.h intl/chardet/src/nsEUCKRVerifier.h intl/chardet/src/nsEUCTWVerifier.h intl/chardet/src/nsGB18030Verifier.h intl/chardet/src/nsGB2312Verifier.h intl/chardet/src/nsHZVerifier.h intl/chardet/src/nsISO2022CNVerifier.h intl/chardet/src/nsISO2022JPVerifier.h intl/chardet/src/nsISO2022KRVerifier.h intl/chardet/src/nsPSMDetectors.cpp intl/chardet/src/nsPSMDetectors.h intl/chardet/src/nsPkgInt.h intl/chardet/src/nsSJISVerifier.h intl/chardet/src/nsUCS2BEVerifier.h intl/chardet/src/nsUCS2LEVerifier.h intl/chardet/src/nsUTF8Verifier.h intl/chardet/src/nsVerifier.h
Attachment #313966 -
Flags: superreview?(dbaron)
Attachment #313966 -
Flags: review?(dbaron)
Assignee | ||
Comment 9•15 years ago
|
||
Attachment #313966 -
Attachment is obsolete: true
Attachment #313968 -
Flags: superreview?(dbaron)
Attachment #313968 -
Flags: review?(dbaron)
Attachment #313966 -
Flags: superreview?(dbaron)
Attachment #313966 -
Flags: review?(dbaron)
Assignee | ||
Comment 10•15 years ago
|
||
Comment 11•15 years ago
|
||
I knew there was another review I really meant to get to yesterday... sorry about missing this one. I'm having trouble understanding the filtering stuff. It seems when FILTER_JAPANESE, etc., is used, the universal detector will consider all Japanese *plus* all non-CJK encodings, when we really only want the Japanese ones (plus UTF-8), or something like that. When FILTER_NONE is used, it seems like we'll consider all encodings except for CJK, when we really want to consider all of them.
Comment 12•15 years ago
|
||
Er, never mind about the first problem -- I didn't read the nsUniversalDetector.cpp changes closely enough. But I still think the second problem exists, and it might help to fix it by replacing FILTER_NONE by FILTER_ALL (with at least one additional constant for FILTER_NON_CJK) -- and then you could always do Latin1 and UTF8, regardless of filter (which looks like what you intended).
Updated•15 years ago
|
Attachment #313968 -
Flags: superreview?(dbaron)
Attachment #313968 -
Flags: superreview-
Attachment #313968 -
Flags: review?(dbaron)
Attachment #313968 -
Flags: review-
Reporter | ||
Updated•15 years ago
|
Version: unspecified → Trunk
Assignee | ||
Comment 13•15 years ago
|
||
Attachment #313968 -
Attachment is obsolete: true
Attachment #314495 -
Flags: superreview?(dbaron)
Attachment #314495 -
Flags: review?(dbaron)
Assignee | ||
Comment 14•15 years ago
|
||
The right patch this time
Attachment #314495 -
Attachment is obsolete: true
Attachment #314496 -
Flags: superreview?(dbaron)
Attachment #314496 -
Flags: review?(dbaron)
Attachment #314495 -
Flags: superreview?(dbaron)
Attachment #314495 -
Flags: review?(dbaron)
Comment 15•15 years ago
|
||
Comment on attachment 314496 [details] [diff] [review] Address David's comments r+sr=dbaron
Attachment #314496 -
Flags: superreview?(dbaron)
Attachment #314496 -
Flags: superreview+
Attachment #314496 -
Flags: review?(dbaron)
Attachment #314496 -
Flags: review+
Assignee | ||
Comment 16•15 years ago
|
||
Checked in.
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
See Also: → https://launchpad.net/bugs/217613
You need to log in
before you can comment on or make changes to this bug.
Description
•