Closed Bug 23920 Opened 25 years ago Closed 24 years ago

Russian Auto detect detects character set wrong

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: ezh, Assigned: ftang)

References

()

Details

(Whiteboard: need code review.)

"Auto detect" detects the page like MacCyrillic. But the page is in WIN-1251.
Status: NEW → ASSIGNED
Target Milestone: M18
It seems we need to improve the russian detection algorithm. Mark this M18
I think I fix this bug several weeks ago by adding a if( > 0x80) statement.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
I verified this in 2000022108 Win32, Mac, and Linux build.
Status: RESOLVED → VERIFIED
win98 2000022508. This bug is back.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Here is the fix
1. we should not report the charset if we collect no 8 bits data (all the prob
equal to 0)
2. we forget to compare index 0.

Index: src/nsCyrillicDetector.cpp
===================================================================
RCS file: /m/pub/mozilla/intl/chardet/src/nsCyrillicDetector.cpp,v
retrieving revision 1.6
diff -c -r1.6 nsCyrillicDetector.cpp
*** nsCyrillicDetector.cpp      2000/02/15 09:15:13     1.6
--- nsCyrillicDetector.cpp      2000/02/28 17:14:02
***************
*** 133,145 ****
     PRUint8 j;
     if(mDone)
        return;
!    for(j=1;j<mItems;j++) {
        if(mProb[j] > max)
        {
             max = mProb[j];
             maxIdx= j;
        }
     }
  #ifdef DEBUG
     for(j=0;j<mItems;j++)
        printf("Charset %s->\t%d\n", mCharsets[j], mProb[j]);
--- 133,149 ----
     PRUint8 j;
     if(mDone)
        return;
!    for(j=0;j<mItems;j++) {
        if(mProb[j] > max)
        {
             max = mProb[j];
             maxIdx= j;
        }
     }
+
+    if( 0 == max ) // if we didn't get any 8 bits data
+      return;
+
  #ifdef DEBUG
     for(j=0;j<mItems;j++)
        printf("Charset %s->\t%d\n", mCharsets[j], mProb[j]);
Status: REOPENED → ASSIGNED
Target Milestone: M18 → M15
Whiteboard: need code review.
I cannot check in this bug before beta since this is not our beta1 goal. 
(limited by Netscape's PDT rule). Move it to M16.

Target Milestone: M15 → M16
*** Bug 30191 has been marked as a duplicate of this bug. ***
Blocks: 30202
*** Bug 30202 has been marked as a duplicate of this bug. ***
Change the summary to "Russian Auto detect detects character set wrong". Move to 
m15
Summary: Auto detect "character set" detects character set wrong → Russian Auto detect detects character set wrong
fix check in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → FIXED
*** Bug 31954 has been marked as a duplicate of this bug. ***
Verified as fixed in 2000041109 Win32 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.