Closed Bug 181344 Opened 23 years ago Closed 20 years ago

Universal auto-detector detects western page as gb18030

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 177505

People

(Reporter: amyy, Assigned: smontagu)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

Build: 11-20 trunk and 1.02 branch build. I randomly saw a few nba.com sub-pages not display properly with Universal auto-detector. Steps to reproduce: 1. Launch browser with a new profile or cleared Cache. 2. Keep the default charaset as western ISO-8859-1. 3. View | Character Coding | Auto-detect, select Universal. 4. Load page: http://www.nba.com/mavericks/matchup/2001-02season_preview.html Result: The page doesn't display correctly with character " ' ", and the charset is marked as gb18030.
Reassign to Shanjian for auto-detector issue.
Assignee: smontagu → shanjian
Keywords: intl
An explanation of this bug is found in bug 177505 comment 5.
*** Bug 227933 has been marked as a duplicate of this bug. ***
Summary: Universal auto-detector improvement: detect western page as gb18030 → Universal auto-detector detects western page as gb18030
*** Bug 205518 has been marked as a duplicate of this bug. ***
*** Bug 220555 has been marked as a duplicate of this bug. ***
There is a test page at that dupe which, when I just loaded it, was detected as ISO-8859-1. But, there is also some possibly interesting troubleshooting info, including a bit of code analysis by S.Montagu.
This bug seems to be affecting http://www.hit-haven.com/z/zero7/destiny.htm Using the universal encoding detector, the English text is interspersed with Japanese characters. The page is actually encoded as Western ISO-8859-1 (or something practically equivalent) - manually switching to this encoding shows that the Japanese characters appear in place of each right single quote (’).
(In reply to comment #8) Similar shenanigans are to be found at http://icwales.icnetwork.co.uk/0100news/0200wales/ (Japanese (Shift_JIS)) and http://icwales.icnetwork.co.uk/0100news/ (Chinese Simplified (GB18030)) but not at http://icwales.icnetwork.co.uk/ (Western (ISO-8859-1)).
The Shift_JIS problem is probably bug 168526.
Blocks: 264871
*** Bug 274288 has been marked as a duplicate of this bug. ***
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
Mass Reassign Please excuse the spam
Assignee: shanjian → nobody
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Assignee: nobody → smontagu
Status: REOPENED → NEW
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him
Status: NEW → REOPENED
Status: REOPENED → NEW
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be re-open. Spam is his fault not my own
Assignee: smontagu → nobody
Assignee: nobody → smontagu
I'm currently trying to see what can be done to solve this bug. I intend to work on both the SJIS and the gb18030 problem. I will open a thread on netscape.public.mozilla.i18n to discuss my findings about the working of th detector, and share idea about exaclty how it should be enhanced. Shanjian says that he reports the 0x81~0xfe, 0x30~0x39, 0x81~oxfe, 0x30~0x39 immediately, but I didn't see that yet in the code, only the frequency detector, whereas what he describes sounds more like a state machine.
Status: NEW → ASSIGNED
Blocks: 248304
Jean-Marc: did you get a chance to open that thread? I don't see it... Gerv
*** Bug 291929 has been marked as a duplicate of this bug. ***
*** Bug 312071 has been marked as a duplicate of this bug. ***
*** Bug 312691 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of 177505 ***
Status: ASSIGNED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → DUPLICATE
No longer blocks: 264871
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: