Closed Bug 177505 Opened 23 years ago Closed 20 years ago

Autodetect=Universal misidentifies some text as GB10830, and leaves Encoding menu in wrong state

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: u32858, Assigned: jgmyers)

References

Details

Attachments

(3 files, 2 obsolete files)

example MIME email attactched as text/plain 23 years ago u32858 29.71 KB, message/rfc822		Details
mail example to prove pb with missing displayed data 21 years ago Franck Depierre 1.58 KB, message/rfc822		Details
Proposed fix 20 years ago John G. Myers 2.79 KB, patch		Details \| Diff \| Splinter Review
Corrected proposed fix 20 years ago John G. Myers 2.70 KB, patch		Details \| Diff \| Splinter Review
Corrected fix 20 years ago John G. Myers 4.62 KB, patch	smontagu : review+ roc : superreview+	Details \| Diff \| Splinter Review

u32858

Reporter

Description

•

23 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021029 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021029 email displayed as charset are iso-8859-1 are not displayed correctly Reproducible: Always Steps to Reproduce: 1.get an html email 2. 3. Actual Results: see that the £ pound are ?? and other letters have kanji spread occasionally throughout them Expected Results: should display in iso-8859-1 correctly. clickeing view->charset-> iso-8859-1 (and selecting again) fixes this problem, but as the "iso-8859-1" is already highlighted this should not hapen first bug submited in UTF-8

marina

Comment 1

•

23 years ago

reporter: what locale are you on? is your auto-detect turned on or off? the email messages that are not displayed correctly are not mime encoded? could you please attach a problematic mail to this bug report? thanks.

u32858

Reporter

Comment 2

•

23 years ago

> > ------- Additional Comments From marina@netscape.com 2002-10-30 09:23 ------- > reporter: what locale are you on? is your auto-detect turned on or off? the > email messages that are not displayed correctly are not mime encoded? could you > please attach a problematic mail to this bug report? thanks. Hi Marina, autodetect was on "universal", it selected iso-8859-1 as highlighted, if i turn it off it uses the default iso-8859-1 i set for all message display in the prefs perhaps this is an autodetect bug. the email was MIME encoded, regards JG $ locale LANG=en_GB.UTF-8 LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= Received: from y01.blackstar.co.uk ([212.250.176.31]) by mail1.tay.ac.uk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2656.59) id VTJMJC8K; Tue, 29 Oct 2002 19:33:38 -0000 Received: (qmail 4331 invoked by uid 1008); 29 Oct 2002 12:37:46 -0000 Date: 29 Oct 2002 12:37:46 -0000 Message-ID: <20021029123746.4330.qmail@y01.blackstar.co.uk> Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary="_----------=_10358950661610159692" MIME-Version: 1.0 X-Mailer: MIME::Lite 1.135 (B2.12; Q2.03) From: update@blackstar.co.uk To: 0013499@tay.ac.uk Reply-To: service@blackstar.co.uk Subject: Who Killed Laura Palmer? This is a multi-part message in MIME format. --_----------=_10358950661610159692 Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain

u32858

Reporter

Comment 3

•

23 years ago

Attached file example MIME email attactched as text/plain — Details

example MIME email attactched as text/plain

Simon Montagu :smontagu

Comment 4

•

23 years ago

shanjian, ftang said this might be related to your recent work on charsets

Shanjian Li

Comment 5

•

23 years ago

"Anything can go wrong will go wrong. " This problem is not caused by my recent change, but it is a problem in universal detector. Before doing multibyte detection, I removed all ascii characters that are not adjacent to high 8-bit. The aim is to improve performance. At that time, gb18030 was not added yet. In gb18030, 0x81~0xfe, 0x30~0x39, 0x81~oxfe, 0x30~0x39 is four bytes characters. Because such sequence will not appears in "almost" any other encoding, I report this immediately. but in this testcase, through filtering, we help create such a sequence, which is 0xa3,0x34,0xa3,0x31. That lead to universal detector mislabel the text as gb18030. This problem probably can be easily fixed by not reporting immediately for such sequence. To add a additional character after high 8-bit will help eliminate gb18030 from consideration.

Assignee: nhotta → shanjian

Franck Depierre

Comment 6

•

21 years ago

Attached file mail example to prove pb with missing displayed data — Details

This bug was not present with the 1.6 version of Mozilla. It appears in 1.7C1. I do not know if ti was present in version previous to 1.6. This mail is probably wrongly formated, but no error message is displayed. Probably pb with "\n", end line caracters. See mail content and compare with the result in Mozilla.

Mike Cowperthwaite

Updated

•

21 years ago

Attachment #104637 - Attachment mime type: text/plain → message/rfc822

Mike Cowperthwaite

Updated

•

21 years ago

Attachment #147045 - Attachment mime type: text/plain → message/rfc822

Mike Cowperthwaite

Comment 7

•

21 years ago

In attachment 104637 [details], the charset is not specified. As noted in comment 5, the actual character set that gets used, if Autodetect=Universal, is GB18030. However, the Encoding menu indicates that ISO-8859-1 has been selected, unlike successful cases of detection (e.g. Big5) -- but see bug 163272. Bug 181344 is about the same problem, in the browser. In attachment 147045 [details] (from Franck Depierre), the charset is *illegally* specified, as: Content-Type: text/plain; charset=iso.8859.1 The misdisplay of this message is a different problem, and I've opened bug 251634 for this. This problem does, however, also show the problem of the Encoding menu being incorrectly updated. Due to bug 129443, viewing a message/rfc822 file in the browser is very little help in getting to the root of the encoding problem: the display is still broken, but differently than in Mail/News.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Linux → All

Summary: email displayed as charset are iso-8859-1 are not displayed correctly → Autodetect=Universal misidentifies some text as GB10830, and leaves Encoding menu in wrong state

Gervase Markham [:gerv]

Comment 8

•

21 years ago

shanjian: are you still around, and able to produce a patch for this bug? It causes regular problems for UK users of Mozilla, such as myself :-) I can provide many more example URLs if you need them. Gerv

Mike Cowperthwaite

Comment 9

•

21 years ago

Bug 253849 opened for the failure of the Mail/News View|Encoding menu to update, which is a common symptom to some otherwise distinct Mail/News charset bugs. Recommend duping this bug to the generic Auto-Detect bug 181344.

Jean-Marc Desperrier

Updated

•

21 years ago

Blocks: 264871

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Frank Tang

Comment 10

•

20 years ago

shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.

Status: NEW → RESOLVED

Closed: 20 years ago

Resolution: --- → WONTFIX

Travis Chase

Comment 11

•

20 years ago

Mass Reassign Please excuse the spam

Assignee: shanjian → nobody

Travis Chase

Comment 12

•

20 years ago

Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Travis Chase

Comment 13

•

20 years ago

Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam

Assignee: nobody → jshin1987

Status: REOPENED → NEW

John G. Myers

Assignee

Comment 14

•

20 years ago

Attached patch Proposed fix (obsolete) — Details — Splinter Review

Assignee: jshin1987 → jgmyers

Status: NEW → ASSIGNED

Attachment #200801 - Flags: review?(smontagu)

Gervase Markham [:gerv]

Comment 15

•

20 years ago

jgmyers: you are da man! This bug bites me daily, whenever I hit a site with a UK currency pound sign. Gerv

John G. Myers

Assignee

Comment 16

•

20 years ago

*** Bug 181344 has been marked as a duplicate of this bug. ***

John G. Myers

Assignee

Comment 17

•

20 years ago

Attached patch Corrected proposed fix (obsolete) — Details — Splinter Review

Attachment #200801 - Attachment is obsolete: true

Attachment #200808 - Flags: review?(smontagu)

Attachment #200801 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 18

•

20 years ago

Comment on attachment 200808 [details] [diff] [review] Corrected proposed fix Can you add some explanation of how this fixes the bug? Is it the approach suggested in comment 5?

John G. Myers

Assignee

Comment 19

•

20 years ago

Comment 5 has two suggestions. The patch chooses to do only the second: after a high-bit octet I feed the next two non-high-bit octets to the lower detectors. The previous code would only feed the next one non-high-bit octet. I also got rid of the malloc and made the algorithm for what gets removed be unaffected by the input buffer block boundaries.

Simon Montagu :smontagu

Comment 20

•

20 years ago

I'm not seeing any difference in the testcases in here or the duplicate bugs.

John G. Myers

Assignee

Comment 21

•

20 years ago

They're detecting as windows-1252 for me. Perhaps you're somehow loading an old version of a shared library?

Simon Montagu :smontagu

Comment 22

•

20 years ago

That could be, because I didn't build at top level. I'll test again, but I'm afraid it won't be before Sunday.

John G. Myers

Assignee

Comment 23

•

20 years ago

Comment on attachment 200808 [details] [diff] [review] Corrected proposed fix This isn't feeding 8bit data to the MBCS probers when the 8bit data isn't followed by non-8bit data.

Attachment #200808 - Attachment is obsolete: true

Attachment #200808 - Flags: review?(smontagu)

John G. Myers

Assignee

Comment 24

•

20 years ago

Attached patch Corrected fix — Details — Splinter Review

Corrects an off-by-one in the buffer length when passing data to the lower probers. Sends data to lower prober when last char of buffer is 8bit. Adds my employer to Contributors list as this is work-for-hire. Includes some #ifdef DEBUG_jgmyers code from my work tree. The rest of that debugging code will come in a different patch to a different bug.

Attachment #201042 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 25

•

20 years ago

Comment on attachment 201042 [details] [diff] [review] Corrected fix r=smontagu

Attachment #201042 - Flags: review?(smontagu) → review+

John G. Myers

Assignee

Updated

•

20 years ago

Attachment #201042 - Flags: superreview?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

20 years ago

Comment on attachment 201042 [details] [diff] [review] Corrected fix okay, but how hard would it be to merge the duplice code into a single helper function?

Attachment #201042 - Flags: superreview?(roc) → superreview+

John G. Myers

Assignee

Comment 27

•

20 years ago

Fixed on trunk

Status: ASSIGNED → RESOLVED

Closed: 20 years ago → 20 years ago

Resolution: --- → FIXED

Gervase Markham [:gerv]

Comment 28

•

19 years ago

Verified using Amazon. jgmyers: British geeks thank you :-) I wonder how one goes about nominating this for checkin on the Firefox 2.0 track? Gerv

Status: RESOLVED → VERIFIED

Smokey Ardisson (offline for a while; not following bugs - do not email)

Comment 29

•

19 years ago

*** Bug 328456 has been marked as a duplicate of this bug. ***

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.