Closed
Bug 177505
Opened 22 years ago
Closed 19 years ago
Autodetect=Universal misidentifies some text as GB10830, and leaves Encoding menu in wrong state
Categories
(MailNews Core :: Internationalization, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: u32858, Assigned: jgmyers)
References
Details
Attachments
(3 files, 2 obsolete files)
29.71 KB,
message/rfc822
|
Details | |
1.58 KB,
message/rfc822
|
Details | |
4.62 KB,
patch
|
smontagu
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021029 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021029 email displayed as charset are iso-8859-1 are not displayed correctly Reproducible: Always Steps to Reproduce: 1.get an html email 2. 3. Actual Results: see that the £ pound are ?? and other letters have kanji spread occasionally throughout them Expected Results: should display in iso-8859-1 correctly. clickeing view->charset-> iso-8859-1 (and selecting again) fixes this problem, but as the "iso-8859-1" is already highlighted this should not hapen first bug submited in UTF-8
reporter: what locale are you on? is your auto-detect turned on or off? the email messages that are not displayed correctly are not mime encoded? could you please attach a problematic mail to this bug report? thanks.
> > ------- Additional Comments From marina@netscape.com 2002-10-30 09:23 ------- > reporter: what locale are you on? is your auto-detect turned on or off? the > email messages that are not displayed correctly are not mime encoded? could you > please attach a problematic mail to this bug report? thanks. Hi Marina, autodetect was on "universal", it selected iso-8859-1 as highlighted, if i turn it off it uses the default iso-8859-1 i set for all message display in the prefs perhaps this is an autodetect bug. the email was MIME encoded, regards JG $ locale LANG=en_GB.UTF-8 LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= Received: from y01.blackstar.co.uk ([212.250.176.31]) by mail1.tay.ac.uk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2656.59) id VTJMJC8K; Tue, 29 Oct 2002 19:33:38 -0000 Received: (qmail 4331 invoked by uid 1008); 29 Oct 2002 12:37:46 -0000 Date: 29 Oct 2002 12:37:46 -0000 Message-ID: <20021029123746.4330.qmail@y01.blackstar.co.uk> Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary="_----------=_10358950661610159692" MIME-Version: 1.0 X-Mailer: MIME::Lite 1.135 (B2.12; Q2.03) From: update@blackstar.co.uk To: 0013499@tay.ac.uk Reply-To: service@blackstar.co.uk Subject: Who Killed Laura Palmer? This is a multi-part message in MIME format. --_----------=_10358950661610159692 Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain
Comment 4•22 years ago
|
||
shanjian, ftang said this might be related to your recent work on charsets
Comment 5•22 years ago
|
||
"Anything can go wrong will go wrong. " This problem is not caused by my recent change, but it is a problem in universal detector. Before doing multibyte detection, I removed all ascii characters that are not adjacent to high 8-bit. The aim is to improve performance. At that time, gb18030 was not added yet. In gb18030, 0x81~0xfe, 0x30~0x39, 0x81~oxfe, 0x30~0x39 is four bytes characters. Because such sequence will not appears in "almost" any other encoding, I report this immediately. but in this testcase, through filtering, we help create such a sequence, which is 0xa3,0x34,0xa3,0x31. That lead to universal detector mislabel the text as gb18030. This problem probably can be easily fixed by not reporting immediately for such sequence. To add a additional character after high 8-bit will help eliminate gb18030 from consideration.
Assignee: nhotta → shanjian
Comment 6•20 years ago
|
||
This bug was not present with the 1.6 version of Mozilla. It appears in 1.7C1. I do not know if ti was present in version previous to 1.6. This mail is probably wrongly formated, but no error message is displayed. Probably pb with "\n", end line caracters. See mail content and compare with the result in Mozilla.
Updated•20 years ago
|
Attachment #104637 -
Attachment mime type: text/plain → message/rfc822
Updated•20 years ago
|
Attachment #147045 -
Attachment mime type: text/plain → message/rfc822
Comment 7•20 years ago
|
||
In attachment 104637 [details], the charset is not specified. As noted in comment 5, the actual character set that gets used, if Autodetect=Universal, is GB18030. However, the Encoding menu indicates that ISO-8859-1 has been selected, unlike successful cases of detection (e.g. Big5) -- but see bug 163272. Bug 181344 is about the same problem, in the browser. In attachment 147045 [details] (from Franck Depierre), the charset is *illegally* specified, as: Content-Type: text/plain; charset=iso.8859.1 The misdisplay of this message is a different problem, and I've opened bug 251634 for this. This problem does, however, also show the problem of the Encoding menu being incorrectly updated. Due to bug 129443, viewing a message/rfc822 file in the browser is very little help in getting to the root of the encoding problem: the display is still broken, but differently than in Mail/News.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Summary: email displayed as charset are iso-8859-1 are not displayed correctly → Autodetect=Universal misidentifies some text as GB10830, and leaves Encoding menu in wrong state
Comment 8•20 years ago
|
||
shanjian: are you still around, and able to produce a patch for this bug? It causes regular problems for UK users of Mozilla, such as myself :-) I can provide many more example URLs if you need them. Gerv
Comment 9•20 years ago
|
||
Bug 253849 opened for the failure of the Mail/News View|Encoding menu to update, which is a common symptom to some otherwise distinct Mail/News charset bugs. Recommend duping this bug to the generic Auto-Detect bug 181344.
Updated•20 years ago
|
Product: MailNews → Core
Comment 10•19 years ago
|
||
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
Comment 12•19 years ago
|
||
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 13•19 years ago
|
||
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
Assignee | ||
Comment 14•19 years ago
|
||
Comment 15•19 years ago
|
||
jgmyers: you are da man! This bug bites me daily, whenever I hit a site with a UK currency pound sign. Gerv
Assignee | ||
Comment 16•19 years ago
|
||
*** Bug 181344 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 17•19 years ago
|
||
Attachment #200801 -
Attachment is obsolete: true
Attachment #200808 -
Flags: review?(smontagu)
Attachment #200801 -
Flags: review?(smontagu)
Comment 18•19 years ago
|
||
Comment on attachment 200808 [details] [diff] [review] Corrected proposed fix Can you add some explanation of how this fixes the bug? Is it the approach suggested in comment 5?
Assignee | ||
Comment 19•19 years ago
|
||
Comment 5 has two suggestions. The patch chooses to do only the second: after a high-bit octet I feed the next two non-high-bit octets to the lower detectors. The previous code would only feed the next one non-high-bit octet. I also got rid of the malloc and made the algorithm for what gets removed be unaffected by the input buffer block boundaries.
Comment 20•19 years ago
|
||
I'm not seeing any difference in the testcases in here or the duplicate bugs.
Assignee | ||
Comment 21•19 years ago
|
||
They're detecting as windows-1252 for me. Perhaps you're somehow loading an old version of a shared library?
Comment 22•19 years ago
|
||
That could be, because I didn't build at top level. I'll test again, but I'm afraid it won't be before Sunday.
Assignee | ||
Comment 23•19 years ago
|
||
Comment on attachment 200808 [details] [diff] [review] Corrected proposed fix This isn't feeding 8bit data to the MBCS probers when the 8bit data isn't followed by non-8bit data.
Attachment #200808 -
Attachment is obsolete: true
Attachment #200808 -
Flags: review?(smontagu)
Assignee | ||
Comment 24•19 years ago
|
||
Corrects an off-by-one in the buffer length when passing data to the lower probers. Sends data to lower prober when last char of buffer is 8bit. Adds my employer to Contributors list as this is work-for-hire. Includes some #ifdef DEBUG_jgmyers code from my work tree. The rest of that debugging code will come in a different patch to a different bug.
Attachment #201042 -
Flags: review?(smontagu)
Comment 25•19 years ago
|
||
Comment on attachment 201042 [details] [diff] [review] Corrected fix r=smontagu
Attachment #201042 -
Flags: review?(smontagu) → review+
Assignee | ||
Updated•19 years ago
|
Attachment #201042 -
Flags: superreview?(roc)
Comment on attachment 201042 [details] [diff] [review] Corrected fix okay, but how hard would it be to merge the duplice code into a single helper function?
Attachment #201042 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 27•19 years ago
|
||
Fixed on trunk
Status: ASSIGNED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Comment 28•19 years ago
|
||
Verified using Amazon. jgmyers: British geeks thank you :-) I wonder how one goes about nominating this for checkin on the Firefox 2.0 track? Gerv
Status: RESOLVED → VERIFIED
*** Bug 328456 has been marked as a duplicate of this bug. ***
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•