Closed Bug 301915 Opened 19 years ago Closed 9 years ago

Improve charset/encoding Auto Detect to handle ISO-8859-2

Categories

(Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: petr, Assigned: smontagu)

References

()

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.10) Gecko/20050723
Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.10) Gecko/20050723

If default character encoding is set to iso-8859-2 and Auto-detect is set to
universal, messages without character encoding header (generated by Outlook
Express for example) written in iso-8859-2 encoding are displayed in iso-8859-1
encoding instead.

Reproducible: Always

Steps to Reproduce:
1. Set default character set to ISO-8859-2
2. Set Auto-Detect to Universal
3. Go to news://microsoft.public.cs.windows for example
4. Find message without correct MIME header written with accented characters

Actual Results:  
Accented characters are displayed as in iso-8859-1 charset.

Expected Results:  
Accented characters shoud be displayed as in iso-8859-2 charset.

To display the message in the right charset, it is necesary to change default
charset to some other and then back to iso-8859-2. For every message separately.
Or to switch Auto-Detect off.

The same behavior on Windows 98 and Windows XP.
Please find an example message, save it as a .EML file, and attach it to this 
bug (using the Create New Attachment link above).

I believe Auto Detect/Universal only identifies ISO-8859-1 vs. Win-1252 vs. 
UTF-8, as well as some subset of various Chinese, Russian and perhaps some other 
encodings.  If so, detection of the various flavors of 8859 would be an 
enhancement, and one I'm guessing would be fairly difficult to implement.

FWIW, I run with Auto Detect turned off.
Assignee: mail → smontagu
Component: MailNews: Main Mail Window → Internationalization
OS: Windows XP → All
Product: Mozilla Application Suite → Core
QA Contact: amyy
Hardware: PC → All
Summary: Incorrect characrter encoding used for messages without character encoding in the header → Incorrect character encoding used for messages without character encoding in the header
Version: unspecified → Trunk
This message is wrongly displayed in iso-8859-1 charset if default charset is
iso-8859-2 and Auto-Detect is universal.
(In reply to comment #1)
> Please find an example message, save it as a .EML file, and attach it to this 
> bug (using the Create New Attachment link above).
> 
Done. In this text, several times the ISO-8859-2 character xB9  <U0161>  LATIN
SMALL LETTER S WITH CARON is displayed as 3/4.
> I believe Auto Detect/Universal only identifies ISO-8859-1 vs. Win-1252 vs. 
> UTF-8, as well as some subset of various Chinese, Russian and perhaps some other 
> encodings.  If so, detection of the various flavors of 8859 would be an 
> enhancement, and one I'm guessing would be fairly difficult to implement.
> 
I don't know what is concept of "Auto-Detect" and "Default" character encoding
settings, but I'd suppose that if I have iso-8859-2 as default and message is in
iso-8859-2 (without proper MIME headers) it should not be changed to iso-8859-1
by Auto-Detect feature. It is not necessary to distinguish between various
flavors of iso-8859, but then the default one should be chosen.

> FWIW, I run with Auto Detect turned off.

Of course, this is possible, but if there is a mix of utf-8 and iso-8859-2
messages not very practical I think.
See bug 115114, especially comments 6 and 14. However, since we now have a
Latin-1 detector, we might want to experiment with turning the Latin-2 detector
back on.

(In reply to comment #3)
> I don't know what is concept of "Auto-Detect" and "Default" character encoding
> settings, but I'd suppose that if I have iso-8859-2 as default and message is in
> iso-8859-2 (without proper MIME headers) it should not be changed to iso-8859-1
> by Auto-Detect feature.

The encoding detected by autodetection has higher priority than your default
encoding.
There are some other 8859-x schemes for Latin-based alphabets, such as -15, that 
might be included under this.
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Incorrect character encoding used for messages without character encoding in the header → Improve charset/encoding Auto Detect to handle ISO-8859-2
Blocks: 264871
QA Contact: amyy → i18n
 The "universal" detector is gone.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: