Last Comment Bug 90584 - charset=... must be applied to non-MIME Subject:/From:/To:/etc. fields
: charset=... must be applied to non-MIME Subject:/From:/To:/etc. fields
Status: RESOLVED FIXED
: fixed1.8.1.15, intl
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- enhancement with 17 votes (vote)
: mozilla1.9beta4
Assigned To: Alex Strelnikov
:
Mentors:
news://news.regetsoft.com/regetsoft.p...
: 73202 97314 154948 196734 201037 214510 228133 273196 351541 356778 382886 409677 (view as bug list)
Depends on: 425945
Blocks: 176238
  Show dependency treegraph
 
Reported: 2001-07-12 18:13 PDT by Eugene Savitsky
Modified: 2010-08-30 11:58 PDT (History)
35 users (show)
asa: blocking1.4b-
mscott: blocking‑thunderbird2-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Screen shot of the newsgroup. (123.38 KB, image/jpeg)
2001-07-16 18:16 PDT, nhottanscp
no flags Details
russian newsgroup message with auto-detect=Russian (13.02 KB, image/png)
2002-04-12 06:38 PDT, Helge Hielscher
no flags Details
Typical e-mail message (655 bytes, text/plain)
2002-07-12 04:19 PDT, Max Alekseyev
no flags Details
a possible patch (1.98 KB, patch)
2003-12-13 07:56 PST, Jungshik Shin
no flags Details | Diff | Review
update (10.19 KB, patch)
2003-12-13 16:47 PST, Jungshik Shin
no flags Details | Diff | Review
the current status : screenshot (95.32 KB, image/jpeg)
2003-12-21 00:29 PST, Jungshik Shin
no flags Details
Charset patch (5.18 KB, patch)
2007-11-25 11:58 PST, Alex Strelnikov
no flags Details | Diff | Review
Charset patch #try1 (5.33 KB, patch)
2007-11-26 06:15 PST, Alex Strelnikov
no flags Details | Diff | Review
Charset patch #2 (7.22 KB, patch)
2008-01-29 11:22 PST, Alex Strelnikov
mozilla: review+
mozilla: superreview+
Details | Diff | Review
Charset patch backport for MOZILLA_1_8_BRANCH (7.47 KB, patch)
2008-02-07 02:01 PST, Alex Strelnikov
mozilla: review-
Details | Diff | Review
Charset patch backport for MOZILLA_1_8_BRANCH #2 (7.77 KB, patch)
2008-02-07 23:27 PST, Alex Strelnikov
no flags Details | Diff | Review
Charset patch backport for MOZILLA_1_8_BRANCH #3 (7.76 KB, patch)
2008-02-08 08:10 PST, Alex Strelnikov
mozilla: review+
mozilla: superreview+
samuel.sidler+old: approval1.8.1.13-
dveditz: approval1.8.1.15+
Details | Diff | Review

Description Eugene Savitsky 2001-07-12 18:13:06 PDT
1. Set character coding to Autodetect->Russian.
2. Open this news group. 
3. List thrue messages

Some subjects are not in cyrillic than in smth else. 

If you see message with (for expl: from Mikhail Lyubkin, 02.06.01 16:39) with ?????????????? in the body (see bug 90581) and the subject is also corrupted so make it manually KOI8-R and the subject will appear correctly but only in the message head.
Comment 1 nhottanscp 2001-07-13 10:16:21 PDT
Could you attach the news message?
Also please specify name of the newsgroup, thanks.
Comment 2 Eugene Savitsky 2001-07-15 13:47:27 PDT
Just look at the URL I typed-in this bug:
news://news.regetsoft.com/regetsoft.public.regetjr.support.ru

There are just some messages, so wouldn't miss it.
Comment 3 nhottanscp 2001-07-16 18:15:22 PDT
In that newsgroup, I see some messages which does not have charset specified.
On my machine they are shown as garbage (I will attach a screen shot) but that 
is expected. The user has to change charset manually per message or set the 
default charset by either pref UI or folder property UI.
Comment 4 nhottanscp 2001-07-16 18:16:00 PDT
Created attachment 42492 [details]
Screen shot of the newsgroup.
Comment 5 nhottanscp 2001-07-17 14:26:18 PDT
I did not realize that this is about auto detection.
Auto detection is not applied to message headers because usually not enough data 
available also the cost of setting up auto detection is high.
Comment 6 Helge Hielscher 2002-04-12 06:24:54 PDT
This does not only affect the subject, but all headers. Also it doesnt work when
you are actually viewing a message.
Comment 7 Helge Hielscher 2002-04-12 06:38:50 PDT
Created attachment 78891 [details]
russian newsgroup message with auto-detect=Russian
Comment 8 Helge Hielscher 2002-04-12 06:46:55 PDT
see also the same message with koi8-r encoding
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=78892
from
http://bugzilla.mozilla.org/show_bug.cgi?id=137098
Comment 9 Max Alekseyev 2002-07-12 03:46:43 PDT
Mozilla always uses codepage specified in Message Display properties instead of
specified in the message header.
This is not related to Autodetect feature.
Comment 10 Max Alekseyev 2002-07-12 03:49:05 PDT
For 'Subject' displaying Mozilla always uses codepage specified in 'Message
Display' properties instead of specified in the message header.
This is not related to 'Autodetect' feature.
Comment 11 Max Alekseyev 2002-07-12 04:19:11 PDT
Created attachment 91092 [details]
Typical e-mail message

I see the same problem on OS/2 build 2002071108, so please switch OS->All.

The problem is not only for newsgroups, but also for regular e-mails. A typical
e-mail message is attached. It has
"Content-type: text/plain; charset=windows-1251"
in the header but Mozilla does not apply this for the Subject.

I think summary like
"Codepage used for message displaying must be applied for Subject also"
will be much better.
Comment 12 Max Alekseyev 2002-07-12 21:35:49 PDT
Status changes:
OS: Windows98 -> All
Summary: 'Subject codepage is incorrect with charset auto detect is on.' ->
'Codepage used for message displaying must be applied for Subject also'
Severity: normal -> major
Comment 13 OS2 adept 2002-07-12 23:20:51 PDT
I've experienced the same problem here. Agree with comment #11 and #12
Comment 14 Wojciech Czarnowski 2002-09-17 04:17:47 PDT
I agree with comment #11 and #12.
Message subject should be displayed according to the setting in the header, eg.
"Content-type: text/plain; charset=iso-8859-2" 
in message preview panel and window and also in message folder.
Comment 15 Anton Shevchenko 2002-09-18 00:47:33 PDT
Some additional comment:

when replying to message, mozilla displays the CORRECT subject in both cases:
when subject is not correctly displayed and when subject is correctly displayed.
this problem is actual for 'sender' field also.
Comment 16 Max Alekseyev 2002-12-04 02:26:34 PST
Changing Summary according to recent comments.
Comment 17 Max Alekseyev 2003-01-12 02:04:25 PST
*** Bug 97314 has been marked as a duplicate of this bug. ***
Comment 18 Dmitry Ban 2003-03-18 21:16:55 PST
IMHO the following sequence should be used:

1) if charset of subject is specified (like 'Subject:
=?KOI8-R?Q?=F0=D2=CF=C2=CC=C5=CD=D9_=D3_=D0=CF=DE=D4=CF=CA?='), this charset
should be used to show subject;

2) if charset of subject is not specified, charset of the message (specified in
'Content-Type' header) should be used to show subject;

3) if none is specified, defaults/autodetectiion should be used.
Comment 19 Eugene Savitsky 2003-03-19 12:26:25 PST
Is this actually the same as bug 196734? Here is about news, but symptomps are
the same...
Comment 20 Max Alekseyev 2003-03-19 12:35:08 PST
*** Bug 196734 has been marked as a duplicate of this bug. ***
Comment 21 Max Alekseyev 2003-03-19 12:39:05 PST
I'm pretty sure that there is no distinction between mail and news in respect to
this bug.
Comment 22 Eugene Savitsky 2003-03-20 04:07:18 PST
CC'ing from duped bug.
Comment 23 Mike Cowperthwaite 2003-06-09 11:52:01 PDT
*** Bug 154948 has been marked as a duplicate of this bug. ***
Comment 24 Jungshik Shin 2003-10-22 23:36:28 PDT
re comment #18,

1) and 3) work in Mozilla. The issue is #2. I also listed in the international
known issues part  (of the release notes)
Comment 25 Max Alekseyev 2003-10-23 13:01:59 PDT
Jungshik, I didn't find any references to this bug in Mozilla 1.5 release notes
http://www.mozilla.org/releases/mozilla1.5/known-issues-int.html
Am I looking at a wrong place?
Comment 26 Jungshik Shin 2003-10-23 16:09:51 PDT
IT's in 'Mail' section as quoted below:

> The Character Coding menu cannot correct the thread pane display of non-MIME
encoded headers. (See > RFC 2047.) If the message displayed lacks MIME charset
information, the thread pane display obeys > the default message display charset
set via the folder properties dialog (set via the menu "Edit | > Folder
Properties ..."). Correctly setting this option to the charset most often used
in the 
> user's mail viewing helps avoid display problems of this type.

It's not as direct as you may like (my draft explicitly mentioned it if my
memory serves me right). While word-smithing, we left that detail out. For most
people, this wouldn't be a serious problem. However, I realize that for
Russians, it's rather important because multiple (legacy) character encodings
(Windows-1251, KOI8-R  and ISO-8859-5are used with more or less the same
frequency.  
Anyway, I'll refer to this bug in the document.
Comment 27 Jungshik Shin 2003-10-23 17:02:41 PDT
adding some more mailnews gurus to cc. This bug should be easier to fix than
making the header listing pane respond 'dynamically' to changes made via View |
Character Coding menu.
|Mime_DecodeMimeHeader| (that's a wrapper over
nsIMIMEHeaderParam::decodeRFC2047Header [1]) has a parameter for the default
charset.  


[1]
http://lxr.mozilla.org/seamonkey/source/netwerk/mime/src/nsMIMEHeaderParamImpl.cpp#360
Comment 28 Jungshik Shin 2003-12-11 10:43:17 PST
*** Bug 228133 has been marked as a duplicate of this bug. ***
Comment 29 Jungshik Shin 2003-12-11 10:44:11 PST
*** Bug 201037 has been marked as a duplicate of this bug. ***
Comment 30 Andrey Cherezov 2003-12-12 07:33:54 PST
From http://www.mozilla.org/releases/mozilla1.6b/known-issues-
int.html#mail: "If the message displayed lacks MIME charset information, the 
thread pane display obeys the default message display charset set via the 
folder properties dialog (set via the menu "Edit | Folder Properties ..."). 
Correctly setting this option to the charset most often used in the user's 
mail viewing helps avoid display problems of this type."

But the message NOT lacks MIME charset information! As I posted at the 
http://bugzilla.mozilla.org/show_bug.cgi?id=228133, there is charset in the 
header: "Content-type: text/plain; charset=windows-1251". And the message body 
displayed right. Thunderbird should use the _same_ charset for the headers. 
But it didn't :(

Setting the default charset of the folder to "most often used" not helps, 
because 55% of the headers have "koi8-r" charset, and 45% - "windows-1251". So 
we'll unable to read a half of the 8-bit headers. If Thunderbird will apply 
the rule "for 8bit headers use the same charset as for message body; and use 
default charset only if there no explicit charset in the message body or 
headers" - we will able to read 99.5% of the headers and messages w/o touching 
menu.

So, I agree with  Jungshik Shin 2003-10-23 16:09 :
"I realize that for Russians, it's rather important because multiple (legacy) 
character encodings (Windows-1251, KOI8-R  and ISO-8859-5 are used with more 
or less the same frequency." ISO-8859-5 is not used in the Russian/Cyrillic 
mail now, but Windows-1251 and KOI8-R used with the same frequency, and this 
is bad idea to select one of them as default. Most often charset is mentioned 
for a body, but just ignored by Mozilla/Thunderbird for a headers.

Any chance this problem will be solved in Thunderbird? This bug is the only 
reason why we still using Outlook.
Comment 31 Jungshik Shin 2003-12-12 10:49:26 PST
Please, read RFC 2047 and RFC 2822. Content-Type header is NOT for specifying
the charset of messgeage header fields BUT ONLY for specifying the charset of
the message __body__.

To specify the charset of the messgae header, you should encode your message
header according to RFC 2047 as below:

Subject: =?KOI8-R?Q?........?=

8bit characters in RFC 822/2822 (STD 11) emails are __forbidden__ forever !!!
Comment 32 Jungshik Shin 2003-12-12 11:03:14 PST
As it stands, Mozilla/Thunderbird ARE standard-compliant, but those sending
their emails with raw 8bit characters violate RFC 2822/RFC 2047. [1]

We want to 'fix' this bug not because those messages with 8bit characters are
standard-compliant but because we believe in a network maxim 'be lenient in what
you accept and be strict in what you send out'. 
Please, ask your correspondents to configure their email clients NOT to send out
8bit characters in email headers. Major email clients can be configured that way
if RFC 2047-compliance is not ON by default. With two widely used encodings,
Russians have all the more incensitve to abide by RFC 2047/2822. We'll "fix"
this bug, but in the meantime, why don't you spread the words to your fellow
Russians? 

[1] http://www.faqs.org/rfcs/rfc2047.html
    http://www.faqs.org/rfcs/rfc2822.html

Comment 33 Jungshik Shin 2003-12-12 11:31:21 PST
> 8bit characters in RFC 822/2822 (STD 11) emails are __forbidden__ forever !!!

  Ooops. I meant '8bit characters in the __header__ of RFC 822/2822(STD 11)
email messages are forbidden'. In the message _body_, needless to say,  8bit
characters can be used with Content-Transfer-Encoding: 8bit when ESMTP 8BITMIME
is supported.

As I'm now sitting in front of Windows 2k box, let me tell you how to configure
MS OE to send RFC 2047-compliant emails:

1. Go to Tools | Options. 
2. Click on 'Send' tab. 
3. Click on 'HTML settings' and make sure 'allow 8-bit characters in headers' is
NOT checked
4. Repeat the above for 'Plain text setting' for both mail and news and 'html
setting' for news

I can't find the equivalent option in Outlook, but all the  messages I received
from Ms Outlook users  are compliant to RFC 2047 so that there must be a way to
do this (or it's not user-configurable, but it always sends RFC 2047-compliant
emails, which is a bit atypical of MS  given its usual ignorance/disrespect
of/for the standard)

In case of Mozilla, RFC 2047-compliance is ON by default. Perhaps because it's
regarded as important to observe RFC 2047 (or because it's regarded as a 'bad'
UI), the checkbox that used to be available in Edit|preference|Mail|Send for
turning that off is now gone. You have to go to 'about:config' and set to
'false' 'mail.strictly_mime-headers'. Of course, I recommend you keep the
default (ON).
Comment 34 Max Alekseyev 2003-12-12 11:47:19 PST
Here is my penny. 

Personally, I'm not going to teach everybody in the net not send 8-bit headers.
But there a lot of such messages (including newsgroups), and I want Mozilla to
support them. Note that such support does NOT violate any RFC. We're not asking
about sending 8-bit headers that would be a violation. We're asking about
ability to read them.
Comment 35 Jungshik Shin 2003-12-12 12:10:03 PST
Please, read my comment #32 (the first paragraph). You're free not to teach
everybody to abide by the standard, but that kind of disrespect for the standard
hurts the internet as a whole. 
Comment 36 Jungshik Shin 2003-12-12 12:11:15 PST
This is not a bug per se, but just an enhancement request.
Comment 37 Jungshik Shin 2003-12-12 12:18:16 PST
Sorry for spamming...

> Please, read my comment #32 (the first paragraph)

 I meant the second paragraph of comment #32 (we believe in a network maxim 'be
lenient in what
you accept and be strict in what you send out)
Comment 38 Andrey Cherezov 2003-12-12 13:29:35 PST
Dear Jungshik Shin,
Mozilla browser is also standard compliant, BUT you've spend a lot of time to 
support most of non-startdart legacy html tricks, bugs, old encodings 
(including even Cyrillic DOS-encoding, not used in the internet at all ;) - 
just because you WANT TO READ (be able to read at least) all these old sites. 
And you don't want to teach each site author - "hey, you have wrong formatted 
html - there and there...". And we also unable re-teach 10 millions of russian 
internet users and 100 000 old mail scripts around the world. These scripts 
know nothing about RFC2047 and "default Cyrillic encoding". For example, now I 
will write something in Russian ('Ìàêñ, ãëÿäè-êà, îíè òóò ïðîñòî ðóññêèõ íå 
ëþáÿò :)') - and I'm sure, your modern Bugzilla will send me this mail with 
wrong encoding! :-)) Hey guys, please fix your Bugzilla scrpits now! :) They 
are not compliant with MIME RFCs!
May be you should better just fix this small issue in the Thunderbird, and it 
will become really "The e-mail and newsgroup client for 2004 and beyond" - for 
Russians too.

Thanks, 
Regards,
Comment 39 Jungshik Shin 2003-12-12 17:05:37 PST
Hey, how many times did I have to repeat? Don't you understand what I
meant by that we believe in a maxim that 'be lenient in what we accept
and be strict in what you send out'(comment #32) [1] or did you decide
to ignore that?  Please, press 'show votes for this bugs' button and
see who voted for this bug. I had voted for this bug before most of
you did. See also who wrote 'international known issues' document(press
'Document history' button at the bottom of the page). I co-authored it
with Kats and it's I who insisted that this issue had to be listed.

[1] This doesn't mean that you can keep violating the standard. Some
people believe that we must be strict in what you accept as well because
otherwise non-standard products would never be fixed. I'm not one of
them (at least in this case). I AM willing to fix this bug. If not,
I would have resolved this bug as 'INVALID or WONT FIX'.

You have to/must 'enlighten' people around you about the standard and
the importance of adhering to the standard. Your time would be much
better spent teaching one more friend/correspondent of yours and/or
authors/administrators of web mail scripts/web mail services about the
standard than making me keep saying the same thing over and over again.
A couple of years ago, I wrote a couple of times to the largest Korean
web mail service provider and was successful in persuading them to fix
their web mail service to be compliant to RFC 2047. Please, do something
instead of just saying that there are too many non-standard email messages
going around.

Incidentally, as opposed to what you wrote, Mozilla rejects quite a lot of
non-standard constructs (in html/css/dom) as invalid/illegal/non-standard.
That's why  we have 'tech-evangelism' components to teach people (web
page authors and web site administrators) what's wrong with their web
pages/sites and how to fix them.
Comment 40 Jungshik Shin 2003-12-13 07:56:16 PST
Created attachment 137356 [details] [diff] [review]
a possible patch

have yet to build a mozilla with this (my Win2k SP4 became terribly slow in
accessing my Samba server where Mozilla source is after installing SP4...)
Comment 41 Jungshik Shin 2003-12-13 16:47:23 PST
Created attachment 137375 [details] [diff] [review]
update

It still doesn't work. In the message display pane, it works, but in the
message list(thread) pane, it doesn't. I thgouth 'charset' stored in msgDB is
the body charset (for simple 'single part' messages), but it's just empty.
Comment 42 Jungshik Shin 2003-12-14 13:39:46 PST
*** Bug 73202 has been marked as a duplicate of this bug. ***
Comment 43 Jungshik Shin 2003-12-14 13:51:38 PST
It doesn't work (msgDB doesn't have the 'body charset' information stored)
because when we get to nsParseMailMessageState::ParseHeaders, |m_headers| has
only a part of the message header[1]. Especially, it doesn't have 'Content-Type'
header so that down in the method, the attempt to set |m_content_type| always
fails (actually, Mozilla never gets there [2]).

857     case 'C': case 'c':
858       if (!nsCRT::strncasecmp ("CC", buf, end - buf))
859         header = GetNextHeaderInAggregate(m_ccList);
860       else if (!nsCRT::strncasecmp ("Content-Type", buf, end - buf))
861         header = &m_content_type;
862       break;


I have to track down why m_headers.GetBuffeerPos() doesn't point to the end of
the header (excluding Content-Type and Content-Transfer-Encoding header fields)

837 int nsParseMailMessageState::ParseHeaders ()
838 {
839   char *buf = m_headers.GetBuffer();
840   char *buf_end = buf + m_headers.GetBufferPos();
841   while (buf < buf_end)
842   {

[1] http://lxr.mozilla.org/mozilla/source/mailnews/local/src/nsParseMailbox.cpp#839
[2] http://lxr.mozilla.org/mozilla/source/mailnews/local/src/nsParseMailbox.cpp#860

Assiging to myself for a better tracking (for me)
Comment 44 Max Alekseyev 2003-12-14 13:58:49 PST
btw, if charset= is not specified but autodetected for the body, it should be
applied to non-MIMEd header fields as well.

Jungshik, does your patch work this way?
Comment 45 Andrey Cherezov 2003-12-15 08:19:23 PST
Thank you very much for patch! How to test it? (I'm unable to recompile 
Mozilla Thunderbird under Windows).
Comment 46 Jungshik Shin 2003-12-19 08:47:14 PST
bienvenu and mscott,
can you help me with the problem described in comment #43?

re: comment #44. No, my patch doesn't address that. It's a different issue. And,
please 'enlighten' your friends to use standard-compliant mail clients. Not even
specifying the body charset (in C-T) is a gross violation of the standard. If
some Russian web mail services do that, file a tech-evangelism bug, put me in
the CC list and I'll write to them. If you're talking about
'multipart/alternative' and other C-T with text/* part(s), we have to deal with
it in a separate bug. Again, this shows why RFC 2047 has to be abided by and why
RFC 2047 stipulates that the character encoding be explicitly specified in each
header field. If everybody observed the standard, we'd dispense with so much
'defensive' stuffs like this. 

re: comment #45: sorry I don't have time to put up a binary build with the patch
for you to try. 
Comment 47 Jungshik Shin 2003-12-21 00:29:48 PST
Created attachment 137782 [details]
the current status : screenshot

The folder default charset is KOI8-R. Both messages in the folder have raw 8bit
characters in Subject header (the first one in KOI8-R and the second one in
Windows-1251). As you can see, the message list pane 'interprets' the subject
of the second as in KOI8-R while the subject in the message display area is
interpreted as in Windows-1251 (that of the body charset). 

I manually edited my mailbox to put 'Content-Type' header field before
'Subject' header field, but that didn't change anything.
Comment 48 Max Alekseyev 2004-03-05 21:05:14 PST
any chance to get this fixed in 1.7b ?
Comment 49 Katsuhiko Momoi 2004-03-05 23:39:00 PST
This is just a comment about the difficulty factor in adding this new behavior.
This is a nice-to-have addition but it may not be easy since the info you need
at the time you need it in the code may not be easy to get. The body thread and
header thread go in 2 different threads within the Mozilla code. See this
diagram for better understanding of the data flow:

http://wp.netscape.com/eng/intl/docs/iuc17/mail/slide05.html

and hope that Jungshik and others can figure out how to get this done
eventually. Thanks for trying it, Jungshik. 

Those who have any influence at all in the Mail and News fields of Russian,
please evangelize the use of MIME-compiant news posting software -- whether or
not this feature request gets done.
Comment 50 Andrey Cherezov 2004-03-06 03:36:26 PST
> The body thread and header thread go in 2 different threads within the 
Mozilla code.

But the required charset (for Subject) is present in the header! Not need to 
hack a body thread to fetch something.

This rule is easy: if the Subject contains 8-bit characters with unknown 
charset - get and use the charset from the 'charset=' parameter in 
the 'Content-Type' field of the message _header_. That's all !

If 'Content-Type:...charset=' in the header is missimg too - then use the 
Default charset.

> Those who have any influence at all in the Mail and News fields of Russian,
please evangelize the use of MIME-compiant news posting software -- whether or
not this feature request gets done.

You mean, we need to ask you to disable existing feature of posting messages 
with 8-bit headers in Mozilla?
Comment 51 Jungshik Shin 2004-03-09 08:13:35 PST
(In reply to comment #50)

> This rule is easy: if the Subject contains 8-bit characters with unknown 
> charset - get and use the charset from the 'charset=' parameter in 
> the 'Content-Type' field of the message _header_. That's all !

 Everybody here knows what this bug is about. You don't have to explain it again.
However, it's not that simple to implement that. Header fields in RFC (2)822 /
STD11 messages come in any order so that Content-Type heeader field come at the
very end while Subject, From come well before that. That's yet another rationale
for RFC 2047-encoding.I'm pretty sure that the authors of RFC 2047 had this in
mind because making each header field self-contained makes it the life of
implementors (e.g. Mozilla developers) so much easier.

 Anyway, what's needed here is a sort of 'callback' we can invoke to change the
header character encoding when necessary. 



> > Those who have any influence at all in the Mail and News fields of Russian,
> please evangelize the use of MIME-compiant news posting software -- whether or
> not this feature request gets done.
> 
> You mean, we need to ask you to disable existing feature of posting messages 
> with 8-bit headers in Mozilla?

It's disabled by default, isn't it? As I wrote a couple of times before, you
Russians have a more compelling reasons to abide by RFC 2047 than others because
you have two widely used character encodings.  So, why don't you evagelize that
as Kat suggested? 
 
Comment 52 Dmitriy Velikodniy 2004-09-18 13:32:17 PDT
Can anyone give an update on the status of this bug?
Comment 53 Mike Cowperthwaite 2005-02-11 06:48:41 PST
*** Bug 273196 has been marked as a duplicate of this bug. ***
Comment 54 Artem Kuroptev 2005-12-12 03:27:13 PST
I am using Thunderbird 1.5/20051111, and it seems like I have found a workaround for this bug. I am not sure if it will work with later versions, however I think it may be useful.

See demo image:
http://kitoy.ru/ak/thunderbird/thunderbird-broken-header-encoding-for-folders-workaround.gif

The workaround:
Set windows-1251 as a default encoding for Inbox.
Create a new subfolder named koi and create a filter to move all koi-encoded messages into it. Set koi8-r as a default encoding for the koi folder.
Create a view (Inbox-test Mail in the picture) which shows all messages from Inbox and koi folders. 
Done.

However, there are two another bugs:
1.	“Sort by” do not work right for views.
2.	“Get flagged messages” do not work right for views.

Sorry for my English.
Comment 55 David Fraser 2006-09-18 07:16:42 PDT
bug 351541 may be a dupe of this
Comment 56 Mike Cowperthwaite 2006-11-11 11:54:03 PST
*** Bug 214510 has been marked as a duplicate of this bug. ***
Comment 57 Eugene Savitsky 2006-11-12 17:52:02 PST
I do still experience this bug in 3.0 nightlies. Any change to fix this for 3.0?

There is no way for nominating bugs for 3.0 yet, but this affects many languages and is defenetly worth be nominated for 3.0.
Comment 58 Frédéric Buclin 2006-11-14 02:30:00 PST
Why waiting for Tb 3.0? We didn't release Tb2.0b1 yet, I see no reason to wait for 3.0.
Comment 59 Jungshik Shin 2006-11-14 03:54:37 PST
I like to fix this bug for TB 2.0, but I'm entirely tied up until the end of December. If someone can come forward and fix this before TB 2.0, it'd be great. If it's not fixed by then, I'll take another look early next year. 


Comment 60 Mike Cowperthwaite 2006-11-14 09:00:06 PST
*** Bug 351541 has been marked as a duplicate of this bug. ***
Comment 61 Mike Cowperthwaite 2007-01-08 19:55:50 PST
*** Bug 356778 has been marked as a duplicate of this bug. ***
Comment 62 Scott MacGregor 2007-01-30 17:05:29 PST
Looks like we didn't get any traction on this and time is starting to run out to make Thunderbird 2.
Comment 63 Phil Ringnalda (:philor) 2007-06-01 16:19:42 PDT
*** Bug 382886 has been marked as a duplicate of this bug. ***
Comment 64 Alex Strelnikov 2007-11-25 11:58:23 PST
Created attachment 290131 [details] [diff] [review]
Charset patch
Comment 65 Alex Strelnikov 2007-11-25 12:00:41 PST
Message charset should be used to display subject, sender and recipient correctly if specified. Otherwise default folder charset will be used.
Comment 66 Alexander L. Slovesnik 2007-11-25 12:22:22 PST
(In reply to comment #64)
> Created an attachment (id=290131) [details]
> Charset patch
Alex, you should request review for patch from specific person. See
http://developer.mozilla.org/en/docs/Getting_your_patch_in_the_tree#Getting_the_patch_reviewed
Comment 67 Alex Strelnikov 2007-11-26 06:15:16 PST
Created attachment 290213 [details] [diff] [review]
Charset patch #try1

Forgot to take into account "Charset Override", if the user changes charset manually.
Comment 68 David :Bienvenu 2007-11-27 07:33:06 PST
Jungshik, are you still tracking this, and does this look like what you had in mind?
Comment 69 WADA 2007-12-24 15:38:49 PST
*** Bug 409677 has been marked as a duplicate of this bug. ***
Comment 70 Alex Strelnikov 2008-01-29 11:22:50 PST
Created attachment 300087 [details] [diff] [review]
Charset patch #2

Synced with the latest cvs tree. Added new message alert charset fix.

base/util/nsMsgDBFolder.cpp - New message alert fix. Detect charset correctly if Content-Type header is multiline.

db/msgdb/src/nsMsgDatabase.cpp - Message list pane fix.
- mime/src/mimemsg.cpp - Message pane fix.
Use message header charset to display header fields in message panes.
Comment 71 David :Bienvenu 2008-02-04 11:39:44 PST
Comment on attachment 300087 [details] [diff] [review]
Charset patch #2

thx for the patch, Alex
Comment 72 Reed Loden [:reed] (use needinfo?) 2008-02-05 22:42:21 PST
Checking in mailnews/base/util/nsMsgDBFolder.cpp;
/cvsroot/mozilla/mailnews/base/util/nsMsgDBFolder.cpp,v  <--  nsMsgDBFolder.cpp
new revision: 1.328; previous revision: 1.327
done
Checking in mailnews/db/msgdb/src/nsMsgDatabase.cpp;
/cvsroot/mozilla/mailnews/db/msgdb/src/nsMsgDatabase.cpp,v  <--  nsMsgDatabase.cpp
new revision: 1.390; previous revision: 1.389
done
Checking in mailnews/mime/src/mimemsg.cpp;
/cvsroot/mozilla/mailnews/mime/src/mimemsg.cpp,v  <--  mimemsg.cpp
new revision: 1.60; previous revision: 1.59
done
Comment 73 Alex Strelnikov 2008-02-07 02:01:07 PST
Created attachment 301853 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH

Would be good to apply this patch to Thunderbird 2.0 (MOZILLA_1_8_BRANCH) too.
Comment 74 David :Bienvenu 2008-02-07 07:50:14 PST
Comment on attachment 301853 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH

I think you're leaking strings here:

+      err = RowCellColumnToCharPtr(row, m_messageCharSetColumnToken, &charSet);
+      if (NS_FAILED(err) || !*charSet || !PL_strcasecmp(charSet, "us-ascii") ||
+          characterSetOverride)
+      {
+        m_dbFolderInfo->GetCharPtrCharacterSet(&charSet);
+      }
+

In RowCellColumntoCharPtr allocates memory for the out string, which you're not freeing, and you're overwriting. And GetCharPtrCharacterSet also allocates memory.
Comment 75 Alex Strelnikov 2008-02-07 23:27:14 PST
Created attachment 302087 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #2

Sorry, messed up things with svn HEAD in previous patch, added freeing code.
But actually there were 'charSet' out string leaks in nsMsgDatabase::RowCellColumnToMime2DecodedString and nsMsgDatabase::RowCellColumnToCollationKey which didn't free allocated string. Fixed too.
Comment 76 David :Bienvenu 2008-02-08 07:38:09 PST
Comment on attachment 302087 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #2

Can you use PR_Free instead of PR_FREEIF? The difference is that PR_FREEIF nulls out charset, which you don't need (they both check for null args, ultimately).

Thx!
Comment 77 Alex Strelnikov 2008-02-08 08:10:26 PST
Created attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

Changed patch to use PR_Free instead of PR_FREEIF. Thank for the hints.
Comment 78 David :Bienvenu 2008-02-08 10:43:57 PST
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

thx for the patch!

Nominating for 1.8.1.13, though it should get some trunk baking.
Comment 79 Daniel Veditz [:dveditz] 2008-02-13 11:25:55 PST
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

clearing branch approval request for now (not denying) -- please re-request after a week or two of trunk testing.
Comment 80 Alexander L. Slovesnik 2008-03-04 11:19:24 PST
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

That patch was baked on trunk for month with no visible regressions.
It fixes very old bug.
Comment 81 Samuel Sidler (old account; do not CC) 2008-03-07 11:06:35 PST
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

It's too late in the cycle to take this now. We'll look at it for 1.8.1.14.
Comment 82 Mike Cowperthwaite 2008-03-08 11:14:38 PST
FWIW, I've tested this on my batch of pathologically-encoded emails and it seems to work sanely in all cases.  I do have one bad message which looks worse with this patch than without, but that's because the header was hardcoded with ISO-8859-1 while the charset was ISO-2022-JP.  Nice work!
Comment 83 Daniel Veditz [:dveditz] 2008-03-26 11:33:11 PDT
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

approved for 1.8.1.14, a=dveditz for release-drivers
Comment 84 Reed Loden [:reed] (use needinfo?) 2008-03-28 02:00:39 PDT
MOZILLA_1_8_BRANCH:

Checking in mailnews/base/util/nsMsgDBFolder.cpp;
/cvsroot/mozilla/mailnews/base/util/nsMsgDBFolder.cpp,v  <--  nsMsgDBFolder.cpp
new revision: 1.257.2.27; previous revision: 1.257.2.26
done
Checking in mailnews/db/msgdb/src/nsMsgDatabase.cpp;
/cvsroot/mozilla/mailnews/db/msgdb/src/nsMsgDatabase.cpp,v  <--  nsMsgDatabase.cpp
new revision: 1.355.4.15; previous revision: 1.355.4.14
done
Checking in mailnews/mime/src/mimemsg.cpp;
/cvsroot/mozilla/mailnews/mime/src/mimemsg.cpp,v  <--  mimemsg.cpp
new revision: 1.55.28.1; previous revision: 1.55
done
Comment 85 Mike Kaply [:mkaply] (Out June 27-July 5) 2008-03-29 17:28:28 PDT
This appears to have caused a regression - bug 425945
Comment 86 Eyal Rozenberg 2008-03-30 11:36:34 PDT
This patch has corrected the display of some message subjects and broken the display of others:

https://bugzilla.mozilla.org/show_bug.cgi?id=425945

I don't see why the message body's charset 'must' be applied to the headers (and I couldn't find an explanation skimming through the comments here) - please explain the reason.
Comment 87 Alex Strelnikov 2008-03-31 01:09:53 PDT
In a few words message charset must be applied to the header fields because that's the only way to get messages with different charsets be displayed correctly in the one folder.
Comment 88 Eyal Rozenberg 2008-03-31 11:18:43 PDT
(In reply to comment #87)
But that's not true, the header can specify it's own encoding, e.g.

Subject: Message 4 =?windows-1255?Q?=E3=E1_=F9=F1_=E1=EE=F8=F7=E7?=
 =?windows-1255?Q?=E4=5D?=

shows up as

Message 4 דב שס במרקחה]

with or without this patch, regardless of the charset of any of the mime parts.

Also, applying the body charset to the header doesn't make messages with different charset display correctly, it makes _some_ such messages display correctly while breaking the display of others...

I therefore ask that this bug be reopened pending further discussion.
Comment 89 Andrey Cherezov 2008-03-31 11:35:04 PDT
Right. The message body's charset should be applied to headers ONLY if encoding isn't specified in the header encoding itself. I.e. only for "raw" headers, not encoded with base64 or QP.
Comment 90 Eyal Rozenberg 2008-03-31 11:46:01 PDT
I think that not even then. Since it's perfectly possible to specify a different charset for a header and for MIME parts, I see no good reason to assume raw non-ASCII headers should be interpreted using some MIME part's charset rather than the folder's default. If you were to make it a _preference_, that's something else, but that's not what you've done.

Alex: If you have some unruly non-standards-compliant messages you want to manipulate, write an extension or something, don't change the default behavior. Or how about this - why not create a folder with a different default charset and use a filter to put your messages there?
Comment 91 Frédéric Buclin 2008-03-31 11:52:58 PDT
(In reply to comment #90)
> I think that not even then. Since it's perfectly possible to specify a
> different charset for a header and for MIME parts, I see no good reason to
> assume raw non-ASCII headers should be interpreted using some MIME part's
> charset rather than the folder's default.

This doesn't make sense either. This means the same email would be displayed differently depending on *your* default folder rather than making the assumption it's encoded using the same charset as the body of the email.

I get emails with different encodings quite often, and that's probably the best assumption the email client can do, IMHO. Filtering is a bad workaround, really.
Comment 92 Eyal Rozenberg 2008-03-31 12:04:28 PDT
(In reply to comment #91)
> This doesn't make sense either. This means the same email would be displayed
> differently depending on *your* default folder rather than making the
> assumption it's encoded using the same charset as the body of the email.

But that's exactly what per-folder default charsets are for.

> I get emails with different encodings quite often, and that's probably the 
> best assumption the email client can do, IMHO.

But _I_ only get emails which this change screws up, so this change seems to me like a much worse assumption for the email client make. And the reason I said it's _much_ worse is that at least you have the 'bad workaround' of filtering;
I don't don't _any_ workaround now. A workaround isn't that bad if people can use it without causing other people trouble.

If you need this change this bad, make it a preffed-off alternative.
Comment 93 Eyal Rozenberg 2008-03-31 12:06:54 PDT
I'm sorry, I spoke too soon. I do have an alternative, I can use the charset 'forcing' mechanism.

So I'm changing my request: Make the behavior change preffed-on.
Comment 94 Andrey Cherezov 2008-03-31 12:12:55 PDT
Eyal, default charset don't work for about half of messages.

This patch will solve all encoding problems, if it applied only for raw headers with unknown encoding.

Eyal> "Since it's perfectly possible to specify a different charset for a header and for MIME parts,..."

It is possible, but if different charsets are actually used in different MIME-parts - its encodings always specified in these parts (in other case nobody will be able to read it), and in this case Thunderbird should use them. Our talks are about some old programs - most often web scripts - which correctly encode body, but forget to encode Subject (because Outlook Express applied body's encoding to headers and all non-Mozilla readers can read it!!). With this patch Thunderbird will 'emulate' Outlook Express / Windows Mail behavior, and all will be happy.
Comment 95 Alex Strelnikov 2008-03-31 12:15:25 PDT
(In reply to comment #88)
> (In reply to comment #87)
> But that's not true, the header can specify it's own encoding, e.g.
> 
> Subject: Message 4 =?windows-1255?Q?=E3=E1_=F9=F1_=E1=EE=F8=F7=E7?=
>  =?windows-1255?Q?=E4=5D?=
> 
> shows up as
> 
> Message 4 דב שס במרקחה]

Eyal, please read carefully. ...'must be applied to non-MIME fields'...
You're talking about MIME encoded subject. Fields like in your example have priority and will be displayed correctly anyway.
Comment 96 Sergey Svishchev 2008-03-31 12:18:45 PDT
(In reply to comment #88)
> Also, applying the body charset to the header doesn't make messages with
> different charset display correctly, it makes _some_ such messages display
> correctly while breaking the display of others...

It improves display of messages with _known_ charset (koi8-r, windows-1251, etc
etc).  'iso-8859-I' isn't known, perhaps it should be made an alias to
'iso-8859-8-I'.

(In reply to comment #90)
> Alex: If you have some unruly non-standards-compliant messages you want to
> manipulate, write an extension or something, don't change the default behavior.

First, there are just too many of these 'unruly' messages out there.  Yeah,
they do not comply to RFC, but it's not MUA's job to enforce these -- its job
is to let users read their mail in their language.  The user won't go seeking
the magic extension, she'll just won't use the software -- especially when the
well-known alternative (MS Outlook) handles this just fine.

> Or how about this - why not create a folder with a different default charset
> and use a filter to put your messages there?

For each charset that you may receive mail in?  Ridiculous.
Comment 97 Karsten Düsterloh 2008-03-31 12:35:18 PDT
(In reply to comment #89)
> Right. The message body's charset should be applied to headers ONLY if encoding
> isn't specified in the header encoding itself. I.e. only for "raw" headers, not
> encoded with base64 or QP.

Wait a second - you are applying some random MIME part charset to perfectly valid ASCII header? This surely will fail terribly if the applied charset is not "ASCII-synced" like eg. Unicode is, for instance if the MIME part encoding is EBCDIC...
Comment 98 Simon Montagu :smontagu 2008-03-31 12:41:51 PDT
(In reply to comment #96)
> 'iso-8859-I' isn't known, perhaps it should be made an alias to
> 'iso-8859-8-I'.

I did wonder about that when reading earlier comments. Do we have any idea how common it is and whether it only occurs when iso-8859-8-I is intended? In theory it might also be an error for iso-8859-6-I.

If it only occurs in mails from a particular forum, it's presumably a configuration error, and would be better handled by reporting to the forum admins.
Comment 99 Andrey Cherezov 2008-03-31 12:56:56 PDT
Karsten> Wait a second - you are applying some random MIME part charset to perfectly valid ASCII header?

No. ASCII is really perfectly valid and shouldn't be touched by this patch. Raw headers (Cyrillic, Unicode, etc) has non-ASCII 8-bit chars.

if(Is8bit(header))apply_patch();
Comment 100 Eyal Rozenberg 2008-03-31 13:01:36 PDT
I'd like to add a piece of info for those who aren't aware: Many MIME part body
(as opposed to header) charset issues, and practically all such issues relevant
to email in Hebrew and Arabic, are handled by an extension I co-wrote, which
does further auto-detection work over Mozilla's, and also takes care of
mixed-charset cases:

https://addons.mozilla.org/en-US/thunderbird/addon/310

so for users like me, the charset defaults for folders effectively 'strongly'
apply only to the subjects and not anything else...

Sergey Svishchev: If this is so much of a problem that you simply _must_ change
the default behavior, then you have to change it so as to accommodate what
currently works well and not break it. How? Do some auto-detection on the
subject line before deciding which charset to use for it; at least don't give
me subject lines which are all \uFFFD's are or all àéùé's (accented latin
letters) etc. Or think of something else. You can't just break some
non-RFC-compliant messages in favor of others (which, by the way, Outlook also
handles fine, and after your change you lose Outlook parity on).
Comment 101 Andrey Cherezov 2008-03-31 13:05:07 PDT
Simon> If it only occurs in mails from a particular forum, it's presumably a
configuration error, and would be better handled by reporting to the forum
admins.

No. There are millions of such forums, CMS, etc-etc (including used on your site). For example, ALL PhpBB forums. It is impossible to ask all existend admins to fix their scripts.

Thunderbird users just wants to read their mail, not to fix entire Internet bugs.
Comment 102 Eyal Rozenberg 2008-03-31 13:09:39 PDT
(In reply to comment #101)
> For example, ALL PhpBB forums. 

Hey, that's my example too :-) ... your change is messing up my messages from phpBB fora. Except mine have both the body charset wrong and the header charset unspecified.
Comment 103 Alexander L. Slovesnik 2008-03-31 13:23:56 PDT
(In reply to comment #100)
> I'd like to add a piece of info for those who aren't aware: Many MIME part body
> (as opposed to header) charset issues, and practically all such issues relevant
> to email in Hebrew and Arabic, are handled by an extension I co-wrote, which
> does further auto-detection work over Mozilla's, and also takes care of
> mixed-charset cases:
> 
> https://addons.mozilla.org/en-US/thunderbird/addon/310
> 
> so for users like me, the charset defaults for folders effectively 'strongly'
> apply only to the subjects and not anything else...
> 
Your extension managed to fix subject lines in the message header display, but doesn't managed to fix them in list of messages in a folder, as you've said in Bug 425945 comment #0
So my guess, that either your extension should be fixed to take care of subject lines in the message list, or additional patch needed to display subject lines in message header and message list consistenly.  

Comment 104 Alexander L. Slovesnik 2008-03-31 13:45:55 PDT
(In reply to comment #96)
> 'iso-8859-I' isn't known, perhaps it should be made an alias to
> 'iso-8859-8-I'.
I just tested it. I've added attachment from Bug 425945 comment #7 to Thunderbird Mail folder. Then I've opened %thunderbirdprogramfolder%/res/charsetalias.properties, found section:

# Aliases for ISO-8859-8-I
#
csiso88598i=ISO-8859-8-I
iso-8859-8i=ISO-8859-8-I

and added string:
iso-8859-i=ISO-8859-8-I

Then I've restarted Thunderbird and attachment from Bug 425945 comment #7 was shown properly.
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.8.1.14pre) Gecko/20080329 Thunderbird/2.0.0.14pre ID:2008032903

Note You need to log in before you can comment on or make changes to this bug.