Closed Bug 90584 Opened 23 years ago Closed 16 years ago

charset=... must be applied to non-MIME Subject:/From:/To:/etc. fields

Categories

(MailNews Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9beta4

People

(Reporter: ezh, Assigned: lelik52)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Keywords: fixed1.8.1.15, intl)

Attachments

(6 files, 6 obsolete files)

1. Set character coding to Autodetect->Russian.
2. Open this news group. 
3. List thrue messages

Some subjects are not in cyrillic than in smth else. 

If you see message with (for expl: from Mikhail Lyubkin, 02.06.01 16:39) with ?????????????? in the body (see bug 90581) and the subject is also corrupted so make it manually KOI8-R and the subject will appear correctly but only in the message head.
Could you attach the news message?
Also please specify name of the newsgroup, thanks.
Keywords: intl
Just look at the URL I typed-in this bug:
news://news.regetsoft.com/regetsoft.public.regetjr.support.ru

There are just some messages, so wouldn't miss it.
In that newsgroup, I see some messages which does not have charset specified.
On my machine they are shown as garbage (I will attach a screen shot) but that 
is expected. The user has to change charset manually per message or set the 
default charset by either pref UI or folder property UI.
I did not realize that this is about auto detection.
Auto detection is not applied to message headers because usually not enough data 
available also the cost of setting up auto detection is high.
Status: NEW → ASSIGNED
Summary: Subject codepage is incorrect → Subject codepage is incorrect with charset auto detect is on.
Target Milestone: --- → Future
This does not only affect the subject, but all headers. Also it doesnt work when
you are actually viewing a message.
Mozilla always uses codepage specified in Message Display properties instead of
specified in the message header.
This is not related to Autodetect feature.
For 'Subject' displaying Mozilla always uses codepage specified in 'Message
Display' properties instead of specified in the message header.
This is not related to 'Autodetect' feature.
Attached file Typical e-mail message
I see the same problem on OS/2 build 2002071108, so please switch OS->All.

The problem is not only for newsgroups, but also for regular e-mails. A typical
e-mail message is attached. It has
"Content-type: text/plain; charset=windows-1251"
in the header but Mozilla does not apply this for the Subject.

I think summary like
"Codepage used for message displaying must be applied for Subject also"
will be much better.
Status changes:
OS: Windows98 -> All
Summary: 'Subject codepage is incorrect with charset auto detect is on.' ->
'Codepage used for message displaying must be applied for Subject also'
Severity: normal -> major
Severity: normal → major
OS: Windows 98 → All
Hardware: Other → All
Summary: Subject codepage is incorrect with charset auto detect is on. → Codepage used for message displaying must be applied for Subject also
Target Milestone: Future → ---
I've experienced the same problem here. Agree with comment #11 and #12
I agree with comment #11 and #12.
Message subject should be displayed according to the setting in the header, eg.
"Content-type: text/plain; charset=iso-8859-2" 
in message preview panel and window and also in message folder.
Some additional comment:

when replying to message, mozilla displays the CORRECT subject in both cases:
when subject is not correctly displayed and when subject is correctly displayed.
this problem is actual for 'sender' field also.
Blocks: advocacybugs
Changing Summary according to recent comments.
Blocks: 176238
No longer blocks: advocacybugs
Summary: Codepage used for message displaying must be applied for Subject also → charset=... must be applied for Subject:/From:/To:/etc. fields
*** Bug 97314 has been marked as a duplicate of this bug. ***
Summary: charset=... must be applied for Subject:/From:/To:/etc. fields → charset=... must be applied to non-MIME Subject:/From:/To:/etc. fields
IMHO the following sequence should be used:

1) if charset of subject is specified (like 'Subject:
=?KOI8-R?Q?=F0=D2=CF=C2=CC=C5=CD=D9_=D3_=D0=CF=DE=D4=CF=CA?='), this charset
should be used to show subject;

2) if charset of subject is not specified, charset of the message (specified in
'Content-Type' header) should be used to show subject;

3) if none is specified, defaults/autodetectiion should be used.
Is this actually the same as bug 196734? Here is about news, but symptomps are
the same...
*** Bug 196734 has been marked as a duplicate of this bug. ***
I'm pretty sure that there is no distinction between mail and news in respect to
this bug.
CC'ing from duped bug.
Flags: blocking1.4b?
Flags: blocking1.4b? → blocking1.4b-
*** Bug 154948 has been marked as a duplicate of this bug. ***
re comment #18,

1) and 3) work in Mozilla. The issue is #2. I also listed in the international
known issues part  (of the release notes)
Jungshik, I didn't find any references to this bug in Mozilla 1.5 release notes
http://www.mozilla.org/releases/mozilla1.5/known-issues-int.html
Am I looking at a wrong place?
IT's in 'Mail' section as quoted below:

> The Character Coding menu cannot correct the thread pane display of non-MIME
encoded headers. (See > RFC 2047.) If the message displayed lacks MIME charset
information, the thread pane display obeys > the default message display charset
set via the folder properties dialog (set via the menu "Edit | > Folder
Properties ..."). Correctly setting this option to the charset most often used
in the 
> user's mail viewing helps avoid display problems of this type.

It's not as direct as you may like (my draft explicitly mentioned it if my
memory serves me right). While word-smithing, we left that detail out. For most
people, this wouldn't be a serious problem. However, I realize that for
Russians, it's rather important because multiple (legacy) character encodings
(Windows-1251, KOI8-R  and ISO-8859-5are used with more or less the same
frequency.  
Anyway, I'll refer to this bug in the document.
adding some more mailnews gurus to cc. This bug should be easier to fix than
making the header listing pane respond 'dynamically' to changes made via View |
Character Coding menu.
|Mime_DecodeMimeHeader| (that's a wrapper over
nsIMIMEHeaderParam::decodeRFC2047Header [1]) has a parameter for the default
charset.  


[1]
http://lxr.mozilla.org/seamonkey/source/netwerk/mime/src/nsMIMEHeaderParamImpl.cpp#360
*** Bug 228133 has been marked as a duplicate of this bug. ***
*** Bug 201037 has been marked as a duplicate of this bug. ***
From http://www.mozilla.org/releases/mozilla1.6b/known-issues-
int.html#mail: "If the message displayed lacks MIME charset information, the 
thread pane display obeys the default message display charset set via the 
folder properties dialog (set via the menu "Edit | Folder Properties ..."). 
Correctly setting this option to the charset most often used in the user's 
mail viewing helps avoid display problems of this type."

But the message NOT lacks MIME charset information! As I posted at the 
http://bugzilla.mozilla.org/show_bug.cgi?id=228133, there is charset in the 
header: "Content-type: text/plain; charset=windows-1251". And the message body 
displayed right. Thunderbird should use the _same_ charset for the headers. 
But it didn't :(

Setting the default charset of the folder to "most often used" not helps, 
because 55% of the headers have "koi8-r" charset, and 45% - "windows-1251". So 
we'll unable to read a half of the 8-bit headers. If Thunderbird will apply 
the rule "for 8bit headers use the same charset as for message body; and use 
default charset only if there no explicit charset in the message body or 
headers" - we will able to read 99.5% of the headers and messages w/o touching 
menu.

So, I agree with  Jungshik Shin 2003-10-23 16:09 :
"I realize that for Russians, it's rather important because multiple (legacy) 
character encodings (Windows-1251, KOI8-R  and ISO-8859-5 are used with more 
or less the same frequency." ISO-8859-5 is not used in the Russian/Cyrillic 
mail now, but Windows-1251 and KOI8-R used with the same frequency, and this 
is bad idea to select one of them as default. Most often charset is mentioned 
for a body, but just ignored by Mozilla/Thunderbird for a headers.

Any chance this problem will be solved in Thunderbird? This bug is the only 
reason why we still using Outlook.
Please, read RFC 2047 and RFC 2822. Content-Type header is NOT for specifying
the charset of messgeage header fields BUT ONLY for specifying the charset of
the message __body__.

To specify the charset of the messgae header, you should encode your message
header according to RFC 2047 as below:

Subject: =?KOI8-R?Q?........?=

8bit characters in RFC 822/2822 (STD 11) emails are __forbidden__ forever !!!
As it stands, Mozilla/Thunderbird ARE standard-compliant, but those sending
their emails with raw 8bit characters violate RFC 2822/RFC 2047. [1]

We want to 'fix' this bug not because those messages with 8bit characters are
standard-compliant but because we believe in a network maxim 'be lenient in what
you accept and be strict in what you send out'. 
Please, ask your correspondents to configure their email clients NOT to send out
8bit characters in email headers. Major email clients can be configured that way
if RFC 2047-compliance is not ON by default. With two widely used encodings,
Russians have all the more incensitve to abide by RFC 2047/2822. We'll "fix"
this bug, but in the meantime, why don't you spread the words to your fellow
Russians? 

[1] http://www.faqs.org/rfcs/rfc2047.html
    http://www.faqs.org/rfcs/rfc2822.html

> 8bit characters in RFC 822/2822 (STD 11) emails are __forbidden__ forever !!!

  Ooops. I meant '8bit characters in the __header__ of RFC 822/2822(STD 11)
email messages are forbidden'. In the message _body_, needless to say,  8bit
characters can be used with Content-Transfer-Encoding: 8bit when ESMTP 8BITMIME
is supported.

As I'm now sitting in front of Windows 2k box, let me tell you how to configure
MS OE to send RFC 2047-compliant emails:

1. Go to Tools | Options. 
2. Click on 'Send' tab. 
3. Click on 'HTML settings' and make sure 'allow 8-bit characters in headers' is
NOT checked
4. Repeat the above for 'Plain text setting' for both mail and news and 'html
setting' for news

I can't find the equivalent option in Outlook, but all the  messages I received
from Ms Outlook users  are compliant to RFC 2047 so that there must be a way to
do this (or it's not user-configurable, but it always sends RFC 2047-compliant
emails, which is a bit atypical of MS  given its usual ignorance/disrespect
of/for the standard)

In case of Mozilla, RFC 2047-compliance is ON by default. Perhaps because it's
regarded as important to observe RFC 2047 (or because it's regarded as a 'bad'
UI), the checkbox that used to be available in Edit|preference|Mail|Send for
turning that off is now gone. You have to go to 'about:config' and set to
'false' 'mail.strictly_mime-headers'. Of course, I recommend you keep the
default (ON).
Here is my penny. 

Personally, I'm not going to teach everybody in the net not send 8-bit headers.
But there a lot of such messages (including newsgroups), and I want Mozilla to
support them. Note that such support does NOT violate any RFC. We're not asking
about sending 8-bit headers that would be a violation. We're asking about
ability to read them.
Please, read my comment #32 (the first paragraph). You're free not to teach
everybody to abide by the standard, but that kind of disrespect for the standard
hurts the internet as a whole. 
This is not a bug per se, but just an enhancement request.
Severity: major → enhancement
Sorry for spamming...

> Please, read my comment #32 (the first paragraph)

 I meant the second paragraph of comment #32 (we believe in a network maxim 'be
lenient in what
you accept and be strict in what you send out)
Dear Jungshik Shin,
Mozilla browser is also standard compliant, BUT you've spend a lot of time to 
support most of non-startdart legacy html tricks, bugs, old encodings 
(including even Cyrillic DOS-encoding, not used in the internet at all ;) - 
just because you WANT TO READ (be able to read at least) all these old sites. 
And you don't want to teach each site author - "hey, you have wrong formatted 
html - there and there...". And we also unable re-teach 10 millions of russian 
internet users and 100 000 old mail scripts around the world. These scripts 
know nothing about RFC2047 and "default Cyrillic encoding". For example, now I 
will write something in Russian ('Ìàêñ, ãëÿäè-êà, îíè òóò ïðîñòî ðóññêèõ íå 
ëþáÿò :)') - and I'm sure, your modern Bugzilla will send me this mail with 
wrong encoding! :-)) Hey guys, please fix your Bugzilla scrpits now! :) They 
are not compliant with MIME RFCs!
May be you should better just fix this small issue in the Thunderbird, and it 
will become really "The e-mail and newsgroup client for 2004 and beyond" - for 
Russians too.

Thanks, 
Regards,
Hey, how many times did I have to repeat? Don't you understand what I
meant by that we believe in a maxim that 'be lenient in what we accept
and be strict in what you send out'(comment #32) [1] or did you decide
to ignore that?  Please, press 'show votes for this bugs' button and
see who voted for this bug. I had voted for this bug before most of
you did. See also who wrote 'international known issues' document(press
'Document history' button at the bottom of the page). I co-authored it
with Kats and it's I who insisted that this issue had to be listed.

[1] This doesn't mean that you can keep violating the standard. Some
people believe that we must be strict in what you accept as well because
otherwise non-standard products would never be fixed. I'm not one of
them (at least in this case). I AM willing to fix this bug. If not,
I would have resolved this bug as 'INVALID or WONT FIX'.

You have to/must 'enlighten' people around you about the standard and
the importance of adhering to the standard. Your time would be much
better spent teaching one more friend/correspondent of yours and/or
authors/administrators of web mail scripts/web mail services about the
standard than making me keep saying the same thing over and over again.
A couple of years ago, I wrote a couple of times to the largest Korean
web mail service provider and was successful in persuading them to fix
their web mail service to be compliant to RFC 2047. Please, do something
instead of just saying that there are too many non-standard email messages
going around.

Incidentally, as opposed to what you wrote, Mozilla rejects quite a lot of
non-standard constructs (in html/css/dom) as invalid/illegal/non-standard.
That's why  we have 'tech-evangelism' components to teach people (web
page authors and web site administrators) what's wrong with their web
pages/sites and how to fix them.
Attached patch a possible patch (obsolete) — Splinter Review
have yet to build a mozilla with this (my Win2k SP4 became terribly slow in
accessing my Samba server where Mozilla source is after installing SP4...)
Attached patch update (obsolete) — Splinter Review
It still doesn't work. In the message display pane, it works, but in the
message list(thread) pane, it doesn't. I thgouth 'charset' stored in msgDB is
the body charset (for simple 'single part' messages), but it's just empty.
Attachment #137356 - Attachment is obsolete: true
*** Bug 73202 has been marked as a duplicate of this bug. ***
It doesn't work (msgDB doesn't have the 'body charset' information stored)
because when we get to nsParseMailMessageState::ParseHeaders, |m_headers| has
only a part of the message header[1]. Especially, it doesn't have 'Content-Type'
header so that down in the method, the attempt to set |m_content_type| always
fails (actually, Mozilla never gets there [2]).

857     case 'C': case 'c':
858       if (!nsCRT::strncasecmp ("CC", buf, end - buf))
859         header = GetNextHeaderInAggregate(m_ccList);
860       else if (!nsCRT::strncasecmp ("Content-Type", buf, end - buf))
861         header = &m_content_type;
862       break;


I have to track down why m_headers.GetBuffeerPos() doesn't point to the end of
the header (excluding Content-Type and Content-Transfer-Encoding header fields)

837 int nsParseMailMessageState::ParseHeaders ()
838 {
839   char *buf = m_headers.GetBuffer();
840   char *buf_end = buf + m_headers.GetBufferPos();
841   while (buf < buf_end)
842   {

[1] http://lxr.mozilla.org/mozilla/source/mailnews/local/src/nsParseMailbox.cpp#839
[2] http://lxr.mozilla.org/mozilla/source/mailnews/local/src/nsParseMailbox.cpp#860

Assiging to myself for a better tracking (for me)
Assignee: nhottanscp → jshin
Status: ASSIGNED → NEW
btw, if charset= is not specified but autodetected for the body, it should be
applied to non-MIMEd header fields as well.

Jungshik, does your patch work this way?
Thank you very much for patch! How to test it? (I'm unable to recompile 
Mozilla Thunderbird under Windows).
bienvenu and mscott,
can you help me with the problem described in comment #43?

re: comment #44. No, my patch doesn't address that. It's a different issue. And,
please 'enlighten' your friends to use standard-compliant mail clients. Not even
specifying the body charset (in C-T) is a gross violation of the standard. If
some Russian web mail services do that, file a tech-evangelism bug, put me in
the CC list and I'll write to them. If you're talking about
'multipart/alternative' and other C-T with text/* part(s), we have to deal with
it in a separate bug. Again, this shows why RFC 2047 has to be abided by and why
RFC 2047 stipulates that the character encoding be explicitly specified in each
header field. If everybody observed the standard, we'd dispense with so much
'defensive' stuffs like this. 

re: comment #45: sorry I don't have time to put up a binary build with the patch
for you to try. 
The folder default charset is KOI8-R. Both messages in the folder have raw 8bit
characters in Subject header (the first one in KOI8-R and the second one in
Windows-1251). As you can see, the message list pane 'interprets' the subject
of the second as in KOI8-R while the subject in the message display area is
interpreted as in Windows-1251 (that of the body charset). 

I manually edited my mailbox to put 'Content-Type' header field before
'Subject' header field, but that didn't change anything.
any chance to get this fixed in 1.7b ?
This is just a comment about the difficulty factor in adding this new behavior.
This is a nice-to-have addition but it may not be easy since the info you need
at the time you need it in the code may not be easy to get. The body thread and
header thread go in 2 different threads within the Mozilla code. See this
diagram for better understanding of the data flow:

http://wp.netscape.com/eng/intl/docs/iuc17/mail/slide05.html

and hope that Jungshik and others can figure out how to get this done
eventually. Thanks for trying it, Jungshik. 

Those who have any influence at all in the Mail and News fields of Russian,
please evangelize the use of MIME-compiant news posting software -- whether or
not this feature request gets done.
> The body thread and header thread go in 2 different threads within the 
Mozilla code.

But the required charset (for Subject) is present in the header! Not need to 
hack a body thread to fetch something.

This rule is easy: if the Subject contains 8-bit characters with unknown 
charset - get and use the charset from the 'charset=' parameter in 
the 'Content-Type' field of the message _header_. That's all !

If 'Content-Type:...charset=' in the header is missimg too - then use the 
Default charset.

> Those who have any influence at all in the Mail and News fields of Russian,
please evangelize the use of MIME-compiant news posting software -- whether or
not this feature request gets done.

You mean, we need to ask you to disable existing feature of posting messages 
with 8-bit headers in Mozilla?
(In reply to comment #50)

> This rule is easy: if the Subject contains 8-bit characters with unknown 
> charset - get and use the charset from the 'charset=' parameter in 
> the 'Content-Type' field of the message _header_. That's all !

 Everybody here knows what this bug is about. You don't have to explain it again.
However, it's not that simple to implement that. Header fields in RFC (2)822 /
STD11 messages come in any order so that Content-Type heeader field come at the
very end while Subject, From come well before that. That's yet another rationale
for RFC 2047-encoding.I'm pretty sure that the authors of RFC 2047 had this in
mind because making each header field self-contained makes it the life of
implementors (e.g. Mozilla developers) so much easier.

 Anyway, what's needed here is a sort of 'callback' we can invoke to change the
header character encoding when necessary. 



> > Those who have any influence at all in the Mail and News fields of Russian,
> please evangelize the use of MIME-compiant news posting software -- whether or
> not this feature request gets done.
> 
> You mean, we need to ask you to disable existing feature of posting messages 
> with 8-bit headers in Mozilla?

It's disabled by default, isn't it? As I wrote a couple of times before, you
Russians have a more compelling reasons to abide by RFC 2047 than others because
you have two widely used character encodings.  So, why don't you evagelize that
as Kat suggested? 
 
Can anyone give an update on the status of this bug?
Product: MailNews → Core
*** Bug 273196 has been marked as a duplicate of this bug. ***
I am using Thunderbird 1.5/20051111, and it seems like I have found a workaround for this bug. I am not sure if it will work with later versions, however I think it may be useful.

See demo image:
http://kitoy.ru/ak/thunderbird/thunderbird-broken-header-encoding-for-folders-workaround.gif

The workaround:
Set windows-1251 as a default encoding for Inbox.
Create a new subfolder named koi and create a filter to move all koi-encoded messages into it. Set koi8-r as a default encoding for the koi folder.
Create a view (Inbox-test Mail in the picture) which shows all messages from Inbox and koi folders. 
Done.

However, there are two another bugs:
1.	“Sort by” do not work right for views.
2.	“Get flagged messages” do not work right for views.

Sorry for my English.
bug 351541 may be a dupe of this
*** Bug 214510 has been marked as a duplicate of this bug. ***
I do still experience this bug in 3.0 nightlies. Any change to fix this for 3.0?

There is no way for nominating bugs for 3.0 yet, but this affects many languages and is defenetly worth be nominated for 3.0.
Why waiting for Tb 3.0? We didn't release Tb2.0b1 yet, I see no reason to wait for 3.0.
Flags: blocking-thunderbird2?
I like to fix this bug for TB 2.0, but I'm entirely tied up until the end of December. If someone can come forward and fix this before TB 2.0, it'd be great. If it's not fixed by then, I'll take another look early next year. 


Assignee: jshin1987 → smontagu
QA Contact: ji → mailnews.i18n
*** Bug 351541 has been marked as a duplicate of this bug. ***
Looks like we didn't get any traction on this and time is starting to run out to make Thunderbird 2.
Flags: blocking-thunderbird2? → blocking-thunderbird2-
Attached patch Charset patch (obsolete) — Splinter Review
Message charset should be used to display subject, sender and recipient correctly if specified. Otherwise default folder charset will be used.
(In reply to comment #64)
> Created an attachment (id=290131) [details]
> Charset patch
Alex, you should request review for patch from specific person. See
http://developer.mozilla.org/en/docs/Getting_your_patch_in_the_tree#Getting_the_patch_reviewed
Attachment #290131 - Flags: review?
Attachment #290131 - Flags: review? → review?(smontagu)
Attached patch Charset patch #try1 (obsolete) — Splinter Review
Forgot to take into account "Charset Override", if the user changes charset manually.
Attachment #290131 - Attachment is obsolete: true
Attachment #290131 - Flags: review?(smontagu)
Attachment #290213 - Flags: review?
Attachment #290213 - Flags: review? → review?(bienvenu)
Assignee: smontagu → lelik52
Jungshik, are you still tracking this, and does this look like what you had in mind?
Attached patch Charset patch #2Splinter Review
Synced with the latest cvs tree. Added new message alert charset fix.

base/util/nsMsgDBFolder.cpp - New message alert fix. Detect charset correctly if Content-Type header is multiline.

db/msgdb/src/nsMsgDatabase.cpp - Message list pane fix.
- mime/src/mimemsg.cpp - Message pane fix.
Use message header charset to display header fields in message panes.
Attachment #290213 - Attachment is obsolete: true
Attachment #290213 - Flags: review?(bienvenu)
Attachment #300087 - Flags: review+
Attachment #300087 - Flags: review+ → review?
Attachment #300087 - Flags: review? → review?(bienvenu)
Comment on attachment 300087 [details] [diff] [review]
Charset patch #2

thx for the patch, Alex
Attachment #300087 - Flags: superreview+
Attachment #300087 - Flags: review?(bienvenu)
Attachment #300087 - Flags: review+
Keywords: checkin-needed
Checking in mailnews/base/util/nsMsgDBFolder.cpp;
/cvsroot/mozilla/mailnews/base/util/nsMsgDBFolder.cpp,v  <--  nsMsgDBFolder.cpp
new revision: 1.328; previous revision: 1.327
done
Checking in mailnews/db/msgdb/src/nsMsgDatabase.cpp;
/cvsroot/mozilla/mailnews/db/msgdb/src/nsMsgDatabase.cpp,v  <--  nsMsgDatabase.cpp
new revision: 1.390; previous revision: 1.389
done
Checking in mailnews/mime/src/mimemsg.cpp;
/cvsroot/mozilla/mailnews/mime/src/mimemsg.cpp,v  <--  mimemsg.cpp
new revision: 1.60; previous revision: 1.59
done
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta4
Attachment #137375 - Attachment is obsolete: true
Would be good to apply this patch to Thunderbird 2.0 (MOZILLA_1_8_BRANCH) too.
Attachment #301853 - Flags: review?(bienvenu)
Comment on attachment 301853 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH

I think you're leaking strings here:

+      err = RowCellColumnToCharPtr(row, m_messageCharSetColumnToken, &charSet);
+      if (NS_FAILED(err) || !*charSet || !PL_strcasecmp(charSet, "us-ascii") ||
+          characterSetOverride)
+      {
+        m_dbFolderInfo->GetCharPtrCharacterSet(&charSet);
+      }
+

In RowCellColumntoCharPtr allocates memory for the out string, which you're not freeing, and you're overwriting. And GetCharPtrCharacterSet also allocates memory.
Attachment #301853 - Flags: review?(bienvenu) → review-
Sorry, messed up things with svn HEAD in previous patch, added freeing code.
But actually there were 'charSet' out string leaks in nsMsgDatabase::RowCellColumnToMime2DecodedString and nsMsgDatabase::RowCellColumnToCollationKey which didn't free allocated string. Fixed too.
Attachment #301853 - Attachment is obsolete: true
Attachment #302087 - Flags: superreview?
Attachment #302087 - Flags: review?
Attachment #302087 - Flags: superreview?(bienvenu)
Attachment #302087 - Flags: superreview?
Attachment #302087 - Flags: review?(bienvenu)
Attachment #302087 - Flags: review?
Comment on attachment 302087 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #2

Can you use PR_Free instead of PR_FREEIF? The difference is that PR_FREEIF nulls out charset, which you don't need (they both check for null args, ultimately).

Thx!
Changed patch to use PR_Free instead of PR_FREEIF. Thank for the hints.
Attachment #302087 - Attachment is obsolete: true
Attachment #302087 - Flags: superreview?(bienvenu)
Attachment #302087 - Flags: review?(bienvenu)
Attachment #302128 - Flags: superreview?
Attachment #302128 - Flags: review?
Attachment #302128 - Flags: superreview?(bienvenu)
Attachment #302128 - Flags: superreview?
Attachment #302128 - Flags: review?(bienvenu)
Attachment #302128 - Flags: review?
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

thx for the patch!

Nominating for 1.8.1.13, though it should get some trunk baking.
Attachment #302128 - Flags: superreview?(bienvenu)
Attachment #302128 - Flags: superreview+
Attachment #302128 - Flags: review?(bienvenu)
Attachment #302128 - Flags: review+
Attachment #302128 - Flags: approval1.8.1.13?
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

clearing branch approval request for now (not denying) -- please re-request after a week or two of trunk testing.
Attachment #302128 - Flags: approval1.8.1.13?
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

That patch was baked on trunk for month with no visible regressions.
It fixes very old bug.
Attachment #302128 - Flags: approval1.8.1.13?
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

It's too late in the cycle to take this now. We'll look at it for 1.8.1.14.
Attachment #302128 - Flags: approval1.8.1.14?
Attachment #302128 - Flags: approval1.8.1.13?
Attachment #302128 - Flags: approval1.8.1.13-
FWIW, I've tested this on my batch of pathologically-encoded emails and it seems to work sanely in all cases.  I do have one bad message which looks worse with this patch than without, but that's because the header was hardcoded with ISO-8859-1 while the charset was ISO-2022-JP.  Nice work!
Comment on attachment 302128 [details] [diff] [review]
Charset patch backport for MOZILLA_1_8_BRANCH #3

approved for 1.8.1.14, a=dveditz for release-drivers
Attachment #302128 - Flags: approval1.8.1.14? → approval1.8.1.14+
Whiteboard: [checkin needed (1.8 branch)]
MOZILLA_1_8_BRANCH:

Checking in mailnews/base/util/nsMsgDBFolder.cpp;
/cvsroot/mozilla/mailnews/base/util/nsMsgDBFolder.cpp,v  <--  nsMsgDBFolder.cpp
new revision: 1.257.2.27; previous revision: 1.257.2.26
done
Checking in mailnews/db/msgdb/src/nsMsgDatabase.cpp;
/cvsroot/mozilla/mailnews/db/msgdb/src/nsMsgDatabase.cpp,v  <--  nsMsgDatabase.cpp
new revision: 1.355.4.15; previous revision: 1.355.4.14
done
Checking in mailnews/mime/src/mimemsg.cpp;
/cvsroot/mozilla/mailnews/mime/src/mimemsg.cpp,v  <--  mimemsg.cpp
new revision: 1.55.28.1; previous revision: 1.55
done
Whiteboard: [checkin needed (1.8 branch)]
This appears to have caused a regression - bug 425945
Depends on: 425945
This patch has corrected the display of some message subjects and broken the display of others:

https://bugzilla.mozilla.org/show_bug.cgi?id=425945

I don't see why the message body's charset 'must' be applied to the headers (and I couldn't find an explanation skimming through the comments here) - please explain the reason.
In a few words message charset must be applied to the header fields because that's the only way to get messages with different charsets be displayed correctly in the one folder.
(In reply to comment #87)
But that's not true, the header can specify it's own encoding, e.g.

Subject: Message 4 =?windows-1255?Q?=E3=E1_=F9=F1_=E1=EE=F8=F7=E7?=
 =?windows-1255?Q?=E4=5D?=

shows up as

Message 4 דב שס במרקחה]

with or without this patch, regardless of the charset of any of the mime parts.

Also, applying the body charset to the header doesn't make messages with different charset display correctly, it makes _some_ such messages display correctly while breaking the display of others...

I therefore ask that this bug be reopened pending further discussion.
Right. The message body's charset should be applied to headers ONLY if encoding isn't specified in the header encoding itself. I.e. only for "raw" headers, not encoded with base64 or QP.
I think that not even then. Since it's perfectly possible to specify a different charset for a header and for MIME parts, I see no good reason to assume raw non-ASCII headers should be interpreted using some MIME part's charset rather than the folder's default. If you were to make it a _preference_, that's something else, but that's not what you've done.

Alex: If you have some unruly non-standards-compliant messages you want to manipulate, write an extension or something, don't change the default behavior. Or how about this - why not create a folder with a different default charset and use a filter to put your messages there?
(In reply to comment #90)
> I think that not even then. Since it's perfectly possible to specify a
> different charset for a header and for MIME parts, I see no good reason to
> assume raw non-ASCII headers should be interpreted using some MIME part's
> charset rather than the folder's default.

This doesn't make sense either. This means the same email would be displayed differently depending on *your* default folder rather than making the assumption it's encoded using the same charset as the body of the email.

I get emails with different encodings quite often, and that's probably the best assumption the email client can do, IMHO. Filtering is a bad workaround, really.
(In reply to comment #91)
> This doesn't make sense either. This means the same email would be displayed
> differently depending on *your* default folder rather than making the
> assumption it's encoded using the same charset as the body of the email.

But that's exactly what per-folder default charsets are for.

> I get emails with different encodings quite often, and that's probably the 
> best assumption the email client can do, IMHO.

But _I_ only get emails which this change screws up, so this change seems to me like a much worse assumption for the email client make. And the reason I said it's _much_ worse is that at least you have the 'bad workaround' of filtering;
I don't don't _any_ workaround now. A workaround isn't that bad if people can use it without causing other people trouble.

If you need this change this bad, make it a preffed-off alternative.
I'm sorry, I spoke too soon. I do have an alternative, I can use the charset 'forcing' mechanism.

So I'm changing my request: Make the behavior change preffed-on.
Eyal, default charset don't work for about half of messages.

This patch will solve all encoding problems, if it applied only for raw headers with unknown encoding.

Eyal> "Since it's perfectly possible to specify a different charset for a header and for MIME parts,..."

It is possible, but if different charsets are actually used in different MIME-parts - its encodings always specified in these parts (in other case nobody will be able to read it), and in this case Thunderbird should use them. Our talks are about some old programs - most often web scripts - which correctly encode body, but forget to encode Subject (because Outlook Express applied body's encoding to headers and all non-Mozilla readers can read it!!). With this patch Thunderbird will 'emulate' Outlook Express / Windows Mail behavior, and all will be happy.
(In reply to comment #88)
> (In reply to comment #87)
> But that's not true, the header can specify it's own encoding, e.g.
> 
> Subject: Message 4 =?windows-1255?Q?=E3=E1_=F9=F1_=E1=EE=F8=F7=E7?=
>  =?windows-1255?Q?=E4=5D?=
> 
> shows up as
> 
> Message 4 דב שס במרקחה]

Eyal, please read carefully. ...'must be applied to non-MIME fields'...
You're talking about MIME encoded subject. Fields like in your example have priority and will be displayed correctly anyway.
(In reply to comment #88)
> Also, applying the body charset to the header doesn't make messages with
> different charset display correctly, it makes _some_ such messages display
> correctly while breaking the display of others...

It improves display of messages with _known_ charset (koi8-r, windows-1251, etc
etc).  'iso-8859-I' isn't known, perhaps it should be made an alias to
'iso-8859-8-I'.

(In reply to comment #90)
> Alex: If you have some unruly non-standards-compliant messages you want to
> manipulate, write an extension or something, don't change the default behavior.

First, there are just too many of these 'unruly' messages out there.  Yeah,
they do not comply to RFC, but it's not MUA's job to enforce these -- its job
is to let users read their mail in their language.  The user won't go seeking
the magic extension, she'll just won't use the software -- especially when the
well-known alternative (MS Outlook) handles this just fine.

> Or how about this - why not create a folder with a different default charset
> and use a filter to put your messages there?

For each charset that you may receive mail in?  Ridiculous.
(In reply to comment #89)
> Right. The message body's charset should be applied to headers ONLY if encoding
> isn't specified in the header encoding itself. I.e. only for "raw" headers, not
> encoded with base64 or QP.

Wait a second - you are applying some random MIME part charset to perfectly valid ASCII header? This surely will fail terribly if the applied charset is not "ASCII-synced" like eg. Unicode is, for instance if the MIME part encoding is EBCDIC...
(In reply to comment #96)
> 'iso-8859-I' isn't known, perhaps it should be made an alias to
> 'iso-8859-8-I'.

I did wonder about that when reading earlier comments. Do we have any idea how common it is and whether it only occurs when iso-8859-8-I is intended? In theory it might also be an error for iso-8859-6-I.

If it only occurs in mails from a particular forum, it's presumably a configuration error, and would be better handled by reporting to the forum admins.
Karsten> Wait a second - you are applying some random MIME part charset to perfectly valid ASCII header?

No. ASCII is really perfectly valid and shouldn't be touched by this patch. Raw headers (Cyrillic, Unicode, etc) has non-ASCII 8-bit chars.

if(Is8bit(header))apply_patch();
I'd like to add a piece of info for those who aren't aware: Many MIME part body
(as opposed to header) charset issues, and practically all such issues relevant
to email in Hebrew and Arabic, are handled by an extension I co-wrote, which
does further auto-detection work over Mozilla's, and also takes care of
mixed-charset cases:

https://addons.mozilla.org/en-US/thunderbird/addon/310

so for users like me, the charset defaults for folders effectively 'strongly'
apply only to the subjects and not anything else...

Sergey Svishchev: If this is so much of a problem that you simply _must_ change
the default behavior, then you have to change it so as to accommodate what
currently works well and not break it. How? Do some auto-detection on the
subject line before deciding which charset to use for it; at least don't give
me subject lines which are all \uFFFD's are or all àéùé's (accented latin
letters) etc. Or think of something else. You can't just break some
non-RFC-compliant messages in favor of others (which, by the way, Outlook also
handles fine, and after your change you lose Outlook parity on).
Simon> If it only occurs in mails from a particular forum, it's presumably a
configuration error, and would be better handled by reporting to the forum
admins.

No. There are millions of such forums, CMS, etc-etc (including used on your site). For example, ALL PhpBB forums. It is impossible to ask all existend admins to fix their scripts.

Thunderbird users just wants to read their mail, not to fix entire Internet bugs.
(In reply to comment #101)
> For example, ALL PhpBB forums. 

Hey, that's my example too :-) ... your change is messing up my messages from phpBB fora. Except mine have both the body charset wrong and the header charset unspecified.
(In reply to comment #100)
> I'd like to add a piece of info for those who aren't aware: Many MIME part body
> (as opposed to header) charset issues, and practically all such issues relevant
> to email in Hebrew and Arabic, are handled by an extension I co-wrote, which
> does further auto-detection work over Mozilla's, and also takes care of
> mixed-charset cases:
> 
> https://addons.mozilla.org/en-US/thunderbird/addon/310
> 
> so for users like me, the charset defaults for folders effectively 'strongly'
> apply only to the subjects and not anything else...
> 
Your extension managed to fix subject lines in the message header display, but doesn't managed to fix them in list of messages in a folder, as you've said in Bug 425945 comment #0
So my guess, that either your extension should be fixed to take care of subject lines in the message list, or additional patch needed to display subject lines in message header and message list consistenly.  

(In reply to comment #96)
> 'iso-8859-I' isn't known, perhaps it should be made an alias to
> 'iso-8859-8-I'.
I just tested it. I've added attachment from Bug 425945 comment #7 to Thunderbird Mail folder. Then I've opened %thunderbirdprogramfolder%/res/charsetalias.properties, found section:

# Aliases for ISO-8859-8-I
#
csiso88598i=ISO-8859-8-I
iso-8859-8i=ISO-8859-8-I

and added string:
iso-8859-i=ISO-8859-8-I

Then I've restarted Thunderbird and attachment from Bug 425945 comment #7 was shown properly.
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.8.1.14pre) Gecko/20080329 Thunderbird/2.0.0.14pre ID:2008032903
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: