Closed Bug 46881 Opened 24 years ago Closed 24 years ago

Can't recognize charset info in multipart/alternative mail --> corrupt reply mail

Categories

(MailNews Core :: MIME, defect, P3)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: mscott)

Details

(Whiteboard: [nsbeta2-][nsbeta3+])

Attachments

(3 files)

** Observed with 7/27/2000 Win32 build **

When you create Rich text mail (with bold characters, special styles, etc.) in Outlook Express 5
and sent it. The default seems to be multipart/alternative mail with the first part "text/plain"
and the 2nd part "text/html".
This is common in any language mail from Outlook Express.
The problem is that we don't recognize the charset info in the Content-type
header in these mail even if the charset info is there. Instead
we fall back to the viewing default.

I can send you a test message if you like.
This bug was separated from Bug 41706 where the problem was reported
first.
Summary: Can't recognize charset info in multipart/alternative mail from OE5 → Can't recognize charset info in multipart/alternative mail from OE5
Mozilla does not seem to have the same problem with multipart/alternative
msgs from Communicator, however.
I think this is sort of the same problem as the last one. Please attach an 
example and I'll see what I can do.

- rhp
Actually we have the same problem for recognizing multipart/alternative 
charset info with messages sent from Mozilla. Corrected the summary line.
Summary: Can't recognize charset info in multipart/alternative mail from OE5 → Can't recognize charset info in multipart/alternative mail
Status: NEW → ASSIGNED
Target Milestone: --- → M18
Rich, it turns out that this is a major problem.
There are so many messages out there which send out both plain text and
HTML portions in mail. I re-tested msgs sent from 4.x and it turns out we have
a problem with any messages which contain multi-part/alternative.

This means that if the default viewing charset does not match 
the received in this type of mail, we are hosed for reply and forward/inline.

In fact I just confirmed that normal mail Communicator sends out often
uses plain text and HTML both in one mail resulting in this MIME
structure. In these messages, if we use "reply", all the quoted materials
are garbage unless the default viewing charset matches.
This is a big regression. 
I'm nominating this for nsbeta2. There are too many multipart/alternative messages
out there for us to ignore this problem.
Keywords: nsbeta2
QA Contact: lchiang → momoi
Severity: normal → critical
CC'ing mscott.

Scott and Naoki, is this not related to one of the following bugs?

Bug 28869 --  Feedback of the current charset used by libmime  (nhotta)
Bug 44635 -- Charset Reflection not working for Reply/Forward-inline

I'm thinking that this could be more a bug for one of you rather than Rich. If so, I can re-open
one the bugs above.
Summary: Can't recognize charset info in multipart/alternative mail → Can't recognize charset info in multipart/alternative mail --> corrupt reply mail
Here are the reasons I can think of which argue for why we cannot put out Beta 2
without this fix:

1. We localize into a few very languages -- French, German & Japanese only.
2. There are a large number of users who use English build outside of Western languages.
3. For these people, reply mail will sometimes succeed but other times will show corrupt 
   text quoted. Most users will never figure out that it's multipart/alternative that fails.
   (They don't look in the mail source and even if they did, very few will ever figure
    what is wrong.)
4. In a lot of mailers, if you choose HTML mail, the default is often to append Plain text
   part making such a message multipart/alternative. In some mailers like Communicator,
   both plain and HTML parts sent out automatically often.

Thus, this problem is guaranteed to produce confusion among users.
The impact will be too big for us to ignore.
Oh, one more reason.

5. Once quoted corrupt, there is no way for us reload the composer and correct
   that corrupted text. You have to re-write it.
Kat,
This isn't happening for beta 2 unless someone else wants to pick it up. Naoki, 
can you handle this...I'm still working on address book stuff and I need to 
travel to Pennsylvania tomorrow.

Sorry, but I just don't have time to handle this one so late in the game.

- rhp
Thanks, Ruich. I'm going to appeal to others I know
who might be able to handle this.
I can try to help out since rich is traveling today. but I'm confused (as 
happens a lot).

1) Can someone please post me an easy example I can use as a test case to see 
the problem.

2) Is the problem with viewing the message or replying to the message? It wasn't 
clear to me which part had the problem.

3) did this ever work in 6.0 or are we thinking this was a regression caused by 
fixes to the bugs kat mentioned above?

4) Also, in this multi-part alternative is the idea that certain parts of the 
message are in different character sets? And we aren't honoring the different 
charsets? Or is it simpler than that and just a problem where the body has a 
charset specified in the content type field and we aren't looking at it and are 
instead using the default charset. 
Thanks, Scott. 

> 1) Can someone please post me an easy example I can use as a test 
>    case to see the problem.

I will post an example after this but you can easily make a
sample this way. With 4.7x or Mozilla, create a message
under any encoding and then choose the option to send
both Plain text and HTML mail. This cerates a multipart/alternative
structure with the chosen encoding placed in the header
charset line.

> 2) Is the problem with viewing the message or replying to the 
>   message? It wasn't clear to me which part had the problem.

Viewing messages does not seem to be problematical. But the charset
menu check is wrong at this point already. In other words, 
you send an ISO-8859-1 multipart alternative with some accented
characters in it, and you can view this. But when you look
at View | Charecter Encoding menu, you see a check mark against
the View Default encoding. So if your view default is
Japanese (ISO-2022-JP). You will see that checkmarked
no matter what the charset parameter of the Content-type
headers in the main body says.

> 3) did this ever work in 6.0 or are we thinking this was a 
>    regression caused by fixes to the bugs kat mentioned above?

I seem to recall that reflecting charset of the original
charset worked for this type of message at one point after 
Bug 28869 was fixed. But I am not 100% sure -- I will check
and report. One possibility is that the fix in Bug 28869
did not cover this type of structure well or missed it.

> 4) Also, in this multi-part alternative is the idea that certain parts of the 
>    message are in different character sets? And we aren't honoring the >    
> different charsets? 

I'm taling about a simple case of 2 charsets being the same. This type 
of messages is very common. 

> Or is it simpler than that and just a problem where the body has a 
> charset specified in the content type field and we aren't looking 
> at it and are instead using the default charset. 

As described above, this seems to be the case. In displayig original
message, display itself seems to be working though the check mark
is placed wrong. 
The problem is that when you reply to such a message, Mozilla
reflects the check marked charset and dumps the original 
content incorrectly into the composer window using that
wrong view default charset.
This is very easy to see with a Japanese message of this
tyape with the view default charset set to Western (ISO-8859-1)
for example.


1. Unarchive the above file and drop the resulting mail file
   into a Local mail folder.
2. Set the View Default pref to Western (ISO-8859-1)
3. Now display this msg in Messenger and confirm
    a) that it displays OK in Japanese
    b) Encoding check mark is placed against ISO-8859-1
4. Now hit the reply button. You will see that Japanese
   portion is not displaying as Japanese. Also the check mark
   is against the ISO-8859-1 instead of against ISO-2022-JP
   in eth View Character Coding menu.
5. There is no way to correct the quote unless you type it in
   yourself and then correct the View Character coding
   before sending it out.
You can compare the test message above with 2 other
JPN msgs contained in the Inlt Smoketest file.
The 1st one in the Smoketest folder is of singlepart
type and reply has no problem.
The 2nd one contains a JPN msg & an HTML attachment
making it a multipart/mixed. We don't seem to have
a problem in reply there, either.
Okay thanks for answering my questions and providing the example. I suspect it's
as simple as Naoki's charset pref (which is used to control the checkbox in the
menu and I also use that value when I quote the message body on reply) just
isn't getting set correctly in this case.

I bet if we tweak that this all falls into place. I'll poke around on it today. 
For multipart/alternative mail, MimeObjectChildIsMessageBody (in 
mimemult.cpp) returns false and not going into the code to set a charset to 
msgWindow. That's why we don't have a check mark.
Scott, do we always use a msgWindow charset for quoting?
Putting on [nsbeta2-] radar.  Not critical to beta2.  Adding "relnote" keyword 
for PR2 release. 
Keywords: relnote, relnote2
Whiteboard: [nsbeta2-]
Nominating for nsbeta3.
Keywords: nsbeta3
Per i18n/mail triage meeting, this bug is now marked as
[nsbeta3+]. 
The impact of this bug is too big to be ignored. 
multipart/alternative mail is very commonly sent by all
major mailers when HTML mail is chosen. To offset the possibility
that it such mail might inconvenience plain text mail readers,
additional plain text part is added and thus mutlipart/alternative
mail is created. 
We are not picking up on the charset info in such messages. 
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
Ok, looks like the problem here is that the charset that is associated with 
folder is being used OVER the charset on the email message. I have a patch that 
I will attach to this bug report. Not sure if this is the right thing to do, 
but it fixes to test case which tells us...well, nothing...so Naoki & mscott 
should let me know if this makes sense.

- rhp
The change disables charset override for quoted text (e.g. incorrect charset 
message cannot be quoted).
If we can get a charset for alternative case, we can set a checkmark correctly 
and message quote can use it. Currently, the checkmark code depends on 
MimeObjectChildIsMessageBody which does not work correctly for alternative 
mails.

Rich asked for help on this one so I'll take it from him. although i'll need
input on his knolwege of multipart/alternative mime code. 
re-assinging for real this time.
Assignee: rhp → mscott
Status: ASSIGNED → NEW
Attached patch proposed fixSplinter Review
Okay, here's the fix I came up with. I'm not sure if it's the right thing to do
but it does fix the problem. In the mime multipart code, we used to only set the
charset information if we only had one part (i.e. if we had a body part). in
multipart/alternative we always have at least two parts so this code never got
executed:

PRBool isBody = MimeObjectChildIsMessageBody(obj, &isAlternativeOrRelated);
if (isbody)
{
do our charset magic
}

I changed it so we enter this if loop if the part is the main body or if we the
part is a multipart/alternative part. If it is, then do the charset magic.

This has one big side effect:
1) The menu always reflects the LAST charset encountered by the
multipart/alternative parts. So if they differ (can they??) the menu will be out
of whack.

An alternative solution (and I don't know enough about how we process
multipart/alternative) is to set this charset information after MIME has
determeined WHICH of the multipart alternative's it's going to actually show.
Rich, do you know where that happens? I.e. if there's a text/plain part and a
text/html part, we only show one of those parts.

I'm not sure if that's necessary. I'd be willing to check this patch in and we
can try it out and see if it breaks anything major in which case it would
require more TLC....
Status: NEW → ASSIGNED
Currently, we always default to the HTML part of a text HTML message so there 
is no logic for this. The patch you proposed does look good to me for what its 
worth :-)

- rhp
thanks for the review Rich. I checked it in.

we'll see if it causes any problems. I couldn't find any. 
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
** Checked with 8/30/2000 Win32 build **

This is now fixed. It's working well with the above build. 
The only concern is point #1 expressed by mscott.
It is unlikely at present to encounter 2 different charsets in
multipart/alternative type msgs. But this may not be true in the
future where we may conceivably send alternative parts in
native charset and Unicode. What one would hope for in such a
case is to whatever charset reply/quote is using is applied to the
correct body part.
rhp's comments seem to point to that.

Marking the fix verified as fixed.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: