Last Comment Bug 5938 - [FEATURE] Charset override is needed for mail (reminder)
: [FEATURE] Charset override is needed for mail (reminder)
Status: VERIFIED FIXED
[nsbeta2+][5/16]2 days
:
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows NT
: P3 normal (vote)
: M16
Assigned To: Scott MacGregor
: Katsuhiko Momoi
Mentors:
Depends on: 11965
Blocks: 35851 38645
  Show dependency treegraph
 
Reported: 1999-05-04 20:14 PDT by Katsuhiko Momoi
Modified: 2008-07-31 01:22 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
zipped mbox which contains three japanese messages for override test (689 bytes, application/octet-stream)
2000-05-16 14:23 PDT, nhottanscp
no flags Details

Description Katsuhiko Momoi 1999-05-04 20:14:43 PDT
** Observed with 5/3/99 Win32 M5 candidate build **

I'm filing this bug as a reminder of what we need to do
for a future Milestone.

Currently, if we paste in Shift_JIS text into the mail
body using 5.0, it goes out in Shift_JIS. Netscape servers
will generally Base64 encode such mail. Here's what the headers
look like:

...
Content-Type: text/plain; charset=iso-2022-jp
Content-transfer encoding: base 64
X-MIME-Autoconverted: from 8-bit to base64 by netscape.com id TAA29656

(Then the B64'ed text follows..)

...

4.6 can display this kind of mail text but 5.0 cannot. Under 5.0,
this kind of message shows as a blank text body.

So, we have to assume something like the following.

1. We don't have Base64 decoder working for mail text.
2. We don't have a way to override the wrong charset tag even though
   we have Base 64 decoder working.

Since the Base64-encoded attachment without charset label can
be displayed currently with 5.0, it's more likely #2 we are
looking at.

In any case, since nhotta seems to have a plan for the
mail charset override, I'm filing this bug as a reminder.

I also heard today from our Mozilla Slavic Mail lead, Pete Cassetta, that
many Cyrillic mail messages are mislabled as 'iso-8859-1' even today,
and that the charset override is a must in 5.0 Mail.
Comment 1 nhottanscp 1999-05-05 10:03:59 PDT
Regarding the first issue for base64, we do not apply base64 for the body. We
plan to do base64 for html attachment as 4.5 but attachment send is not
supported in M5. Also in case we allow sending Shift_JIS body, it will be sent
as 8 bit thus no base64 to be used.

The second issue, for sending it just send as the whatever the menu selection
(no override needed). About viewing, the plan is to use a pref to use (honour)
MIME label or not. If not using MIME label then the display will be controlled
by the charset menu.

I am accepting the bug for the viewing control by the pref. Also cc to
rhp@netscape.com as he implements this in libmime.
Comment 2 nhottanscp 1999-05-17 10:24:59 PDT
rhp@netscape.com checked in the changes last week. The following should be
available.
There is a new pref "mail.force_user_charset". If this is true then libmime does
not do charset conversion at all. The conversion is
controlled by the charset menu selection. If the flag is false then libmime will
apply the main body charset in case the message has
multi-part and no charset labels.

There are a couple of remaining issues.
1) A multi-part message with different charsets can only be seen as one charset
(i.e. may not show all the attachments correctly).
The problem should be resolved partially if we integrate auto charset detection
(could be filed as a separate bug).
2) Charset override only affects the body. Headers cannot be controlled by the
charset menu.
This requires libmime to get a charset through libnet. The current plan is to do
this with a new libnet integration in M7 or  later.
Comment 3 Katsuhiko Momoi 1999-05-23 18:13:59 PDT
** Checked with 5/22/99 Win32 build **

The current fix (using the pref option) to enable the Character
Set menu control is now working. In this sense, this bug has been
fixed.
However, there is more general issue of how the mail viewing
charset override should be done even when the charset-honoring
is turned on.

This part requires a spec and I'll provide one for discussion
soon. In this latter sense, we need to keep this bug alive.

Therefore I'm going to confirm that the current fix is working for
M6 and then move the remainder of issues to M7 or later. The ones
nhotta mentioned could be dealt with in M7. The general charset
override implementation could come later -- this also needs to be
coordinated with the proposal to do charset override in Browser (and
late in Editor).

If not all the issues are resolved at N7, then move this bug forward to
a later Milestone. Re-opening it with these conditions.
Comment 4 nhottanscp 1999-06-02 16:08:59 PDT
Charset override feature has been disabled since the META breakage in M6.
It is possible to restore the pre-breakage feature but I would like to
co-ordinate with browser to have the unified feature.
Moving to M8.
Comment 5 nhottanscp 1999-06-25 15:44:59 PDT
Moving to M10.
Comment 6 nhottanscp 1999-08-12 14:04:59 PDT
Dependency info:
There is a pending issue of passing (override) charset to libmime from the menu
(then webshell). The discussion was done once in mail-news mozilla newgroup
(6/29) but not resolved. It is currently discussed under libnet mozilla
newsgroup.
Comment 7 nhottanscp 1999-08-16 13:44:59 PDT
Added 11965 as a dependency (see my previous comment).
Comment 8 nhottanscp 1999-08-18 13:55:59 PDT
M15
Comment 9 nhottanscp 2000-01-06 14:32:59 PST
Adding mscott for the issue of charset passing from webshell to libmime.
Comment 10 nhottanscp 2000-02-22 13:35:49 PST
I am trying to summarize the remaining issues around this bug.

The override feature is needed when the mail contains incorrect charset label 
(e.g. us-ascii for Japanese), this is not unusual.
The current issue is to pass a charset from webshell to libmime.
nsMessenger::SetDocumentCharset get the charset through JS. That needs to be 
passed to libmime. I will reassign the bug to mscott for this to be implemented.

Additional info about override:
In libmime, there are already two fields defined and used for override 
(override_charset)  and default (default_charset).
See, mimetext.cpp MimeInlineText_rotate_convert_and_parse_line()
Those fields are currently not set thus not used.

The default charset is needed to read mails without charset specified. I will 
file a separate bug for this and assign to rhp. I think this can be done by 
defining a pref and set it to default_charset in libimime.

There is one more feature needed. When libmime decide a main body charset to 
use, the charset name to be feed backed to the user by putting a mark on the 
charset menu item. I will file a separate bug for that.
Comment 11 nhottanscp 2000-03-06 09:59:55 PST
Removing 7886 from depend.
Comment 12 lchiang 2000-04-04 15:40:50 PDT
bulk move to M16 per selmer.
Comment 13 leger 2000-05-08 15:14:17 PDT
Putting on [nsbeta2+][5/16] radar.  This is a feature MUST complete work by 
05/16 or we may pull this feature for PR2.
Comment 14 Scott MacGregor 2000-05-15 17:02:19 PDT
I have this feature implemented in my tree. I'll be checking it in when the tree
goes green today. If you select a a character set from the menu then we'll
reload the currently displayed message, passing in this new charset as the over-
ride charset into libmime.

By the way, I noticed that it takes a *noticeable* amount of time to bring up
and tear down the I18N charset menu. is that a known bug? It made me think that
something suspicious may be going on when we were building and dismissing this
menu.

Comment 15 Scott MacGregor 2000-05-15 17:25:21 PDT
I checked in my changes for this feature. Naoki came by my cube and we verified
that it appears to be working (at least with some simple cases). You can now use
the charset menu to force a charactet over ride for a particular message! *yeah*
Comment 16 cata 2000-05-15 21:05:15 PDT
Scott, you are right, we do something every time we build the charset menu: we 
execute a piece of JS that places the checkmark on the menu item representing 
the character set of the current document.

However, the performance impact should not be visible. I do not know yet why it 
is happening, I'll have to see if it's just the the fact that we are an RDF/XUL 
menu and then what's the hit of the extra stuff we are doing.

There is a performance bug filled on this: #29552. Please feel free to add your 
comments/observations/ideas there.
Comment 17 nhottanscp 2000-05-16 14:23:12 PDT
Created attachment 8746 [details]
zipped mbox which contains three japanese messages for override test
Comment 18 nhottanscp 2000-05-16 14:29:54 PDT
I did a simple test using today's win32 build 2000051609.
The second message in the attachment has a wrong charset label for Japanese 
(ISO-8859-1). It shows a garbage initially but after I changed the menu to 
ISO-2022-JP, it shows a correct Japanese text.
Then after I select the other message and come back to the second message, it 
shows the garbage again. This follows the spec (override status should not 
stick).
Comment 19 nhottanscp 2000-05-16 15:16:25 PDT
I tested more combinations. Override works for attachments, it also overrides 
auto-detection. 
One case the override does not work is when html attachment has a META charset 
tag. Do we also want to override this? We can change mimetext.cpp not to use 
META when override charset is set.
Comment 20 bobj 2000-05-16 19:00:32 PDT
This bug for the feature implementation is resolved.
Let's log the issue you mention as a separate bug where we can discuss if
it is valid or not.
Comment 21 Katsuhiko Momoi 2000-05-16 19:11:53 PDT
I see that there are some minor bugs associated with
eabling of this feature but these will be filed in 
separate bugs.

verified to be working on Windows with 5/16/2000 build.
Comment 22 nhottanscp 2000-05-18 09:57:01 PDT
I found a problem when the overridden message is quoted.
Filed a separate bug 39736 - charset override has no effect on quoting.
Comment 23 Katsuhiko Momoi 2000-05-31 00:52:03 PDT
** Checked with 5/30/2000 Win32, Linux and Mac builds **

This feature is working with the above build in the 
following types of cases:

1. When the MIME-charset info is absent for main body or displayble
   attachments.
2. When the MIME-charset info is erroneous for main body or multi-part
   body.

This does not currently override an erroneous meta-charset
tag in an attached document.
We should probably file this in a separate bug.
Marking it verified as fixed.

Note You need to log in before you can comment on or make changes to this bug.