Closed Bug 64119 Opened 24 years ago Closed 22 years ago

Import from Eudora-J: Ja chars in the mails of Out and Trash folder are not displayed correctly

Categories

(MailNews Core :: Internationalization, defect)

x86
Windows 95
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED WORKSFORME

People

(Reporter: ji, Assigned: cavin)

References

Details

(Keywords: intl)

Attachments

(4 files)

This bug is seperated from Bug 58897.

After import finishes, there is Eudora account created. In (jushin), Out
(soushin) and Trash (Gomibako) folder in addtion to some default folders are
created under this account. For the mails in Out and Trash folder, the Ja 
characters in the subject are displayed as question marks both in the thread 
pane and msg view pane. The Ja chars in the mail body are displayed as dots.
Reassign to tonyr@fbdesigns.com, cc to putterman.
Assignee: nhotta → tonyr
Keywords: intl
Mass change of QA contact to ji for the bugs filed by her.
QA Contact: momoi → ji
Scott, could you reassign if tonyr@fbdesigns.com is not available?
reassigning to cavin.
Assignee: tonyr → cavin
So, the problem does not happen to In(box), right? Or does it apply to all
folders? Just want to make sure because the summary only mentions Out and Trash
folders.

Also, can someone send me some Eudora folders with Japanese chars so that I can
import the msgs and debug the problem right away? You can just send me
everything under 'Eudora' directory in a zip file. I want to avoid installing
Eudora-J because I just visited http://www.eudora.ne.jp/ and I could not find
the download button as I can't read Japanese (I ran NJWIN viewer to show
Japanese in the browser).
OK, I can now read messages with Japanese chars on my NT machine. Somehow I was
asked to download and install 'MS Global IME for Japanese' from the browser when
I visited one of the Japanese page, and I was able to view Japanese chars after
the installation.

So my guess is that the message in question either has the wrong charset in the
Subject field or the encoding of the string is wrong. I have to see what the
message looks like first. Can anyone provide me with a message to reproduce the
problem?
I just attached a zip file which contains a set of Eudora4.3-J mailboxes.
To reproduce the problem, you need a Japanese system or Japanese enabled English 
 W2k system:
Unzip the file and copy over the mailboxes to your Eudora directory, then run 
6.1 to import mails from Eudora.
The mailboxes include default In, Out and Trash mailbox and another user created 
mailbox. All these mailbox names are in Japanese, that's why you need a Japanese 
system or Japanese enabled English W2K system.
Only mails in In mailbox can be imported correctly from Eudora-J. The mails 
in other mailboxes, like Out, Trash and user created mailbox, can't.
So is there any way to do this on an NT system or it has to be a w2k?  I went to
Regional Settings under Control Panel I didn't see Japanese on the list. How do
I install/add it to the list?

I also looked at messages in the test folders and the message subjects are not
encoded by Eudora at all. An example of this (encoded in base64) looks like the
following:

  Subject: =?iso-2022-jp?B?GyRCJFgkaCQmJDMkPRsoQg==?=

We do display correctly for the above subject data.  I'll have to see what the
import does when handling 8 bit data in the subject field.
It has to be Japanese NT or a Japanese enabled English w2k system.
To make English W2K enabled for Japanese, please do the following steps:
1. Go to Control Panel -> Regions
2. On Language tab (not sure about the name of the tab, it's the 1st one)
you can see a language list, check on Japanese
3. On the same window, click on Input Locale,
 add Japanese, then click on Apply, it will ask you for the installation 
CD. Reboot machine, you'll have a Japanese enabled Win2K. The win2k UI 
stays in English.

My development machine runs English NT, is there any easy way to make it a
Japanese NT?  If it's too muich trouble then I'll have to debug it on win2K.
It looks like you can access the testing folders on English NT, although the 
folder names are probably not displayed correctly. Since this bug is about the 
message display and we don't have problems access folders, I think you can debug 
on English NT for this particular bug. Naoki, what do you think?
I think the implementation is depending on OS file system charset (e.g. folder
name), so need JA system.
Hm, looks like nhotta is right as we do perform system specific conversion in
nsImportService::SystemStringToUnicode() where it uses platform specific charset
to convert the input string to unicode. On my English NT box the charset is
'windows-1252' and the conversion is wrong because I can never see the Japanese
chars displayed (I can view a correctly encoded Japanese subject on my machine).
The import code does encode the subject header in the following format though:

  Subject: =?iso-2022-jp?B?LCwsLGZlZlhmZw==?=

Here is the data conversion flow that I found from reading the code:

                charset = system's         charset = UTF-8  
  JP chars --> ConvertSysToUnicode() --> ConvertFromUnicode() 
                       ^                         ^                      
                  GetHeaderValue()           SetSubject()        
                   Import code              ComposeField code

Then the final step is to encode the subject (after doing all the above
conversions) by calling nsMsgI18NEncodeMimePartIIStr() with charset
'iso-2202-jp'.  I'll need to try it on a Japanese charset to see if the (final
step of) encoding is right or not. 
OK, finally get to work on it on a Japanese win2K machine and here is what I 
found.  All messages in the Trash and Out folders do not have the Content-Type: 
header which contains the charset info.  As such, the charset is defaulted 
"US-ASCII" when we encode the subject in function 
nsMsgI18NEncodeMimePartIIStr(). Since the charset is wrong we get the 
incorrect encoded subject header.  For testing purpose, I hard coded the charset  
to "Shift_JIS" (ie, platform's charset), import the same message again and it 
worked fine.  So I think the fix is to set the default message content charset 
to the platform's if it's missing from the message.

Another problem I found with import is that some of the Japanese chars (I guess 
those with high bit set) in the body are converted to dots and it's caused by 
the following function call:

 rv = nsMsgI18NConvertFromUnicode( theCharset, uniBody, body);

This needs to be fixed as well.  The call also causes a lot of assertion dialogs 
to be displayed.
The body problem I found was caused by the charset problem too. So if you pass 
the system/platform charset (when the msg does not specify one) to 
nsMsgI18NConvertFromUnicode() everything works fine and the assertion dialogs go 
away too. I'll submit the patch right after this.
Ccing sspitzer.
sr=mscott
Fix checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Checked with 08/10 trunk build. It seems that mails with attachments are not 
imported properly. After import, the attachment is displayed as a path. 
Screen shot to follow.
Attached image A scrrenshot.
It could be a separate issue and I need to look into it more. ji, do the Eudora
mailboxes you submitted earlier have the attachments I can test?  If not, please
attach one to the bug. Thanks.
The Eudora mailbox attached above contains two Ja mails. The first one is 
attached with a HTML file with filename in Japanese. The second one is attached 
with a txt file with Shift-JIS contents but the attachment filename is in 
English.
Filed seperate bug 95613 for the attachment issue, since the attachment problem 
affects all the mailboxes including In mailbox.
Marked this as verified.
Status: RESOLVED → VERIFIED
This is broken on 10/02 0.9.4 build. The import wizard can finish w/o any error.
But Japanese mails in Out and Trash folders are not imported correctly. The mail
body is displayed garbled after import. Ja mails in In folder look okay. Reopened.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Nominating for nsbeta1.
Keywords: nsbeta1
Status: REOPENED → ASSIGNED
Keywords: nsbeta1nsbeta1-
Renominating for nsbeta1 since it causes data corruption.
apanese Eudora users will come across this problem when switching to our client.
Keywords: nsbeta1-nsbeta1
marking nsbeta1- per mail triage.
Keywords: nsbeta1nsbeta1-
Blocks: 157010
Blocks: 157673
remove nsbeta1. please consider this as a adt2 nsbeta1+ for m1.2final release
Keywords: nsbeta1-nsbeta1
Naoki and I looked at this bug again yesterday. It seems that for the mails sent
and received on the same machine, charset info for mail body is removed after
import. It caused the display problem reported in this bug. But for the mails
sent and received on different machines, charset info is reserved.
Since usually users send and received mails using different accounts and on
different machines ,they won't see this problem.
Marked this one as wfm.  
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → WORKSFORME
Marked it as verified.
Status: RESOLVED → VERIFIED
Above comments is about mails in Trash folder. For the mails in Out folder, it
looks like Eudora-J itself removes the mail body charset info whenever it copies
the mail to Out folder, a "limitation" of Eudora-J.
Depends on: 180372
No longer blocks: 157673
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: