Closed Bug 270638 Opened 20 years ago Closed 13 years ago

Import kills 8-bit characters from subjects and addresses

Categories

(MailNews Core :: Import, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 686985

People

(Reporter: mozillabugzilla, Unassigned)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

When importing mails to TB from Outlook 2003, subject lines and addresses that
contain 8-bit characters are destroyed. These look 100% OK in Outlook, but only
garbage after imported in TB.
*I* *KNOW* that 8-bit characters shouldn't appear in the subject lines, nor in
the  address fields, but many programs allow them, and it helps a lot if they
are not destroyed in existing mails. They are also very important in
international mails.
PLEASE, HELP US TO GET OUR MAILS OUT OF THIS BLOODY PST FILE WITH AS LESS DAMAGE
AS POSSIBLE!

Reproducible: Always
Steps to Reproduce:
1. Import mails in TB from Outlook that contain 8-bit chars in the Subject:,
From:, To: lines etc.

Actual Results:  
8-bit chars contained there, are destroyed and replaced with garbage.

Expected Results:  
Try to preserve the 8-bit characters, as they are very important in
internationally-coded mails.
can you attach a sample message?
The "beforeImport.txt" file contains the "internet headers" of a short mail, as
they appear inside Outlook 2003. Notice the "Subject:" line.
The "afterImport.txt" file contains the same mail after import in TB. Notice
how the "Subject:" line has changed.
cc'ing Jshin for advice - I assume we should detect the 8-bit characters and fix
the subject to be mime-2 encoded correctly...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Yes, that would be the best, but figuring out the character encoding used in the
header could be difficult in some cases. 

(In reply to comment #4)
> Yes, that would be the best, but figuring out the character encoding used in the
> header could be difficult in some cases. 
> 
> 

You don't need to look far, though... only two lines below the Subject: line,
the correct encoding is displayed in its full glory... Of course, if you would
have to guess, it would be much more difficult.
(In reply to comment #5)

> You don't need to look far, though... only two lines below the Subject: line,
> the correct encoding is displayed in its full glory...

Gee,  that's our best **guess**(note that RFC (2)822/RFC 204[4-8] don't specify
the order of header fields), but it's not always correct(ok. 99% of cases,
that's right). More importantly,  'charset' is not present in the outermost 
header if C-T is not 'text/*' (e.g. 'multipart/mixed', 'multipart/alternative'),
in which case we have to look into the 'body' (one or more of subparts). 

Not a TB auto-migration bug -> Core:MailNews:Import
Assignee: mscott → nobody
Component: Migration → MailNews: Import
Product: Thunderbird → Core
Version: unspecified → Trunk
QA Contact: import
Product: Core → MailNews Core
Blocks: 157010
This is changed by the proposed patch for Bug 207156. However, this is not the solution. Instead of simply ruining the 8-bit characters, the code now tries to get the headers from Outlook in Unicode. Thus, it now relies on the Outlook to guess the headers encoding. I hope that this will solve majority of cases, but not all, and furthermore, it may introduce new mistakes in cases where it accidentally used to be ok.
patch was check into bug  686985
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: