Closed Bug 686985 Opened 13 years ago Closed 13 years ago

Try to restore missing 8-bit header's charset at import from Outlook

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Thunderbird 9.0

People

(Reporter: mikekaganski, Unassigned)

References

Details

(Keywords: testcase)

Attachments

(3 files, 3 obsolete files)

Message demonstrating such a problem. 13 years ago Mike Kaganski 19.50 KB, text/plain		Details
The result of importing the message in previous file under Windows with Windows-1251 default codepage. 13 years ago Mike Kaganski 19.50 KB, text/plain		Details
Revert to requesting ASCII headers from Outlook; convert 8-bit headers from the message charset before normalization 13 years ago Mike Kaganski 29.61 KB, patch		Details \| Diff \| Splinter Review
Removed incorrect comment 13 years ago Mike Kaganski 30.45 KB, patch		Details \| Diff \| Splinter Review
Previous one was messed up with incorrect line endings 13 years ago Mike Kaganski 29.59 KB, patch		Details \| Diff \| Splinter Review
de-bitrotted patch 13 years ago David :Bienvenu 25.45 KB, patch	Bienvenu : review+	Details \| Diff \| Splinter Review

Mike Kaganski

Reporter

Description

•

13 years ago

Attached file Message demonstrating such a problem. — Details

When importing from Outlook, if a mail is not filly frc822/MIME-conformant in the sense that it uses 8-bit characters in its headers, and doesn't specify charset and encode characters, then the resulting imported message may contain these headers with garbled contents. Attached is a sample Outlook message file (MSG) that has its "From:", "To:", "Subject:" and "Disposition-Notification-To" headers, as well as its body, using KOI8-R charset. Note that all these headers have no encoding, and no charset indication. The "Content-Type:" header specifying the body charset is also missing. Current import method is able to guess the body charset correctly, from the information that Outlook provides. Thus, on import, the missing "Content-Type:" header is recreated with correct contents. Then, the code processing headers inside the message-creating API detects that the characters in some headers (namely "From:", "To:" and "Subject:") use 8-bit characters, and converts them from current charset to the charset specified in "Content-Type:", then encodes. This makes these headers unreadable. Note that the result of importing this message's headers will depend on what charset is default in your OS. The proposition is that the import code should detect this situation itself before passing headers to message-creating API, and if 8-bit characters in a header are detected, try to convert that header from "Content-Type"'s charset to UNICODE.

Mike Kaganski

Reporter

Comment 1

•

13 years ago

Attached file The result of importing the message in previous file under Windows with Windows-1251 default codepage. — Details

Mike Kaganski

Reporter

Updated

•

13 years ago

Component: Migration → Import

Product: Thunderbird → MailNews Core

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Assignee: nobody → mikekaganski

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Keywords: testcase

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Assignee: mikekaganski → nobody

Ludovic Hirlimann [:Usul]

Updated

•

13 years ago

Status: ASSIGNED → NEW

Mike Kaganski

Reporter

Comment 2

•

13 years ago

I need some advise. It may look dumb, but anyway, I'm not a rfc822 expert, so please don't be harsh. Here's my speculations. Please check if they are correct. Message header must be 7-bit ASCII text. If it needs to include an i18n'd text, this text must be encoded (quoted-printable adjusted for headers with charset specification), so it becomes 7-bit anyway. The correctly encoded header string may use a charset that is different from the other headers and the body. If a sending party sends a message which headers contain 8-bit characters in header, it always should be presumed that the charset of these characters use the same charset as the body of the message, isn't it? So, our procedure may look like this: 1. Get transport headers, that may include the body charset. 2. Get the body, and do some magic to decide which charset it uses (it includes using information from step 1, and in case it's absent, the data reported by Outlook, OS default charset etc.). 3. Convert all headers from the found charset to Unicode. This will not alter the valid 7-bit characters, and all 8-bit ones will hopefully be converted to their correct codepoints, and will afterwards be correctly processed by message creating code. This may be false, if there is a possibility that there may exist a header that uses 8-bit text AND explicitly specifies its charset. Is it possible? Thank you for help.

Mike Kaganski

Reporter

Comment 3

•

13 years ago

Attached patch Revert to requesting ASCII headers from Outlook; convert 8-bit headers from the message charset before normalization (obsolete) — Details — Splinter Review

Assuming my presumptions are right, this patch takes care of such messages. Tested on 4GB of Outlook messages; seems like it cures the problem while not adding new ones.

Attachment #561081 - Flags: review?(dbienvenu)

David :Bienvenu

Comment 4

•

13 years ago

Re the assertion that message-id's can't fold, I'm not sure that's the case. I know the message-id can be on it's own line when Exchange/Outlook generates the message. Mike, can you look at the info in the bug below? https://bugzilla.mozilla.org/show_bug.cgi?id=676916 The patch itself has bit-rotted slightly, but I've refreshed it and will attach the refreshed version after I check that it builds...

Mike Kaganski

Reporter

Comment 5

•

13 years ago

(In reply to David :Bienvenu from comment #4) Thank you, David. That's definitely my mistake (that was introduced in patch for bug 207156); the RFC2822 allows folding and comments both before and after the Message-Id field. So this comment needs to be removed. However, is there a real need to change the code itself? The Message-Id is passed to CreateAndSendMessage (or its current replacement) to allow it to create the message successfully; we don't use the (possibly changed) value of this header from the composed message - this header is copied as is. Will CreateAndSendMessage fail when a folding Message-Id is passed?

Mike Kaganski

Reporter

Comment 6

•

13 years ago

Attached patch Removed incorrect comment (obsolete) — Details — Splinter Review

I have checked the code of nsMsgComposeAndSend::InitCompositionFields in http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#2714 (it is called in nsMsgComposeAndSend::Init). It looks like the code wouldn't mind if the Message-Id wouldn't be passed at all - it would simply generate a new one. So here's the same patch without that incorrect comment.

Attachment #561081 - Attachment is obsolete: true

Attachment #561081 - Flags: review?(dbienvenu)

Attachment #562618 - Flags: review?(dbienvenu)

Mike Kaganski

Reporter

Comment 7

•

13 years ago

Attached patch Previous one was messed up with incorrect line endings (obsolete) — Details — Splinter Review

Sorry for noise

Attachment #562618 - Attachment is obsolete: true

Attachment #562618 - Flags: review?(dbienvenu)

Attachment #562625 - Flags: review?(dbienvenu)

David :Bienvenu

Comment 8

•

13 years ago

Attached patch de-bitrotted patch — Details — Splinter Review

this is the de-bitrotted patch I was planning on checking in. I removed the comment locally...

Attachment #562625 - Attachment is obsolete: true

Attachment #562625 - Flags: review?(dbienvenu)

Attachment #562647 - Flags: review+

Mike Kaganski

Reporter

Comment 9

•

13 years ago

Oh, excuse me. I have misunderstood you, as I thought you refer to bug 676916 when saying that its patch is bit-rotted... Thank you.

Mike Kaganski

Reporter

Comment 10

•

13 years ago

By the way, do you know a bug report about incorrect display of sender and topic of messages received _in usual way_, that contain such improper 8-bit headers, in the message list (and sometimes in message view window)? I tried to find one, and failed. I think that TB could use some similar technique to workaround this.

Mike Kaganski

Reporter

Comment 11

•

13 years ago

This bug 686985 is duplicate of bug 270638. I must have posted there, the more so since I have already seen it a while ago.

David :Bienvenu

Comment 12

•

13 years ago

http://hg.mozilla.org/comm-central/rev/76cb5cca90ec fixed on trunk. Thx for the patch, Mike.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 9.0

Martin Schröder [:mschroeder]

Updated

•

13 years ago

QA Contact: migration → import

You need to log in before you can comment on or make changes to this bug.