Note: There are a few cases of duplicates in user autocompletion which are being worked on.

[import] Importing mail from Outlook 2002 imports HTML emails as plain text.

RESOLVED FIXED in Thunderbird 3.3a1

Status

MailNews Core
Import
P3
critical
RESOLVED FIXED
13 years ago
7 years ago

People

(Reporter: olly, Assigned: Jorg K (GMT+2))

Tracking

(Blocks: 1 bug, {dataloss})

unspecified
Thunderbird 3.3a1
x86
Windows XP
dataloss
Bug Flags:
blocking-thunderbird3.0a2 -
blocking-thunderbird3 -
wanted-thunderbird3 +

Thunderbird Tracking Flags

(thunderbird3.1 .5-fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

13 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040707 Firefox/0.9.2
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040707 Firefox/0.9.2

When I import my Outlook 2002 mailbox into Thunderbird 0.7.2, all HTML emails
are rendered in Thunderbird as plain text.  I would expect them to be rendered
as they were in Outlook, i.e. as HTML.   

Reproducible: Always
Steps to Reproduce:
1. Open Thunderbird
2. Select Tools/Import/Mail/Outlook
3. View the email which is HTML in Outlook and it will be rendered as plain text
in Thunderbird.

Actual Results:  
The HTML email is rendered as plain text in Thunderbird.

Expected Results:  
The HTML email should have been imported correctly and displayed as HTML in
Thunderbird.
Summary: Importing mail from Outlook 2002 imports HTML emails as plain text. → [import] Importing mail from Outlook 2002 imports HTML emails as plain text.

Comment 1

13 years ago
I can confirm this bug in Outlook 2003 as well.  I imported just yesterday using
version 0.8 (20040929) which was the most recent listed on the rumbling edge.  I
believe this bug should be fixed before 1.0 as a lot of people are going to be
coming over from Outlook and everything needs to import correctly.

Comment 2

13 years ago
Confirmed. With TB 0.9.

Please folks, change the status from:
"Nobody has validated that this bug needs to be fixed."
to: NEW or better yet ASSIGNED

I consider this a 'blocker'.

Comment 3

13 years ago
please try a .9+ build -
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.9/

you will need to re-import.

*** This bug has been marked as a duplicate of 199298 ***
Status: UNCONFIRMED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → DUPLICATE

Comment 4

13 years ago
(In reply to comment #3)

> *** This bug has been marked as a duplicate of 199298 ***

This bug is definitely *not* a duplicate of 199298. That bug is about mail
incorrectly importing as an attachment. This is about mail losing its HTML
formatting on import.

It has also not been resolved. I confirmed this bug yesterday on TB 1.0
importing from Outlook 2002 on WinXP. I think this is a killer that will stop
unhappy Outlook users (like me) from switching to TB.

I believe it should be reopened.

Comment 5

13 years ago
*VALIDATION*

I consider this NOT as a blocker.
Actually, it's quite important to resolve this bug.

I've lost most of my mails because I thought the import from outlook would have
been successfull.

sai
(Reporter)

Comment 6

13 years ago
This bug is still present.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---

Comment 7

13 years ago
TB1.0 imported 14/05/05 from Outlook2002/WindowsXP. All HTML mail converted to 
plain text. Note that this is not a rendering problem, the message source has 
been converted. Complex menus get converted into unintelligable single-line 
strings of words, overwhelming the text contents. 

I will/can not migrate until I can sucessfully import my mail archive. I could 
live without the graphics and menus, but I have to be able to read the text 
contents! 

Comment 8

13 years ago
I got the same problem when importing Outlook 2003 email into thunderbird as 
well. On top of that, any html email that I send from Outlook 2003 to a 
Microsoft Exchange server arrives as text in Thunderbird. This is a huge 
problem. To recreate this bug, simply do the following:
  1) Send an email from Outlook with a color, various font size, etc. 
  2) Receive email through Thunderbird and all of the formatting will disappear
  
Also, the html formatting works fine if the email is sent from an online email 
source such as Yahoo, hotmail, gmail, etc. 

Please take a deeper look at this issue, as we are trying to run mhonarc on 
the mbox file format used by thunderbird.

Thanks.
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
This bug has been automatically resolved after a period of inactivity (see above
comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 13 years ago12 years ago
Resolution: --- → EXPIRED

Comment 11

11 years ago
This item, 250878, was automatically closed, but I report that the severe trouble remains in recently version of Thunderbird!

I am transitioning from Outlook 2003, SP 1, 
to Thunderbird 1.5.0.4 (20060516).
I experience trouble that is described by 323184 and 250878.
In other words, I imported thousands of email into Thunderbird, 
deleted them from Outlook, saw many that were next to unreadable.  Now I'm sad.

But looking at bugzilla, the players seem to think of the trouble as uncorroborated, and no work targeting this issue is being attempted (?!).
So I write to notify you that this trouble is still present in sw very recently released.
Reopening at the request of John Ruckstuhl.

Gerv
Status: RESOLVED → UNCONFIRMED
Resolution: EXPIRED → ---

Comment 13

11 years ago
I'm seeing this bug when importing from Outlook 2003 into TB 1.5.0.5.  For all imported HTML messages, every mention of a Content-Type field in the message's source looks like this:

Content-Type: text/plain; charset=windows-1252; format=flowed

Comment 14

11 years ago
This bug was reported 2+ years ago.
Current owner Scott MacGregor hasn't commented since it was reopened 2+ months ago.  Scott, any thoughts?  Hand off to someone else if it makes sense?
Thanks,
John Ruckstuhl

Comment 15

11 years ago
I can confirm this bug--just ran into this problem with TB 1.5.0.9.  I tried importing my Outlook email (Outlook 2002 SP3) and the HTML was indeed imported as plain text.

Looking at the history suggests this bug won't be fixed anytime soon--has anyone come up with a workaround?

J.R.

Comment 16

11 years ago
I can't believe this bug is marked as unconfirmed after several years. This bug stops you converting from Outlook and so is a show stopper. I just started using TB, v1.5.0.9 on XP Pro SP2. All HTML tags are stripped from HTML emails. I posted sample emails from Outlook 2002 and TB, plus more information, in this thread:

http://forums.mozillazine.org/viewtopic.php?t=526358

Can someone please work on this bug? It's been 3 years already!

I'm stuck with clunky Outlook until this problem is fixed.
QA Contact: migration

Comment 17

9 years ago
I can also confirm this bug. I imported my mails from Outlook 2002 using TB 2.0.0.12 and HTML messages were imported as plain text : Hyperlinks converted to texts (URLs lost), quotes lost… But attachments were preserved.

Comment 18

9 years ago
Here is my opinion about this bug as a developer. End-users can directly jump to the "Options" section.

To sum things up, TB uses Simple MAPI to import Outlook mails and it seems MAPI doesn't support the HTML format, only RTF.

As stated in the "Import .pst files" [1] MozillaZine Knowledge Base (MZKB) article, TB imports Outlook mails using the Simple MAPI client interface. The bad news is that "Thunderbird's SimpleMAPI support is buggy" [3]. The MAPIReadMail function is used to "retrieve a message for reading" [2] and the message is set in its lppMessage parameter, a MapiMessage [4] structure. Its lpszNoteText attribute contains the message body as a text. This last information is very important because it means the body MIME type is always plain text. The "Receiving a Message: Simple MAPI Sample" [7] MSDN topic seconds the limitation : « the message text can be printed with a simple call to the C library function printf. », it means the text doesn't contain HTML elements and is readable to the end-user. That's probably why all Outlook mails are imported as plain text messages in TB. The structure also has a lpszMessageType but I'm not familiar with the InterPersonal Message (IPM) class.

About the IPM class. Reading the "Creating a New Interpersonal (IPM) Message Class" MSDN topic it seems TB suffer from a bad design issue :

« There are several ways to create a new message class used for person-to-person communication. By using MAPI properties to structure message content, you avoid writing code to parse message text or a binary attachment. […] Design Tasks […] Decide whether to use Simple MAPI, CMC, MAPI, or the CDO Library. See "Selecting a Client Interface" [6]. In addition to the considerations listed there, you must think about whether that interface fits well into the application framework you have chosen. »

So the question is, was it smart to choose Simple MAPI over the CMC, MAPI and CDO libraries ? I think it was a wise decision because as stated in the 6th reference, time is a valuable resource and Simple MAPI is so… simple. I don't know much about CMC, only it's perfect for cross-platform development and it has the same functions as Simple MAPI. This last info made me ticked as TB is a cross-platform application. MAPI is COM-based so it's far much more powerful than Simple MAPI but it's far less accessible. Dozens of interfaces, structures… A real maze for new comers.

In the 6th reference there's also a very important section about messaging types : enabled, aware and based. TB is apparently enabled/aware, the simple approach. But based applications have a few extra features :

« Messaging-based applications have more complex messaging requirements because they have more direct contact with and control over the underlying messaging system services like address books, message stores, and transports. These applications often implement a wide variety of messaging features, such as rules processing, automatic forwarding, and supporting Rich Text Format. »

RTF, Rich Text Format. Sounds like HTML ? Not really, in the "Rich Text Format" Wikipedia article :

« The Rich Text Format (often abbreviated RTF) is a proprietary document file format developed by Microsoft in 1987 for cross-platform document interchange. […] It should not be confused with enriched text (mimetype "text/enriched" of RFC 1896) or its predecessor Rich Text (mimetype "text/richtext" of RFC 1341 and 1521) which are completely different specifications. »

So RTF is not HTML and it seems even based applications can't handle it, otherwise MSDN would mention it. I mean, it's HTML and every users or developers love the format. Things get even worse when you check out the "Formatted Text in MAPI" [9] MSDN topic :

« The text of a message can be stored and transmitted using plain text or formatted text. Formatted text enhances the message text by altering its appearance with, for example, one or more fonts, font sizes, or text colors. It is recommended that all clients and whenever possible, all message store providers, support formatted text. Supporting formatted text in messages adds value by improving message readability and making message handling easier and more efficient. […] MAPI defines these two message text properties and mechanisms for conversion between them so that RTF-aware clients can interoperate with clients and messaging systems that do not support formatted text. »

Does MAPI support HTML ? I don't think so. Knowing how Microsoft tricks the developers to impose their own proprietary formats, I think MAPI perfectly handles HTML mails - probably using the Internet Explorer API - but only exposes RTF bodies to the developers. So even if TB was migrating from Simple MAPI to MAPI (Extended ?) it seems we would have to deal with RTF voodoo. Of course I'm only speculating but MSDN is so obscur about the message bodies, I could only guess.

Options :
* Check my assertion about the MAPIReadMail function retrieving the message as a plain text. I think a simple PHP script could do the job nicely as I don't know anything about XUL yet.
* Use the utilities listed in the first reference, but I've not tried them yet. If they can import HTML mails, we can do it too.
* Dig into MAPI and see if it offers some magic interface or function to retrieve a HTML message for reading (IMAPIMessageSite::GetMessage, IMessage, IMAPIProp...).
* Mark this bug as resolved, close our eyes and prevent further reopenings ;)
* Import Outlook mails and edit the plain text messages one by one to fix the format problems : Crunched quotes, MIA URLs, we come in peace niark! niark! texts... Winter is over, spring time. Let's forget about this last option :D

Sorry for this long comment but I've just spent like 2 hours working on this bug and thought my comments would interest the other developers.

References :
* [1] <http://kb.mozillazine.org/Import_.pst_files>
* [2] MSDN topic : <http://msdn2.microsoft.com/en-us/library/ms529646(EXCHG.10).aspx>
* [3] "MAPI Support" MZKB article : <http://kb.mozillazine.org/MAPI_Support>
* [4] MSDN topic : <http://msdn2.microsoft.com/en-us/library/ms529146(EXCHG.10).aspx>
* [5] <http://msdn2.microsoft.com/en-us/library/ms527920(EXCHG.10).aspx>
* [6] MSDN topic : <http://msdn2.microsoft.com/en-us/library/ms526440(EXCHG.10).aspx>
* [7] <http://msdn2.microsoft.com/en-us/library/ms527946(EXCHG.10).aspx>
* [8] <http://en.wikipedia.org/wiki/Rich_Text_Format>
* [9] <http://msdn2.microsoft.com/en-us/library/ms531451%28EXCHG.10%29.aspx>

Comment 19

9 years ago
I'm glad to hear that a developer is looking into this problem.  I ran into it myself as I tried to switch from Outlook to TB.  This bug makes a lot of my email archives unintelligible, so I won't be able to switch to TB unless it's fixed.

IMO as an end user this is a _really_ important bug to fix.  I want to switch to TB because Outlook 2002 doesn't work properly with Vista.  I don't went to spend the money on a new version of Outlook, and I'm really mad that MS is refusing to support their own product through an OS upgrade.  I bet a lot of people are in the same boat.

But people _aren't_ going to switch if they lose their email archives.  If you want Thunderbird to _ever_ achieve a significant market share, this _has_ to be fixed.  

And I gotta say that I can't _believe_ that this has been open for 4 YEARS and is still unresolved.

Updated

9 years ago
Assignee: mscott → nobody
Confirming based on the various comments here.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 21

9 years ago
Very probable dupe (for this) bug 250878

Comment 22

9 years ago
I can confirm problem in 2.0.0.14 with XP and Outlook 2003.  100% of imports formated as HTML display HTML code only and are not properly rendered.

It is astounding to me that this wasn't considered a showstopper for TB 2.  This bug affects 100% of outlook users who migrate to TB.  If you can't import from Outlook, then only support import from Outlook Express.  That is very simple kludge and at least represent the acurate stage of TB development.  I'm very glad I checked the imports or I could have deleted all of my outlook data before I discovered this bug.  Again, how is this not a showstopper?  Horribly screwing-up every outlook import can't be good for TB's reputation.

My original thread and report of the problem:
http://forums.mozillazine.org/viewtopic.php?p=3402337

Also, here is a REALLY simple non-developer solution:  Do the import like normal, begin test: 1) strip all white space out the the body of the message, scan the body of the message, if it begins with <html> then run a backwards scan to determine if body of text ends with </html>, if all those are true, then set whatever flags in TB that need to be set that tell it to render the message in HTML.  Finally, Pray the html renderer is gracefull enough that it won't crash on recieving malformed HTML and can render plain text somewhat legibly if something happens to slip through.  How's that?

ovidiu: that dupe references this bug report, did you have a different bug report in mind that you meant to reference as a dupe?
Flags: blocking-thunderbird3?
Flags: blocking-thunderbird3.0a2?

Comment 23

9 years ago
(In reply to comment #21)
> Very probable dupe (for this) bug 250878
> 

silly me, that was bug 323184 I was aiming. 


maybe related (far, not probable ..)
bug 395745
bug 415045 (far away)

Updated

9 years ago
Duplicate of this bug: 323184

Comment 25

9 years ago
Not an a2 blocker; marking blocking‑thunderbird3.0a2-
Flags: blocking-thunderbird3.0a2? → blocking-thunderbird3.0a2-

Comment 26

9 years ago
I'm afraid I no longer attempt to use this "product".  
It burned me badly, and I cannot recommend it to anyone.
Magnus/Ovidiu -- I'm sure you mean well, but dispositioning this bug and closing a dupe is false progress.  No one has been paying attention to the bug list for 4 years.
Regards,
John

Comment 27

9 years ago
Two further observations:
(1) Magnus, my thinking is that it is useful for the duplicates to refer to each other, but it is generally a mistake to close one just to "clean up" the bug database.  Only the developer who completes a fix should close the dupe.  Why?  Although the root cause is very likely duplicate, there is valuable user commentary in both 250878 and 323184 threads.  That is, by closing 323184, if a developer were ever to take interest in 250878, he would have to be abnormally diligent to study the comments of 323184.  You are effectively nullifying any comments on 323184 because even though they are still readable -- they won't be read.  And 10% of the time, a helpful dupe-closer will actually be mistaken and will create confusion.  See earlier in this thread where it was helpfully declared a dupe of 199298, then, that was reversed.
(2) re my claim: "No one has been paying attention to the bug list for 4 years" -- Note that 323184 has been owned by Scott MacGregor forever... In 2006 I suggested to Scott maybe hand off to someone else, but he didn't.  In October 2007, Scott left Mozilla (see http://scott-macgregor.org/blog/ ), but his cases didn't roll over to someone else, they just dead-ended.  But this one had been dead-ended for 3 years in Scott's lap anyway.

Anyway, that's my thinking.  If you disagree, feel free to express a different point of view; sometimes I learn from that.

Regards,
John

Comment 28

9 years ago
John, marking a bug as dupe does not nullify it's claims - this bug is still open.

Think of it this way - if someone decides to fix this (anyone's free to take it up and submit a patch) and that fix for some reason doesn't fix the "other issue", that bug can be reopened. 

If we only closed bugs as duplicates after the other bug was fixed the whole system would collapse. We do have the option to reopen bugs that weren't dupes after all. I'd say the percentage that has to get reopened is very small. (Have no numbers, but my guess would only a few percent if even that.) 

Comment 29

9 years ago
John, I can only say that dupes are about consolidation, cause attention test and work can eventually be split and that's worse. I really saw that and duping didn't hurt.

Yes, some bugs are really old, but this one just got ? for tb3 and 30a2 block. Which means is just getting into attention.

(ps: and I think bugs work and general work is really getting in shape lately. and migration/import will be part of the game.. )

Comment 30

9 years ago
Wouldn't block for this; blocking‑thunderbird3-, but we sure would like a patch for it.

Jean-Marc: is this something you're working on/ or would like to work on?
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3?
Flags: blocking-thunderbird3-
Priority: -- → P3
Target Milestone: --- → Thunderbird 3.0rc1

Comment 31

9 years ago
(In reply to comment #30)
> Jean-Marc: is this something you're working on/ or would like to work on?

First of all sorry for the late reply. My email notifications were disabled so I completely lost track of this bug.

Magnus, I would really like to work on it but time is an issue. Fixing this bug should be very time consuming. Moreover as posted previously the MAPI black box makes things worst. However I'm optimistic and think each problem has a solution so it can be done. I don't know when and how it will happen, but someday we'll be able to import our Outlook mails in Thunderbird. For the moment I've decided to keep Outlook so I can check my old messages from time to time.

Comment 32

9 years ago
(In reply to comment #22)
> I can confirm problem in 2.0.0.14 with XP and Outlook 2003.  100% of imports
> formated as HTML display HTML code only and are not properly rendered.

Thanks Alex, it would also be interesting to know if Outlook 2002+ MAPI versions are more opened or if a new API is available. If MAPI is really a black box then migrating from the 2002 release to the most recents ones, the latest one is Outlook 2007, could be an other solution. But we still would have to improve the importation feature in Thunderbird, to support the new API.
outlook experts, good analysis here and I have a recent user interested in helping. what are the prospects of working toward resolution?

this is really dataloss
Severity: major → critical
Keywords: dataloss

Comment 34

8 years ago
Hi, I think I'm that user :)
This is happening to me with 2.0.0.22 on windows xp, importing from outlook 2007.
I imported a very large pst file, and most, if not all, html mails are wrong. Apperently, mails with multiple content-type get wrong, 
I can fix the problem by changing Content Type of the first section from text/plain to text/html.
How can I help to solve this issue?

Updated

8 years ago
Duplicate of this bug: 395745

Comment 36

8 years ago
A quick one little addition...

The same result is happening on both Win and Mac version of T3, I did the import on Windows and I tried to browse on old html emails from the Win client and the Mac client... 

Something strange happened while changing the zoom (of the preview pane) in the Win T3 client, for just one second it tried to re-render the message in html, and after that "magic" second, it returned to the plain text version of the html.

I don't know why this is classified as P3.. this is something critical if you want people to adopt T3 moving their recent and old email history and take advantage of the new search features.

Comment 37

8 years ago
its a pity that nobody cares about this crit error. i migrated to thunderbird and importet straight away from outlook 2002 -> all html tags are gone. meta types are still the same (text/html).

Comment 38

8 years ago
http://msdn.microsoft.com/en-us/library/ff385210.aspx
http://blogs.msdn.com/interoperability/archive/2010/02/19/New-Office-Documentation-Now-Publicly-Available.aspx
Brand new documentation for Microsoft Outlook files. Data portability is increasingly important for our customers and partners as more information is stored and shared in digital formats. One particular request we’ve heard is for improved access to email, calendar, contacts, and other data generated by Microsoft Outlook. On desktops, this data is stored in Outlook Personal Folders, in a format called a .pst file. Last fall we promised to release documentation that would make it easier for developers to read, create, and interoperate with the data in .pst files across a variety of platforms, using the programming language of their choice. After seeking input on the documentation from the community, today we delivered on that promise (here's the link to the documentation on MSDN: http://msdn.microsoft.com/en-us/library/ff385210.aspx).

Comment 39

8 years ago
I'll look into the docs but without outlook installed I can't promise anything.
I use Thunderbird 3.0.3 and Outlook 2003 and I confirm the bug is still there.

But the strange part is it only occurs with some e-mails (randomly). 

Some e-mails sent by the same person, same client, same configuration, etc. show this behaviour when imported from Outlook 2003 and some others don't.
(Assignee)

Comment 41

7 years ago
Yep, this bug is still there with TB 3.0.4 and Outlook 2003.

I've been assigned some other bugs in the Outlook import process. So while I'm at it, I'll look at this one, too. I noticed it myself this week. And I agree, it needs to be fixed asap. I have 10 years worth of Outlook e-mail to import ;-)

Basically the problem is here:
http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/nsOutlookMail.cpp#352
and here:
http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/MapiMessage.cpp#454

Somehow the heuristics get it wrong. And once TB has incorrectly decided to do the message as plain text, all bets are off:

http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/nsOutlookCompose.cpp#596

'bodyType' is determined from what was passed in to the function, and that was determined incorrectly in the two spots shown above.

Sadly, the whole Outlook import is quite twisted as some of the other bugs show, so it will need a bit of straightening out.

Comment 42

7 years ago
um, this was first reported in 2004, right?
With all good wishes to you Jorg K., but I will not be holding my breath.
(Assignee)

Comment 43

7 years ago
Well, to me eye it's still the old Netscape code.

Don't hold your breath, especially since the MAPI interface is used to get the e-mail out of Outlook.

Anyway, I hope to improve the import to a point where I can import my own Outlook data satisfactorily.
(Assignee)

Comment 44

7 years ago
OK, I did some debugging:

Take a good look here:
http://mxr.mozilla.org/comm-central/source/mailnews/import/outlook/src/MapiMessage.cpp#466:

  if (!m_body.IsEmpty() && m_body.Find("<!-- Converted from text/plain format -->") == kNotFound)
467     m_bodyIsHtml = TRUE;
468   else
469 

MAPI delivers all messages a HTML format. TB tries to tell plain text apart from HTML by looking for this string:
"<!-- Converted from text/plain format -->"

And guess what! MAPI gets this wrong. There are messages delivered with this string that were in fact HTML to start with. So we convert them to plain text.

Unless I can find a better way to tell the two apart, the only option might be, to import everything as HTML. Much better than converting HTML to plain text.

To be continued.
(Assignee)

Comment 45

7 years ago
OK. Here are three messages, for your enjoyment.

The first one is falsely converted to plain text.

The following two were already plain text in Outlook, one with attachment, one without.

I think, the fix is simple.

Instead of looking for 
"<!-- Converted from text/plain format -->"
anywhere in the body, we should only look for that string immediately following the <BODY> tag.

============ Messages follow ===================

Falsely converted to plain/text:

Headers returned by MAPI (excerpts)
========================

MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_0071_01CA8BCD.D8AB6C30"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3598
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
X-Antivirus: avast! (VPS 100101-0, 01/01/2010), Outbound message
X-Antivirus-Status: Clean


Body returned by MAPI - I marked (+++) the line with <!-- Converted from text/plain format -->
=====================

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE></TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.6000.16890" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hi Jorg,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Here is a stock list for you to experiment (free of 
charge I hope :)))&nbsp;&nbsp;</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Lillian</FONT></DIV>
<BLOCKQUOTE dir=ltr 
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV 
  style="BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: black"><B>From:</B> 
  <A title=jorgk@jorgk.com href="mailto:jorgk@jorgk.com">Jörg Knobloch</A> 
</DIV>
  <DIV style="FONT: 10pt arial"><B>To:</B> <A title=enquiry@lyrapianos.com.au 
  href="mailto:enquiry@lyrapianos.com.au">'Lyra Piano Shop'</A> </DIV>
  <DIV style="FONT: 10pt arial"><B>Sent:</B> Saturday, January 02, 2010 7:26 
  AM</DIV>
  <DIV style="FONT: 10pt arial"><B>Subject:</B> {Spam?} Pasting from MS Excel 
  into MS Frontpage</DIV>
  <DIV><BR></DIV><!-- Converted from text/plain format --> +++++++++++++++++++++++++++++++++
  <P><FONT size=2>Hello,<BR><BR>Please send me the spreadsheet with the 
  stocklist, so that I can experiment.<BR><BR>I can tell you now, that pasting 
  is a terrible idea since it reintroduces the horrible formatting which doesn't 
  even work. Here an example:<BR><BR>Excel:<BR><IMG 
  src="cid:006d01ca8b71$a4fc6490$590d8273@your7cc5e821f0"><BR><BR>IE and 
  Opera:<BR><IMG 
  src="cid:006e01ca8b71$a4fc6490$590d8273@your7cc5e821f0"><BR><BR>Firefox:<BR><IMG 
  src="cid:006f01ca8b71$a4fc6490$590d8273@your7cc5e821f0"><BR><BR>Chrome and 
  Safari:<BR><IMG 
  src="cid:007001ca8b71$a4fc6490$590d8273@your7cc5e821f0"><BR><BR>Five browsers, 
  three versions when you paste from MS Excel to MS Outlook.</FONT></P>
  <P><FONT size=2>JK.</FONT> </P></BLOCKQUOTE></BODY></HTML>


Originally already plain text without attachments:


Headers returned by MAPI (excerpts)
========================

Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.5
Subject: AW: Alles Gute zum Geburtstag
Date: Wed, 7 Apr 2010 09:53:30 +0200
Message-ID: <C6949432B1C70449A3E3F73388F7398402AEF869@olg-kar01-mmx04.JUSTIZ.BWL.NET>
In-Reply-To: <103FA5B446AF4489B255E7C8CB2A6E9B@toad>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Alles Gute zum Geburtstag
Thread-Index: AcrVD59wQBqc7Y20Sr+rI1cXuIlvRQBFj7oA
X-OriginalArrivalTime: 07 Apr 2010 07:53:33.0101 (UTC) FILETIME=[6B9D25D0:01CAD627]
X-Spam-Status: No, score=-5.7
X-Spam-Score: -56
X-Spam-Bar: -----
X-Spam-Flag: NO


Body returned by MAPI
=====================

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>AW: Alles Gute zum Geburtstag</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>Lieber Jörg,<BR>
<BR>
herzlichen Dank für Deine Glückwünsche. Sie kamen durchaus rechtzeitig, weil ich ja nur im Gericht auf meine eMails zugreifen kann. Gestern war ich in einem Notariat und heute lese ich sie.<BR>
<BR>
Ein schönes Osterfest hatten wir und auch eine nette Geburtstagsfeier mit Kaffeetrinken mit den ältern Herrschaften (mein Vater war aber dieses Jahr leider nicht hier) und Abendessen mit Freunden. Henriette und Jonathan waren auch da.<BR>
<BR>
Herzliche Grüße<BR>
<BR>
Dein Rainer<BR>
<BR>
</FONT>
</P>

</BODY>
</HTML>


Originally already plain text WITH attachment:

Headers returned by MAPI (excerpt)
========================

Message-ID: <9814013.1263507478572.JavaMail.ngmail@webmail08.arcor-online.net>
Date: Thu, 14 Jan 2010 23:17:58 +0100 (CET)
Subject: Protokoll
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_20707_14932200.1263507478570"
X-ngMessageSubType: MessageSubType_MAIL
X-WebmailclientIP: 88.73.69.211
X-Spam-Status: No, score=-2.5
X-Spam-Score: -24
X-Spam-Bar: --
X-Spam-Flag: NO


Body returned by MAPI
=====================

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>Protokoll</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>Allen ein schönes Wochenende!<BR>
<BR>
Astrid</FONT>
</P>

</BODY>
</HTML>
(Assignee)

Comment 46

7 years ago
Created attachment 441438 [details] [diff] [review]
Improved heuristics for finding true plain text messages

Sadly, since the fix is in the same source file, this also contains the patch for bug 309932.

I'm awaiting input from the reviewers to tell me how to separate the two.
Attachment #441438 - Flags: review?(bienvenu)

Comment 47

7 years ago
You can use an hg queue - http://hgbook.red-bean.com/read/mercurial-queues-reference.html

... but if the fixes do not build on each other you can also just back out the other fix and start clean. (If the fixes aren't too close to each other the patches can be applied on top of each other.)
(Assignee)

Comment 48

7 years ago
I prefer, not to concern myself with configuration management here ;-)

I believe a provided a fix to a critical problem that was scheduled for TB 3.0 (we are currently at 3.0.4), which has been open since 2004.

This patch contains two sections which are independent of each other and about 250 lines apart. The first section is the patch already provided for bug 309932, since that bug is fixed in the same source file as this one here.

I am hoping that the person applying the patch will do one of the following:
1) apply this patch entirely and discard the one of bug 309932
or
2) apply the patch for bug 309932 first. Consequently the first patch of this patch will fail (since it's already applied), the second one will succeed. If it doesn't, the the person applying the patch should be able to modify the patch for it to succeed.

After all, not counting comments, we are talking about a two-line-change and a one-line-change.

Comment 49

7 years ago
That seems to be a suitable solution to the problem (at least for now).... Ok, so now who will pick this up to make it available to the next night built or the next minor release?

We, human users will be extremely happy to test it and (finally!) use it...

BTW. By my side a huge thank you to Jorg for your effort!!!!!!

Updated

7 years ago
Blocks: 564148

Comment 50

7 years ago
I can't currently test this because outlook import isn't working on my machine...Neil, can you import from Outlook?
Does it import all folders, or just one? I can put a test message in my Outlook 2003 Inbox, but the account has a large number of messages in another folder.
(Assignee)

Comment 52

7 years ago
Imports absolutely everything, including any open personal folder files (PST).
Hmm, so the problem occurs on HTML replies to plain text messages?

I'm not sure whether bienvenu would count it as a test but I could try setting up an Outlook profile with just a .PST file supplied by someone else.
(Assignee)

Comment 54

7 years ago
Not sure. For some HTML messages MAPI just returns the string we look for
"<!-- Converted from text/plain format -->"
somewhere in the body of the message. And those messages are NOT plain text, as explained above.

I haven't worked out when that happens, but I do have some HTML messages here that were previously falsely converted to plain text.
David ping for the review.

Updated

7 years ago
Attachment #441438 - Flags: review?(bienvenu) → review?(bugzilla)

Comment 56

7 years ago
I can't review this since I can't test it.  I meant to transfer the review request; I guess it didn't take.
Assignee: nobody → mozilla
Component: Migration → Import
Product: Thunderbird → MailNews Core
QA Contact: migration → import
Target Milestone: Thunderbird 3.0rc1 → ---
Comment on attachment 441438 [details] [diff] [review]
Improved heuristics for finding true plain text messages

Jork sent me a test PST file last month, but I only just got around to testing.

The current code finds the <!-- Converted from text/plain format --> comment as expected in a plain text message. However it also finds it embedded in an HTML reply to a plain text message. The new code correctly limits the test to require the comment to appear immediately after the body element.

>-    FormatDateTime( st, str);
>+    // FormatDateTime would append the local time zone, so don't use it.
>+    // Instead, we just append +0000 for GMT/UTC here.
>+    FormatDateTime( st, str, FALSE);
>+    str += " +0000";
[This already landed as part of bug 309932.]

>+  // kind-hearted Outlook will give us html even for a plain text message.
Nit: Might as well use a captial: Kind

>+  if (!m_body.IsEmpty() &&
>+    m_body.Find("<BODY>\x0D\x0A<!-- Converted from text/plain format -->") ==
>+    kNotFound)
Can we use \r\n here instead of \x codes?
Nit: indentation is not quite right here; it should look like this:
if (!m_body.IsEmpty() &&
    m_body.Find("<BODY>\x0D\x0A<!-- Converted from text/plain format -->") ==
    kNotFound)
Attachment #441438 - Flags: review?(bugzilla) → review+
(Assignee)

Comment 58

7 years ago
Created attachment 461789 [details] [diff] [review]
new patch to satisfy review

New patch to satisfy review.

Please note: "kind-hearted" was already spelled lowercase in the original code. Now it's uppercase as requested ;-)
Attachment #441438 - Attachment is obsolete: true

Updated

7 years ago
Attachment #461789 - Flags: superreview?(bienvenu)

Updated

7 years ago
Attachment #461789 - Flags: superreview?(bienvenu)
Attachment #461789 - Flags: superreview+
Attachment #461789 - Flags: review+

Comment 59

7 years ago
Created attachment 462063 [details] [diff] [review]
patch checked in.
Attachment #461789 - Attachment is obsolete: true

Comment 60

7 years ago
fix checked in, thx, Jorg.
Status: NEW → RESOLVED
Last Resolved: 12 years ago7 years ago
Resolution: --- → FIXED
Comment on attachment 462063 [details] [diff] [review]
patch checked in.

We've decided to take this on 1.9.2 for the next 3.1.x build as this will hopefully improve the import from outlook.
Attachment #462063 - Flags: approval-thunderbird3.1.5+
(Assignee)

Comment 62

7 years ago
If you want to improve the import from Outlook by 773%, you should review and accept the fix to bug 207156. That constitutes a fine rework and cleanup effort.
Checked in to 1.9.2 (sorry for forgetting to set the username):

http://hg.mozilla.org/releases/comm-1.9.2/rev/9c82665360ba
status-thunderbird3.1: --- → .5-fixed
Target Milestone: --- → Thunderbird 3.3a1

Comment 64

7 years ago
HELP.
Have been waiting long for this fix, downloaded 3.1.5. as soon as available, and imported my mail (Outlook 2003; 11.8118.8107 SP2; XP).
After it imported all my mails, it turned out that - unfortunately - the mails are still mangled!!
Ie I see HTML in the mail messages - see below.

An unhappy user, hoping to switch from Outlook but still unable to do so... :-(

Kind regards,

Roel


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Array Seminars</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<STYLE type=text/css>BODY {
	PADDING-RIGHT: 0px; PADDING-LEFT: 0px; PADDING-BOTTOM: 0px; MARGIN: 0px; PADDING-TOP: 0px; BACKGROUND-COLOR: #fbf0ce
}
.BackgroundShade {
	BACKGROUND: #fbf0ce
}
.Title {
	PADDING-RIGHT: 10px; PADDING-LEFT: 10px; FONT-WEIGHT: bold; FONT-SIZE: 15px; BACKGROUND: #000080; PADDING-BOTTOM: 10px; COLOR: #fff; PADDING-TOP: 10px; FONT-FAMILY: Verdana, Helvetica, Arial, sans-serif; TEXT-ALIGN: center; TEXT-DECORATION: none
}
(In reply to comment #64)
> HELP.
> Have been waiting long for this fix, downloaded 3.1.5. as soon as available,
> and imported my mail (Outlook 2003; 11.8118.8107 SP2; XP).
> After it imported all my mails, it turned out that - unfortunately - the mails
> are still mangled!!
> Ie I see HTML in the mail messages - see below.
> 

Can you file a new bug please, with examples attached to it so we can have a look and maybe fix some of the remaining issues with have with import ?
You need to log in before you can comment on or make changes to this bug.