Closed Bug 119441 Opened 18 years ago Closed 18 years ago

Problems importing Outlook & Eudora Mail with "From" in the beginning of a line

Categories

(MailNews Core :: Backend, defect, P2)

x86
Windows XP
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: victor.wong, Assigned: cavin)

References

(Blocks 1 open bug)

Details

(Whiteboard: nab-imp)

Attachments

(3 files, 2 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
BuildID:    2002011003

I'm trying to import a 12,000+ message inbox from Outlook XP to Mozilla. After 
the import, when I view the list of messages in the Inbox in Mozilla, there are 
some messages without a subject and all with dates on 12/31/1969 at 4:00pm. 
After checking the original messages, I found that these were all were 
fragments of original messages. The original messages had lines in the message 
body that started like "From now on...".

Reproducible: Always
Steps to Reproduce:
1. Import mail from Outlook
2. Look at list of messages

Actual Results:  Some messages with lines in the message body starting with the 
word "From..." are misinterpreted as a separate message.

Expected Results:  No messages should be misinterpreted.

I'm guessing it's one of two things:
1. If Mozilla stores mail in the MBOX format then the import utility might have 
forgotten to perform From_ line quoting as specified in the RFC.
2. The messages are misparsed by the procedure that greps out the header 
information for the message list.
Attached file Problem email
This was one of the messages that when imported from Outlook would be
recognized as two messages. One message would have all the header information
and the first part of the message. The other message would have no subject and
have a sender of something like "CPU".
The attachment is a .msg file saved from Outlook. You can just drag it back 
into Outlook and then do the import to see the bug.
yes, this would be a bug in the import utility. I'm not sure who owns it, so I'm
cc'ing Scott and reassigning to Cavin since he's the only one I know who has
worked on import issues in the past.
Assignee: bienvenu → cavin
Status: UNCONFIRMED → NEW
Component: Mail Database → Mail Back End
Ever confirmed: true
QA Contact: esther → nbaca
Trunk build 2002-01-11-03:WinMe

Change summary from "Problems with importing mail from Outlook XP" to "Problems
with importing mail from Outlook", since I was able to reproduce the problem
using Outlook on WinMe.
Whiteboard: nab-imp
Nominating for nsbeta1.
Keywords: nsbeta1
I decided to test a bit more and found a possibly related problem in build 
200201103.

Try this:
1. Click on 'Compose' on the toolbar.
2. Type in 'From me' in the message body.
3. Close the window.
4. It will ask you if you want to save it. Click yes.
5. Select the message in the drafts folder.

You will see that the message has '>From me' as it's message body.

This looks to me as if the display code doesn't understand From_ quoting. So 
there seems to be a problem with the import code not performing From_ quoting 
and then also a problem with the display code misunderstanding the quoting.
Summary: Problems with importing mail from Outlook XP → Problems with importing mail from Outlook
Yes, it's caused by the import code not performing the "From_" line quoting (in 
the body text). Reporter, if you like, you can fix the problem now by manually 
adding '>' in front of all the "from ..." lines in the imported INBOX file. 
Then, remove the associated INBOX.msf file, restart mailnews program and reopen 
the INBOX.  Just a workaround for now until this is fixed.

For the "display code doesn't understand From_ quoting" problem, we should 
address it in a separate bug.
Changing summary to "Problems with importing mail from Outlook & Subjects
starting with "From".
Summary: Problems with importing mail from Outlook → Problems with importing mail from Outlook & Subjects starting with "From"
Sorry for the noise. Changing summary to more closely reflect reality.
Changing from 'Problems with importing mail from Outlook & Subjects starting
with "From"' to 'Problems importing Outlook Mail with "From" in the beginning of
a line'.

Note: the problem also occurs if the attachment has text that has "from" in the
begging of a line.
Summary: Problems with importing mail from Outlook & Subjects starting with "From" → Problems importing Outlook Mail with "From" in the beginning of a line
Moving to 0.9.9.
Target Milestone: --- → mozilla0.9.9
Attached patch (Av1) proposed patch (obsolete) — Splinter Review
Added code to prefix lines starting with "From " with ">" in the msg body.
Comment on attachment 65324 [details] [diff] [review]
(Av1) proposed patch

The patch looks good but I have a little concern about performance especially
on Mac with the fact your are doing a bunch of small write to disk. Maybe you
should try to write to a buffer and flush it to disk when it's full. A size of
4K should be fine. Import is not a frequent operation but it could be very time
consuming when you are importing a hudge inbox. R=ducarroz
Attachment #65324 - Flags: review+
+  if (dataLen == 1)
+  {
+    pDst->Write(start, 1, &written);
+    return NS_OK;
+  }

if (dataLen < 5)
{
  pDst->Write(start, dataLen, &written);
  return NS_OK;
}
:-) 
Also what is dataLen?
And you use NS_ENSURE_TRUE(rv) on other writes...

+      // Find a line so check for "From ".
+      *pChar = 0;
+      if ( nsCRT::strlen(start) > 4 && (*(start) == 'F') && (*(start + 1) ==
'r') &&
+           (*(start + 2) == 'o') && (*(start + 3) == 'm') && (*(start + 4) == '
') )
+          rv = pDst->Write(">", 1, &written);
+      *pChar = 0x0D;

simply if ((pChar - start) > 4 && .....)
         rv = pDst->Write(">", 1, &written);
?

+      rv = pDst->Write(start, pChar-start+2, &written);
                                           ^^^
Any chance that message has unix-style linebreaks? (Just speculating, I'm not
windows guy)
What about smth like

pChar = start;
while (start < end)
{
  while (pChar <= end && pChar != 'F')
    pChar++;

  if (pChar < end) {
    if ((pChar == body.m_pBuffer + body.m_writeOffset || (pChar[-1] == '\n' &&
pChar[-2] == '\r')) &&
        pChar[1] == 'r' && pChar[2] == 'o' && pChar[3] == 'm' && pChar[4] ==' ')
    {
      if (start < pChar)
      {
        pDst->Write(start, pChar - start, &written);
        start = pChar;
        pChar += 5;
      }
      rv = pDst->Write(">", 1, &written);
    }
  }
  else if (start < end)
  {
    // Flush out the remaining data and we're done.
    rv = pDst->Write(start, end-start, &written);
    NS_ENSURE_SUCCESS(rv,rv);
    break;
  }
}
+  if (dataLen == 1)
+  {
+    pDst->Write(start, 1, &written);
+    return NS_OK;
+  }

I guess I attached the wrong patch file.  Good catch. I'll attach the final one
and the main logic stays the same.
Attached patch (Av2) The right patch (obsolete) — Splinter Review
Attachment #65324 - Attachment is obsolete: true
Comment on attachment 66138 [details] [diff] [review]
(Av2) The right patch

R=ducarroz
Attachment #66138 - Flags: review+
Comment on attachment 66138 [details] [diff] [review]
(Av2) The right patch

some general comments:

1)  did you log a bug on us not handling ">From_" quoting?
2)  I noticed that when I composed a plain text draft message, wrote "From ",
and save it,
    it got saved as " From ".  So we appear to be doing some sort of "From "
escaping, but it doesn't
    seem like we are doing the right type.
3)  Proper "From " quoting also means that you quote ">From " as ">>From ",
">>From ", as ">>>From ", etc.
4)  What about other methods of import?  (does this bug affect Eudora import?)  

>+  char *start = body.m_pBuffer + body.m_writeOffset;
>+  char *end   = body.m_pBuffer+body.m_bytesInBuf;

end should be a const char *, since it doesn't change, right?

>+    while ((pChar <= end) && (*pChar != 0x0D) && (*(pChar+1) != 0x0A))

if pChar == end, pChar+1 might be off the end of the buffer, right?

>+      if ( nsCRT::strlen(start) > 4 && (*(start) == 'F') && (*(start + 1) == 'r') &&
>+           (*(start + 2) == 'o') && (*(start + 3) == 'm') && (*(start + 4) == ' ') )

doing a strlen here is bad for performance.  
I think you are going to need a while loop, to eat leading '>', looking for the
first 'F'.
once you find it, then you can do strncmp(ptr, "From ", 5);

you should find out where and why the compose (or local folder?) code is
quoting "From " as " From ".
We should be fixing that code as well.	Investigate if it is worth having this
code live under local,
and we call it from the import code.

The next step, after this is fixed, is to properly handle ">From" quoting on
display.  This should be
easy, as we currently handle "." escaping in various places.
Attachment #66138 - Flags: needs-work+
I think you can break it up into two bugs:

1) escaping (make sure the eudora and outlook import code works and fix the 
code that is escaping "From " -> " From " when writing to disk.  Make sure to 
see do the right thing if you have a message on the imap server, and you copy 
it to a local folder, as well as when you save drafts, templates, send later, 
etc.)  I think it will make sense to write the escaping code once (in 
mozilla/mailnews/local or in mozilla/mailnews/base/util I'm not sure yet), and 
then call it from all various places that need it.  There will be at least two.

2) unescaping (make sure that the local folder parsing code 
escapes ">From ", ">>From ", ">>>From ", etc.).  This can be a seperate bug.
I think you can break it up even further, into three bugs:

1) escaping, part 1. (make sure the eudora and outlook import code works)

It will make sense to write the escaping code once (in 
mozilla/mailnews/local or in mozilla/mailnews/base/util I'm not sure yet), and 
then call it from all various places that need it.  There will be at least two.

2) escaping, part 2.  fix the mozilla mbox code to escape the same way and fix 
the import code does.  Make sure to see do the right thing if you have a 
message on the imap server, and you copy 
it to a local folder, as well as when you save drafts, templates, send later, 
etc.)  

3) unescaping (make sure that the local folder parsing code 
escapes ">From ", ">>From ", ">>>From ", etc.).  This can be a seperate bug.
Attachment #66138 - Attachment is obsolete: true
Outlook Express has its own way of escaping "From " so I left it alone. The code 
is quite different from those of Outlook and Eudora import (looks like written 
by different people). Let me know if that needs to be changed as well (it'll 
take some more time to incorporate the util functions in the code).

ducarroz, I'm not sure if the makefile for Mac will need to be changed.
Adding "Eudora" to Summary because it has the same problem.
Summary: Problems importing Outlook Mail with "From" in the beginning of a line → Problems importing Outlook & Eudora Mail with "From" in the beginning of a line
Per comment #20, "escaping, part2" and "unescaping" are logged as bug 121946 and
bug 121947 respectively.
Status: NEW → ASSIGNED
Keywords: nsbeta1nsbeta1+
Priority: -- → P2
Comment on attachment 66559 [details] [diff] [review]
(Av3) Put the util functions in nsLocalUtils.cpp, fix Outlook & Eduora problem
[Checked in: Comment 26]

looks good.  some minor comments:

>+  if ( (*start == 'F') && (end-start > 4) && !nsCRT::strncmp(start, "From ", 5) )

do strncmp() instead of nsCRT::strncmp(), it's faster.

>+    rv = PR_TRUE;
>+  return rv;
>+}
>+
>+//
>+// This function finds all lines starting with "From " or "From " preceeding
>+// with one or more '>' (ie, ">From", ">>From", etc) in the input buffer
>+// (between 'start' and 'end') and prefix them with a ">" .
>+//
>+nsresult EscapeFromSpaceLine(nsIFileSpec *pDst, char *start, const char *end)
>+{
>+  nsresult rv;
>+  char *pChar;
>+  PRInt32 written;
>+
>+  pChar = start;
>+  while (start < end)
>+  {

this code is now not in a windows only file.  (that import code was windows
only, right?)
so is this code "correct" for all platforms?

>+    while ((pChar < end) && (*pChar != 0x0D) && (*(pChar+1) != 0x0A))
>+      pChar++;

instead of 0x0A and 0x0D, consider using nsCRT::CR and nsCRT::LF, to make the
code more readable.
(if you want to fix the rest of the import code, that would be great, but that
can be spun off to a new bug.)

these functions might be better in msgbaseutil, as local and the import
libraries (at least eudora?) already links that in,
and adding local (which is large static library) will eat up more RAM at run
time.

address those nits, make sure we are doing the line endings correct for all
platforms, and investigate moving this code to baseutil, and then sr=sspitzer
Fix checked in.

Tested the patch on Mac Eudora import without problems.  On all platforms we do 
end lines with CRLF when, for example, saving draft msgs to local folders. So 
the new function should be fine.

The new code was moved to msgbaseutil and two minor things were fixed.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Reopening

Trunk build 2002-03-01: WinMe
After importing Eudora Mail I see two messages where the Subject is blank, the
Sender state "the", and the date is 1969. I'm sending Cavin my Eudora Mail
folder to determine the cause.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, I see where the problem is. Working on a patch.
Changes are:
1. Recplace pDst->Write() call with EscapeFromSpaceLine() when body text is >
8K. This is where the problem was.

2. Replace "NS_ENSURE_SUCCESS(rv,rv)" with "if (NS_SUCCEEDED(rv))" so that the
code will follow through and the source file stream will be closed in case of
errors.

Both Eudora and Outlook have the same problem.
Comment on attachment 72630 [details] [diff] [review]
(Bv1) Fixed not escaping from line problem when body text is > 8K
[Checked in: Comment 33]

R=ducarroz
Attachment #72630 - Flags: review+
Comment on attachment 72630 [details] [diff] [review]
(Bv1) Fixed not escaping from line problem when body text is > 8K
[Checked in: Comment 33]

sr=sspitzer
Attachment #72630 - Flags: superreview+
Comment on attachment 72630 [details] [diff] [review]
(Bv1) Fixed not escaping from line problem when body text is > 8K
[Checked in: Comment 33]

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #72630 - Flags: approval+
Fix checked into trunk.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Trunk build 2002-03-22: WinMe, Mac 9.1
Verified Fixed, checked Eudora, Outlook Express and Outlook.
Status: RESOLVED → VERIFIED
Attachment #65324 - Attachment description: proposed patch, v1 → (Av1) proposed patch
Attachment #66138 - Attachment description: The right patch. → (Av2) The right patch
Attachment #72630 - Attachment description: Fixed not escaping from line problem when body text is > 8K. → (Bv1) Fixed not escaping from line problem when body text is > 8K [Checked in: Comment 33]
Attachment #66559 - Attachment description: Put the util functions in nsLocalUtils.cpp, fix Outlook & Eduora problem. → (Av3) Put the util functions in nsLocalUtils.cpp, fix Outlook & Eduora problem [Checked in: Comment 26]
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.