Last Comment Bug 484605 - TB doesn't import messages with many recipients or large msg header from Outlook Express
: TB doesn't import messages with many recipients or large msg header from Outl...
Status: RESOLVED FIXED
: dataloss, fixed-seamonkey2.0.3
Product: MailNews Core
Classification: Components
Component: Import (show other bugs)
: Trunk
: x86 Windows Vista
: -- critical (vote)
: Thunderbird 3.1a1
Assigned To: Phil Lacy
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-21 15:50 PDT by Jan Lachnitt
Modified: 2010-02-17 06:36 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
.2-fixed


Attachments
record size increased for OE import-also printf %S not valid (3.94 KB, patch)
2009-12-26 16:45 PST, Phil Lacy
no flags Details | Diff | Review
log of patched import (8.11 KB, text/plain)
2009-12-26 16:46 PST, Phil Lacy
no flags Details
MSDN quote from their site on one of database structures we fetch (3.49 KB, text/plain)
2009-12-26 23:23 PST, Phil Lacy
no flags Details
reads record data from file due to undefined record node size (9.14 KB, patch)
2009-12-27 12:47 PST, Phil Lacy
no flags Details | Diff | Review
view of import with patch (88.36 KB, image/png)
2009-12-27 12:48 PST, Phil Lacy
no flags Details
ver with reading indexes into buffer and index data from file (8.94 KB, patch)
2009-12-27 18:36 PST, Phil Lacy
neil: review+
Details | Diff | Review
for checkin r+= neil@httl.net (8.95 KB, patch)
2009-12-28 22:14 PST, Phil Lacy
philbaseless-firefox: review+
mozilla: superreview+
Details | Diff | Review
checkin [Checkin: Comment 25 & 27] (8.94 KB, patch)
2010-01-06 16:44 PST, Phil Lacy
philbaseless-firefox: review+
philbaseless-firefox: superreview+
standard8: approval‑thunderbird3.0.1-
standard8: approval‑thunderbird3.0.2+
Details | Diff | Review

Description Jan Lachnitt 2009-03-21 15:50:37 PDT
User-Agent:       Opera/9.64 (Windows NT 6.0; U; cs) Presto/2.1.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; cs; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2

In the discussion of bug 415457, I was asked to test the mail import from Outlook Express in new TB 3 beta. I did so and the result is that not all messages have been successfully imported. I have analyzed them and found out that it obviously depends only on the number of recipients (or maybe the length of header). E.g. a message with 296 recipients has been imported but a message with 533 recipients has not.

Reproducible: Always

Steps to Reproduce:
1. Import mail from Outlook Express.
Actual Results:  
Messages with too many recipients are missing.

Expected Results:  
All messages should be imported.
Comment 1 Wayne Mery (:wsmwk, NI for questions) 2009-03-21 17:27:34 PDT
Thanks Jan
Comment 2 Wayne Mery (:wsmwk, NI for questions) 2009-12-25 19:25:52 PST
confirming...assuming Jan is correct
Comment 3 Phil Lacy 2009-12-26 16:14:45 PST
we have a max OE record of 16000.  Probably because we didn't know what size to make it. I'll post a patch and Neil can see if it's ok.

http://mxr.mozilla.org/comm-central/source/mailnews/import/oexpress/nsOE5File.cpp#268
Comment 4 Phil Lacy 2009-12-26 16:45:00 PST
Created attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

%S for wide char in our logging doesn't work. I changed where needed here.
see log in other attachment to see if you can catch any problems with this change.
Import now works with large headers.
Comment 5 Phil Lacy 2009-12-26 16:46:19 PST
Created attachment 419221 [details]
log of patched import
Comment 6 Phil Lacy 2009-12-26 23:23:48 PST
Created attachment 419227 [details]
MSDN quote from their site on one of database structures we fetch

This is the problem table we fetch that has a lot of strings one of them is recipients but it could be any number of things to make it a rather larger record that we don't need.

We only need index 1,2,and 4 so I'll change the patch accordingly.

Reason for the extraneous info (spam) is in case there are other bugs filed wanting other things. I note we are using the create/sent date and maybe we ought to use the download date available in this header.
Comment 7 neil@parkwaycc.co.uk 2009-12-27 02:40:25 PST
Comment on attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

>-#define  kMailboxBufferSize  0x4000
Looks like this was the value in the original Outlook Express import code.
I've got no particular objection to changing this constant from one apparently arbitrary value to another arbitrary value.

>+// records know to go to 0x7400 bug 484605
How about
// Bug 484605 has an example of a record of size 0x7400

>+#define  kMailboxBufferSize  0xffff
Nit: might as well use single spaces.

Also the logging changes look reasonable to me.
Comment 8 Phil Lacy 2009-12-27 05:46:02 PST
Comment on attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

hang on please.
More appropriate patch coming. This one still has a max value that could be exceeded and there will be more unnecessary data loss.
Comment 9 Phil Lacy 2009-12-27 12:47:00 PST
Created attachment 419254 [details] [diff] [review]
reads record data from file due to undefined record node size

I have comments on the problem. This resolves unknown buffer size required instead of guessing.
tested.
Comment 10 Phil Lacy 2009-12-27 12:48:15 PST
Created attachment 419255 [details]
view of import with patch
Comment 11 neil@parkwaycc.co.uk 2009-12-27 13:50:21 PST
Comment on attachment 419254 [details] [diff] [review]
reads record data from file due to undefined record node size

>-#define  kMailboxBufferSize  0x4000
>+#define kMailboxBufferSize 0x4000
No point just doing the whitespace change.

>+        if (ReadBytes( pFile, pBuffer, kDontSeek, PR_MIN(recordSize, kMailboxBufferSize))) {
So, do you actually use all of the record? It looks like you're only reading the attributes, and then you read the data separately. Or would it help to read an amount of data, and try to use that to avoid a separate read?
Comment 12 Phil Lacy 2009-12-27 14:32:45 PST
the structure is
header structure
  number of indexes
index structure
  index
  data of this index or datamass[x]
<repeat for up to x23 indexes>
datamass
  length varies.

The indexes we need are originally in the first part of the datamass but after changes I'm only 99% sure so you're 99% right.....

 I don't know how changes to drafts and flags are made by oe, but I can't imagine they wouldn't write out a new record or over the old one and preserve the expected order of things. If some of the strings change then they definitely would write out a new record but that data is past our data in the datamass.

As far as read performance maybe we ought cut down the max size from 0x4000 to 0x300 or we can take a pretty good chance of getting all our data in the buffer the first time
Comment 13 neil@parkwaycc.co.uk 2009-12-27 14:37:32 PST
(In reply to comment #12)
> the structure is
> header structure
>   number of indexes
> index structure
>   index
>   data of this index or datamass[x]
> <repeat for up to x23 indexes>
> datamass
>   length varies.
And what part(s) of the structure are represented by the recordSize?
Comment 14 Phil Lacy 2009-12-27 14:56:28 PST
all, including the datamass. That's why it can grow. There's an index for recipients. A string. And there are other strings for header data.
Comment 15 neil@parkwaycc.co.uk 2009-12-27 15:57:02 PST
So the old code hoped that the entire datamass fitted in 0x4000, while the new code only really cares about the indexes? In that case I'd either only bother reading the indexes, unless you want to read the first 0x4000 of the datamass to try to avoid separate reads for the offset, flags and time.
Comment 16 Phil Lacy 2009-12-27 18:36:59 PST
Created attachment 419269 [details] [diff] [review]
ver with reading indexes into buffer and index data from file

I like that idea,
I just imported the buggy email and I also did an import of my main oe mail which has 16117. All email showed up. Spot checking, large attachments are complete, star, read, reply and forward flags check out.
Import took about 45 seconds.
Comment 17 Phil Lacy 2009-12-27 18:38:10 PST
BTW, I don't think I know for sure that indexes are sorted so I'm reading all 30h. Other sources on web have the max at 23h and MSDN shows 28h.
Comment 18 neil@parkwaycc.co.uk 2009-12-28 07:06:49 PST
Comment on attachment 419269 [details] [diff] [review]
ver with reading indexes into buffer and index data from file

> #define  kMailboxBufferSize  0x4000
>+#define  kMaxAttrQnt         0x0030
I think we can afford to spell it "Count", not "Qnt"

>+      if (marker == pIndex[i]  && numAttrs <= kMaxAttrQnt) {
Nit: extra space crept in before "&&"
Comment 19 Phil Lacy 2009-12-28 07:10:06 PST
(In reply to comment #18)
> >+#define  kMaxAttrQnt         0x0030
> I think we can afford to spell it "Count", not "Qnt"

Ha! That's what I was trying to think of, duh
Comment 20 Phil Lacy 2009-12-28 22:14:36 PST
Created attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net
Comment 21 Mark Banner (:standard8) 2010-01-06 07:49:30 PST
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

As its in mailnews/ this patch currently requires sr. I'm hoping David can give it a quick look.
Comment 22 David :Bienvenu 2010-01-06 08:15:15 PST
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

looks reasonable, thx very much for the fix, Phil.
Comment 23 Phil Lacy 2010-01-06 08:33:27 PST
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

>   PRUint32  flags;
>   PRUint64  time;
>+  PRUint32  dataStart;
>+
> 
>   for (PRUint32 i = 0; i < size; i++) {

don't know how Neil miss this nit!
I'll remove extra line and mark for checkin tonight
Comment 24 Phil Lacy 2010-01-06 16:44:17 PST
Created attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]

The extra line got by Neil because I slipped it in after the review.
Comment 25 Serge Gautherie (:sgautherie) 2010-01-10 03:40:12 PST
Comment on attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]


http://hg.mozilla.org/comm-central/rev/a3ebce1a6cb5
Comment 26 Mark Banner (:standard8) 2010-01-10 06:33:20 PST
Comment on attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]

Not going to take this for 3.0.1, will let it bake on trunk a bit longer and will consider for 3.0.2.
Comment 27 Mark Banner (:standard8) 2010-02-01 02:34:22 PST
Checked into 1.9.1: http://hg.mozilla.org/releases/comm-1.9.1/rev/5f8c5816aed2
Comment 28 Ludovic Hirlimann [:Usul] 2010-02-17 06:36:10 PST
I lack a testcase to verify.

Note You need to log in before you can comment on or make changes to this bug.