TB doesn't import messages with many recipients or large msg header from Outlook Express

RESOLVED FIXED in Thunderbird 3.1a1

Status

MailNews Core
Import
--
critical
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: Jan Lachnitt, Assigned: Phil Lacy)

Tracking

({dataloss, fixed-seamonkey2.0.3})

Trunk
Thunderbird 3.1a1
x86
Windows Vista
dataloss, fixed-seamonkey2.0.3

Thunderbird Tracking Flags

(thunderbird3.0 .2-fixed)

Details

Attachments

(1 attachment, 7 obsolete attachments)

(Reporter)

Description

8 years ago
User-Agent:       Opera/9.64 (Windows NT 6.0; U; cs) Presto/2.1.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; cs; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2

In the discussion of bug 415457, I was asked to test the mail import from Outlook Express in new TB 3 beta. I did so and the result is that not all messages have been successfully imported. I have analyzed them and found out that it obviously depends only on the number of recipients (or maybe the length of header). E.g. a message with 296 recipients has been imported but a message with 533 recipients has not.

Reproducible: Always

Steps to Reproduce:
1. Import mail from Outlook Express.
Actual Results:  
Messages with too many recipients are missing.

Expected Results:  
All messages should be imported.
Thanks Jan
Severity: normal → critical
Component: Migration → Import
Keywords: dataloss
Product: Thunderbird → MailNews Core
QA Contact: migration → import
Version: unspecified → Trunk
confirming...assuming Jan is correct
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 3

8 years ago
we have a max OE record of 16000.  Probably because we didn't know what size to make it. I'll post a patch and Neil can see if it's ok.

http://mxr.mozilla.org/comm-central/source/mailnews/import/oexpress/nsOE5File.cpp#268
Assignee: nobody → philbaseless-firefox
Status: NEW → ASSIGNED
(Assignee)

Comment 4

8 years ago
Created attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

%S for wide char in our logging doesn't work. I changed where needed here.
see log in other attachment to see if you can catch any problems with this change.
Import now works with large headers.
Attachment #419220 - Flags: review?(neil)
Attachment #419220 - Flags: approval-thunderbird3.0.1?
(Assignee)

Comment 5

8 years ago
Created attachment 419221 [details]
log of patched import
Attachment #419221 - Flags: review?(neil)
(Assignee)

Comment 6

8 years ago
Created attachment 419227 [details]
MSDN quote from their site on one of database structures we fetch

This is the problem table we fetch that has a lot of strings one of them is recipients but it could be any number of things to make it a rather larger record that we don't need.

We only need index 1,2,and 4 so I'll change the patch accordingly.

Reason for the extraneous info (spam) is in case there are other bugs filed wanting other things. I note we are using the create/sent date and maybe we ought to use the download date available in this header.

Comment 7

8 years ago
Comment on attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

>-#define  kMailboxBufferSize  0x4000
Looks like this was the value in the original Outlook Express import code.
I've got no particular objection to changing this constant from one apparently arbitrary value to another arbitrary value.

>+// records know to go to 0x7400 bug 484605
How about
// Bug 484605 has an example of a record of size 0x7400

>+#define  kMailboxBufferSize  0xffff
Nit: might as well use single spaces.

Also the logging changes look reasonable to me.
Attachment #419220 - Flags: review?(neil) → review+
(Assignee)

Comment 8

8 years ago
Comment on attachment 419220 [details] [diff] [review]
record size increased for OE import-also printf %S not valid

hang on please.
More appropriate patch coming. This one still has a max value that could be exceeded and there will be more unnecessary data loss.
Attachment #419220 - Flags: review+
(Assignee)

Comment 9

8 years ago
Created attachment 419254 [details] [diff] [review]
reads record data from file due to undefined record node size

I have comments on the problem. This resolves unknown buffer size required instead of guessing.
tested.
Attachment #419220 - Attachment is obsolete: true
Attachment #419254 - Flags: review?(neil)
Attachment #419220 - Flags: approval-thunderbird3.0.1?
(Assignee)

Comment 10

8 years ago
Created attachment 419255 [details]
view of import with patch
Attachment #419255 - Flags: review?(neil)
Comment on attachment 419254 [details] [diff] [review]
reads record data from file due to undefined record node size

>-#define  kMailboxBufferSize  0x4000
>+#define kMailboxBufferSize 0x4000
No point just doing the whitespace change.

>+        if (ReadBytes( pFile, pBuffer, kDontSeek, PR_MIN(recordSize, kMailboxBufferSize))) {
So, do you actually use all of the record? It looks like you're only reading the attributes, and then you read the data separately. Or would it help to read an amount of data, and try to use that to avoid a separate read?
(Assignee)

Comment 12

8 years ago
the structure is
header structure
  number of indexes
index structure
  index
  data of this index or datamass[x]
<repeat for up to x23 indexes>
datamass
  length varies.

The indexes we need are originally in the first part of the datamass but after changes I'm only 99% sure so you're 99% right.....

 I don't know how changes to drafts and flags are made by oe, but I can't imagine they wouldn't write out a new record or over the old one and preserve the expected order of things. If some of the strings change then they definitely would write out a new record but that data is past our data in the datamass.

As far as read performance maybe we ought cut down the max size from 0x4000 to 0x300 or we can take a pretty good chance of getting all our data in the buffer the first time
(In reply to comment #12)
> the structure is
> header structure
>   number of indexes
> index structure
>   index
>   data of this index or datamass[x]
> <repeat for up to x23 indexes>
> datamass
>   length varies.
And what part(s) of the structure are represented by the recordSize?
(Assignee)

Comment 14

8 years ago
all, including the datamass. That's why it can grow. There's an index for recipients. A string. And there are other strings for header data.
So the old code hoped that the entire datamass fitted in 0x4000, while the new code only really cares about the indexes? In that case I'd either only bother reading the indexes, unless you want to read the first 0x4000 of the datamass to try to avoid separate reads for the offset, flags and time.
(Assignee)

Comment 16

8 years ago
Created attachment 419269 [details] [diff] [review]
ver with reading indexes into buffer and index data from file

I like that idea,
I just imported the buggy email and I also did an import of my main oe mail which has 16117. All email showed up. Spot checking, large attachments are complete, star, read, reply and forward flags check out.
Import took about 45 seconds.
Attachment #419254 - Attachment is obsolete: true
Attachment #419269 - Flags: review?(neil)
Attachment #419254 - Flags: review?(neil)
(Assignee)

Comment 17

8 years ago
BTW, I don't think I know for sure that indexes are sorted so I'm reading all 30h. Other sources on web have the max at 23h and MSDN shows 28h.
Comment on attachment 419269 [details] [diff] [review]
ver with reading indexes into buffer and index data from file

> #define  kMailboxBufferSize  0x4000
>+#define  kMaxAttrQnt         0x0030
I think we can afford to spell it "Count", not "Qnt"

>+      if (marker == pIndex[i]  && numAttrs <= kMaxAttrQnt) {
Nit: extra space crept in before "&&"
Attachment #419269 - Flags: review?(neil) → review+
(Assignee)

Comment 19

8 years ago
(In reply to comment #18)
> >+#define  kMaxAttrQnt         0x0030
> I think we can afford to spell it "Count", not "Qnt"

Ha! That's what I was trying to think of, duh
(Assignee)

Comment 20

8 years ago
Created attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net
Attachment #419221 - Attachment is obsolete: true
Attachment #419227 - Attachment is obsolete: true
Attachment #419255 - Attachment is obsolete: true
Attachment #419269 - Attachment is obsolete: true
Attachment #419386 - Flags: review+
Attachment #419221 - Flags: review?(neil)
Attachment #419255 - Flags: review?(neil)
(Assignee)

Updated

8 years ago
Summary: TB skips messages with too many recipients when importing mail from Outlook Express → TB doesn't import messages with many recipients or large msg header from Outlook Express
(Assignee)

Updated

8 years ago
Attachment #419386 - Flags: approval-thunderbird3.0.1?
(Assignee)

Updated

8 years ago
Keywords: checkin-needed
Attachment #419386 - Flags: superreview?(bienvenu)
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

As its in mailnews/ this patch currently requires sr. I'm hoping David can give it a quick look.
Keywords: checkin-needed

Comment 22

8 years ago
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

looks reasonable, thx very much for the fix, Phil.
Attachment #419386 - Flags: superreview?(bienvenu) → superreview+
(Assignee)

Comment 23

8 years ago
Comment on attachment 419386 [details] [diff] [review]
for checkin r+= neil@httl.net

>   PRUint32  flags;
>   PRUint64  time;
>+  PRUint32  dataStart;
>+
> 
>   for (PRUint32 i = 0; i < size; i++) {

don't know how Neil miss this nit!
I'll remove extra line and mark for checkin tonight
(Assignee)

Comment 24

8 years ago
Created attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]

The extra line got by Neil because I slipped it in after the review.
Attachment #419386 - Attachment is obsolete: true
Attachment #420450 - Flags: superreview+
Attachment #420450 - Flags: review+
Attachment #419386 - Flags: approval-thunderbird3.0.1?
(Assignee)

Updated

8 years ago
Keywords: checkin-needed
Comment on attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]


http://hg.mozilla.org/comm-central/rev/a3ebce1a6cb5
Attachment #420450 - Attachment description: checkin r+ neil sr+ bienvenu → checkin [Checkin: Comment 25]
Attachment #420450 - Flags: approval-thunderbird3.0.1?
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.1a1
Comment on attachment 420450 [details] [diff] [review]
checkin
[Checkin: Comment 25 & 27]

Not going to take this for 3.0.1, will let it bake on trunk a bit longer and will consider for 3.0.2.
Attachment #420450 - Flags: approval-thunderbird3.0.2?
Attachment #420450 - Flags: approval-thunderbird3.0.1?
Attachment #420450 - Flags: approval-thunderbird3.0.1-
Attachment #420450 - Flags: approval-thunderbird3.0.2? → approval-thunderbird3.0.2+
Checked into 1.9.1: http://hg.mozilla.org/releases/comm-1.9.1/rev/5f8c5816aed2
status-thunderbird3.0: --- → .2-fixed
Attachment #420450 - Attachment description: checkin [Checkin: Comment 25] → checkin [Checkin: Comment 25 & 27]

Updated

8 years ago
Keywords: fixed-seamonkey2.0.3
I lack a testcase to verify.
You need to log in before you can comment on or make changes to this bug.