Closed Bug 1491228 Opened 6 years ago Closed 5 years ago

mbox to maildir conversion: allocation size overflow

Categories

(MailNews Core :: Backend, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 66.0

People

(Reporter: d4629635, Assigned: benc)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 14 obsolete files)

79.71 KB, patch
mkmelin
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180807170231

Steps to reproduce:

User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0
Build-ID: 20180731173940
OS: Linux 4.18.7-1-default

Have imap mbox folder structure of 595M.
Set preference mail.store_conversion_enabled.
Change message store type to maildir in Account Manager/Settings.



Actual results:

Conversion dialog shows progress stopping at 95%, conversion failed.
Error console: Error: InternalError: allocation size overflow  converterWorker.js:270:11

        


Expected results:

Completed the conversion.
Sadly the whole file is inside a try {} catch {} block so it is not clear which line shows this error. But it seems to be an internal error emitted by the JS engine. We probably try to handle a too large object.
Reading the code, it seems we read the messages in 10MB sized chunks, which seems reasonable.

What is the size of the largest message in that mbox account?
Indeed there seems to happen something at 10 MB. The largest mail I had was 38,6 MB and there were several with more than 10 MB. I deleted one attachment after another and each time the conversion failed. When there was no mail left with a size of more than 10 MB, the conversion worked. The largest mail I have now has 9,7 MB.
Thank you for the test.
So we can now focus on whether the 10MB chunking actually works.
https://dxr.mozilla.org/comm-central/source/comm/mailnews/base/util/converterWorker.js
Arai, would you know which object we may be overflowing that the JS engine can't handle it?
Code in question is in the previous comment.
Flags: needinfo?(arai.unmht)
most likely thing is string.
SpiderMonkey's string's maximum length is 256M char units.
https://searchfox.org/mozilla-central/rev/21588b2a9824e0758fe11d10065e2c01ea9f32be/js/src/jsfriendapi.h#808

the code increases the chunk if the single message is larger than 10MB.
then, depending on the encoding, 40MB message can be from 50MB to ~100MB in raw message source.
haven't checked the details about the code, but hitting 256M char units could happen.
Flags: needinfo?(arai.unmht)
Thank you for the hint. We must look into where we build such a long string.
Flags: needinfo?(mkmelin+mozilla)
Let me add another observation: I did the conversion of the mailbox from above also on an old PC which only has 3 GB of memory. It worked, but it took considerably longer than on my laptop, because the machine started swapping. In the systems monitor I could see that during the conversion thunderbird starts to consume huge amounts of memory, this peaks, goes down again and then repeats periodically until done.
Severity: normal → major
Component: Untriaged → Backend
Product: Thunderbird → MailNews Core
Hello,

I came across the same issue and did a bit further digging (added console.log output).
The problem occurs in line that states

textNew = decoder.decode(array);

I also added a logging about the current position and "noOfBytes" variable.
As Tooru Fujisawa [:arai] said it hits a limit of 256M as the error occurs exactly when noOfBytes reaches 269000000 bytes (256,5M) and then the decode function is called.
May it be that the solution is as simple as having a wrong regex in there?
I use TB 60.3.0 and found line 225 of converterWorker.js interesting. It states:
          let regexp = /^(From - )/gm;
I have no such lines in all my mailboxes. The marking line for new messages in my mbox files is simply
From 
("From" + Space + Newline)

So I changed the regexp line to
          let regexp = /^(From(?: - | $))/gm;
(because I was not sure if the  - option is maybe needed, so I added my variant as an OR.

After this the conversion works flawlessly on my side.

Best wishes from Germany.
Attached patch 1491228-maildir-regexp.patch (obsolete) — Splinter Review
I've turned Marius Burkard's suggestion into a patch. I don't know how he created mbox files with just "From ", but that seems legal.
Flags: needinfo?(mkmelin+mozilla)
Attachment #9023446 - Flags: review?(mkmelin+mozilla)
Attachment #9023446 - Flags: review?(benc)
(In reply to Jorg K (GMT+1) from comment #11)
> Created attachment 9023446 [details] [diff] [review]
> 1491228-maildir-regexp.patch
> 
> I've turned Marius Burkard's suggestion into a patch. I don't know how he
> created mbox files with just "From ", but that seems legal.

I don't know how the "From " lines occur, but I am using thunderbird since a few years now (version 1.x), so maybe there is some very old setting in there that has survived all updates until now (never really did a fresh profile, though).

There's nothing I changed manually and I have re-checked, there is no "From - " lines in my mbox files, nowhere.
(In reply to Jorg K (GMT+1) from comment #11)
> Created attachment 9023446 [details] [diff] [review]
> 1491228-maildir-regexp.patch
Oh, there are two lines, not just one that need to be changed (same regex).
(In reply to Jorg K (GMT+1) from comment #11)

> I've turned Marius Burkard's suggestion into a patch. I don't know how he
> created mbox files with just "From ", but that seems legal.

Wow, I hadn't realised what a can of worms the mbox "standard" is!

My reading is that the separator should just "^From " (no hyphen).

From https://tools.ietf.org/html/rfc4155:

-----------
Each message in the mbox database MUST be immediately preceded
        by a single separator line, which MUST conform to the following
        syntax:

           The exact character sequence of "From";

           a single Space character (0x20);

           the email address of the message sender (as obtained from the
           message envelope or other authoritative source), conformant
           with the "addr-spec" syntax from RFC 2822;

           a single Space character;

           a timestamp indicating the UTC date and time when the message
           was originally received, conformant with the syntax of the
           traditional UNIX 'ctime' output sans timezone (note that the
           use of UTC precludes the need for a timezone indicator);

           an end-of-line marker.

-----------

rfc4155 specifically doesn't detail a method for quoting "^From " lines in message bodies. It mentions a bunch of strategies which various implementations use (eg escaping by prefixing ">" to the line, ensuring the previous line is blank etc etc).
Ugh.

Without knowing what "From "-line-within-message-body handling thunderbird uses, it's hard to tell what the right splitter regexp should be.
Thanks, fixed all regex in the file now.

I mostly see From - Tue Oct  2 18:30:52 2018. I've searched for "^From $" and didn't find a single one.
Attachment #9023446 - Attachment is obsolete: true
Attachment #9023446 - Flags: review?(mkmelin+mozilla)
Attachment #9023446 - Flags: review?(benc)
Attachment #9023457 - Flags: review?(mkmelin+mozilla)
Attachment #9023457 - Flags: review?(benc)
Here's what my mbox files look like (excerpt):

From 
Return-Path: <xxxxx@yyyy.com>
Delivered-To: my@mailaddress.de
Received: from localhost (localhost [127.0.0.1])
        by mx.myserver.de (Postfix) with ESMTP id C42CC27C176C
        for <my@mailaddress.de>; Fri, 14 Apr 2017 14:39:52 +0200 (CEST)
[...] and so on [...]
Message-Id: <xxxxxx@yyyy.com>
MIME-Version: 1.0
From: Support <support@yyyy.com>
To: my@mailaddress.de
[...] and so on [...]
From 
Return-Path: [...] next message
(In reply to Jorg K (GMT+1) from comment #15)
> Created attachment 9023457 [details] [diff] [review]
> 1491228-maildir-regexp.patch (v1b)
> 
> Thanks, fixed all regex in the file now.
> 
> I mostly see From - Tue Oct  2 18:30:52 2018. I've searched for "^From $"
> and didn't find a single one.

You could try grep'ing for "^From\s+$" because in my mbox files there is \r\n line endings it seems (may be I migrated from windows to linux at some times keeping my thunderbird profile … I don't remember.)
And looking at rfc4155, Thunderbird seems to place "-" where some implementations put the email address.
I'd say Marius's mbox separators (bare "From ") were valid mbox, albeit a little spartan.

Two thoughts:

1) Is it a goal that Thunderbird should be able to read mbox files from other email clients? If so, then it needs to be really careful about parsing mbox format, and we should definitely have unit test cases with "From " lines in the middle of messages.

2) It strikes me that the mbox<->maildir conversion has no business performing mbox parsing at all. It should really be handled by the existing (and presumably well-tested and robust) implementation. I think the mbox<->maildir conversion should be handled using the plugable mailstore interface:

  create a destination mailstore
  for each message in the source store:
     read message
     feed it into the destination store

Sorry. I realise this is spiraling waaay out of the realms of a simple regex fixup, but I've thought about this stuff a bit and wanted to throw it out here :-)
Comment on attachment 9023457 [details] [diff] [review]
1491228-maildir-regexp.patch (v1b)

Review of attachment 9023457 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/converterWorker.js
@@ +207,3 @@
>              let lastPos = [];
> +            // Regular expression to find "From - " or "From " at beginning of lines.
> +            let regexpLast = /^(From(?: - | $))/gm;

I think the check should be purely for "^From "  (allowing for anything at all until the end of line).
So the real bug is the hyphen being in the regexp in the first place.
Attachment #9023457 - Flags: review?(benc) → review-
With the patch most of the causes should be gone, there's only the situation left, when a single message exceeds the 256M character length limit.

I experienced a single mailbox that had a message with 309M in it (I could delete it, so no problem here). But that should be a very rare case (one message in a profile folder of about 25G mail data here).
So like this?
Attachment #9023463 - Flags: review?(mkmelin+mozilla)
Attachment #9023463 - Flags: review?(benc)
(In reply to Marius Burkard from comment #20)
> I experienced a single mailbox that had a message with 309M in it (I could
> delete it, so no problem here). But that should be a very rare case (one
> message in a profile folder of about 25G mail data here).

It is a rare case you can push such a big message through any mail server :)
(In reply to Ben Campbell from comment #18)
> 2) It strikes me that the mbox<->maildir conversion has no business
> performing mbox parsing at all. It should really be handled by the existing
> (and presumably well-tested and robust) implementation. I think the
> mbox<->maildir conversion should be handled using the plugable mailstore
> interface:
> 
>   create a destination mailstore
>   for each message in the source store:
>      read message
>      feed it into the destination store
Without knowing Thunderbird's code internals this sounds like a good thing to me, but at least I think the conversion is one of the less-used features, isn't it? So maybe the fix-up can do it for a while until somebody has the time to implement a shiny new conversion process :-)
(In reply to Ben Campbell from comment #18)
> 1) Is it a goal that Thunderbird should be able to read mbox files from
> other email clients?

Officially probably not. The mbox file should have been created by Thundebird in the first place.
But we are relaxed in what we can read in, when the mbox file is manually crafted or copied from a different program.

> 2) It strikes me that the mbox<->maildir conversion has no business
> performing mbox parsing at all. It should really be handled by the existing
> (and presumably well-tested and robust) implementation. I think the
> mbox<->maildir conversion should be handled using the plugable mailstore
> interface:

Exactly, it is a pity this has another mbox parser of its own.
I can't yet find whether I objected to it in bug 856087 but it is definitely ugly and causes problems like this bug.

(In reply to Ben Campbell from comment #19)
> >              let lastPos = [];
> > +            // Regular expression to find "From - " or "From " at beginning of lines.
> > +            let regexpLast = /^(From(?: - | $))/gm;
> 
> I think the check should be purely for "^From "  (allowing for anything at
> all until the end of line).

Yes, we have a few occurrences of this, e.g. at
https://dxr.mozilla.org/comm-central/source/comm/mailnews/local/src/nsParseMailbox.cpp#811
https://dxr.mozilla.org/comm-central/source/comm/mailnews/local/src/nsLocalMailFolder.cpp#1946
Don't let perfect be enemy of better ;-)
(In reply to Jorg K (GMT+1) from comment #21)
> Created attachment 9023463 [details] [diff] [review]
> 1491228-maildir-regexp.patch (v2)
> 
> So like this?

I am not sure about this. I think this would lead to a lot of false positives if you would just use "^From ", I think. Thunderbird does NOT seem to handle any "From xxx" line in the mail body. I checked some of my files and found this (very new because from this list!). I don't know what result the converter would produce by parsing a message like this:

From - Wed Nov  7 23:26:51 2018
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-Path: <xxxxx@us-west-2.amazonses.com>
Delivered-To: my@address.de
Received: from xxx
        by yyy
        for <my@address.de>; Wed,  7 Nov 2018 23:16:12 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple;
        s=6wipwtexsysx7yw2q5tfr4556vhfkoxc; d=mozilla.org; t=1541628969;
        h=From:To:Subject:Date:References:Message-ID:In-Reply-To:Content-Type:Content-Transfer-Encoding:MIME-Version;
--
From https://tools.ietf.org/html/rfc4155:

-----------
Each message in the mbox database MUST be immediately preceded
        by a single separator line, which MUST conform to the following
        syntax:

           The exact character sequence of "From";

           a single Space character (0x20);


So the line From https://tools.ietf.org/html/rfc4155: would match the regex here.
(In reply to Marius Burkard from comment #26)
> So the line From https://tools.ietf.org/html/rfc4155: would match the regex
> here.
No, since in the file it's prefixed, like so:
>From https://tools.ietf.org/html/rfc4155:
(In reply to Jorg K (GMT+1) from comment #27)
> (In reply to Marius Burkard from comment #26)
> > So the line From https://tools.ietf.org/html/rfc4155: would match the regex
> > here.
> No, since in the file it's prefixed, like so:
> >From https://tools.ietf.org/html/rfc4155:

I thought so, but in my mbox files it is not. :-/

> grep '^From ' mx.pixcept-4.de/INBOX | grep -v -E '^From\s+$'
> From you mentioned above it seems like there gonna be a lot of works for fixing the declaration warning, more like a restructuring to me, so for the time being I am going to use this "dirty fix" in running ISPConfig with php7.1.
> From the info I got from him earlier, he has finished the basic move and =
> From the ssh-keygen manual:

Three matches in my mbox INBOX file.
I just tried creating a message with a "From " line in the body, and Thunderbird escaped it by adding a leading space.
So I think the "^From " patch should be fine, at least for messages originating in thunderbird in mbox mode...

(but since such escaping isn't handled by the mbox<->maildir conversion, I bet it's possible to screw it up by creating a message in maildir containing a "From " line in the body, then converting to mbox and back again :- )
(In reply to Marius Burkard from comment #28)
> > grep '^From ' mx.pixcept-4.de/INBOX | grep -v -E '^From\s+$'
> > From you mentioned above it seems like there gonna be a lot of works for fixing the declaration warning, more like a restructuring to me, so for the time being I am going to use this "dirty fix" in running ISPConfig with php7.1.
> > From the info I got from him earlier, he has finished the basic move and =
> > From the ssh-keygen manual:
> 
> Three matches in my mbox INBOX file.

Well, that doesn't bode well for a simple fix...

The next quick-and-dirty hacks I'd suggest:

1) expand the regexp check to make sure the preceeding line is blank, or that the "^From " is the first thing in the file (or chunk, in this case, which isn't the same thing, but as a quick-and-dirty hack...). Each message in the mbox is separated by a blank line.

2) checking for:
"^From $"
"^From <email> <timestamp>$"
...
and any other known format we're likely to encounter.
Comment on attachment 9023463 [details] [diff] [review]
1491228-maildir-regexp.patch (v2)

Review of attachment 9023463 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, r- if Marius's "From "-infested inbox is anything to go by :-(
Attachment #9023463 - Flags: review?(benc) → review-
I'm a bit tired of writing patches for others. You fix it ;-)
Assignee: nobody → benc
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(In reply to Ben Campbell from comment #29)
> I just tried creating a message with a "From " line in the body, and
> Thunderbird escaped it by adding a leading space.
> So I think the "^From " patch should be fine, at least for messages
> originating in thunderbird in mbox mode...
Sorry, but my thunderbird behaves different.
To say that again, I did not alter the about:config before yesterday when I enabled the mbox->maildir conversion and I did not manually edit the mbox files.

I just did a test again, I wrote a message in thunderbird and sent it to myself (different server and mail) in Mode "plaintext and HTML".
The result is that it is stored like this in the Sent mbox file:

--------------61691B057008E89A979FEEE0--
From 
To: Marius <mails@xxxx.de>
From: Marius <m.burkard@yyyy.de>
Subject: Testmail
Message-ID: <10688b37-2633-fa4b-6ada-76c50f5f0689@yyyyy.de>
Date: Thu, 8 Nov 2018 08:56:54 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.3.0
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="------------4165C32CED74A850506BDDFF"
Content-Language: de-DE

This is a multi-part message in MIME format.
--------------4165C32CED74A850506BDDFF
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

With

From im Body




(In reply to Ben Campbell from comment #30)
> (In reply to Marius Burkard from comment #28)
> > > grep '^From ' mx.pixcept-4.de/INBOX | grep -v -E '^From\s+$'
> > > From you mentioned above it seems like there gonna be a lot of works for fixing the declaration warning, more like a restructuring to me, so for the time being I am going to use this "dirty fix" in running ISPConfig with php7.1.
> > > From the info I got from him earlier, he has finished the basic move and =
> > > From the ssh-keygen manual:
> > 
> > Three matches in my mbox INBOX file.
> 
> Well, that doesn't bode well for a simple fix...
> 
> The next quick-and-dirty hacks I'd suggest:
> 
> 1) expand the regexp check to make sure the preceeding line is blank, or
> that the "^From " is the first thing in the file (or chunk, in this case,
> which isn't the same thing, but as a quick-and-dirty hack...). Each message
> in the mbox is separated by a blank line.
I'm sorry to be a spoilsport again. There is no blank lines in front of the "^From " lines in the mbox files, either.


(In reply to Jorg K (GMT+1) from comment #32)
> I'm a bit tired of writing patches for others. You fix it ;-)
I can understand you. It can be frustrating, I know it from my own experience, especially if something doesn't behave as it does with yourself.
(In reply to Ben Campbell from comment #18)
> 2) It strikes me that the mbox<->maildir conversion has no business
> performing mbox parsing at all. It should really be handled by the existing
> (and presumably well-tested and robust) implementation. I think the
> mbox<->maildir conversion should be handled using the plugable mailstore
> interface:

There are good reasons not to reach into the XPCOM to do it through the store: you can't use XPCOM in a worker => therefore the whole conversion would happen on the main thread which would be slower and also freeze up the whole UI during conversion. Also we'd have to handle all kind of internal database opening/closing problems (given the program is open) etc. And of course, it's good that the conversion is standalone(ish).
(In reply to Ben Campbell from comment #30)

> 1) expand the regexp check to make sure the preceeding line is blank, or
> that the "^From " is the first thing in the file (or chunk, in this case,
> which isn't the same thing, but as a quick-and-dirty hack...). Each message
> in the mbox is separated by a blank line.

Sadly we can't rely on the preceding blank line as TB itself is not adding the blank line e.g. in Sent folder and similar when it is creating the message. That is a known bug, which I'm not sure is already fixed (but there were some attempts). Even if it is fixed, existing folders may contain messages without it. I think such folders can be repaired via compacting or so.

So it seems Marius's 'From ' in body is properly escaped with the preceding ^ so which problem is still open?
Yes, it seems the proper message starting From is not followed by any of the usual data (<email/username> <timestamp>). What OS are we talking here? Maybe there is a problem fetching some of the data so it is not written out (due to NS_ENSURE_SUCCESS() exiting the function at some point) . But if "From " at start of the line is enough for us, we do not need to wonder.
(In reply to :aceman from comment #35)
> (In reply to Ben Campbell from comment #30)
> 
> > 1) expand the regexp check to make sure the preceeding line is blank, or
> > that the "^From " is the first thing in the file (or chunk, in this case,
> > which isn't the same thing, but as a quick-and-dirty hack...). Each message
> > in the mbox is separated by a blank line.
> 
> Sadly we can't rely on the preceding blank line as TB itself is not adding
> the blank line e.g. in Sent folder and similar when it is creating the
> message. That is a known bug, which I'm not sure is already fixed (but there
> were some attempts). Even if it is fixed, existing folders may contain
> messages without it. I think such folders can be repaired via compacting or
> so.
> 
> So it seems Marius's 'From ' in body is properly escaped with the preceding
> ^ so which problem is still open?
> Yes, it seems the proper message starting From is not followed by any of the
> usual data (<email/username> <timestamp>). What OS are we talking here?
> Maybe there is a problem fetching some of the data so it is not written out
> (due to NS_ENSURE_SUCCESS() exiting the function at some point) . But if
> "From " at start of the line is enough for us, we do not need to wonder.

That's the problem, the "From" in the message body is not escaped here with TB 60.3.0 in the mbox file.
I changed to 64b1 now and the "From" in the message body gets escaped by a leading space.
That means that the converter cannot rely on From being escaped at line's start as long as the messages were not created/stored by latest TB.

My OS is debian stretch, by the way.
So I have a suggestion.

As far as I think the message introdution "From " line must always be followed by the mail headers.
So a regex that matches only those cases would be:

/(^From(?: - .*?| )$\n^[A-Za-z\-]+: .)/

This would allow a line both with the bare scheme I experienced and the longer one, that is followed by a line with a mail header.
There should be rare conditions where such a composition occurs in a mail body and is not escaped properly.
Sorry, I forgot to add the "gm" modifier

/(^From(?: - .*?| )$\n^[A-Za-z\-]+: .)/gm
Maybe even better this one to avoid problems when \r\n instead of \n is used (or \n instead of \r\n):

/(^From(?: - .*?| )\r?\n[A-Za-z\-]+: .)/gm
I think the first approach was best: /^(From(?: - | $))/gm - What we had or "^From $". That won't break anything.
(In reply to Jorg K (GMT+1) from comment #40)
> I think the first approach was best: /^(From(?: - | $))/gm - What we had or
> "^From $". That won't break anything.

"^From $" would break the entires like you have them (and me too in TB 64b1), so it would have to be the full regex with both alternatives.
(In reply to Jorg K (GMT+1) from comment #40)
> I think the first approach was best: /^(From(?: - | $))/gm - What we had or
> "^From $". That won't break anything.

Sorry for being pedantic in previous comments, but I think I now agree with you :-)
It's not a perfect fix, but it's a simple, non-intrusive change and it's better than what's there currently.

I'll see if I can add some unit tests to your patch.
I'm also a little freaked out about the possibility that the converter will might "From " markers over multiple chunks, so I'll take a look at that too.


And just to continue the pedantic tradition :-)
Places where `/^(From(?: - | $))/gm` will fail:

1) It won't work on any mbox files which (correctly) place an email address instead of a hyphen. A problem with foreign mbox files, but I've not seen any email addresses so far in Thunderbird-generated mbox.

2) On messages containing lines beginning with "From - ", but only if saved with older versions of Thunderbird. Newer versions of Thunderbird escape those with a space and aren't a problem.
(In reply to Marius Burkard from comment #37)
> So I have a suggestion.
> 
> As far as I think the message introdution "From " line must always be
> followed by the mail headers.
> So a regex that matches only those cases would be:
> 
> /(^From(?: - .*?| )$\n^[A-Za-z\-]+: .)/

I think that's a really good heuristic for handling non-escaped mboxes (as written by older TB).
The current converter code is brittle enough that I'd be worried about introducing more problems than it'd solve, but I think longer term we'll need to do this.
(In reply to Ben Campbell from comment #42)
> (In reply to Jorg K (GMT+1) from comment #40)
> > I think the first approach was best: /^(From(?: - | $))/gm - What we had or
> > "^From $". That won't break anything.
> And just to continue the pedantic tradition :-)
> Places where `/^(From(?: - | $))/gm` will fail:
> 
> 1) It won't work on any mbox files which (correctly) place an email address
> instead of a hyphen. A problem with foreign mbox files, but I've not seen
> any email addresses so far in Thunderbird-generated mbox.
So what about
/^(From(?: - | [^ ]+@[^ ]+ | $))/gm
It is very simplified about "email", but does the trick at least:
https://regex101.com/r/khGbW0/1

> 2) On messages containing lines beginning with "From - ", but only if saved
> with older versions of Thunderbird. Newer versions of Thunderbird escape
> those with a space and aren't a problem.
Yes, or having a single line with "From " in it (don't know if anyone sends such trash).
Attached patch 1491228_better_linesep_1.patch (obsolete) — Splinter Review
Marius, I used your idea of matching the following mail-header line to handle unescaped "From " lines.
Any chance you could give this regex a whirl and let me know how you go?


The attached a patch has a whole bunch of extra testing stuff, but if you just want to try it out, the core regex is:
      /^((From $)|(From [\S]+ \S{3} \S{3} [ \d]\d \d\d:\d\d:\d\d \d{4}$))\r?\n[\x21-\x7E]+:/gm;

From the patch:
      // A regexp to match mbox separator lines.
      // We support lines like:
      // "From "
      // "From MAILER-DAEMON Fri Jul  8 12:08:34 2011"
      // "From - Fri Jul  8 12:08:34 2011"
      // "From bob@example.com Fri Jul  8 12:08:34 2011"
      // we also require a message header on the next line, in order
      // to better cope with unescaped "From " lines in the message body.

It works great with my test data mboxes (included in the patch).

We might want to ease off on the date matching a little. Not sure how standard it is.
It's not 100% percent watertight, but anyone with mboxes which screw up this regex are likely to know how to fix em ;-)

The rest of the patch is an overhaul of the unit tests to run an assortment of mbox formats.

The full patch isn't ready yet - the tests only work sometimes.
I suspect there's a timing issue in the mbox converter which is screwing things up, although it's quite likely my tests are missing some async subtleties (my first attempt at writing xpcshell unit tests).

I discovered a bunch of other issues in the existing mbox conversion:
1) it includes the "From " line in the maildir emails (pretty sure it should strip it)
2) I _think_ there's some issue with processing the last message in the mbox - I'm getting a lot of errors where the converter is trying to write into a directory it's already removed...
3) I think it collects the entire message into memory before writing it out (not that big a deal, but there's no need for it to do so).
4) it's not clear if it handles "From " lines split over multiple chunks. (I _think_ so, just hard to tell)
5) the converter worker suppresses errors, so when a unittest fails it's hard to track down what happened.

I've started tidying up and refining the mbox splittng code to sort all this out, so will aim to post an updated patch next week with robust mbox parsing and unit tests!

Sheesh. Such a can of worms :-) But I think this is important - there's not many people doing mbox->maildir conversion right now, but at some point there'll be a lot, and there are a lot of things there currently which will bite 'em.
Attachment #9025248 - Flags: feedback?(mails)
(In reply to Ben Campbell from comment #45)
> I discovered a bunch of other issues in the existing mbox conversion:
> 1) it includes the "From " line in the maildir emails (pretty sure it should
> strip it)

Bug 1135309.

> 3) I think it collects the entire message into memory before writing it out
> (not that big a deal, but there's no need for it to do so).

That's the original problem filed in this bug. The over 300M messages ;)

> 4) it's not clear if it handles "From " lines split over multiple chunks. (I
> _think_ so, just hard to tell)
> 5) the converter worker suppresses errors, so when a unittest fails it's
> hard to track down what happened.

It does output a bunch of debug information.
(In reply to Magnus Melin [:mkmelin] from comment #46)
> (In reply to Ben Campbell from comment #45)
> > 3) I think it collects the entire message into memory before writing it out
> > (not that big a deal, but there's no need for it to do so).
> That's the original problem filed in this bug. The over 300M messages ;)
Due to the problem with the From lines the limit was not about the messages but about 256M+ of an mbox file as a whole.

(In reply to Ben Campbell from comment #45)
> Marius, I used your idea of matching the following mail-header line to
> handle unescaped "From " lines.
> Any chance you could give this regex a whirl and let me know how you go?
I currently have no mbox file ready to check this (need to restore from a backup to do so), but I manually tested this using some of the combinations I remembered and it seems to do well.
(In reply to Magnus Melin [:mkmelin] from comment #46)
> (In reply to Ben Campbell from comment #45)
> > I discovered a bunch of other issues in the existing mbox conversion:
> > 1) it includes the "From " line in the maildir emails (pretty sure it should
> > strip it)
> 
> Bug 1135309.

Cool - next patch should sort this one out.
 
> > 3) I think it collects the entire message into memory before writing it out
> > (not that big a deal, but there's no need for it to do so).
> 
> That's the original problem filed in this bug. The over 300M messages ;)

No, that 300M issue came up because it wasn't catching all the "From " separators it should have been (and also possibly catching some spurious ones due to unquoted "From " lines in message bodies).
 
> > 5) the converter worker suppresses errors, so when a unittest fails it's
> > hard to track down what happened.
> 
> It does output a bunch of debug information.

Well, it tells you that an exception was thrown, but not where (just a catch containing the whole worker function).
I peppered debug dumps everywhere and the issue is that it (sometimes) moves the destination directory into it's final place _before_ writing the last message. The logic of the mbox splitting is a little tricky to follow so I'm not quite sure why it's doing that. I'm going to rearrange the splitting code a little, and should be able to address all these issues.
Attached patch fix_mbox_conversion_1.patch (obsolete) — Splinter Review
OK, hopefully this is the end of it!
Rejigged mbox parsing so that it:

1) handles all the likely variations of "From " line
2) handles most unquoted "From " lines in message content
3) no longer includes the "From " line in the maildir output (Bug 1135309)
4) writes huge (>10MB) messages out in chunks rather than accumulating them
5) fixes the sporadic write-message-into-moved-maildir issue

Also adds some test data for common mbox variants.
Attachment #9023457 - Attachment is obsolete: true
Attachment #9023463 - Attachment is obsolete: true
Attachment #9025248 - Attachment is obsolete: true
Attachment #9023457 - Flags: review?(mkmelin+mozilla)
Attachment #9023463 - Flags: review?(mkmelin+mozilla)
Attachment #9025248 - Flags: feedback?(mails)
Attachment #9025962 - Flags: review?(mkmelin+mozilla)
(In reply to Ben Campbell from comment #49)
> Created attachment 9025962 [details] [diff] [review]
> fix_mbox_conversion_1.patch
> 
> OK, hopefully this is the end of it!
> Rejigged mbox parsing so that it:
> 
> 1) handles all the likely variations of "From " line
> 2) handles most unquoted "From " lines in message content
> 3) no longer includes the "From " line in the maildir output (Bug 1135309)
> 4) writes huge (>10MB) messages out in chunks rather than accumulating them
> 5) fixes the sporadic write-message-into-moved-maildir issue
> 
> Also adds some test data for common mbox variants.
Looks very promising when reading the code.
Comment on attachment 9025962 [details] [diff] [review]
fix_mbox_conversion_1.patch

Review of attachment 9025962 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good, mostly some nits, and the maildir -> mbox problem

::: mailnews/base/test/unit/test_mailstoreConverter.js
@@ +17,3 @@
>    localAccountUtils.loadLocalMailAccount();
>  
> +  add_task( async function test_mbox_conversion() {

nit: no space beofre async, and naming the functions is not needed (anymore)

@@ +18,5 @@
>  
> +  add_task( async function test_mbox_conversion() {
> +    await doMboxTest("test1", "../../../data/mbox_modern", 2 );
> +    await doMboxTest("test2", "../../../data/mbox_mboxrd", 2 );
> +    await doMboxTest("test3", "../../../data/mbox_unquoted", 2 );

no spaces before )

@@ +30,5 @@
>    run_next_test();
>  }
>  
> +// Create a server, account and inbox, and install an mbox file.
> +// Returns the server.

You could use JSDoc/JavaDoc style coments for documentation, like

/**
 * Create a server, account and inbox, and install an mbox file.
 * @return {nsIMsgIncomingServer} a server.
 */

@@ +50,3 @@
>  
> +  // TODO: is there some way to make folder rescan the mbox?
> +  // We don't need it for this, but would be nice to do things properly.

something like https://searchfox.org/comm-central/rev/b9eaed88678ed8252d9997811b5648982c813690/mail/base/content/folderPane.js#2520?

@@ +50,4 @@
>  
> +  // TODO: is there some way to make folder rescan the mbox?
> +  // We don't need it for this, but would be nice to do things properly.
> +  return server

nit: missing ;

@@ +93,5 @@
> +    cnt++;
> +  }
> +
> +  Assert.equal(cnt, expectCnt, "expected number of messages (" + mboxFilename + ")");
> +

nit: remove empty extra line, and a few lines below

::: mailnews/base/util/converterWorker.js
@@ +164,5 @@
>      // not properly propagated back to the worker error handling.
>      throw new Error(e);
>    }
> +
> +

nit: double empty line

@@ +168,5 @@
> +
> +  // Splits an mbox file up into a maildir.
> +  // mboxFilename: filename of the mbox file to split
> +  // maildirName: filename of the maildir to create
> +  // progressFn: if set, will be called after each message is processed.

JSDoc style this

@@ +238,5 @@
> +        if (m.index > pos) {
> +          writeToMsg(buf.substring(pos, m.index));
> +        }
> +        pos = m.index;
> +        pos += m[1].length;  // skip the "From " line

since you're now skipping it, for the maildir -> mbox case you need to add it back

@@ +245,5 @@
> +
> +      // Deal with whatever is left in the buffer
> +      let endPos = buf.length;
> +      if (!eof) {
> +        // keep back enough to cope with separator lines crossing

Nit: capitalize keep
Attachment #9025962 - Flags: review?(mkmelin+mozilla)
Attached patch maildir_to_mbox_separators.patch (obsolete) — Splinter Review
Fixes the maildir->mbox conversion (which was brokenly assuming that maildir messages started with a "From " separator line).
Attached patch fix_mbox_conversion_2.patch (obsolete) — Splinter Review
OK, thanks Magnus - I think this patch hits most of the things you picked up in comment 51.

There aren't any unit tests for the maildir->mbox conversion, but I added a note in the comments, and my manual round-trip conversion tests went fine (with both patches present).
Attachment #9025962 - Attachment is obsolete: true
Filed bug 1508931 for adding tests to round-trip the conversion.
It now occurred to me that TB has information about start of each message in the mbox file (in the .msf index), does it intentionally not use it for the conversion and open-codes another mbox parser in the convertor?
I'm not sure how you access that, but due to the no-xpcom-in-workers, somehow obtaining or passing that info around would be cumbersome at least. Not to mention you could have corrupted .msf files. There are general advantages to being standalone given this does seem fairly limited "parsing" we have to do.
Attachment #9026612 - Flags: review?(mkmelin+mozilla)
Attachment #9026613 - Flags: review?(mkmelin+mozilla)
oops - forgot to add r? flags...
Comment on attachment 9026612 [details] [diff] [review]
maildir_to_mbox_separators.patch

Review of attachment 9026612 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/converterWorker.js
@@ +91,5 @@
> +        let encoder = new TextEncoder();
> +        // Helper to format dates
> +        // eg "Thu Jan 18 12:34:56 2018"
> +        let fmtUTC = function(d) {
> +          const dayNames = ["Sun","Mon","Tue","Wed","Thu","Fri","Sat" ];

nit: no space before the ]

@@ +96,5 @@
> +          const monthNames = ["Jan", "Feb", "Mar", "Apr",
> +            "May", "Jun", "Jul", "Aug",
> +            "Sep", "Oct", "Nov", "Dec"];
> +          return dayNames[d.getUTCDay()]
> +            + " " + monthNames[d.getUTCMonth()]

we always put + at the end of the previous line

@@ +120,1 @@
>            mboxFile.write(msgFileOpen.read());

you can just inline the new Date().

However, you're now unconditionally adding the From line. E.g. our current converted (at least) maildir files already have an mbox From line at the top. I think you want to check that before writing another
Attachment #9026612 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9026613 [details] [diff] [review]
fix_mbox_conversion_2.patch

Review of attachment 9026613 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately there's something wrong here. My testcase for mbox -> mailbox doesn't finish (stays at 65%). I'll try to find out why, since it does work on trunk.

::: mailnews/base/test/unit/test_mailstoreConverter.js
@@ +39,5 @@
> + */
> +function setupServer(srvName, mboxFilename) {
> +  // {nsIMsgIncomingServer} pop server for the test.
> +  let server = MailServices.accounts.createIncomingServer(srvName,"localhost",
> +                                                       "pop3");

nit: align "pop3" with srvName

@@ +51,4 @@
>  
> +  // install the mbox file
> +  let mboxFile = do_get_file(mboxFilename);
> +  mboxFile.copyTo( inbox.filePath.parent, inbox.filePath.leafName)

missing ;
no space after (

@@ +52,5 @@
> +  // install the mbox file
> +  let mboxFile = do_get_file(mboxFilename);
> +  mboxFile.copyTo( inbox.filePath.parent, inbox.filePath.leafName)
> +
> +  // TODO: is there some way to make folder rescan the mbox?

Not sure what you mean, but if it's a new folder that has no .msf file, the box will be rescanned when it's db is opened the first time

@@ +61,5 @@
>  
> +/**
> + * Perform an mbox->maildir conversion test.
> + *
> + * @param {string} srvName - A unique server name to use for the test.

@param {String}

@@ +63,5 @@
> + * Perform an mbox->maildir conversion test.
> + *
> + * @param {string} srvName - A unique server name to use for the test.
> + * @param {string} mboxFilename - mbox file to install and convert.
> + * @param {number} expectCnt - Number of messages expected.

@param {int} I think

@@ +76,5 @@
> +
> +  let aVal = await convertMailStoreTo(
> +    mailstoreContractId, server, new EventTarget());
> +  // NOTE: convertMailStoreTo() will suppress exceptions in it's
> +  // worker, which makes unittest failures trickier to read...

Which suppression are you referring to?
The worker can throw so that is not suppressed - https://searchfox.org/comm-central/source/mailnews/base/util/converterWorker.js#312
not sure if that try/catch is needed anymore though

@@ +86,5 @@
> +
> +  let inbox = server.rootFolder
> +    .getFolderWithFlags(Ci.nsMsgFolderFlags.Inbox);
> +  // NOTE: the conversion updates the path of the root folder,
> +  // but _not_ the path of the inbox...

You can add that, the inbox path is correct after restart.

::: mailnews/base/util/converterWorker.js
@@ +167,5 @@
> +
> +  /**
> +   * Split an mbox file up into a maildir.
> +   *
> +   * @param {string} mboxFilename - Filename of the mbox file to split.

String

@@ +226,5 @@
> +    }
> +
> +    let mboxFile = OS.File.open(mboxFilename);
> +    let buf = "";
> +    eof = false;

let eof 
[seems like eof is never declared, so do that here]
Attachment #9026613 - Flags: review?(mkmelin+mozilla) → review-
This needs some debugging. I got a reproducible test case for you though, I'll send it in an email.
(In reply to Magnus Melin [:mkmelin] from comment #60)
> This needs some debugging. I got a reproducible test case for you though,
> I'll send it in an email.

Thanks for that - I got it and did some runs with it.
The mboxes you sent look fine and converted for me without problem as local folders.
When I tried to convert them as IMAP folders I saw the problem you encountered - stalling at (say) 66% complete.
The stall occurs because the progress tracking doesn't send back the expected number of events and so the main thread hangs there waiting for events that will never arrive.
I _think_ the discrepancy is down to how the counts are calculated, but it's tricky - the count-calculating and progress-reporting protocol is a little convoluted. Different methods are used for different types of server.

I'm sorting out a patch which refactors a bunch of it and clarifies things a lot.
Attached patch mbox_convert_revamp_1.patch (obsolete) — Splinter Review
First off - I'm sorry. I really didn't intend this to be quite so... comprehensive. I say "patch" but really I've reordered so much that it's easier to just treat them as new files.

This is what I've got right now - it works solidly for me. Feel free to give it a whirl, but I want to do some more testing and cleaning up, so I'm not quite looking for a review of it yet.

The changes:

- nailed down protocol for communication between main thread and worker (there's an explicit "I've finished!" message now, so no more stalling). This'll also make unit tests much easier to write.
- I've replaced all the server-type-specific count calculation. The conversion is now treated as a filesystem-based process and the estimation is much simpler. The mailstore doesn't care what type of server it is.
- There's only one worker now. The conversion is (should be!) I/O bound, so it's simpler and less error-prone to just use the single worker.
- in general I've split up a lot of functions into smaller components, so it should be much easier to follow.
- the mbox->maildir and maildir->mbox code paths are much more distinct, which should make it more maintainable.
Attachment #9026612 - Attachment is obsolete: true
Attachment #9026613 - Attachment is obsolete: true
Comment on attachment 9029769 [details] [diff] [review]
mbox_convert_revamp_1.patch

Review of attachment 9029769 [details] [diff] [review]:
-----------------------------------------------------------------

Some quick comments:

- For the function documentations, make use of the full 80ch width, and use /** */ documentation style comments.
- {string} should be {String}, and {number}  {Number}

Some trailing/unnecessary whitespcace (shows in red in this review tool)

Just using one worker may slow things down quite a bit if you have a huge mailstore, but the "old" way is very fast so perhaps it doesn't matter as much.

::: mailnews/base/util/mailstoreConverter.jsm
@@ +23,5 @@
> + * Sets a server to use a different type of mailstore, converting
> + * all the existing data.
> + *
> + * @param {string} aMailstoreContractId - XPCOM id of new mailstore type.
> + * @param {nsIMsgServer} aServer        - server to migrate.

{nsIMsgIncomingServer}

@@ +26,5 @@
> + * @param {string} aMailstoreContractId - XPCOM id of new mailstore type.
> + * @param {nsIMsgServer} aServer        - server to migrate.
> + * @param {???} aEventTarget            - where to send progress events.
> + *
> + * @return {string} - the new root directory for the migrated server.

{Promise} is still the return value I think

@@ +99,5 @@
> +    });
> +
> +    // Kick off the worker.
> +    worker.postMessage({
> +      'srcType': srcType,

we use double quote normally
But, I think postMessage can also send just objects now so you don't need any quotes at all
Attached patch mbox_convert_revamp_2.patch (obsolete) — Splinter Review
Mostly just a little tidying up. Added in some code to sort the maildir files by timestamp when converting to mbox. Likely a little futile and pointless, but seems like the Right Thing (tm) - it's the best proxy we have for reception time without scanning the message headers.
Attachment #9029769 - Attachment is obsolete: true
Attachment #9032334 - Flags: review?(mkmelin+mozilla)
My maildir->mbox->maildir roundtrip unit test in Bug 1508931 requires this patch.
Comment on attachment 9032334 [details] [diff] [review]
mbox_convert_revamp_2.patch

Review of attachment 9032334 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/converterWorker.js
@@ +71,5 @@
> +    // Slightly cheesy hack - our byte-counting on mbox->maildir conversion
> +    // will fall slightly short (we miss counting the "From " lines).
> +    // This is still fine for providing progress feedback to the user, but
> +    // we'll just fudge one last report of 100% so the GUI isn't stuck
> +    // showing a "99% complete - conversion finished!" message.

I don't understand this comment

@@ +95,5 @@
> +   *                                update (for maildir->mbox conversion, this
> +   *                                a message count).
> +   */
> +  function maildirToMBox(maildir, mboxFilename, progressFn) {
> +

nit: redundant empty line

@@ +118,5 @@
> +      if (b.byteLength < 5) {
> +        return false;
> +      }
> +      // 'F','r','o','m',' '
> +      return(b[0]==70 && b[1]==114 && b[2]==111 && b[3]==109 && b[4]==32);

add spaces after return and aroudn the ==s

@@ +241,5 @@
>  
> +    let mboxFile = OS.File.open(mboxPath);
> +    let buf = "";
> +    eof = false;
> +    while(!eof) {

while (!eof

@@ +280,5 @@
> +    closeExistingMsg();
> +  }
> +
> +  // Returns true if name looks like a subfolder directory.
> +  function isSBD(name) {

not to get strict js errors, you need to make functions not first inside another function the let style, like

let isSBD = function(name)

@@ +395,5 @@
> +
> +    let it = new OS.File.DirectoryIterator(srcPath);
> +    try {
> +      it.forEach( function(ent) {
> +        let dest = OS.Path.join(destPath, ent.name);

no space before function

if you want, make it fat arrow intead, likt

if.forEach((ent) => {

::: mailnews/base/util/mailstoreConverter.jsm
@@ +24,5 @@
> + * all the existing data.
> + *
> + * @param {String} aMailstoreContractId - XPCOM id of new mailstore type.
> + * @param {nsIMsgServer} aServer        - server to migrate.
> + * @param {???} aEventTarget            - where to send progress events.

yeah, the <progress> element to send updates to.

@@ +37,5 @@
>  
> +  let srcType = null;
> +  let destType = null;
> +  if(aMailstoreContractId== "@mozilla.org/msgstore/maildirstore;1") {
> +    srcType = "maildir";

if (

and space before ==

@@ +67,5 @@
> +        // send the percentage completion to the GUI
> +        // XXX TODO: should probably check elapsed time, and throttle
> +        // the events to avoid spending all our time drawing!
> +        // Message conversion should be nearly I/O-bound, and I/O is
> +        // faaaast these days.

perhaps, but maybe not overcomplicate it

i think we could check to see if aEventTarget is not set though, and not dispatch for that case.
(for more general usage than the one we have in the UI now).

@@ +82,5 @@
> +        let newStoreTypeID = storeTypeIDs[destType];
> +
> +        let finalRoot = installNewRoot(aServer, destDir, newStoreTypeID);
> +        log.info("Converted dir installed as: " + finalRoot);
> +        log.info("Conversion done!");

you could combine these too messages into one

@@ +190,5 @@
> +  log.info("Path to parent folder of account root folder: " +
> +    parent.path);
> +
> +  var converterFolder = new FileUtils.File(OS.Path.join(parent.path,
> +    dir.leafName));

since you're touching it, maybe make all the vars lets instaed
For the 'F','r','o','m',' , maybe just use String.fromCharCode(b[0], b[1], b[2], b[3]).

And come to think of it, a space after?

So maybe just String.fromCharCode[...b.slice(0, 5)] == "From "
Comment on attachment 9032334 [details] [diff] [review]
mbox_convert_revamp_2.patch

Review of attachment 9032334 [details] [diff] [review]:
-----------------------------------------------------------------

I'll try this out once the nits are fixed
Attachment #9032334 - Flags: review?(mkmelin+mozilla)
Attached patch mbox_convert_revamp_3.patch (obsolete) — Splinter Review
- nit/linting fixes
- spotted and fixed an error in the error-handling code
- In the ChromeWorker, I had to move a bunch of the helper functions up into file scope. Keeping them internal, but using the `let blah = function(){...` syntax caused issues with functions being invoked before being declared.

(My unit tests ran fine - including the roundtrip one - but my build is currently suffering clang crashes, so I haven't done a manual run on this patch yet. I'm pretty confident of it, but I'll hold off the r? until I do)
Attachment #9032334 - Attachment is obsolete: true
Comment on attachment 9032853 [details] [diff] [review]
mbox_convert_revamp_3.patch

Review of attachment 9032853 [details] [diff] [review]:
-----------------------------------------------------------------

Just a few comments.
Please for all the function documentation, use JSDoc/JavaDoc style documentation, i.e.

/**
 * .........
 */

// is just for code comments, but wouldn't included in generated documentation and such

::: mailnews/base/util/converterWorker.js
@@ +66,5 @@
> +      " " + d.getUTCFullYear();
> +  };
> +
> +  // Helper to check start of byte array for "From ".
> +  let startsWithFromMarker = function(b) {

why a helper function when it's used only once?

@@ +408,5 @@
> +    } else if (srcType == "mbox" && destType == "maildir") {
> +      costFn = calcMBoxCost;
> +      convertFn = convertTreeMBoxToMaildir;
> +    } else {
> +      throw new Error("Unsupported conversion!");

perhaps add the types to the message to ease debugging if this ever happens

throw new Error(`Unsupported conversion: ${srcType} => ${destType}`);

::: mailnews/base/util/mailstoreConverter.jsm
@@ +192,5 @@
> +  log.info("Path to parent folder of account root folder: " +
> +    parent.path);
> +
> +  var converterFolder = new FileUtils.File(OS.Path.join(parent.path,
> +    dir.leafName));

var -> let
Attached patch mbox_convert_revamp_4.patch (obsolete) — Splinter Review
Attachment #9032853 - Attachment is obsolete: true
Attached patch mbox_convert_revamp_5.patch (obsolete) — Splinter Review
Attachment #9035518 - Attachment is obsolete: true
Attachment #9035530 - Flags: review?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #70)

Please for all the function documentation, use JSDoc/JavaDoc style
documentation, i.e.

/**

  • .........
    */

// is just for code comments, but wouldn't included in generated
documentation and such

I probably need come clarification here on jsdoc policy. We don't want all functions included in generated documentation, right? Just the ones that are exported or otherwise visible from the outside, and not the internal helper/support functions.

What I've done is document all the top-level functions using the jsdoc conventions.
But I've only used the "/**" marker for blocks that should appear in generated API docs - ie external-facing stuff.
And I've used plain "/*" markers for internal functions, to prevent them from cluttering up the generated documentation with noise.
ie:

/**
 * ... jsdoc-style block to appear in generated docs ...
 */

vs

/*
 * ... jsdoc-style block, not appearing in docs ...
 */

Does that do what we want?

::: mailnews/base/util/converterWorker.js

  • // Helper to check start of byte array for "From ".
  • let startsWithFromMarker = function(b) {

why a helper function when it's used only once?

It started off a lot more verbose :-) I've inlined it.

@@ +408,5 @@
perhaps add the types to the message to ease debugging if this ever happens

throw new Error(Unsupported conversion: ${srcType} => ${destType});

Good idea. Done.

(In reply to Ben Campbell from comment #73)

I probably need come clarification here on jsdoc policy. We don't want all
functions included in generated documentation, right? Just the ones that are
exported or otherwise visible from the outside, and not the internal
helper/support functions.

I think it's standard practice to still use /** */ style comments for all documentation, internal or not. Internal is a vague concept anyway since we're not really doing an interface. It's all internal code.

/* */ style comments should be avoided in general, since they prevent commenting out larger blocks. Use // style instead.

Attached patch mbox_convert_revamp_6.patch (obsolete) — Splinter Review

Thanks - that sounds like a good policy.
I've converted all the "internal" jsdoc blocks back to /**.

I'm looking forward to when it becomes an issue and we find ourselves with too much internal documentation :-)

Attachment #9035530 - Attachment is obsolete: true
Attachment #9035530 - Flags: review?(mkmelin+mozilla)
Attachment #9035802 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9035802 [details] [diff] [review]
mbox_convert_revamp_6.patch

Review of attachment 9035802 [details] [diff] [review]:
-----------------------------------------------------------------

For the commit message, please always use one bug, starting the message "Bug 1491228 - "
You can then include in the text that it also fixes bug 1135309, but it needs to be clear which bug to go to for any followups etc.

For the iterators, "iterator" would seem a better name than "it". 

Anyway, I'll get this landed

::: mailnews/base/util/converterWorker.js
@@ +173,5 @@
> +    progressFn(raw.byteLength);
> +  };
> +
> +  let closeExistingMsg = function() {
> +    if (outFile !== null) {

we generally just compare to falsy/thuthy if there are no reason to do something else, so

if (outFile) {
Attachment #9035802 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/67c263d7544b
refactored mbox<->maildir conversion, also fixing bug 1135309 (don't use "From -" for maildir files). r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 66.0
Blocks: 1135309

Backout:
https://hg.mozilla.org/comm-central/rev/6c36a1eb832351416d0a9e6ff513ce9f1745aa43
Backed out changeset 67c263d7544b (bug 1491228) for test failures in test_converterImap.js. a=backout DONTBUILD

Magnus: I'd appreciate if you let me do the checkins since I coordinate them overall (unless it's an urgent bustage fix).
Ben: Please run tests locally or on try before presenting for review. Hint: The reviewer can run them, too. As far as I know, there are three tests covering this area, so it's easy to run those three.

I checked whether I could fix the failing test quickly, but I couldn't as there appear to be multiple problems:
First something goes wrong and then the error handling is faulty, too:
JavaScript strict warning: resource:///modules/mailstoreConverter.jsm, line 59: ReferenceError: reference to undefined property "lineno"
A came to the conclusion that fixing this secondary error wouldn't have helped, so I opted for a backout :-(

Status: RESOLVED → REOPENED
Flags: needinfo?(benc)
Resolution: FIXED → ---

(In reply to Jorg K (GMT+1) from comment #78)

Ben: Please run tests locally or on try before presenting for review. Hint: The reviewer can run them, too. As far as I know, there are three tests covering this area, so it's easy to run those three.

Sorry about that - I thought test_mailstoreConverter.js was the only one (along with new round-trip test from Bug 1508931). No excuse for not doing the try build though...

I checked whether I could fix the failing test quickly, but I couldn't as there appear to be multiple problems:
First something goes wrong and then the error handling is faulty, too:
JavaScript strict warning: resource:///modules/mailstoreConverter.jsm, line 59: ReferenceError: reference to undefined property "lineno"

I think the original error is something simple (I think the converter is assuming the source mbox file exists, which may not be the case for an unsynced imap account).
But yeah, always embarrassing when the error-handling code fails :-)

A came to the conclusion that fixing this secondary error wouldn't have helped, so I opted for a backout :-(

In any case, I need to update this patch now that the patch for bug 1259040 has landed (both affect the mailstore converter code).

Flags: needinfo?(benc)

There are three tests, all from the original bug 856087:
test_mailstoreConverter.js
test_converterImap.js
test_converterDeferredAccount.js
https://hg.mozilla.org/comm-central/rev/cb57855b5455

Attached patch mbox_convert_revamp_7.patch (obsolete) — Splinter Review

Patch updated:

  • now uses .eml extension on maildir messages (resolving clash with bug 1259040).
  • fixed up some bad exception handling which was exposed by test_converterImap.js.
  • fixed up test_converterImap.js (turns out it never worked anyway. It was tearing down its test imap account before running the test (yay async js! :-). So the conversion would fail and throw an exception which bypassed any Assert() calls which would have flagged up the test failure).
Attachment #9035802 - Attachment is obsolete: true
Attachment #9037865 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9037865 [details] [diff] [review]
mbox_convert_revamp_7.patch

Review of attachment 9037865 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM if the tests are happy

::: mailnews/base/test/unit/test_mailstoreConverter.js
@@ +39,5 @@
> + */
> +function setupServer(srvName, mboxFilename) {
> +  // {nsIMsgIncomingServer} pop server for the test.
> +  let server = MailServices.accounts.createIncomingServer(srvName,"localhost",
> +                                                       "pop3");

nit: align "pop3" under srvName. and add space after "srvName,"

::: mailnews/base/util/converterWorker.js
@@ +2,5 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +/* eslint-env mozilla/chrome-worker, node */
> +

what does this do?

::: mailnews/base/util/mailstoreConverter.jsm
@@ +209,5 @@
> +  try {
> +    dir.moveTo(parent, dir.leafName);
> +    // {nsIFile} new account root folder.
> +    log.info("Path to new account root folder: " +
> +              converterFolder.path);

should fit on 80ch line line, no?
Attachment #9037865 - Flags: review?(mkmelin+mozilla) → review+

OK, nits addressed. I'm assuming new patch requires new r+ flag, so I've marked it for review again.

Try build here:

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=1e01a96ab3d94880328a868204b5fcf2314d6633

Attachment #9037865 - Attachment is obsolete: true
Attachment #9038442 - Flags: review?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #82)

::: mailnews/base/util/converterWorker.js

+/* eslint-env mozilla/chrome-worker, node */

what does this do?

It suppresses a few undefined-symbol lint warnings (importScripts, OS) which would otherwise trigger on chrome workers.
I need to figure out if there's an equivalent eslint-env for unit tests. The xpcshell tests suffer lots of spurious undefined-symbol lint errors.

Comment on attachment 9038442 [details] [diff] [review]
mbox_convert_revamp_8.patch

Review of attachment 9038442 [details] [diff] [review]:
-----------------------------------------------------------------

If it's r+ with comments, you can just address the comments and mark the updated attachment r+
Attachment #9038442 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/d156755548d3
refactored mbox<->maildir conversion (also fixes Bug 1135309). r=mkmelin

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: