Closed Bug 1942540 Opened 1 year ago Closed 1 year ago

Support mbox files with bare CR line endings (produced on Mac pre Thunderbird 1.0)

Categories

(MailNews Core :: General, defect)

Unspecified
macOS
defect

Tracking

(thunderbird_esr128 wontfix, thunderbird136 wontfix)

RESOLVED FIXED
137 Branch
Tracking Status
thunderbird_esr128 --- wontfix
thunderbird136 --- wontfix

People

(Reporter: benc, Assigned: mkmelin)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Mbox files with CR line endings don't work. Only CRLF and LF line endings are handled.

It turns out that there was a period when Mac builds were writing out mbox files with bare CRs, and we've had a case of such files being spotted in the wild and causing problems - a user wanting to bring in mbox files from her profile on an old mac install.
If you just copy such mbox files into your profile, the emails do not show up.

It's not clear how prevalent this is.
https://bugzilla.mozilla.org/show_bug.cgi?id=156614#c6 kind of suggests that there might have been a window of time where bare-CR mboxes were being written, but that it was fixed to use LF instead in Thunderbird 1.0 (!).

I don't really want to add bare-CR handling into the mbox parsing code - it complicates the line-end logic and creates corner-cases I'd like to avoid.

But I think there is probably an argument for an mbox import path which handles this and all kinds of other mbox variations out there in the wild (both from thunderbird and from other apps), with much looser rules and a more heuristic-based approach. For example, during import you could do multiple passes over an mbox to sniff out oddities, which isn't something you'd want to do for the core mbox code.

For the benefit of others who might have a similar issue and come here searching, I'll explain what I ended up doing.

In my case the emails affected seem to all be in one single pop account inside the profile tqdxo9l3.default. At first I thought that tr "\r" "\n" would suffice, but the mbox files I'm dealing with also have emails that use CRLF for the line-end.

On a copy of the Thunderbird folder from ~/Library (on a Mac), I ran the following commands on a Linux machine:

cd <path to where the copy lies>/Thunderbird/Profiles/tqdxo9l3.default
find Mail/pop.affected.account -type f -name \*.msf -exec sh -c 'test -f "`dirname "$1"`/`basename "$1" .msf`"' _ "{}" \; -exec sh -c 'MAILPATH="`dirname "$1"`/`basename "$1" .msf`" ; grep --binary-files=text -q -e "^From .*^M" "$MAILPATH" && (mkdir -p "`dirname "../../../test2/Profiles/tqdxo9l3.default/$MAILPATH"`" ; dos2unix -n "$MAILPATH" "../../../test2/Profiles/tqdxo9l3.default/$MAILPATH" ; mac2unix "../../../test2/Profiles/tqdxo9l3.default/$MAILPATH")' _ "{}" \; -print

I assume every mbox file has an .msf file next to it and only mbox files have them. Also, that an mbox file in which any line starts with "From " and contains a carriage return needs to be converted. The reason for the separate folder was to make it easy to only send the changed mbox files (the zip file is over 4GB already).

Thus far I've done some checks and have not yet found any issues with the resulting mbox files, but I've not gone over all of them (~ 300).

Summary: Support mbox files with CR line endings → Support mbox files with bar CR line endings (produced on Mac pre Thunderbird 1.0)
See Also: → 1946327

What does the Mozilla Thunderbird team think to do to make this work again without converting?

PS: It's a pity that TM doesn't use Maildir by default, so this wouldn't have happened.

I don't think we should support bare CRs by default - it complicates the code quite a lot (with bare CRs you need a look-ahead to ensure there's no LF following, whereas with CRLF and LF the line end is always unambiguous: lines always end at an LF).

But it does seem like an automated conversion would be worthwhile - it's obviously a real issue with CR-only mbox files out there in the wild...

Quick remedy:
BBEdit offers the possibility to change the code page and control characters in the lower border, you just have to open the MBox file, change the bottom as desired, save, that's it. When I copied some mailboxes together, a few mails were destroyed. I tried this MBox with CR LF, it worked, the broken mails were gone. Changed all the ones I was working on to UTF-8.
Maybe it's because of the history. These mails were originally in MS Outlook 2011 for Mac. There was no direct way to get them to Postbox, so I copied the mails to directories on the Mac using drag & drop, then dragged and dropped them back to Postbox from there. And now imported from Postbox with Thunderbird. A few generations of MacOS in between.

With CR nothing is displayed, with LF or CRLF it is.

This allows repairing mbox files with bare CR line endings, when one performs Repair Folder for a local mbox.
If no bare CR lines were found, no action is taken.

For testing, on linux use this to create a test case:
Run these:
dos2unix /path/to/mbox
unix2mac /path/to/mbox

Remove the .msf.
Restart Thunderbird, notice the mbox doesn't work.
Peform Repair Folder, should be be working again.

Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED

dos2unix and unix2mac are available on mac by Hombrew, Fink, MacPorts.

Target Milestone: --- → 137 Branch

Pushed by john@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/d7962b3d8592
Allow repair of classic MacOS line endings in mbox files. r=BenC

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Summary: Support mbox files with bar CR line endings (produced on Mac pre Thunderbird 1.0) → Support mbox files with bare CR line endings (produced on Mac pre Thunderbird 1.0)
OS: Unspecified → macOS
Duplicate of this bug: 1946327
See Also: 1946327
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: