Closed Bug 1332889 Opened 7 years ago Closed 7 years ago

When copy mbox file into Local Folder (ie. import it manually), msgs may be lost

Categories

(Thunderbird :: Folder and Message Lists, defect)

45 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1276139

People

(Reporter: rlaggren, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Build ID: 20161212000000

Steps to reproduce:

- Copy mbox file with mail into "Local Folders" dir in profile.

- Run T-bird; folder appears as expected

- Clicked folder to display list of messages; some messages display as expected


Actual results:

All messages after a certain point are missing. Msg's after the line which begins as quoted below do not appear in T-bird (quotes added):

"From " ...(additional text in line)

This line occurs in the body of one of the emails. T-bird does not display any warning or msg. The mbox file in "Local Folders" is intact and complete, containing the full headers and text of all the original emails; but msg's after that line do not appear in T-bird. 

The attached test file is a snip from a much larger file containing archives of a mailing list. The problem text is on line #81. Change "From " to "From-" and T-bird displays all messages in the file properly.


Expected results:

All messages in the mbox file copied to "Local Folders" s/b displayed and available in T-bird.
This bug is relevant (sorry, didn't find this in 15 minutes searching b4 opening bug):

https://bugzilla.mozilla.org/show_bug.cgi?id=355237

It contains links to other discussions and possible fixes. Last update 2014.
Further clarification after more search. 

Above I described changing the line in the msg body starting with "From " to "From-" which causes T-bird to read the file correctly. After reading some of Jamie Zawinski's (and others') posts it appears that the traditional fix for this problem ("From " in a msg body) has been to "mangle" that text into ">From " (in the Sending client, I guess). 

So it appears that T-bird uses the "From " delimiter (possibly including some line feed characters) to define a new msg . It assumes some special formatting in the msg body text (eg. the Sending client mangling the text "From " to ">From ") will protect it against those delimiter characters found in the msg body. T-bird does not appear to do any additional tests when it finds "From " in an mbox file, just declares a new msg at that point.

Would it be possible to put in place a couple more tests when hitting the "From " delimiter in the summary build, for a reality check? Zawinski has a rather discouraging take on this (below) but since the line is already in memory, it would not seem to cost much to provide some added, partial, protection. This would make T-bird more robust when dealing with older files, files on other servers, from different systems, etc. If a (possibly) rogue delimiter is flagged (as part of building the summary file), a dialogue could display associated text with a couple of radio-button options [treat it like a new msg | treat it like body text]; also, a checkbox "Do this for all". Displaying multiple lines before/after that point would help immensely and since this should only happen very occasionally when accessing "new" mail files and the surrounding lines would be read for display only as part of the problem dialogue, not as part of the regular summary file build process, reading 5 prior and 5 succeeding lines for problem display purposes needn't cause a performance hit for regular mail usage. And the "Do for All" option would disable the additional reads, as well as the dialogue box if that were needed for very large files or when every message needed fixing.

Zawinsky - Fri, 17 May 1996  - in comp.mail.headers

 "The random **** that has traditionally been dumped into that line is without bound; comparing the first five characters is the only safe and portable thing to do. Usually, but not always, the next token on the line after ``From '' will be a user-id, or email address, or UUCP path, and usually the next thing on the line will be a date specification, in some format, and usually there's nothing after that. But you can't rely on any of this. "
(In reply to Rufus from comment #0)
> - Copy mbox file with mail into "Local Folders" dir in profile.

Which software created this mbox file?

In the Berkeley mailbox format "From" must not be at the beginning of the line or TB will assume that a new message starts there.
> which software created this mobx file?

No idea. The mail files I need to review are from the Python newsgroup archive and begin in Feb 1999. There is no software stamp in the file, at least not that I'm aware of. Also, in the course of years, several different applications may have touched these files.

I can see where this can be labeled an input file problem. However, the "From " delimiter seems to be a long standing email issue that should be handled better than simply discarding what appears to be all messages following the problem "From " line. T-bird handles this file as expected right up to that line so it can clearly handle the file format, excepting this one particular text literal in the body.

For example, opening the file in Evolution presents a completely normal and useable mail file which retains all messages and data with the exception (I think) of the single line with the bogus "From " text. It creates a new message starting at the bogus "From " line composed of all succeeding lines until the beginning of the next message, ie. the next real "From " line. It generates a brief header consisting of a date field w/the current date, a subject field w/the literal "No Subject", a Message-ID field w/the current domain and other stuff that may be random,  "Mime Version 1.0", "X-Evolution-Source: local" and finally a "real" "From:" (w/colon). Thus it makes a "place holder" message which saves all the data in the broken message following the problem text except (apparently, from brief examination) the single bad line.

I haven't checked if it changes the mail file or generates this display on the fly. I suspect it changes the file. While this might offend some data ethicists, it is completely useable, understandable, allows relatively easy access and identification of the affected data and that data is mostly easily recoverable by the end user from w/in the mail client. The T-bird alternative where it appears all following messages disappear and are simply not available, at least from w/in the T-bird gui, looks pretty bad in comparison. For the ideal world, a warning dialog could pop when this situation, a "From " delimiter w/no near by headers, occurs displaying context and offering to combine it with the previous message.
Sorry, this appears to be a duplicate, read bug 1276139 comment #4.

Since you found the very old bug 355237 (which I wasn't aware of), please go and poke that bug and ask what's happening there.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Depends on: 355237
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: