email split into two or more (in read mbox) (if user appends mutt's mail data to Tb's local mail folder file. mutt doesn't escape "From " line without timestamp data.)

RESOLVED WONTFIX

Status

enhancement
RESOLVED WONTFIX
9 years ago
9 years ago

People

(Reporter: redelm, Unassigned)

Tracking

SeaMonkey 2.0 Branch
x86
All

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100205 SeaMonkey/2.0.3
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100205 SeaMonkey/2.0.3


Some emails seem to be split into two (or more) fragments when read from a *.mbx file (with or without rebuilding summary).  Deleting the non-HTML "text" portion usually fixes the split.  A line beginning with "From " appears to trigger the split.

Attached is the simplest file which generates the split.

Reproducible: Always

Steps to Reproduce:
1. scp mailfile to Local\ Folders
2. open seamonkey -mail
3. click on Local\ Folders\mailfile
4. observe email split -- later fragments lack Subj: and Date: but have short From:
Actual Results:  
email split

Expected Results:  
email in one piece

splits seem to happen when text line begins with  "From  "
perhaps other mail keywords are also triggers.
(Reporter)

Comment 1

9 years ago
mbox file that generates split email
(Reporter)

Updated

9 years ago
Version: unspecified → SeaMonkey 2.0 Branch
(In reply to comment #0)
> splits seem to happen when text line begins with  "From  "

Mozilla(Sm,Tb) uses Unix Mbox format(search at Google, please) for mail folder file. If you manually create Mozilla's mail folder file, you MUST escape "From " line by yourself.
As quoted-printable mail, escape by "=46rom " is probably a practical workaround.

Note:
Tb 3 supports import of .eml file by "drag&drop .eml file to thread pane or folder" and escapes "From " line by ">From ".

INVALID
(Reporter)

Comment 3

9 years ago
Thank you for the quick response.
I'm well aware of Unix mbox format (I use `mutt` as primary mailreader, seamonkey to handle such HTML as considered necessary).  I do _not_ use *.eml files (MS-OutlookExpress may be MS buggiest/insecurest product ever, although MS-InternetExplorer is competing vigorously).

mutt handles `^^From ` just fine and is used to create the mbox.  I guess it has more sophisticated start-of-email detection techniques, like seeing whether following lines fit header format or not.  Perhaps mutt has some option to escape the "^^From " so less sophisticated pgms are not confused.

The thing is, starting a paragraph with "From " is neither unusual nor poor english.  I get about one such email per day.  Only about 2/3rd of my email is received with "quoted-printeable".  

This seems like a fixable & visible bug.  Mangled email will give Moz{Sm,Tb} a black eye.  I sure hope the more common POP3 access works better.
".eml" file is never line mode file generated by MS's software, even if name of ".eml" was originally defined by MS. ".eml" is de facto standard of file extenstion of file contains mail data stream defined by SMTP, POP3, IMAP, NNTP etc. based on mail RFC's(RFC2822 etc.).

> mutt handles `^^From ` just fine and is used to create the mbox.

Who wrote "From " line in Tb's file for local mail folder(file of Unix Mail box format). You, didn't you?

As I wrote, if you pass the file(you attached to this bug) to Tb indicating "content is mail data stream" with file extension of ".eml", Tb imports passed data with escaping "From " line in mail data stream, as Tb/Sm does do for mail data stream sent from POP3/IMAP/News server(or sent to SMTP server by Tb/Sm) when Tb stores the mail data stream in Tb's mail folder file.

Known problems around escaping/unescaping of "From " line.
- Upon draft save, Tb escapes "From " line with space(" From ") in some cases,
  but escapes with ">From " in some other cases. However, upon edit draft,
  Tb currently doesn't unescape ">From ".
  Similar issue exists in Tb's mail reading from Tb's local mail folder file. 
- If "From " line is sent from server, Tb escapes it upon save in Tb's local
  mail folder file. However, once saved in Unix Mbox format file, it's
  imposible to know original is "From " or ">From ".
  This is well known issue and the issue is produced by design of Unix Mbox.

Note:
If degital signed mail, escaping by "=46rom " can't be used for proper mail data processing even if quoted-printable mail, because "what is original" is lost if escaping of "From " line due to Unix Mbox format is used. Simplest solution is "never use Unix Mbox format". "One file per a mail" is one of such kind of solutions.
(Reporter)

Comment 5

9 years ago
"From " lines in mbox file were created by mutt which read mailspool, processed most mails in textmode.  A few (5-20+) were saved into this mbox for HTML rendering by Sm.  File was scp'd to machine running X ~/../Local\ Folders .  Just drop it there.  Sometimes rebuilding the Summary File (.msf) is necessary.  

Never use any import fn, nor file extention.  Works like a charm, two keystrokes (mice are an ego-hazard) and the mail is copied over and open.  mbox properly read, all emails From/Date/Subj & rendering correct, except for this pesky 'From ' false splitting.

I can see the unix mbox file design has some ambiguity around 'From '.  mutt seems to handle it.  I don't think it escapes 'From ' but can parse the same mbox without false splitting.
(In reply to comment #5)
> "From " lines in mbox file were created by mutt which read mailspool, processed most mails in textmode.

Does mutt use "one file per mail" system(not Unix Mbox file, near to .eml file. Apple Mail 2 already transferred to "one file per mail" called ".emlx").
See similar enhancement request of Tb.
> Bug 58308 support qmail's maildir format

Or mutt also uses Unix Mbox format file but has capability to track escaping of "From " line?
Even if so, recovery of "tracking data of escaping of From line from Unix Mbox file only" is impossible, unless meta data in mail header format(like X-Mozilla-Status:) is held in the Unix Mbox format file. 
Even if so, use of externally created Unix Mbox file is very hard, because it's impossible to know original is "From " or ">From " or "?From"(with other escape character than >).

> Sometimes rebuilding the Summary File (.msf) is necessary. 

Does it mean you append mail data to Tb's mail folder file while Tb is running?

Even if you appended mail data to Tb's mail folder file while Tb is running and while Tb doesn't open the Tb's mail folder file, Tb detects file size change and invokes internal-rebuild index, if file size is changed.
However, if file size is not changed, internal rebuild-index is not invoked, even if file timestamp is changed, unless difference of timestamp exceeds mail.db_timestamp_leeway value. This is to support same mail file use with Tb on different OS and mail folder file on file server without frequent internal rebuild-index.
And, if you add mail data with "From " line, the added "From " line is treated as "mail separator of Unix Mbox" by rebuild-index, because the "From " line you added is "mail separator of Unix Mbox" which is defined by "spec of Unix Mbox".
Summary: email split into two or more (in read mbox) → email split into two or more (in read mbox) (if user puts "From " line in Tb's local mail folder file)
(Reporter)

Comment 7

9 years ago
1) mutt does _not_ use "one file per mail", it concatenates in Unix mbox.

2) mutt does _not_ appear to escape "From " lines in msg text.  Check the originally attached file -- it was created with mutt.  mutt does appear to have a more sophisticated email separator algorithm than just grep "\n\nFrom ".  I'm guessing it looks at following lines to see if they are known-header keywords, and if not, "\n\nFrom " is in msg-text, not a valid email separator.

3) I do not append while Sm is open.  I copy prior to opening Sm. The .msf usually correct without manual rebuild, but not always, especially if previous file was larger but new one has new emails.

-- Robert
(In reply to comment #7)
> 1) mutt does _not_ use "one file per mail", it concatenates in Unix mbox.
> 2) mutt does _not_ appear to escape "From " lines in msg text.
> Check the originally attached file -- it was created with mutt.

There are several versions/variants in "Unix Mbox format".
  "From ...", ... ,"From <word> <timestamp> <comment>" (some mailers put
   message-id in <word>. Sm/Tb puts "-" in <word> field)
   Common in all versions/variants is; line starts with "From ".
   Note:
   I saw "From " line with <word>==null(string of length==0, instead of 0x00)
   which was generated by a verion of Opera 6.x in his Unix Mbox format file. 
mutt may check format of "From ..." line.

Mozilla family continues to use 'line start with "From "' probabaly for compatibility of local mail box file with former versions including first Netscape Mail&News version, and with other tools/softwares who know "Netscape/Mozilla family uses Unix Mbox file".
Introducing mailnews.local_maildb.rebuild_index.strict_unix_mbox_separator=true may be a solution. I believe user who knows "Unix Mbox format" like you won't complaint about incompatibility with older Sm/Tb/Mozilla/Netscape and other tools/softwares around "mail separator of Unix Mbox format".  

Note:
Sm/Tb doesn't always use "From " line and doesn't always reparse "From " line of locally held mail data.
(a) IMAP mail downloaded in Disk Cache(IMAP folder of "offline use=off").
    As "one file per mail", "From " line is not used.
(b) IMAP mail downloaded in offline-store(IMAP folder of "offline use=on").
    Unix Mbox format is used like local mail folder file, and "From " line is
    written as indicator of "start of a mail".
    However 'reparse of "From " line won't be executed, because rebuild-index
    of IMAP folder is "discard offline-store data, then re-download whole mail
    data into offline-store". I don't know whether escaping of "From " line is
    done for mail data in offline-store or not.
(Reporter)

Comment 9

9 years ago
Good points!

mutt uses  From $USER `date`  , and might well check for this (using 1st line to set From $USER match pattern).

I agree stricter checking for email-sep should be optionally implemented.  I wouldn't want to be too restrictive, because I bet some people use 10+ year old mailfiles.  I do.
I believe (a) is far important than (b) for majority of users.
(a) compatibility of local mail folder file with;
   - older Sm/Tb/Mozilla/Netscape
   - other tools/sotwres who reads local mail folder file of Mozilla family
(b) compatibility with mutt's local mail folder file
Because you are Linux user, I believe following is one of simplest/easiest JOB on Linux.
  adding ">" to line start from "From " by command/script
  before appending mail data to Sm's local mail folder file  
Similar JOB is also required for [CRLF] or [LF] issues. See bug 503271.
Please consider use of simple preprocessor before appending mail data to Sm's local mail folder file.
Robert in Houston, are you still eager for enhancemet for "compatibility with mutt's local mail folder file"? I'm hoping developers will pay attention to important enhancement like "qmail's maildir format support" instead of such enhancement for small number of peoples.
As I wrote in Bug 377986 comment #6, to generate X-Mozilla-Status:/X-Mozilla-Status2: header(required to track deleted status, read status etc.) and X-Mozilla-Keys: header(required to hold tag added to a mail), two times of "Compact Folder" is required, if mail data in mutt's mail folder file is appended to SeaMonkey's mail folder file.

Preprocessor should do at least next three before append mutt's mail data to SeaMonkey's mail folder file.
(1) Escaping of mail text line start with "From ".
    - If quoted printable, replace by "=46rom " can be used.
    - If format=flowed, conversion of "[CRLF]From ..." to "[CRLF]From" +
      "[CRLF] ..." is simlest way. It won't change logical mail data.
    - If format=flowed, escape by " From " is also a practical way, because
      there is no problem if the "From ..." is a part of flowed statements.
      Even if "[CRLF]From "(new line even when format=flowed), single space
      before "From" won't produce problem in mail reading.
      This is done by Thunderbird for format=flowed mail. AFAIR, Thunderbird
      doesn't unescape the spaced added for escaping.
    - If text/html, moving "From " to previous line as " From"
      can be used in many cases, because new line in HTML is treated as
      a single space if European language.
    - If "From ..." line in attached text file, escaping or moving of "From "
      should be executed carefully, in order not to corrupt original data.
    - I dont't know "what should be done" for degital signed mail.
(2) Conversion of single [LF] and single [CR] to [CRLF].
    Because mail folder file holds mail data stream, [CRLF] is proper data.
    Altough Seamonke/Thunderbird can use mail data of single [LF] and/or
    single [CR] with no problem, some IMAP servers reject such data.
    See Bug 83396, Bug 301010, and Bug 402759.
    I dont't know "what should be done" for degital signed mail.
(3) Addition of X-Mozilla-Status:, X-Mozilla-Status2:, X-Mozilla-Keys: headers.
    If these headers are not written, mail status, adde tag is lost by
    rebuild-index. To avoid the loss of information, two times of "Compact
    Folder" is required(current design/implementaion).
    If these headers are added by you, there is no need of two times of
    "Compact Folder".
FYI.
If you want to read mail data by other mailer or mail server on *nix platform, "movemail account" may be an easy way.
> http://www-archive.mozilla.org/mailnews/movemail/index.html
Linux version of Sm/Tb may have movemail account definition support.
(Reporter)

Comment 14

9 years ago
I can live with the situation as-is.  I just hope it does not adversely affect others.  This is an "identification" problem much like SPAM -- problems of false negatives and false positives on email boundaries.  A reasoned balance should be struck.

maildir does not interest me -- I _like_ huge mailfiles and detest clogged directories.  movemail seems heavyweight -- I can write a simple sed one-liner.
Severity: major → enhancement
Status: UNCONFIRMED → RESOLVED
Last Resolved: 9 years ago
OS: Linux → All
Resolution: --- → WONTFIX
Summary: email split into two or more (in read mbox) (if user puts "From " line in Tb's local mail folder file) → email split into two or more (in read mbox) (if user appends mutt's mail data to Tb's local mail folder file. mutt doesn't escape "From " line without timestamp data.)
You need to log in before you can comment on or make changes to this bug.