Closed Bug 194382 Opened 22 years ago Closed 4 years ago

sequence "bla-bla.\nFrom" in plaintext attachments becomes "bla-bla.\n>From"

Categories

(MailNews Core :: Attachments, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: s_i_m, Unassigned)

References

(Depends on 2 open bugs)

Details

(Whiteboard: [patchlove][needs updated patch][stalled on blockers?])

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20030210
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20030210

Very old bug. Existed in Netscape 4.x. If the following character sequence
"something.\nFrom something" is found in a plaintext file which is going to be
attached to the message, it appears in the attachment of a sent message as
"something.\n>From".

Reproducible: Always

Steps to Reproduce:
1. make a text file like this
---cut here---
bla-bla-bla.

From bla-bla-bla
---cut here---

2. attach this file to a test message

3. send it to yourself

Actual Results:  
you will get
---cut here---
bla-bla-bla.

>From bla-bla-bla
---cut here---

Expected Results:  
the same text as what was sent, of course!

Perhaps, the bug works not only for a sentence containing "From" but also for
other MIME keywords, I have not tested.
Funny comment:
Have just received a confirmation e-mail from Bugzilla about registering the
bug. Looks like the bug works perfectly ^_^ LOL

hint: USE WEB TO SEE THE BUG DESCRIPTION!
"From" is the message delimiter in the mbox format and has to be escaped as
">From" if it's not an actual message delimiter....  This is one of the known
pitfalls ofthe mbox format and is not easily resolvable without switching to a
different mail storage format, last I checked....
bz, while that is true, I would expect mozilla to display this unescaped, even
if it is stored as >From.
How do you differentiate that from actual messages that contain the text
">From"?  That's the issue I seem to recall with mbox.... (that said, if we
handle this in message bodies but not in attachments then this is just a bug)
It must be something like "bit stuffing". In other words, if you add one ">" to
_every_ occurence of "From" in a received message before placing that into mbox
and do the reverse procedure when you extract the message body from the box,
then this scheme should work also for the case when the mesage source has that
">From" already in the text (it becomes ">>From" and then changes back), etc.
That is why i think it is a bug.
Confirming with:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4b) Gecko/20030515

The "From " starting lines in the message body (no matter if it is a multipart
MIME type, i.e. including the attachments) are correctly escaped into ">From "
to store in mbox format but then they should be normalized upon reading.

One thing I've noticed while viewing a message source that the line:

From - Sun May 18 13:45:37 2003

which delimits messages in a mbox file is also included while I think it should
be omitted in addition to displaying normalized "From " starting lines (in the
source), i.e. pure "message/rfc822".
Perhaps the way to address this problem is to use base64 (or some other 
encoding?) for text attachments, at least for those containing a "From " line,
rather than adding stuffing before the From.  Tweaking the text that's actually 
part of the email causes enough complaints, but an attachment should arrive in 
the same condition it went out.

See also bug 169090, altho it's really not too closely related to this one.
Status: UNCONFIRMED → NEW
Ever confirmed: true
It seems that there are a number of mbox formats around.

Currently MailNews uses the mboxo format, which suffers from the ambigutity
pointed out in comment #4.
There is also the mboxrd format which disambiguates "From " and ">From " lines,
so mails can be displayed as received. This is similar to the soloution
proposed in comment #5.

A mail reader that understands mboxo format can read mboxrd format, and vice
versa,(excepting ambiguities)

I propose changing MailNews to read/write in the mboxrd format.

For reference, please see the mbox man page
(e.g. http://www.die.net/doc/linux/man/man5/mbox.5.html)
Oddly enough, the mbox man page on Suse linux 8.2 only describes the mboxo format.
Assignee: mscott → mscott
Bug 119441 is related to this bug.
It describes how import for outlook and eudora were fixed to cope with
 "From " lines.
They were fixed using the mboxrd format.

In comment #20 on the bug, Seth suggests that all mboxes should use this format
and asks for bugs to be opened requesting the chages to read/write this format.

To the best of my knowledge the new bugs were not opened.
This also happens in Linux, probably in all versions.

The key question is: Is that > added when sending or only when saving to a
folder (which is mbox).

In the first case, this must not happen.

In the second case, see bug 121947.

pi
OS: Windows XP → All
Hardware: PC → All
I'll try find some time to test if it's a folder-only issue or not... 
but may be confused by having the enigmail plugin installed.
Suppose I'll have to do a test with & without plugin.
Pi:

Thanks, you've uncovered the follow-up bugs to bug 119441, which I referenced
in comment #9.
There are *two* follow-up bugs: bug 121947 and bug 121946.
*** Bug 231744 has been marked as a duplicate of this bug. ***
*** Bug 260379 has been marked as a duplicate of this bug. ***
PLEASE NOTE IT IS NOT JUST A DISPLAYING ISSUE! WHEN ATTACHMENT IS STORED TO THE
LOCAL DRIVE IT KEEPS ALL OF THESE REDUNDANT ">"!

The bug is still present even in Thunderbird. If it was only displaying issue it
would not be a big deal. But the most annoying feature of this this bug (at
least for me) is that when I receive a message with, for example, a plain text
LaTeX file attached to it and when I store the attachment to my drive I get all
these redundant '>' symbols in the file. This is really annoying because in
scientific texts the paragraphs starting from "From..." are very common, and I
receive lots of texts prepared in LaTeX. It seems a bit stupid to zip a plain
text file of several KB size every time when I need to send it... If somebody
clears this issue it will be very much appreciated by the scientific community
:D Really.
Product: MailNews → Core
Component: MailNews: Attachments → ChatZilla
Component: ChatZilla → MailNews: Attachments
*** Bug 293389 has been marked as a duplicate of this bug. ***
A similar problem may occur in the (rare) case when a attachment with a single "." in a line is sent via an old SMTP server.
If an attachment line starts with "From ", quoted-printable is selected and "From " is encoded as "=46rom "

This patch should fix this bug, because the special handling of "From " is already implemented in the q-p encoder:
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimeenc.cpp#1160

It may make sense to do this also for main body if format=flowed is not used, but I'm not sure.

The issue from comment 17 is not handled by this patch, but it could be easily extended.
Attachment #200719 - Flags: review?(bienvenu)
(In reply to comment #17)
> A similar problem may occur in the (rare) case when a attachment with a single
> "." in a line is sent via an old SMTP server.

Bug 259564.
*** Bug 338700 has been marked as a duplicate of this bug. ***
Christopher, Nikolay, does this work for you and bug 248726, bug 121946, bug 121947, bug 259564, bug 271225

this one WFM version 3.0a2pre (2008060403)
Assignee: mscott → nobody
QA Contact: stephend → attachments
version 3.0a2pre (2008071003) - WFM
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
The steps to reproduce of this bug are simple, and I still see the reported symptom with TB 3.0a2pre-0711, Win2K.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Attached image example of message
Mike, look for an attachment - nope I don't see any problems.
Product: Core → MailNews Core
joe, rsx, do you also wee this problem?
Not able to reproduce here:
Mozilla/5.0 (Windows NT 5.0; rv:8.0a1) Gecko/20110812 Thunderbird/8.0a1 ID:20110812050601
I still see this in 3.1.11 (which has an odd UA string: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666).

N.B. You won't see this in an IMAP account, only when the message has been downloaded or copied to a POP account. I'd be very surprised if the bug has just gone away.
jwq, thanks for the info

- this is the first anyone has mentioned pop.  have you tested with imap?
- please reset your UA string using http://kb.mozillazine.org/Resetting_your_useragent_string_to_its_compiled-in_default
- please also update the bug with results when you are able to use version 5 or 6.
I tested with IMAP first, not thinking, and the problem doesn't occur.

 Then I remembered that this is a very old "bit of fossilized stupidity" [1] caused by the mbox storage format used by Thunderbird to save POP downloaded messages to disk. IMAP doesn't download messages to disk, thus doesn't exhibit this bug.

Cross-Ref: Bug 121946, Bug 121947. In particular, see Comment 8 above and Bug 121947 Comment 17 with regard to the mboxrd format.

David Bienvenue is working on Bug 402392 which allows different message storage formats to be used. When that's complete I think the iron would be hot enough to straighten out the historic bugs with use of the mbox format by changing it to mboxrd.

[1] LaTeX: A Document Preparation System, 2nd ed., Leslie Lamport, p. 34.
Using format=flowed for the attachment could solve the issue also in case the message is stored in an mbox while transiting.  The advantage of adding a space instead of a ">" is that compliant software will remove the leading space upon seeing format=flowed.  Even with non-compliant software, the result looks acceptable if space-stuffing is done consistently, i.e. adding a leading space to all lines (see Bug 609908 Comment 3.)
 Alessandro, you may well be right in your suggestion that using format=flowed would solve this particular issue, but that would be treating the symptom rather than the cause of the problem. The cause is that the local message storage implementation is altering the message, which it ought not do. Fixing Bug 121947 and Bug 121946 ought to fix this bug as well. I'm going to take the liberty of placing those bugs on the depends list for this bug; I'm surprised they weren't there before.
Depends on: 121946, 121947
Whiteboard: [patchlove][needs updated patch][stalled on blockers?]
Comment on attachment 200719 [details] [diff] [review]
Patch for nsMsgAttachmentHandler to detect lines starting with "From "

I think we can assume this won't apply, so clearing review
Attachment #200719 - Flags: review?(mozilla)
http://www.qmail.org/qmail-manual-html/man5/mbox.html is now commonly
considered to be the definitive text on the definition of the mbox format:

HOW A MESSAGE IS DELIVERED
     Here is how a program appends a message to an mbox file.

     It first creates a From_ line given the  message's  envelope
     sender  and  the  current  date.   If the envelope sender is
     empty (i.e., if this is a bounce message), the program  uses
     MAILER-DAEMON  instead.   If  the  envelope  sender contains
     spaces, tabs, or newlines, the program  replaces  them  with
     hyphens.

     The program then copies the message, applying >From  quoting
     to  each  line.   >From  quoting  ensures that the resulting
     lines are not From_ lines:  the program prepends a > to  any
     From_ line, >From_ line, >>From_ line, >>>From_ line, etc.

     Finally the program appends a blank line to the message.  If
     the  last  line of the message was a partial line, it writes
     two newlines; otherwise it writes one.

HOW A MESSAGE IS READ
     A reader scans through an mbox file looking for From_ lines.
     Any From_ line marks the beginning of a message.  The reader
     should not attempt to take advantage of the fact that  every
     From_ line (past the beginning of the file) is preceded by a
     blank line.

     Once the reader finds a message,  it  extracts  a  (possibly
     corrupted)  envelope  sender  and  delivery  date out of the
     From_ line.  It then reads until the next From_ line or  end
     of  file,  whichever  comes  first.  It strips off the final
     blank line and deletes  the  quoting  of  >From_  lines  and
     >>From_ lines and so on.  The result is an RFC 822 message.

COMMON MBOX VARIANTS
     There  are  many  variants  of  mbox  format.   The  variant
     described above is mboxrd format, popularized by Rahul Dhesi
     in June 1995.

     The original mboxo  format  quotes  only  From_  lines,  not
     >From_ lines.  As a result it is impossible to tell whether

          From: djb@silverton.berkeley.edu (D. J. Bernstein)
          To: god@heaven.af.mil

          >From now through August I'll be doing beta testing.
          Thanks for your interest.

     was quoted in the original message.  An mboxrd  reader  will
     always strip off the quoting.

     mboxcl format is like mboxo format, but includes a  Content-
     Length  field  with  the  number  of  bytes  in the message.
     mboxcl2 format is like mboxcl  but  has  no  >From  quoting.
     These  formats  are used by SVR4 mailers.  mboxcl2 cannot be
     read safely by mboxrd readers.



Whenever you implement an mbox import/export function, please make sure that you use the mboxrd variant described above, otherwise there will be no round-trip compatibility and it will always be possible to create a message where a ">" is added or lost accidentally.
Severity: major → normal

I wrote myself a tiny message to check the state of this bug with Thunderbird 60.8.0 (64-bit). My Sent folder if fed by BCC, so that's where the message arrived. For historical reasons, the IMAP folder is a local file, not a maildir. However, the message was not modified:

my ImapMail/server folder$ tail -35 Sent |hd
00000000  0d 0a 0d 0a 0d 0a 46 72  6f 6d 20 2d 20 4d 6f 6e  |......From - Mon|
00000010  20 4a 75 6c 20 32 32 20  31 32 3a 30 37 3a 35 30  | Jul 22 12:07:50|
00000020  20 32 30 31 39 0a 58 2d  4d 6f 7a 69 6c 6c 61 2d  | 2019.X-Mozilla-|
00000030  53 74 61 74 75 73 3a 20  30 30 30 31 0a 58 2d 4d  |Status: 0001.X-M|
00000040  6f 7a 69 6c 6c 61 2d 53  74 61 74 75 73 32 3a 20  |ozilla-Status2: |
00000050  30 30 30 30 30 30 30 30  0a 44 65 6c 69 76 65 72  |00000000.Deliver|
00000060  65 64 2d 54 6f 3a 20 61  6c 65 2d 73 65 6e 74 40  |ed-To: ale-sent@|
00000070  74 61 6e 61 2e 69 74 0d  0a 52 65 74 75 72 6e 2d  |tana.it..Return-|
00000080  50 61 74 68 3a 20 3c 76  65 73 65 6c 79 40 74 61  |Path: <vesely@ta|
00000090  6e 61 2e 69 74 3e 0d 0a  44 4b 49 4d 2d 53 69 67  |na.it>..DKIM-Sig|
000000a0  6e 61 74 75 72 65 3a 20  76 3d 31 3b 20 61 3d 72  |nature: v=1; a=r|
[...]
00000490  3d 75 73 2d 61 73 63 69  69 0d 0a 43 6f 6e 74 65  |=us-ascii..Conte|
000004a0  6e 74 2d 4c 61 6e 67 75  61 67 65 3a 20 65 6e 2d  |nt-Language: en-|
000004b0  55 53 0d 0a 43 6f 6e 74  65 6e 74 2d 54 72 61 6e  |US..Content-Tran|
000004c0  73 66 65 72 2d 45 6e 63  6f 64 69 6e 67 3a 20 37  |sfer-Encoding: 7|
000004d0  62 69 74 0d 0a 0d 0a 54  68 69 73 20 69 73 20 74  |bit....This is t|
000004e0  6f 20 63 68 65 63 6b 20  74 68 61 74 0d 0a 46 72  |o check that..Fr|
000004f0  6f 6d 20 64 6f 65 73 20  6e 6f 74 20 67 65 74 20  |om does not get |
00000500  71 75 6f 74 65 64 0d 0a  6f 72 20 64 6f 65 73 20  |quoted..or does |
00000510  69 74 3f 0d 0a                                    |it?..|

The format somehow provides for tolerating \nFrom_ in the middle of a message. Then, I copied the message to local folders, and, surprise surprise, the local folders have a different format. The message was modified (and DKIM Verifier complained):

my Mail/Local_Folders$ tail -35 Sent|hd
00000000  46 72 6f 6d 20 2d 20 4d  6f 6e 20 4a 75 6c 20 32  |From - Mon Jul 2|
00000010  32 20 31 32 3a 30 38 3a  35 38 20 32 30 31 39 0a  |2 12:08:58 2019.|
00000020  58 2d 4d 6f 7a 69 6c 6c  61 2d 53 74 61 74 75 73  |X-Mozilla-Status|
00000030  3a 20 30 30 30 31 0a 58  2d 4d 6f 7a 69 6c 6c 61  |: 0001.X-Mozilla|
00000040  2d 53 74 61 74 75 73 32  3a 20 30 30 30 30 30 30  |-Status2: 000000|
00000050  30 30 0a 58 2d 4d 6f 7a  69 6c 6c 61 2d 4b 65 79  |00.X-Mozilla-Key|
00000060  73 3a 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |s:              |
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000b0  20 20 20 0a 44 65 6c 69  76 65 72 65 64 2d 54 6f  |   .Delivered-To|
000000c0  3a 20 61 6c 65 2d 73 65  6e 74 40 74 61 6e 61 2e  |: ale-sent@tana.|
000000d0  69 74 0d 0a 52 65 74 75  72 6e 2d 50 61 74 68 3a  |it..Return-Path:|
000000e0  20 3c 76 65 73 65 6c 79  40 74 61 6e 61 2e 69 74  | <vesely@tana.it|
000000f0  3e 0d 0a 44 4b 49 4d 2d  53 69 67 6e 61 74 75 72  |>..DKIM-Signatur|
[...]
000004e0  69 6e 3b 20 63 68 61 72  73 65 74 3d 75 73 2d 61  |in; charset=us-a|
000004f0  73 63 69 69 0d 0a 43 6f  6e 74 65 6e 74 2d 4c 61  |scii..Content-La|
00000500  6e 67 75 61 67 65 3a 20  65 6e 2d 55 53 0d 0a 43  |nguage: en-US..C|
00000510  6f 6e 74 65 6e 74 2d 54  72 61 6e 73 66 65 72 2d  |ontent-Transfer-|
00000520  45 6e 63 6f 64 69 6e 67  3a 20 37 62 69 74 0d 0a  |Encoding: 7bit..|
00000530  0d 0a 54 68 69 73 20 69  73 20 74 6f 20 63 68 65  |..This is to che|
00000540  63 6b 20 74 68 61 74 0d  0a 3e 46 72 6f 6d 20 64  |ck that..>From d|
00000550  6f 65 73 20 6e 6f 74 20  67 65 74 20 71 75 6f 74  |oes not get quot|
00000560  65 64 0d 0a 6f 72 20 64  6f 65 73 20 69 74 3f 0d  |ed..or does it?.|
00000570  0a 0a                                             |..|

I also noticed a naked \n and lots of spaces around X-Mozilla-Keys.

To me it looks like this bug, at least as described in comment 0 and in several duplicates, is no longer an issue. The fix seems to be the fact that text attachments are now base64 encoded. This solution was suggested in comment 7 but I'm pretty sure base64 encoding of text attachments was not done in response to this bug but, as a side-effect, fixed this bug.

There also does not seem to be a problem with body lines beginning with "From " in general with TB generated mbox files since on composing "blank stuffing" is used so no text line begins with "From " but become " From " and the leading blank is removed before displaying the message. But this is not really the subject of this bug.

I tried to duplicate Alessandro's comment 35 description. I composed an email with a leading "From " in the body and saved it to an imap mbox storage. The "stuffed blank" was present before the "From" but displayed with no blank. On copy to a Local Folder, the message was not quoted with ">" and had no leading space. The Local Folder mbox file did contain the stuffed space before the From. But, again, this is not really the subject of this bug since it doesn't involve an attachment. Or does it, maybe I'm reading the memory dumps wrong?

So for now, I'm marking this as INVALID.

Edited P/S: Text attachments by default are not displayed inline unless pref mail.inline_attachments.text is set true. So, even without base64 encoding, text attachments will not, by default, cause a problem when lines begin with "From ".
However, 3rd party mbox or eml files that don't have "blank stuffed" or ">" escaped "From " lines will start a new message at that point when placed in a Local Folder (again, not a concern of this bug).
PP/S: I found this bug report while browsing the Import/Export Tools code here:
https://github.com/thundernest/import-export-tools-ng/blob/master/src/chrome/content/mboximport/exportTools.js#L2001

Status: REOPENED → RESOLVED
Closed: 16 years ago4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: