TB apprently uses mboxo format, which irrecoverably corrupts mail

RESOLVED DUPLICATE of bug 121947

Status

defect
--
major
RESOLVED DUPLICATE of bug 121947
7 years ago
6 years ago

People

(Reporter: calestyo, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Reporter)

Description

7 years ago
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0 Iceweasel/16.0.2
Build ID: 20121026191203



Actual results:

Not sure whether this is known already or not,... in any case I think
it's quite critical..

I recently stumbled over several MUAs/tools (e.g. Evolution, getmail)
that have their problems with the mbox format, namely by corrupting
stored or imported mail in not quoting From_ lines correctly... or by
intentionally using the mboxo format, which inherently leads to this
corruption.


What happens in short: 
Attached are two versions of the same mail, the transcript as I send it
via SMTP and what postfix stores in the mbox.

You will note that postfix replaces any lines matching (regexp):
"^From (.*)$"
with:
">From \1"

The problem now is, that it does not quote From_ lines already starting
with one or more ">" as this is required in the mboxrd format; i.e.:
"^>(>*)From (.*)$"
would need to be replaced by
">>\1From \2"

The effect now is, that mails cannot be unquoted anymore and the
corruption added above is obviously irrecoverably.


Expected results:

Please:
a) fix the issue, by switching to one of the other formats (I guess
mboxrd is ideal, as it's backwards compatible, but allows one to fully
unquote everything correctly) 

b) ideally, warn all users via the
release-notes/change-log/-announce-mailing-list about this, IMHO, most highly
severe corruption.

It may be helpful to tell them that any mails stored into mbox via
fetchmail, that contained lines matching "^From (.*)$" may be corrupted.
Further, the way to find the places of possible corruptions is by
matching:
"^>From (.*)$" (note the single > which is needed here, as THESE are the
cases that may either be quoted From_ lines or just lines that were the
text ">From ...".

I guess both is not to difficult to be done :) and as said,... when
switching to mboxrd (instead of mboxo)... no further problems should
occur... a bit more will be quoted, which allows the quoting to be
undone correctly.
(b) Is IMHO important, to at least notify people that they may have
suffered from this over years. Looking around it seems to turn out that
most people are not aware that mbox is actually a family of formats and
that especially mboxo has its issues.
(Reporter)

Updated

7 years ago
Severity: normal → blocker
OS: Linux → All
Hardware: x86_64 → All
(Reporter)

Comment 1

7 years ago
Posted file smtp-transcript
(Reporter)

Comment 2

7 years ago
May I add.. TB is even way more harmful here than it is e.g. Evolution for several reasons:

a) AFAICS, TB _always_ uses mbox as it's local format, while e.g. Evolution uses maildir since 3.x.


b) TB seems to corrupt the messages even then, when they are never written to disk.
Better said, when there should never be any reason to write them to disk as mbox.
I set up two IMAP accounts, the first from which the attached test-message is read, the 2nd to which it is immediately moved via TB's mail filters.
The 2nd IMAP server uses maildir.
But the mail appears already corrupted there.
(Reporter)

Updated

7 years ago
Attachment #678185 - Attachment description: this is annoying → smtp-transcript
(Reporter)

Updated

7 years ago

Comment 3

7 years ago
I can't follow you. Is TB itself corrupting the messages? If yes, how do they render after being corrupted? If no, does some interaction with Evolution/other client cause the breakage?
Component: Untriaged → Database
Product: Thunderbird → MailNews Core
(In reply to Christoph Anton Mitterer from comment #2)
> a) AFAICS, TB _always_ uses mbox as it's local format, (snip)

As you see in Tb's local mbox file you attached, and as you say, line starts with "From " is escaped by adding ">" because Tb uses Unix Mbox format.

> Better said, when there should never be any reason to write them to disk as mbox.

It's current design/implementation which is inherited from Netscape Messenger.
Please wait for completion of "Plugable mail stores"(Bug 402392 is first step) and further enhancements in Tb's mail store format support.

> b) TB seems to corrupt the messages even then, when they are never written to disk.

What do you call by "corrupt"?
Because of "Unix Mbox format", escaping of "From " line in held mail data is mandatory.

Please note that fetched/downloaded mail data is always written in Disk file except when Memory Cache is used for very small mail.
- Local mail folder(POP3/Local Folders)
    Unix Mbox format is used, so escaping of "From " is executed by Tb.
- IMAP, offline-use=Off folder, saved in Memory Cache :
    Not held in Disk. Not Unix Mbox format, so no escaping.
- IMAP, offline-use=Off folder, saved in Disk Cache :
    Held in local Disk Cache file on Disk. Not Unix Mbox format, so no escaping.
- IMAP, offline-use=On folder, saved in offline-store file.
    Held in local offline-store file on Disk.
    Unix Mbox format is used too, but escaping of "From " is not done by Tb,
    because Tb won't re-parse mail data stream once saved in offline-store file.

> I set up two IMAP accounts, the first from which the attached test-message
> is read, the 2nd to which it is immediately moved via TB's mail filters.
> The 2nd IMAP server uses maildir.
> But the mail appears already corrupted there.

Can you attach followings?
- NSPR log
  - SMTP log for initial mail send to first IMAP account,
  - IMAP log for fetch from first IMAP account,
  - IMAP log for filter move from first IMAP to second IMAP
    and fetch from second IMAP server.
- Sent mail copy in IMAP Sent folder. Save as .eml and attach it, please.
  Note: As you know, "From " is escaped in local mbox file,
        so IMAP Sent is needed.
For getting log, see bug 402793 comment #28.
  NSPR_LOG_MODULES=timestamp,smtp:5,imap:5,MsgCopyService:5

For "corruption" you call.
Followings?
(a) When mail view of mail in local mail folder(POP3/Local Folders"),
    "> added by Tb for escaping due to Unix Mbox" is not removed.
(b) Escaping of "From " line is inconsistent in mail composition of Tb.
    In composition, space is used in some cases but ">" is used in some cases,
    and typed "From "line is not escaped in some cases if format=flowed is used.
    Escaping character may not be removed by Tb upon mail send.
(c) When mail copy from local mail folder to IMAP folder,
    escaping character is not removed correctly by Tb in some cases(or always).
Summary: TB apprently uses mboxo format with local(8), which irrecoverably corrupts mail → TB apprently uses mbox format with local(8), which irrecoverably corrupts mail
(Reporter)

Comment 5

7 years ago
I corrected the bug summary back:
a) The "local(8)" part was stupid and came from me copy&pasting from a similar bug in postfix

b) the trailing o in "mboxo" is intentional!
"mbox" is not a real format, but rather a family of sub-formats, namely mboxo, mboxrd, mboxcl and mboxcl2
Summary: TB apprently uses mbox format with local(8), which irrecoverably corrupts mail → TB apprently uses mboxo format, which irrecoverably corrupts mail
(Reporter)

Comment 6

7 years ago
(In reply to :aceman from comment #3)
> Is TB itself corrupting the messages? If no, does some interaction with
> Evolution/other client cause the breakage?

Yes, it's TB itself,... the others were just examples of clients that suffer(ed) from this problem, too.


> If yes, how do
> they render after being corrupted?
More in my reply to "WADA"'s comment.
(Reporter)

Comment 7

7 years ago
Okay let me see... whether I can clarify things even more =)


(In reply to WADA from comment #4)
> As you see in Tb's local mbox file you attached, and as you say, line starts
> with "From " is escaped by adding ">" because Tb uses Unix Mbox format.
Yes, the problem (again see my example file) is, that it does NOT quote lines that are already quoted.
For example:
While
"From foo" get's correctly quoted to ">From foo"
a line that is already quoted like:
">From foo" or ">>>From foo" does not get quoted to ">>From foo" respectively ">>>>From foo".

Therefore it's quite obvious, that the quoting cannot be undone, which is why one get's irrecoverable corruption.


> It's current design/implementation which is inherited from Netscape
> Messenger.
> Please wait for completion of "Plugable mail stores"(Bug 402392 is first
> step) and further enhancements in Tb's mail store format support.
I guessed that this had historical problems.
In the end this has just some "side-effects" on the actual corruption issue, namely that everything is affected by the corruption I've described, while other MUAs (my example was Evolution) who do not store all their mails intermediately into mbox, are at least safe, when e.g. directly moving mail from (IMAP or maildir) to (IMAP or maildir)... assuming of course, that the IMAP server would use a storage format that doesn't corrupt mail (i.e. not mboxo - not the trailing o)


> What do you call by "corrupt"?
> Because of "Unix Mbox format", escaping of "From " line in held mail data is
> mandatory.
Yes, one needs to somehow "quote" "From " lines in order to avoid "phantom messages" when a message body contains a "From " line.
But as I described above, the corruption comes from not quoting already quoted "From " lines, which may also easily appear in any message body. See above.


> - Local mail folder(POP3/Local Folders)
>     Unix Mbox format is used, so escaping of "From " is executed by Tb.
=> corruption appears.

> - IMAP, offline-use=Off folder, saved in Memory Cache :
>     Not held in Disk. Not Unix Mbox format, so no escaping.
=> then I'd guess (though I haven't verified)... no corruption

> - IMAP, offline-use=Off folder, saved in Disk Cache :
>     Held in local Disk Cache file on Disk. Not Unix Mbox format, so no
> escaping.
What format are you then using?

> - IMAP, offline-use=On folder, saved in offline-store file.
>     Held in local offline-store file on Disk.
>     Unix Mbox format is used too, but escaping of "From " is not done by Tb,
Uhm? Who else would do the quoting then? It can't be the IMAP server, cause he doesn't know anything about "your" local storage format.


> Can you attach followings?
Before I do (which is quite some effort, as only one of the two IMAP servers is under my control).
1) I described something incorrectly:
Actually I did not move from IMAP to IMAP, but from POP3 to IMAP. (sorry)
Not sure whether this makes a difference for TB.

So are you sure all this is really needed? As far as I understand, that the corruption appears here because TB stores the mail from the first server (actually a POP3) locally as mbox, before transferring it to the 2nd IMAP server.

I can check this evening, whether it really happens when moving from IMAP to IMAP.


> For "corruption" you call.
> Followings?
> (a) When mail view of mail in local mail folder(POP3/Local Folders"),
>     "> added by Tb for escaping due to Unix Mbox" is not removed.
- first... this should be independent of POP3 or IMAP (speaking in a transport sense), as both don't need to quote "From " lines, right?!
(of course, the POP3 / IMAP servers somehow store their mail on their side.... and if mbox is used _there_ quoting has to be done correctly).

- second... with viewing I'd expect you mean "see the mail in the GUI"...
If the quotes (">") are not removed there... than this is IMHO also a bug... (because you show the user wrong content)... but it's not what I refer to in this issue,... this issue is really about only quoting parts when storing mail (and I guess when importing mboxes the problem happens, too)... and thereby irrecoverably corrupting the mail.
Right now, you wouldn't be able (even if you'd want to) to correctly unquote the mails (which IS the very problem).


> (b) Escaping of "From " line is inconsistent in mail composition of Tb.
>     In composition, space is used in some cases but ">" is used in some
> cases,
>     and typed "From "line is not escaped in some cases if format=flowed is
> used.
>     Escaping character may not be removed by Tb upon mail send.
I'm not sure what you mean here.
When composing a mail, then per se no "From " line quoting/unquoting should be needed... this always comes only when one stores to / reads from mbox files.

What however some MUAs actually do is "quoting" "From " lines at the mail level,... what I mean is, the quote it as quoted printable.
e.g. "From " needs to become "=46rom " and of course already quoted lines like ">From " need to become e.g. "=3EFrom ".

But this doesn't really solve the problem... it just helps broken clients/tools/servers, which receive mail composed on such a quoting MUA, to not corrupt such messages when storing them in mbox.


> (c) When mail copy from local mail folder to IMAP folder,
>     escaping character is not removed correctly by Tb in some cases(or
> always).
Would actually be a bug to... but same is in (a): this is just a follow-up problem... right now it's even worse... even if we wanted to correctly unquote "From " liens when sending from a local folder to an IMAP folder... we couldn't do, as "From " lines are only quoted on the first level.

HTH,
Chris.
(Reporter)

Updated

7 years ago
Attachment #678184 - Attachment description: thunderbird.mbox → thunderbird.mbox (as actually written by TB)
(Reporter)

Updated

7 years ago
Attachment #678184 - Attachment mime type: application/octet-stream → text/plain
(Reporter)

Comment 9

7 years ago
I've added a version of the mbox file, as TB _should_ have generated it.
diff the two, to see what I'm talking about... and yes the difference seems little, but actually this is the irrecoverable corruption... which is btw. not only a "small" change of message content... it would e.g. also break any (crypto) signatures.

One further note regarding the SMTP transcript:
Of course, in the actual SMTP session, the lines were CRLF terminated and not just LF, as in the attached file.

Comment 10

7 years ago
This is hardly a "blocker" though (which implies preventing MailNews development) and I don't even see a dataloss condition that would justify "critical" status, unless messages are somehow split or combined as an artifact of the issue.
Severity: blocker → major
This is a duplicate of either bug 121947, if you would rather that Thunderbird unescape the 'From' in mbox files, or bug 58308, if you would rather have Thunderbird implement maildir.
(Reporter)

Comment 12

7 years ago
(In reply to rsx11m from comment #10)
> This is hardly a "blocker" though (which implies preventing MailNews
> development) and I don't even see a dataloss condition that would justify
> "critical" status, unless messages are somehow split or combined as an
> artifact of the issue.

Well given that it is impossible to reconstruct the original message even with manual intervention... and given that TB is a MUA whose core business is mail... I'd say that such corruption of this data deserves the highest possible severity.

It's as if a DBMS like Postgresql would arbitrarily change data stored in it,...
(Reporter)

Comment 13

7 years ago
(In reply to Joshua Cranmer [:jcranmer] from comment #11)
> This is a duplicate of either bug 121947, if you would rather that
> Thunderbird unescape the 'From' in mbox files,
Please read my comments from above!

I don't think it's covered by #121947. Rather #121947 is depending on this issue.
With the currently used mbox format it's simply not possible to correctly unquote mails, for whichever use (displaying, copying to IMAP, etc.)


> or bug 58308, if you would
> rather have Thunderbird implement maildir.
I don't see why it should be a duplicate of this... right maildir doesn't suffer from this problem as it doesn't need separating "From " lines, but there's a simple solution to make mbox non-corruptive, namely mboxrd.
(Reporter)

Comment 14

7 years ago
btw: These bugs you've mentioned (and the long line of duplicates over the years) is really disturbing.

(After a short glance) it seems that most simply asked that you should unquote ">From " lines when displaying mail, without noting that it's not possible to do this correctly as the quoting is incorrectly done in the first place.

But it's really disturbing that no developer ever noted that more goes on here and that by that, millions of mails of TB users were silently and irrecoverably corrupted over the years.


Arguably, most other projects (getmail being the only notable exception so far) where I reported this problem behaved rather similarly, trying to deny severity, hide the issue behind other bugs and refusing to publicly warn their users on what has happened over the years.

I mean mistakes can happen, even critical ones, but as far as it applies, I considered this behaviour from that projects really outrageous.

Mail is the core data of MUAs and mail tools,... unless perhaps cryptography problems in mail signing/encrypting, there can't be anything of less severity as corruption to that core data, especially if this is not even manually reversible.
(In reply to Christoph Anton Mitterer from comment #12)
> (In reply to rsx11m from comment #10)
> > This is hardly a "blocker" though (which implies preventing MailNews
> > development) and I don't even see a dataloss condition that would justify
> > "critical" status, unless messages are somehow split or combined as an
> > artifact of the issue.
> 
> Well given that it is impossible to reconstruct the original message even
> with manual intervention... and given that TB is a MUA whose core business
> is mail... I'd say that such corruption of this data deserves the highest
> possible severity.

Funny. When trying to search for a duplicate, I found an astounding lack of high-volume complaint bugs. That suggests that changing From to >From is relatively unnoticed by most people--at most, it's a cosmetic change. Note that searching for "From" will still find the right emails, etc. The only case where it's an important issue is where octet-for-octet equivalence is strictly necessary for correctness purposes (i.e., S/MIME and PGP signatures).

(In reply to Christoph Anton Mitterer from comment #13)
> I don't think it's covered by #121947. Rather #121947 is depending on this issue.
> With the currently used mbox format it's simply not possible to correctly unquote mails,
> for whichever use (displaying, copying to IMAP, etc.)

If you morphed bug 121947 into "treat the mbox as mboxrd", I don't think anyone would complain, since it's the only reasonable way to fix that bug in a mbox world (read comment 0).

> > or bug 58308, if you would
> > rather have Thunderbird implement maildir.
> I don't see why it should be a duplicate of this... right maildir doesn't
> suffer from this problem as it doesn't need separating "From " lines, but
> there's a simple solution to make mbox non-corruptive, namely mboxrd.

How then do you propose to distinguish existing mboxo implementations from mboxrd files?

We plan to implement maildir anyways in the long run, which would make this issue relatively moot.

(In reply to Christoph Anton Mitterer from comment #14)
> But it's really disturbing that no developer ever noted that more goes on
> here and that by that, millions of mails of TB users were silently and
> irrecoverably corrupted over the years.

As far as corruption goes, this is relatively minor: it is often little more than a cosmetic nuisance. True corruption is complete loss of email contents, translation into total incomprehensible gibberish, etc.

> Arguably, most other projects (getmail being the only notable exception so
> far) where I reported this problem behaved rather similarly, trying to deny
> severity, hide the issue behind other bugs and refusing to publicly warn
> their users on what has happened over the years.

I won't deny that this is an issue. If you want to fix it, be my guest. I will point out, though, that the effort that would go into modifying all the places where we read or write mbox files to use a slightly different variant would be much better spent replacing mbox with a file-per-message system that also fixes several other major issues (like eliminating the need for mailbox compaction, probably the biggest source of true dangerous dataloss bugs).
Status: UNCONFIRMED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 121947
(Reporter)

Comment 16

7 years ago
Yeah well it's quote outrageous to see this bug hidden behind an only partially connected issue, even one of low importance.
I personally don't use TB and when I see how severe bugs are handled I can just say: thanks god.

Nevertheless, I reported this issue out of courtesy towards TB users.



Given that you ignore my arguments and come up with excuses like "it's just cosmetic", which is trivially proven wrong, I don't see much use on me commenting on further technical issues here.


(In reply to Joshua Cranmer [:jcranmer] from comment #15)
> relatively unnoticed by most people--at most, it's a cosmetic change. Note
> that searching for "From" will still find the right emails, etc. The only
> case where it's an important issue is where octet-for-octet equivalence is
> strictly necessary for correctness purposes (i.e., S/MIME and PGP
> signatures).
It's not just this,.. what if you send code that should later be reused somewhere...

Anyway.. the way you argue is like if glibc would find an error in pow()... it would be fine if it's just little and noone notices...
Really,... I can't believe it...


> If you morphed bug 121947 into "treat the mbox as mboxrd", I don't think
> anyone would complain, since it's the only reasonable way to fix that bug in
> a mbox world (read comment 0).

Well it's not the only way (there's mboxcl/mboxcl2), but arguably mboxrd is probably the better choice for most systems.

Feel free of course to rewrite the bug you've mentioned.
I think my issue here described the problem in detail with solutions... so I can't quite see why I should do the same under another bug number.

And I still think that the bug(+ it's duplicates) you've mentioned does not quite fit, as they're just about unquoting on showing messages.



> How then do you propose to distinguish existing mboxo implementations from
> mboxrd files?
There's no way to do so.
You can't differ between mboxo/mboxrd and not between mboxcl/mboxcl2.
Even differing between those two classes is rather heuristics...

But I don't think differing is needed at all:
- The old mboxo messages are corrupted anyway... and there's no way to recover them unless by guessing.
- When TB would now switch to mboxrd, it would of course still be impossible to generate the real message content out of something that was stored previously as mboxo.
But we wouldn't loose anything-

Further, mboxrd (and this is the good thing about it) is compatible to all tools... those expecting mboxo, mboxrd and even those expecting mboxcl/2.

Of course, only when a program actually expects mboxrd, the original mail will be restored... in any other case (not necessarily with mboxcl (v1)) you may just end up with wrong quoting/unquoting.
But right now, with mboxo, you'd do so in _any_ case.


> We plan to implement maildir anyways in the long run, which would make this
> issue relatively moot.
This is what evolution does and I think it's not necessarily a good idea.
Of course maildir has it's advantages (no need to change the mail, not even for status info.... and no locking issues) it also has disadvantages:
- some bit waste of storage
- far slower full text search
- not possible in all setups due to e.g. filesystem limitations.

So it's great if you'd support maildir... but only as a further option.


Anyway... the switch to mboxrd should be technically really simple.
So I can't quite understand what keeps from doing it.

> As far as corruption goes, this is relatively minor: it is often little more
> than a cosmetic nuisance. True corruption is complete loss of email
> contents, translation into total incomprehensible gibberish, etc.
I disagree..
Even small changes of data can be important. Just because you personally think you can live with it in the circumstances you can imainge, it doesn't mean that anyone else can. Take my pow(3) example from above.

In a sense complete loss would be even better, because then people would at least notice... right now it's silent data corruption.


> I won't deny that this is an issue. If you want to fix it, be my guest. I
> will point out, though, that the effort that would go into modifying all the
> places where we read or write mbox files to use a slightly different variant
> would be much better spent replacing mbox with a file-per-message system
> that also fixes several other major issues (like eliminating the need for
> mailbox compaction, probably the biggest source of true dangerous dataloss
> bugs).

Isn't it simply grepping for all occurrances of the "From " string... and at each place, where quoting occurs extenting that to also quote such "From" lines, which have already one or more ">".
(And the same in the other direction on unquoting)

?

Comment 17

7 years ago
(In reply to Christoph Anton Mitterer from comment #12)
> (In reply to rsx11m from comment #10)
> > This is hardly a "blocker" though (which implies preventing MailNews
> > development) and I don't even see a dataloss condition that would justify
> > "critical" status, unless messages are somehow split or combined as an
> > artifact of the issue.
> 
> Well given that it is impossible to reconstruct the original message even
> with manual intervention... and given that TB is a MUA whose core business
> is mail... I'd say that such corruption of this data deserves the highest
> possible severity.

As I've tried to explain, there are reasonably well defined criteria for the "blocker" and "critical" severity flags. The most severe issues are those where, e.g., compilation becomes impossible, that's a "blocker". Then, crashes of the application or loss of messages or entire folders qualify for the "critical" severity. The next highest level is "major" for bugs not qualifying for the two top levels, and that where I've moved your report which is "the highest possible severity" in this case.

(In reply to Christoph Anton Mitterer from comment #16)
> Feel free of course to rewrite the bug you've mentioned.
> I think my issue here described the problem in detail with solutions... so I
> can't quite see why I should do the same under another bug number.

It is desirable to combine multiple bugs on the same issue into single ones to avoid that the discussion is spread across various reports. This bug and bug 121947 are not necessarily identical but related in that they try to resolve issues with the "From" quoting/escaping inherent to the mbox* format. It's not uncommon to combine related bugs into a single one as well if the solution appears to be the same in both cases.

Alternatively, this bug here could be reopened and a formal dependency established one way or the other. But, it's likely that fixing one bug would also solve the other in the process, thus duping the bug here to the older one with the suggestion to expand its scope sure is a viable approach.
(Reporter)

Comment 18

7 years ago
Hey...

(In reply to rsx11m from comment #17)
> As I've tried to explain, there are reasonably well defined criteria for the
> "blocker" and "critical" severity flags. The most severe issues are those
> where, e.g., compilation becomes impossible, that's a "blocker". Then,
> crashes of the application or loss of messages or entire folders qualify for
> the "critical" severity.
Well I guess one can always argue about severity levels and what level stands for what.
Many projects make up their own definitions for this...
Now if blocker is the highest possible one,... and it would be compilation errors.. that would sound quite strange for me... you see these errors immediately and can simply go back to the most recent working versions.

If critical is what you intend for loosing messages, well that it's just this isn't it?
While you don't loose full messages you loose message integrity... and it's neither upon me nor upon TB decide whether a user wants to live with "cosmetic" corruptions or not.

Anyway...


> It is desirable to combine multiple bugs on the same issue into single ones
> to avoid that the discussion is spread across various reports. This bug and
> bug 121947 are not necessarily identical but related in that they try to
> resolve issues with the "From" quoting/escaping inherent to the mbox*
> format. It's not uncommon to combine related bugs into a single one as well
> if the solution appears to be the same in both cases.
In principle you're right,.. but what likely happens now is the following:
The previous bugs all underestimated the real severity of this problem, which is shown by this issue probably existing since... ever?
At least the first "related" bug is 10 years ago... so in principle 10 years of time that TB could have known about this corruption and 10 years of time it could have saved its users from it.

I know from experience that people tend to only look at the "first" bug of such a series of "duplicates" and most other information gets lost.

I had similar discussions with the really stubborn (not to say stupid) Evolution upstream, where more or less the same happened... marking the bug as a duplicate... and the "first" report has been even marked as notabug.


> Alternatively, this bug here could be reopened and a formal dependency
> established one way or the other.
I think that would be better.

> But, it's likely that fixing one bug would
> also solve the other in the process
Probably,... well that depends largely how TB internally processes this.

IMHO the first important point is that all exporting/moving/copying/importing  from/to mbox is fixed (i.e. moved to mboxrd).

Cause this is something which people will never be able to correct.

If the "viewing" thingy is the same code... fine... if not, that would have IMHO less priority.


In the end I guess all technical information has been said, right?
- mboxo => irrecoverable corruption
- mboxrd => largely compatible, no corruption, at least if the clients/tools interpret mboxrd.

Further I've pointed out that:
- it's none of our business to decide whether users wants to live with even just "cosmetic" corruptions
- and that we should broadly/publicly warn users (even when the issue should have been fixed) on what has happened over the past years with their mail

Comment 19

7 years ago
(In reply to Christoph Anton Mitterer from comment #18)
> Well I guess one can always argue about severity levels and what level
> stands for what.

While there are subjective differences in severity of an issue, those two levels are fairly well defined in the context of this bug-tracking system and should be reserved for those purposes.

> Now if blocker is the highest possible one,... and it would be compilation
> errors.. that would sound quite strange for me... you see these errors
> immediately and can simply go back to the most recent working versions.

You are underestimating the complexity of the build system. A change in Firefox may break Thunderbird and other Mozilla applications as they share code (and has done so in various instances in the past). Such bugs obviously *do* have a blocker severity and need to be addressed immediately. 

> I know from experience that people tend to only look at the "first" bug of
> such a series of "duplicates" and most other information gets lost.

Which is why I've added a comment with a 3-line summary of the discussion here so that people can read about the differences among mbox formats and where the root of the problem is. I'm not willing to drive that further.

Comment 20

7 years ago
Hi Christoph Anton Mitterer,

You may want to look at Bug 121947 comment 17.
There you may notice that it's not only you who see the serevity of the bug. Other had expressed similar thoughts and similar suggestions as well.

The only thing you seem to misunderstand (like I did in time when I submitted that comment) is that:
The "trivial fix" as you see it is absolutely impossible the way you put it. The legacy of the old code base is very poorly architectured in the first place. You cannot simply enumerate all places writing/reading the mbox data. This was never implemented like a "class" or a "lib"; rather, it was initially implemented "from scratch" in every single place (slightly exaggerating, just to clarify the point). The _only_ possible approach is to isolate the storage code to a dedicated layer, to enable further processing. And TB devs do realize this, and do address this in their Pluggable Store work: Bug 402392 is already "fixed", and it is now possible to start implementing other stores (like a clone of maildir that David had implemented himself, or mboxrd that you could try to implement as a interested volunteer). By the way, you can already switch to maildir as a temporary workaround (though it is still experimental; you may find details in Bug 58308 comment 174 and 176).
But the pluggable store work is not finished yet, and as a major change it is likely to introduce a lot more new due work.

So the summary is:
it is not true that noone sees the problem (or its severity);
it is not enough to declare the "right way" of fixing the bug;
it is not wise to judje which way of handling bug reports is better for a project based on your point of view untill you become this project developer; if you aren't going to do some coding, then your effort filling the bug is really important and appreciated, but please let the devs to handle it as is best for them.
(Reporter)

Comment 21

7 years ago
I'll reply to the most recent comments tomorrow... but in the meantime:

I made some tests with IMAP... and there things look a bit better:

When viewing message source, saving mails, or filtering mails to other IMAP servers... from an IMAP folder.... no corruption seems to occur.

Apparently mail is then _not_ stored in an mboxo format (and thereby corrupted) for temporary purposes... can anyone confirm this?

BUT:
When I check that "store mails offline" thingy (was disabled for the tests above)... things get a bit weird:
Looking at the ImapMail folder.. I see the test message in an mbox like file... _not quoted at all_.
Meaning, not even just the normal "From " lines are quoted as in mboxo.

Now I don't know how far the files in ImapMail are used, but when a user comes along and takes them as normal mbox<whatever> files... he will get some even more sever form of corruption (phantom messages).

Actually I'd have expected to see such phantom messages in TB itself, when I start it offline, expecting that it would use those mbox files then.
But I didn't... no idea how you determine message start/end there.

I tried further by taking the exact "From - ..." lines as TB generates them... no success in seem my phantom message... deleted all the .msf files ... then the _no_ offline message appears at all..


Actually I found another mbox problem... (have only checked it in the offline IMAP case... in the files in ImapMail)...
Every mbox<*> format asks that a emtpy line is put after each message (before each except the first "From " line).
It seems TB haven't added this newline.

In principle... mbox parsers _SHOULD_ not rely on that newline (speaking in the RFC meaning of SHOULD/etc.)... but it's a SHOULD... not a MUST NOT.

I'll recheck that tomorrow.
(Reporter)

Comment 22

7 years ago
Oh btw,.. a personal question:

Can any of the TB-mbox gurus tell me (for sure) whether TB ever (in any version) used the mboxcl or mboxcl2 formats (i.e. those that use a Content-Length: mail header to determine the next message's start)?


Thanks,
Chris.
(Reporter)

Comment 23

7 years ago
(In reply to rsx11m from comment #19)
> You are underestimating the complexity of the build system. A change in
> Firefox may break Thunderbird and other Mozilla applications as they share
> code (and has done so in various instances in the past). Such bugs obviously
> *do* have a blocker severity and need to be addressed immediately. 
I guess in the end we can argue this back and forth forever,... and I'm pretty sure that you won't be able to convince me that bugs like what you describe (i.e. build problems affecting other projects or so) count higher than the user's security and/or data integrity. ;-)


(In reply to rsx11m from comment #19)
> Which is why I've added a comment with a 3-line summary of the discussion
> here so that people can read about the differences among mbox formats and
> where the root of the problem is. I'm not willing to drive that further.
(see my upcoming comment on Mike Kaganski's comment)
(Reporter)

Comment 24

7 years ago
(In reply to Mike Kaganski from comment #20)
> You may want to look at Bug 121947 comment 17.
> There you may notice that it's not only you who see the serevity of the bug.
> Other had expressed similar thoughts and similar suggestions as well.
Well this more or less shows exactly what I was talking about... I haven't read that comment in detail, but assuming you're right, this means the real problem is now known for at least nearly two years... right?
And still... nothing has really happened, has it?


> The only thing you seem to misunderstand (like I did in time when I
> submitted that comment) is that:
Well in all doing respect... and this is not about offending you or other developers...

But if such an issue is apparently known for such a long time without anything happening... than there's really something wrong in the project philosophy.

It shows that I'm not so wrong with what I said, that such an issue is hidden behind other things.


Or, even worse, it would prove that there is quite some arrogance ruling, in that that TB could decide which kind of corruption was "not severe" and users should have to live with...


> The "trivial fix" as you see it is absolutely impossible the way you put it.
> The legacy of the old code base is very poorly architectured in the first
> place.
... even if that's the case... than IMHO this doesn't count as an excuse.
Because TB would still have had the chance to at least broadly warn users what's going on, right? Of course it wouldn't have been good publicity if one tell user: "Hey TB currently corrupts your mail, but we think you can live with it".

As I found out yesterday... things seem to be safe with IMAP... so at least TB could have warned it's users... "use IMAP, if your want your messages to be not irrecoverably modified".


> The _only_ possible approach is to
> isolate the storage code to a dedicated layer, to enable further processing.
> And TB devs do realize this, and do address this in their Pluggable Store
Fine... I have nothing against such structural improvements... but still I think, they don't justify in any way, that users are exposed to data corruption.


> So the summary is:
> it is not true that noone sees the problem (or its severity);
Ok.. fine.. but then people should really ask themselves whether this doesn't mean that things are even worse.


Don't get me wrong,.. I know this is open source and many programmers do this just in their spare time.
I don't know what the current status between Mozilla<->TB is... but at least for some time it was officially supported... and Mozilla "earns" quite some money and tries to convince people to trust them and use their products, because their better... so form that point of view, I think there is some responsibility of quickly fixing and publishing such an issue.

Even when leaving out Mozilla as a big funding foundation and just talking about individuals working on TB in their spare time, I think there is a responsibility to broadly and publicly announce about such an issue.


Take the Debian-OpenSSL debacle... they could have also decided to keep this more or less secret... but it affected so many users that it was simply necessary to spread the news basically everywhere.


Just my 2 cents.
(Reporter)

Comment 25

7 years ago
Updating my comment 21:

I checked these things again and the following comes out:

when _reading_ mail from POP3:
- any viewing/saving of the message is also corrupted
- if mail is moved to some other place (other local folders, other IMAP servers) even when done "immediately" with filters... it stays corrupted.

- mails are written as mboxo (corrupted)
- the empty line between a mail and the next From_ line is generated




when _reading_ mail from IMAP:
- any viewing/saving of the message is _not_ corrupted
- if mail is moved to some other place (other IMAP servers - but of course NOT other local folders) even when _not_ done "immediately" with filters... it does _not_ get corrupted.

- the mails are cached(?) in some weird mbox-like format in ./ImapMail/.. which is however not compatible to any other mbox (not even mboxo) format.
- no quoting of From_ lines happens at all, thus, if this mbox-like file would be used by other programs, one would get severe corruptions and phantom messages.
- the empty line between a mail and the next From_ line not generated
- even when offline, when I'd expect these cached-mbox-like-files are used, any viewing/saving of the message is _not_ corrupted (don't know how you parse them)



I have not checked anything with respect to "importing" files in TB... whether it handles this correctly or not.

All cases of mbox<*> formats generated by TB that I've seen have separating From_ that are not really "standards compliant".

Yeah I know,.. there is no "definite" mbox standard... but usually RFC 4155 and/or qmails mbox manpage are taken as standards.
Both these documents imply about the following:
From <envelope address> <arrival date>

TB (in the cases I've seen) always writes:
From - <arrival date>

No idea why this happens, I don't think it's just a issue in my setup, cause other MUAs get the info correctly from my POP3 / IMAP servers (I test with many of them).

I don't think that this is very critical... nevertheless,.. for the sake of being "standards-compliant" TB should use the widespread format.

Comment 26

6 years ago
I agree that today the generally recommended and by-far sanest version of the "mbox" file format is the "mboxrd" variant defined in

  http://www.qmail.org/qmail-manual-html/man5/mbox.html

It has the huge advantage over all other variants that an mbox export/import cycle never corrupts any data (added/dropped > characters), and achieves that without relying on any Content-Length headers.
You need to log in before you can comment on or make changes to this bug.