imported .eml file should be preserved 1:1, but import is destructive/datalossy: TB115 appends CRLF, TB128 kills FROM (using maildir)
Categories
(MailNews Core :: Backend, defect)
Tracking
(Not tracked)
People
(Reporter: chrizilla, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: dataloss, ux-trust)
Attachments
(2 files)
summary
Importing an .eml file into Thunderbird should be lossless/non-destructive, but it is destructive/datalossy.
steps to reproduce
- use a maildir mail account (not mbox)
- import an .eml file via drag+drop from outside Thunderbird into TB's folder pane
- navigate in a file manager to TB's profile folder\Mail
- find the newly added .eml file
- hex-compare the new file (from step 3) to the original file (from step 1)
expected results
The import should be lossless/non-destructive.
The imported .eml file should be preserved 1:1.
The file created in TB's profile folder and the original file should be 100% identical.
actual results
The import is destructive/datalossy:
-
TB 115 @ Win10/64bit:
- appends CRLF (this is bug 1888585)
- does not kill 1st line
-
TB 125.0b4 @ Win10/64bit
- does not append CRLF (so fixes bug 1888585 apparently ?)
- but unlike TB115, it kills 1st line (
From
). Regression to TB 115 ?
-
TB 128.0a1 (2024-05-17) (64-bit) @Win10/64bit:
- same behavior as reported above for TB 125
See the screenshots in the comments below.
I double-checked all results.
reiteration / generational loss
- in TB 115: a repeat of the STR (i.e. re-importing the file created in step 3 again) appends yet another CRLF, so the file grows bigger and bigger in size with each iteration !
- in TB 128: untested
relevance
- unnecessary and unintended alteration of the data without users' knowledge, who trust their files to be preserved 1:1 (ux-trust)
- data-destructive process which should be lossless
- since the affected files have changed and are not identical any longer, this bug complicates/impairs :
- file deduplication
- incremental/differential backup
[keywords: extra newline, linefeed]
image content: screenshot of file comparison (middle section omitted for simplicity)
left side: original .eml file
right side: import to TB 115 adds CRLF
image content: screenshot of file comparison (middle section omitted for simplicity)
left side: original .eml file
right side: import to TB 128 kills 1st line (from)
Isn't the From line the TB specific mailbox delimiter. For maildir that should be removed so all messages follow the same format. Wouldn't that have been rectified in bug 1719121?
Comment 4•9 months ago
|
||
Agreed with comment 3, the initial From line is mbox specific and should get removed.
I guess this makes this bug WFM.
I am in the dark. I would be very thankful if you could explain to me:
-
The date after
From -
is not redundant (I checked the msg source). So if theFrom -
line is removed, the data is lost. Why do we want to lose data (generally speaking) ? -
Unless I am mistaken, the date/time stored in the
From -
line is the moment, when I received the message. I don't want to lose that information. Why is it removed ? Can't we keep it ? -
Until version 115, Thunderbird never removed the
From -
line. Why now? Why the change in behaviour ? -
I don't understand why comments #3 and #4 both refer to mbox. As mentioned above, all 3 of my tested versions (115, 125, 128) have maildir storage, not mbox. And the email(s) I tested with were all received/exported/imported on maildir. They never touched anything mbox-related. The only explanation I can imagine: These are some mbox-vestiges in TB's maildir code and that's why you talk about mbox ?
Comment 6•9 months ago
|
||
(In reply to chrizilla from comment #5)
- The date after
From -
is not redundant (I checked the msg source). So if theFrom -
line is removed, the data is lost. Why do we want to lose data (generally speaking) ?
It's not meant to be data. In fact, we don't even produce the date part anymore (just "From")
- Unless I am mistaken, the date/time stored in the
From -
line is the moment, when I received the message. I don't want to lose that information. Why is it removed ? Can't we keep it ?
See above.
- Until version 115, Thunderbird never removed the
From -
line. Why now? Why the change in behaviour ?
From is an mbox specific construct, we now fixed the bug that made it appear in maildir.
- I don't understand why comments #3 and #4 both refer to mbox. As mentioned above, all 3 of my tested versions (115, 125, 128) have maildir storage, not mbox. And the email(s) I tested with were all received/exported/imported on maildir. They never touched anything mbox-related. The only explanation I can imagine: These are some mbox-vestiges in TB's maildir code and that's why you talk about mbox ?
Well, because From is mbox specific, and any usage of it in maildir was never intended.
(In reply to Magnus Melin [:mkmelin] from comment #6)
we don't even produce the date part anymore (just "From")
So now the date/time when an email arrives in Thunderbird (which is an important information) is not stored anymore ?
Comment 8•9 months ago
|
||
Correct. Importance is in the eye of the beholder, but that information was never accessible except for if you dig through the raw mbox looking for it.
In TB 115 it's very easily accessible by just pressing CTRL+U
or view > message source
!
What need is there for the removal of this (IMO important) data ?
Comment 11•9 months ago
|
||
What Magnus said :-)
(In reply to chrizilla from comment #5)
I am in the dark. I would be very thankful if you could explain to me:
- The date after
From -
is not redundant (I checked the msg source). So if theFrom -
line is removed, the data is lost. Why do we want to lose data (generally speaking) ?
The initial "From " line is explicitly part of the mbox format (it separates the individual messages), and not part of the RFC5322 mail message format.
- Unless I am mistaken, the date/time stored in the
From -
line is the moment, when I received the message. I don't want to lose that information. Why is it removed ? Can't we keep it ?- Until version 115, Thunderbird never removed the
From -
line. Why now? Why the change in behaviour ?
It is (usually) the reception time, but it's a kind of mbox-specific out-of-band piece of info. There's no counterpart in maildir, and the internal plumbing in TB doesn't really make use of it (e.g. it gets lost when you copy messages between folders).
It's not really safe to rely on the format of anything after the "From " - mbox is more a spiky little bundle of similar file formats and implementations rather than a single one. Even if there's an rfc 4155 to define it, you can't guarantee that's the variant in use - TB itself has been all over the place over the years, and different parts of the program actually used to write out slightly different variants anyway! So now we just don't even bother. The big mbox revamp I did tried to put all the mbox-specific quirks into one single place so nowhere else has to deal with them.
- I don't understand why comments #3 and #4 both refer to mbox. As mentioned above, all 3 of my tested versions (115, 125, 128) have maildir storage, not mbox. And the email(s) I tested with were all received/exported/imported on maildir. They never touched anything mbox-related. The only explanation I can imagine: These are some mbox-vestiges in TB's maildir code and that's why you talk about mbox ?
Yes, every time you see a leading "From " line, that's an mbox vestige. I'm sure there are a few lurking about still (although we're pretty tolerant of them). Everywhere used to assume that mbox was the only storage format. And the mbox-isms everywhere in the code made fixing a lot of maildir bugs nigh-on impossible. So we're in a much better place now.
I'm actually pretty sympathetic to the general principle that whatever gets sent to us gets stored verbatim in maildir. And mostly I think that is the case (I try really hard to make everything byte-exact).
The "From " lines are probably the main exception - I lean toward the burn-it-with-fire camp there.
"From " lines should never be seen outside an mbox file. The rest of TB shouldn't even know they exist.
Hope that helps! We're a little hobbled by these kind of historical quirks, but slowly straightening things out.
Description
•