Closed Bug 1265153 Opened 8 years ago Closed 8 years ago

Linux version "save file" saves .eml with names unsuitable for NTFS, leading to Windows file system errors (also an ntfs-3g driver problem)

Categories

(Thunderbird :: Untriaged, defect)

38 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: matombo, Unassigned)

References

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160319202018

Steps to reproduce:

On Linux i saved a e-Mail with right click -> save as, on a ntfs file system


Actual results:

When i wanted to open the file from windows it tells me, "Could not open file"
running chkdsk tells me the index entry of that file is corrupted


Expected results:

the file should just be openable
Very hard to believe that application software would damage the filesystem, even if you have NTFS volumes mounted in Linux.

BTW, how does CHKDSK tell you about the index entry of a particular file. I've never seen this.

Magnus, I'd mark this invalid.
Flags: needinfo?(mkmelin+mozilla)
well not the whole filesystem is damaged but only the individual files created by thunderbird
(and only windows seems to have a problem with them)

in windows: "chkdsk d: /f" lists corrupted files in it's output

btw i run windows 10 and manjaro linux
Do you understand how NTFS volumes work on Linux? Typically they a handled by the ntfs-3g driver from http://www.tuxera.com. Many Linux distributions have outdated versions of this ntfs-3g driver, I know, because the first thing I do on any Linux installation is to upgrade those packages manually.

From a software point of view, TB talks to the Linux operating system and it writes to the disk via the respective disk driver. Under no circumstances could TB cause disk corruption. Your NTFS driver can, but we don't provide that. It is technically impossible for TB to write directly to the NTFS volume and corrupt it.

Personally I have run "chkdsk /f" many times in my live without ever seeing a specific file being reported. You could convince me otherwise if you can provide a screenshot or some sort of log entry. If you run "chkdsk" at start up, the result is logged as system event and you can paste it from the event viewer.

Having said all that, I will now boot into Linux and save a .eml file onto an NTFS volume, since I am set up that way, too.
I found out what was causing the problem:
When saving multiple e-Mails at once the file names are generated from the subject field
However when the subject has special characters they are in the filename as well, which is perfectly fine with linux, but is causing the problems on windows (in my case i had some emails with exclamation marks in the subject)
OK, I can confirm your problem using ntfs-3g "2014.2.15.A3.3".

The problem is in the file name, CHKDSK reports this:

Deleted invalid filename [Bug 1265153] linux versions "save file" saves corrupted .eml files on ntfs.eml (53397) in directory 896043.
File 53397 has been orphaned since all its filenames were invalid
Windows will recover the file in the orphan recovery phase.
Correcting minor file name errors in file 53397.

I had save the message with this subject:
"[Bug 1265153] linux versions "save file" saves corrupted .eml files on ntfs"

Note that no index error was reported.

We can do two things here: Sanitise the file name a bit better so it complies with Windows naming conventions.

More importantly you should report this error to Tuxera. Post the problem on their forum at http://tuxera.com/forum/ and let us know the link to your post here.

I repeat: The NTFS driver corrupts the NTFS volume, not Thunderbird. That driver should be able to handle whatever filename it is given.
Flags: needinfo?(mkmelin+mozilla)
I dug through this a bit more. The .eml file got "rescued" into a found.000\file0000.chk.
I can open that file and set the original .eml content.

Can you PLEASE report that at Tuxera!
(Or if you don't, I will. I know the guy behind ntfs-3g, he will attend to any real problem.)
Typo: I can open that file and *see* the original .eml content (after fixing the access rights).

Do you think we should sanitise file names a bit better on Linux so they comply with Windows rules?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(acelists)
Summary: linux versions "save file" saves corrupted .eml files on ntfs → Linux version "save file" saves .eml with names unsuitable for NTFS, leading to Windows file system errors (also an ntfs-3g driver problem)
(In reply to Jorg K (GMT+2) from comment #7)
> Typo: I can open that file and *see* the original .eml content (after fixing
> the access rights).
> 
> Do you think we should sanitise file names a bit better on Linux so they
> comply with Windows rules?

But how do you get that you are writing to a NTFS filesystem? Or do we sanitize everytime, thus punishing ext* users that could handle more of the special characters? Do we then do a list of hacks for different filesystems?

I think the bug is in the ntfs driver. It should have rejected a filename that can't be safely saved to the filesystem.

But I would not object to creating a sanitizer (that acted regardless of filesystem) that would strip all special characters (outside of a-zA-Z0-9-_) that could be enabled by a pref. The user could try to export 1000 msgs and the FS rejecting half of them could be a mess.
Flags: needinfo?(acelists)
(In reply to :aceman from comment #9)
> But how do you get that you are writing to a NTFS filesystem? 
We don't (go through the effort to find out).

> Or do we
> sanitize everytime, thus punishing ext* users that could handle more of the
> special characters?
I'd say so. Just a suggestion.

> I think the bug is in the ntfs driver. It should have rejected a filename
> that can't be safely saved to the filesystem.
Sure, I logged a bug with Tuxera. They should sanitise and resolve conflicts.
We already have the problem with lowercase and uppercase. "aaa.pdf" will clobber "AAA.pdf".

> But I would not object to creating a sanitizer (that acted regardless of
> filesystem) that would strip all special characters (outside of a-zA-Z0-9-_)
> that could be enabled by a pref.
Perhaps Magnus has an opinion.

> The user could try to export 1000 msgs and
> the FS rejecting half of them could be a mess.
Indeed.

We can also decide to do nothing.
Flags: needinfo?(mkmelin+mozilla)
OK, we got a reply at the Tuxera Forum.
http://tuxera.com/forum/viewtopic.php?f=2&t=31101&sid=aec6023d187ea7fe4496e0d2707dba02&p=39106#p39106

You need to set the "windows_names" mount option.

We won't implement sanitation, since, as pointed out, that would punish all users.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mkmelin+mozilla)
Resolution: --- → INVALID
Thank you very much for looking into this

I also would like an optional strict sanitizer
(In reply to Jorg K (GMT+2) from comment #11)
> You need to set the "windows_names" mount option.

One could certainly argue it's totally ridiculous for such an option not to be the default on NTFS. Why would someone use it if not for windows compatibility??
(In reply to Matombo from comment #12)
> I also would like an optional strict sanitizer
It makes more sense to have the NTFS driver do this.
I don't think that we would consider your wish if it were an enhancement request.
but should a driver really change filenames?
i think it's more the job to reject files (with a meaningfull error message)
and then the user can try to save it with a new filename

problem here is when saving multiple e-Mail from thunderbird the user can never enter any filenames (and even if, it would be tedious for 100+ e-Mails)

I think a switch somewhere to select the default naming conventions would be nice to have as some kind of stretch goal?

should i still create a enhancement request to discuss this under a suitable topic?
(In reply to Matombo from comment #15)
> should i still create a enhancement request to discuss this under a suitable
> topic?
Sure, but no guarantee it will get any attention.

BTW, why are you saving those .eml files? I do it once in a while to transfer a single message to some testing profile, but never in production use.
for backup reasons
then i can handle my emails like any other text file
And you do the backup to an NTFS volume. Hmm. The mbox file is text, too, you know.
If you want one message per file, you should use maildir storage.
(In reply to Matombo from comment #15)
> problem here is when saving multiple e-Mail from thunderbird the user can
> never enter any filenames (and even if, it would be tedious for 100+ e-Mails)
Yes, that is what I meant.

> I think a switch somewhere to select the default naming conventions would be
> nice to have as some kind of stretch goal?
> 
> should i still create a enhancement request to discuss this under a suitable
> topic?

Yes please. I would support such a sanitizer, provided it worked on all filesystems (and based on a preference). Some users (like me) may want filenames without special characters even if the FS supports them.
@jorg
ntfs because i want to be able to access it from windows as well (dual boot system) and sometimes i open up old emails from my backups
i guess there are smarter solutions but this one was the most easy and natural i could come up with xD
@aceman
vote here: https://bugzilla.mozilla.org/show_bug.cgi?id=1265255
See Also: → 1265255
You need to log in before you can comment on or make changes to this bug.