Open attachment uses wrong filename encoding (Latin-1/15 instead of UTF-8) if LANG set to something not UTF-8



OS Integration
3 years ago
2 years ago


(Reporter: heinz.repp, Unassigned)


31 Branch

Firefox Tracking Flags

(Not tracked)



(1 attachment)



3 years ago
using Thunderbird 31.3.0 on Ubuntu Trusty 14.04 32bit, but this bug bites me since many versions before, I finally decided to file it now.

Steps to reproduce:

Open an attachment of a received mail that has German umlauts in its name. Source view shows that the filename is encoded in UTF-8 (2 bytes for every umlaut), has type
> Content-Type: application/octet-stream;
>  	name="<name with umlauts in utf-8>.pdf"
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
> 	filename="<name with umlauts in utf-8>.pdf";
> 	size="<size>";

whereas the mail itself has type
> Content-Type: multipart/mixed;
and no encoding specified.

Actual results:

The PDF viewer, qpdfview in my case, opens and tells it could not open the file <name with umlauts omitted>, but when I point it to the /tmp directory it can open the file just fine - this is my usual workaround.

Expected results:

The helper application should open the file without problems.

What's wrong?

Turns out the file itself is saved to the  /tmp directory with utf-8 umlauts, but the command line (/proc/<proc-id>/cmdline) of the PDF viewer has all umlauts in Latin1 (only one byte, 'ä' is 0xe4, 'ü' is 0xfc a.s.o). As Ubuntu has switched to utf8 everywhere long ago this clearly can not work. But why doesn't Thunderbird use the utf-8 filename used already in the mail and when saving the attachment, but instead re-encodes the file name in latin-1 resp. latin-15 when calling the helper application?!?

btw, maybe related: bug 848365 - here an utf-8 user name is also latin-1 re-encoded - on an windows system and when sending credentials, so it seems l18n somewhere deep inside is broken ...

Comment 1

3 years ago
Can you provide a sample? Save as .eml

Comment 2

3 years ago
Created attachment 8536012 [details]
Just a test.eml

email with attachment with file name containing umlauts

Comment 3

3 years ago
Pretty queer: When opening this email with the default locale "de_DE.UTF-8" and then opening the attachment I receive the error popup: "Warning: Konnte »/tmp/jsttst.pdf« nicht öffnen." (Alert: Could not open "/tmp/jsttst.pdf") as described in the bug report.

But: when opening thunderbird with C locale (LANG=C thunderbird), opening the eml and trying to open the attached pdf, then the program is not even able to save the file, it throws this error: "Alert: Unable to save the attachment. Please check your file name and try again later." two times.

What's going on here? System and all mass storage uses UTF-8, the mail has it UTF-8-encoded, but thunderbird is not even able to save an attachment with perfectly legal file name?!?

And even more queer: trying to save it manually to the /tmp directory it recognizes there is already a file with this name and asks if it should be replaced - but clicking "Replace" again throws the error "unable to save ..." two times. This means: checking for duplicates works, but saving not? One time it uses utf-8, and then not?!?

Comment 4

3 years ago
Opening and saving the pdf works fine for me. It does show a title of "jüstätöst - jA1/4st... something", But that seems to happen regardless of what app opens it.

Comment 5

3 years ago
Ok, in your system it works; I would guess you have Finnish locale, could you test also with LANG=C and LANG=de_DE.UTF-8?

Comment 6

3 years ago
Yes I can confirm there's a problem if I do export LANG=C
I suspect it's a core issue though. I don't have a UTF8 named pdf handy, but firefox can't even open your sample pdf if i save it to disk and set LANG=C
Summary: Open attachment uses wrong filename encoding (Latin-1/15 instead of UTF-8) → Open attachment uses wrong filename encoding (Latin-1/15 instead of UTF-8) if LANG set to something not UTF-8

Comment 7

3 years ago
Just in case it helps, as it looks to be very closely related and may give another clue about the cause: when the attachment name contains any non-ascii character, this is what happens in my system:

1. When trying to open the attachment, *two* files are created in /tmp, one with the latin1 encoding, with size 0, and another with utf-8 encoding, with the actual content.
2. Then the helper application is called, passing it the latin1 name (that actually has no content at all), so it fails to open it

So it seems there's a two-step mistake here: somewhere it uses the latin1 (native, in this case) encoding for the file name (so it gets created, maybe as a default behavior), and the actual attachment saving code, that uses the utf-8 file name. Maybe the one that creates the latin1 filename is the very same as the one that tries to call the helper app (a misbehavior in that case), or it comes from a previous step.

My system is working with no utf-8 file naming (it uses iso8859-1 i.e. latin1), LANG=es_ES, LC_COLLATE=C. Thunderbird is 38.2.0, but this happens since a long time ago.
You need to log in before you can comment on or make changes to this bug.