messages.getAttachmentFile() returns incorrect file data for UTF-8 encoded text attachments
Categories
(Thunderbird :: Add-Ons: Extensions API, defect)
Tracking
(Not tracked)
People
(Reporter: admin, Unassigned)
Details
Attachments
(1 file)
2.65 KB,
message/rfc822
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:136.0) Gecko/20100101 Firefox/136.0
Steps to reproduce:
Attempted to retrieve the file contents of an attachment with a content type of text/xml; charset=utf-8
Actual results:
The file data did not reflect the original attachment contents but appears to have been converted from UTF-8 to another charset (ANSI)
Expected results:
The correct file data should have been returned.
I will provide supporting materials shortly, and will do further testing against TB beta and 115, as well as testing against a similar text/plain type document.
Saving the attachment directly results in the correct file being composed; likewise, copying the base64 string representing the attachment file from the message source and decoding it using an appropriate utility (such as https://www.base64decode.org/ results in a correctly encoded file.
getAttachmentFile()
appears to be performing a conversion to another charset, which I will attempt to specifically identify. The size of the result file in my tests has been smaller than that of the original attachment, which implies conversion.
Note that this only occurs when characters outside of the ASCII range are involved, such as European characters with diacritic marks.
Sample email file containing an .xml file attachment; the contents of the xml file are simply the declaration, a single root element, and the character ö
(xF6 of ISO-8859-1).
The problem is exhibited in TB 128.8.0esr and 136/137; it does not occur in TB 115 (most recent minor version) however.
The base-64 encoded string representing the attachment file data is as follows:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4NCjxhPsO2PC9hPg==
Decoding this results in a 49 byte sequence with the ö
character being represented by two bytes (xC3 xB6).
When executing messages.getAttachmentFile()
against the same attachment, the result data is 48 bytes in length and the same character is represented by a single byte (xF6).
Again this appears to be a case of the method converting the file from UTF-8, although the attachment content type header clearly indicates UTF-8:
Content-Type: text/xml; charset=UTF-8; name="utf-8_o_umlaut.xml"
Description
•