libmime passes badly encoded char* into JS land as AUTF8Strings, results in complete garbage

RESOLVED FIXED in Thunderbird 8.0

Status

MailNews Core
MIME
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: protz, Assigned: protz)

Tracking

unspecified
Thunderbird 8.0
x86_64
Linux

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Assignee)

Description

7 years ago
Created attachment 548733 [details]
The message that's causing the issue

In mimemsg.cpp:156 we create a "munged subject", i.e. we record the subject of a message/rfc822 part in its MimeHeaders structure so that we can re-use it later to give a meaningful name to the attachment.

In mimemoz2.cpp:251, we use that information to assign a real_name to the nsMsgAttachmentData (e.g. if the attached message's subject was "Check out this lolcat", then the attachment name is "Check out this lolcat.eml"; smart, huh?).

In mimemoz2.cpp:752 we pass the real_name (which is a char*) as a AUTF8String into JS-land  through XPConnect. This is the call to mimeEmitterStartAttachment. At this stage, doing printf("%s", tmp->real_name) yields (in my console):
N�o perca esta - confira a fidelidade dos mesmo antes de excluir - N�o � SPAM.eml

In jsmimeemitter.js:343, we obtain the real_name as the first parameter. At this stage, aName is the char* I've been talking about since the beginning.

    let re = /[\u0000-\u0008\u000b-\u000c\u000e-\u001f]/g;
    dump("\u001b[01;36mName : "+aName+"\u001b[00m\n\n");
    dump("\u001b[01;36mName : "+aName.replace(re, "")+"\u001b[00m\n\n");

The first dump yields "N" only. Where it gets interesting is the second line displays:

N㵳灁汰ⵥ瑳汹ⵥ灳湡派牡敫楴杮嘠物污ⴠ朦㭴㰠⁁%40gmail.com@imap.googlemail.com/&Jw0- caml-list

Looks like someone is reading into some random memory location because the ASCII chars near the end of the string are the name of a folder that's not even the one the message is in.

I don't know if this is the regexp engine's fault, or XPConnect, or whatever. I guess on the Thunderbird side, we should just make sure munged_subject is properly encoded.

I'm CCing mrbkap because that does like something suspicious which shouldn't happen.
(Assignee)

Comment 1

7 years ago
Created attachment 548734 [details]
How the string when used in JS looks like
(Assignee)

Comment 2

7 years ago
I'm trying to reproduce from a sample webpage, but I think the problem lies within XPConnect. Here's the result of

  for (let i = 0; i < aName.length; ++i)
      dump(aName.charCodeAt(i)+", ");

[78, 0, 15731, 28737, 27760, 11621, 29811, 27769, 11621, 28787,
    28257, 27966, 29281, 25963, 26996, 26478, 22048, 29289, 27745, 11552, 26406,
    15220, 15392, 8257, 10, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58860, 0,
    1200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0];
(Assignee)

Comment 3

7 years ago
Created attachment 548825 [details] [diff] [review]
Do a decoding of the munged subject according to the charset specified in the content-type header

David, here's a patch that I think should fix the issue. I'm being very conservative, and I'm refusing to trust the name if there's no charset provided. Given the effects that this entails, I'd tend to lean on the strict side. If there's a charset provided, I try to decode the munged_subject after we've parsed all the headers of the attached message. If the decoding fails, I'm also discarding the munged subject (this results in a "Forwarded Message.eml" attachment).

Let me know what you think about it!
Assignee: nobody → jonathan.protzenko
Status: NEW → ASSIGNED
Attachment #548825 - Flags: review?(dbienvenu)
(Assignee)

Comment 4

7 years ago
Created attachment 548827 [details] [diff] [review]
The right patch

Sorry, uploaded the wrong file.
Attachment #548825 - Attachment is obsolete: true
Attachment #548825 - Flags: review?(dbienvenu)
Attachment #548827 - Flags: review?(dbienvenu)

Comment 5

7 years ago
Comment on attachment 548827 [details] [diff] [review]
The right patch

Looks OK in general - code might be cleaner w.r.t. memory management if you could use nsCStrings and Adopt - it might be nice to do that, though libmime so far does not.

A unit test (mozmill or xpcshell test) would be helpful as well.
(Assignee)

Comment 6

7 years ago
I'll write the test. Could you clarify what you have in mind concerning Adopt? I'm not sure where that would be helpful...

Comment 7

7 years ago
ns[C]String.Adopt assumes management of allocated memory, so you don't have to have explicit PR_Free's. So, this code:

+    char *charset;
+    char *contentType = MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
+    if (contentType) {
+      charset = MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
+      PR_Free(contentType);
+    }
+
+    // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
+    // the munged subject.
+    if (charset) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);
+      PR_Free(charset);

could be

nsCString charset;
nsCString contentType;
contentType.Adopt(MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
if (!contentType.IsEmpty())
  charset.Adopt(MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
   // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
   // the munged subject.
   if (!charset.IsEmpty()) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);

You could do something similar with munged_subject.

Comment 8

7 years ago
Comment on attachment 548827 [details] [diff] [review]
The right patch

clearing review request
Attachment #548827 - Flags: review?(dbienvenu)
(Assignee)

Comment 9

7 years ago
Created attachment 549349 [details]
Use nsCStrings + test

Here's an updated patch.
- I'm now using nsCStrings for strings that are local to the new chunk in mimemsg.cpp. Since I'm not managing munged_subject personally, I figured out it would be too much trouble to change all the uses to use the new string API.
- I also wrote a test. I believe it's correct, since dumping all the charCodes from the resulting JS strings returns garbage when the patch is not applied, and returns the correct charCodes when the patch is applied.

Ideally, people should use the MIME syntax for encoding subjects (e.g. =?windows-1252?q?S=E9minaire_des?= doctorants) but having an improperly encoded subject should not send us off the rails like this, so I still think it's worthwhile to give decoding a shot before discarding the subject for the attachment name.
Attachment #548827 - Attachment is obsolete: true
Attachment #549349 - Flags: review?(dbienvenu)

Updated

7 years ago
Attachment #549349 - Flags: review?(dbienvenu) → review+
(Assignee)

Comment 10

7 years ago
http://hg.mozilla.org/comm-central/rev/6e04b4cf944f

I'm still confused as to why, when printing out the string from js, it would go print some random parts of memory, but well...
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 8.0
Depends on: 676882
You need to log in before you can comment on or make changes to this bug.