Closed Bug 674488 Opened 13 years ago Closed 13 years ago

libmime passes badly encoded char* into JS land as AUTF8Strings, results in complete garbage

Categories

(MailNews Core :: MIME, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 8.0

People

(Reporter: protz, Assigned: protz)

References

Details

Attachments

(3 files, 2 obsolete files)

In mimemsg.cpp:156 we create a "munged subject", i.e. we record the subject of a message/rfc822 part in its MimeHeaders structure so that we can re-use it later to give a meaningful name to the attachment.

In mimemoz2.cpp:251, we use that information to assign a real_name to the nsMsgAttachmentData (e.g. if the attached message's subject was "Check out this lolcat", then the attachment name is "Check out this lolcat.eml"; smart, huh?).

In mimemoz2.cpp:752 we pass the real_name (which is a char*) as a AUTF8String into JS-land  through XPConnect. This is the call to mimeEmitterStartAttachment. At this stage, doing printf("%s", tmp->real_name) yields (in my console):
N�o perca esta - confira a fidelidade dos mesmo antes de excluir - N�o � SPAM.eml

In jsmimeemitter.js:343, we obtain the real_name as the first parameter. At this stage, aName is the char* I've been talking about since the beginning.

    let re = /[\u0000-\u0008\u000b-\u000c\u000e-\u001f]/g;
    dump("\u001b[01;36mName : "+aName+"\u001b[00m\n\n");
    dump("\u001b[01;36mName : "+aName.replace(re, "")+"\u001b[00m\n\n");

The first dump yields "N" only. Where it gets interesting is the second line displays:

N㵳灁汰ⵥ瑳汹ⵥ灳湡派牡敫楴杮嘠物污ⴠ朦㭴㰠⁁%40gmail.com@imap.googlemail.com/&Jw0- caml-list

Looks like someone is reading into some random memory location because the ASCII chars near the end of the string are the name of a folder that's not even the one the message is in.

I don't know if this is the regexp engine's fault, or XPConnect, or whatever. I guess on the Thunderbird side, we should just make sure munged_subject is properly encoded.

I'm CCing mrbkap because that does like something suspicious which shouldn't happen.
I'm trying to reproduce from a sample webpage, but I think the problem lies within XPConnect. Here's the result of

  for (let i = 0; i < aName.length; ++i)
      dump(aName.charCodeAt(i)+", ");

[78, 0, 15731, 28737, 27760, 11621, 29811, 27769, 11621, 28787,
    28257, 27966, 29281, 25963, 26996, 26478, 22048, 29289, 27745, 11552, 26406,
    15220, 15392, 8257, 10, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58860, 0,
    1200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0];
David, here's a patch that I think should fix the issue. I'm being very conservative, and I'm refusing to trust the name if there's no charset provided. Given the effects that this entails, I'd tend to lean on the strict side. If there's a charset provided, I try to decode the munged_subject after we've parsed all the headers of the attached message. If the decoding fails, I'm also discarding the munged subject (this results in a "Forwarded Message.eml" attachment).

Let me know what you think about it!
Assignee: nobody → jonathan.protzenko
Status: NEW → ASSIGNED
Attachment #548825 - Flags: review?(dbienvenu)
Attached patch The right patch (obsolete) — Splinter Review
Sorry, uploaded the wrong file.
Attachment #548825 - Attachment is obsolete: true
Attachment #548825 - Flags: review?(dbienvenu)
Attachment #548827 - Flags: review?(dbienvenu)
Comment on attachment 548827 [details] [diff] [review]
The right patch

Looks OK in general - code might be cleaner w.r.t. memory management if you could use nsCStrings and Adopt - it might be nice to do that, though libmime so far does not.

A unit test (mozmill or xpcshell test) would be helpful as well.
I'll write the test. Could you clarify what you have in mind concerning Adopt? I'm not sure where that would be helpful...
ns[C]String.Adopt assumes management of allocated memory, so you don't have to have explicit PR_Free's. So, this code:

+    char *charset;
+    char *contentType = MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
+    if (contentType) {
+      charset = MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
+      PR_Free(contentType);
+    }
+
+    // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
+    // the munged subject.
+    if (charset) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);
+      PR_Free(charset);

could be

nsCString charset;
nsCString contentType;
contentType.Adopt(MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
if (!contentType.IsEmpty())
  charset.Adopt(MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
   // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
   // the munged subject.
   if (!charset.IsEmpty()) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);

You could do something similar with munged_subject.
Comment on attachment 548827 [details] [diff] [review]
The right patch

clearing review request
Attachment #548827 - Flags: review?(dbienvenu)
Attached file Use nsCStrings + test
Here's an updated patch.
- I'm now using nsCStrings for strings that are local to the new chunk in mimemsg.cpp. Since I'm not managing munged_subject personally, I figured out it would be too much trouble to change all the uses to use the new string API.
- I also wrote a test. I believe it's correct, since dumping all the charCodes from the resulting JS strings returns garbage when the patch is not applied, and returns the correct charCodes when the patch is applied.

Ideally, people should use the MIME syntax for encoding subjects (e.g. =?windows-1252?q?S=E9minaire_des?= doctorants) but having an improperly encoded subject should not send us off the rails like this, so I still think it's worthwhile to give decoding a shot before discarding the subject for the attachment name.
Attachment #548827 - Attachment is obsolete: true
Attachment #549349 - Flags: review?(dbienvenu)
Attachment #549349 - Flags: review?(dbienvenu) → review+
http://hg.mozilla.org/comm-central/rev/6e04b4cf944f

I'm still confused as to why, when printing out the string from js, it would go print some random parts of memory, but well...
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 8.0
Depends on: 676882
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: