Closed Bug 674488 Opened 13 years ago Closed 13 years ago

libmime passes badly encoded char* into JS land as AUTF8Strings, results in complete garbage

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Thunderbird 8.0

People

(Reporter: protz, Assigned: protz)

References

Details

Attachments

(3 files, 2 obsolete files)

The message that's causing the issue 13 years ago Jonathan Protzenko [:protz] 25.42 KB, message/rfc822		Details
How the string when used in JS looks like 13 years ago Jonathan Protzenko [:protz] 13.50 KB, image/png		Details
Do a decoding of the munged subject according to the charset specified in the content-type header 13 years ago Jonathan Protzenko [:protz] 4.32 KB, patch		Details \| Diff \| Splinter Review
The right patch 13 years ago Jonathan Protzenko [:protz] 1.93 KB, patch		Details \| Diff \| Splinter Review
Use nsCStrings + test 13 years ago Jonathan Protzenko [:protz] 5.29 KB, text/plain	Bienvenu : review+	Details

Jonathan Protzenko [:protz]

Assignee

Description

•

13 years ago

Attached file The message that's causing the issue — Details

In mimemsg.cpp:156 we create a "munged subject", i.e. we record the subject of a message/rfc822 part in its MimeHeaders structure so that we can re-use it later to give a meaningful name to the attachment.

In mimemoz2.cpp:251, we use that information to assign a real_name to the nsMsgAttachmentData (e.g. if the attached message's subject was "Check out this lolcat", then the attachment name is "Check out this lolcat.eml"; smart, huh?).

In mimemoz2.cpp:752 we pass the real_name (which is a char*) as a AUTF8String into JS-land  through XPConnect. This is the call to mimeEmitterStartAttachment. At this stage, doing printf("%s", tmp->real_name) yields (in my console):
N�o perca esta - confira a fidelidade dos mesmo antes de excluir - N�o � SPAM.eml

In jsmimeemitter.js:343, we obtain the real_name as the first parameter. At this stage, aName is the char* I've been talking about since the beginning.

    let re = /[\u0000-\u0008\u000b-\u000c\u000e-\u001f]/g;
    dump("\u001b[01;36mName : "+aName+"\u001b[00m\n\n");
    dump("\u001b[01;36mName : "+aName.replace(re, "")+"\u001b[00m\n\n");

The first dump yields "N" only. Where it gets interesting is the second line displays:

N㵳灁汰ⵥ瑳汹ⵥ灳湡派牡敫楴杮嘠物污ⴠ朦㭴㰠⁁%40gmail.com@imap.googlemail.com/&Jw0- caml-list

Looks like someone is reading into some random memory location because the ASCII chars near the end of the string are the name of a folder that's not even the one the message is in.

I don't know if this is the regexp engine's fault, or XPConnect, or whatever. I guess on the Thunderbird side, we should just make sure munged_subject is properly encoded.

I'm CCing mrbkap because that does like something suspicious which shouldn't happen.

Jonathan Protzenko [:protz]

Assignee

Comment 1

•

13 years ago

Attached image How the string when used in JS looks like — Details

Jonathan Protzenko [:protz]

Assignee

Comment 2

•

13 years ago

I'm trying to reproduce from a sample webpage, but I think the problem lies within XPConnect. Here's the result of

  for (let i = 0; i < aName.length; ++i)
      dump(aName.charCodeAt(i)+", ");

[78, 0, 15731, 28737, 27760, 11621, 29811, 27769, 11621, 28787,
    28257, 27966, 29281, 25963, 26996, 26478, 22048, 29289, 27745, 11552, 26406,
    15220, 15392, 8257, 10, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58860, 0,
    1200, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0];

Jonathan Protzenko [:protz]

Assignee

Comment 3

•

13 years ago

Attached patch Do a decoding of the munged subject according to the charset specified in the content-type header (obsolete) — Details — Splinter Review

David, here's a patch that I think should fix the issue. I'm being very conservative, and I'm refusing to trust the name if there's no charset provided. Given the effects that this entails, I'd tend to lean on the strict side. If there's a charset provided, I try to decode the munged_subject after we've parsed all the headers of the attached message. If the decoding fails, I'm also discarding the munged subject (this results in a "Forwarded Message.eml" attachment).

Let me know what you think about it!

Assignee: nobody → jonathan.protzenko

Status: NEW → ASSIGNED

Attachment #548825 - Flags: review?(dbienvenu)

Jonathan Protzenko [:protz]

Assignee

Comment 4

•

13 years ago

Attached patch The right patch (obsolete) — Details — Splinter Review

Sorry, uploaded the wrong file.

Attachment #548825 - Attachment is obsolete: true

Attachment #548825 - Flags: review?(dbienvenu)

Attachment #548827 - Flags: review?(dbienvenu)

David :Bienvenu

Comment 5

•

13 years ago

Comment on attachment 548827 [details] [diff] [review]
The right patch

Looks OK in general - code might be cleaner w.r.t. memory management if you could use nsCStrings and Adopt - it might be nice to do that, though libmime so far does not.

A unit test (mozmill or xpcshell test) would be helpful as well.

Jonathan Protzenko [:protz]

Assignee

Comment 6

•

13 years ago

I'll write the test. Could you clarify what you have in mind concerning Adopt? I'm not sure where that would be helpful...

David :Bienvenu

Comment 7

•

13 years ago

ns[C]String.Adopt assumes management of allocated memory, so you don't have to have explicit PR_Free's. So, this code:

+    char *charset;
+    char *contentType = MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
+    if (contentType) {
+      charset = MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
+      PR_Free(contentType);
+    }
+
+    // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
+    // the munged subject.
+    if (charset) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);
+      PR_Free(charset);

could be

nsCString charset;
nsCString contentType;
contentType.Adopt(MimeHeaders_get(msg->hdrs, HEADER_CONTENT_TYPE, PR_FALSE, PR_FALSE);
if (!contentType.IsEmpty())
  charset.Adopt(MimeHeaders_get_parameter(contentType, "charset", nsnull, nsnull);
   // If we've got a charset, use nsMsgI18NConvertToUnicode to magically decode
   // the munged subject.
   if (!charset.IsEmpty()) {
+      nsresult rv = nsMsgI18NConvertToUnicode(charset, orig, dest);

You could do something similar with munged_subject.

David :Bienvenu

Comment 8

•

13 years ago

Comment on attachment 548827 [details] [diff] [review]
The right patch

clearing review request

Attachment #548827 - Flags: review?(dbienvenu)

Jonathan Protzenko [:protz]

Assignee

Comment 9

•

13 years ago

Attached file Use nsCStrings + test — Details

Here's an updated patch.
- I'm now using nsCStrings for strings that are local to the new chunk in mimemsg.cpp. Since I'm not managing munged_subject personally, I figured out it would be too much trouble to change all the uses to use the new string API.
- I also wrote a test. I believe it's correct, since dumping all the charCodes from the resulting JS strings returns garbage when the patch is not applied, and returns the correct charCodes when the patch is applied.

Ideally, people should use the MIME syntax for encoding subjects (e.g. =?windows-1252?q?S=E9minaire_des?= doctorants) but having an improperly encoded subject should not send us off the rails like this, so I still think it's worthwhile to give decoding a shot before discarding the subject for the attachment name.

Attachment #548827 - Attachment is obsolete: true

Attachment #549349 - Flags: review?(dbienvenu)

David :Bienvenu

Updated

•

13 years ago

Attachment #549349 - Flags: review?(dbienvenu) → review+

Jonathan Protzenko [:protz]

Assignee

Comment 10

•

13 years ago

http://hg.mozilla.org/comm-central/rev/6e04b4cf944f

I'm still confused as to why, when printing out the string from js, it would go print some random parts of memory, but well...

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 8.0

Mark Banner (:standard8)

Updated

•

13 years ago

Depends on: 676882

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

libmime passes badly encoded char* into JS land as AUTF8Strings, results in complete garbage

Categories

(MailNews Core :: MIME, defect)

Tracking

(Not tracked)

People

(Reporter: protz, Assigned: protz)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Updated

Attachment

General

Description

File Name

Content Type