Closed Bug 423768 Opened 13 years ago Closed 10 years ago

mailto URI parsing - hnames are not decoded before checking for a match

Categories

(MailNews Core :: Composition, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 7.0

People

(Reporter: shadow2531, Assigned: shadow2531)

References

()

Details

Attachments

(2 files, 6 obsolete files)

User-Agent:       Opera/9.50 (Windows NT 5.1; U; en)
Build Identifier: version 3.0a1pre (2008031802)

A mailto URI might contain %HH representing chars that don't really need to be percent-encoded (as an edge case). When parsing the mailto URI and looking for to, cc, bcc, subject and body values etc., Thunderbird needs to decode the hname before it does its case-insensitive comparison.

Passing the above mailto URI to Thunderbird should generate a compose window like the following:

To: to
Cc: cc
Bcc: bcc
Subject: subject

body

However, in Thunderbird, the compose window is blank.

Thunderbird should also decode the hnames anyway just in case support is added for a keyword that has a wide char in it (or utf-8 sequence representing a wide char since it looks like Thunderbird uses a char* in this part). That way, the %HH sequence representing the wide char in the hname will be decoded before checking for a match.

See http://lxr.mozilla.org/mailnews/source/mailnews/compose/src/nsSmtpUrl.cpp#100 for example. You can see that it doesn't decode before comparing.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Uh, that mailto URL is not exactly correct... Parameters and arguments the wrong way for one thing (or something??), and a question mark that doesn't belong there.
The mailto URI is supposed to be like that.

1. The TO hvalue between 'mailto:' and '?' is optional.

2. In the query string, each of the hnames (the value before the =) are percent-encoded even though they don't have to be in this case. (They only contain non-reserved ascii characters in this case, but I percent-encoded them anyway to show that Thunderbird doesn't handle that.)

That URI, when decoded, looks like this:
mailto:?sUbJeCt=subject&tO=to&cC=cc&bCc=bcc&bOdY=body
, which Thunderbird handles fine and produces the correct result.

Thunderbird just doesn't handle URIs made by authors that go crazy with the percent-encoding.

Also, just like with http query strings, when getting the value of an hname, the server-side script decodes it first. It doesn't just decode the hvalue. It decodes both.

For example, try this:
<?php 
    header("Content-Type: text/plain; charset=utf-8");
    foreach ($_GET as $hname => $hvalue) {
        echo strtolower($hname) . "=" . $hvalue . "\n";
    }
?>
With these:
mailto.php?%73%55%62%4A%65%43%74=subject&%74%4F=to&%63%43=cc&%62%43%63=bcc&%62%4F%64%59=body
mailto.php?%E2%88%9A=%E2%88%9A

, to see what I mean.

Thunderbird doesn't decode the hname before doing the case-insensitive check to see if the hname is to, cc, bcc, subject or body etc.

Hope that helps.
Ah, yes, another thing is why on earth someone would do such a thing;)
(In reply to comment #3)
> Ah, yes, another thing is why on earth someone would do such a thing;)
>

True, but that's more of an example to just show the problem.

Consider this:

I think "test>me" would be a valid header name. "test>me" in a mailto URI would look like this "mailto:?test%3Eme=value". 

*IF* Thunderbird was looking for a "test>me" hname in a mailto URI, it wouldn't find it unless it decoded "test%3Eme" before checking. Some extension might use some special hname also.

FWIW, for comparison, Opera's built-in mail client decodes hnames before checking for a match. Try in <http://my.opera.com/desktopteam/blog/2008/03/18/happy-easter> for example. I don't believe they're doing any specical-casing. I believe they just realize that in addition to hvalues, hnames in a mailto URI can be percent-encoded too.

Does Thunderbird have something like a PL_decoded_strcasecmp (token, "bcc")?
bug 441075 has been checked to trunk.
Could you see if this has any effect on your problem.
(In reply to comment #5)
> bug 441075 has been checked to trunk.
> Could you see if this has any effect on your problem.
> 

version 3.0a2pre (2008071003) - No change.
Product: Core → MailNews Core
Component: Message Compose Window → MailNews: Composition
OS: Windows XP → All
Product: Thunderbird → Core
QA Contact: message-compose → composition
Hardware: PC → All
Michael A. Puls, do you still see this in version 3.1?
Whiteboard: [closeme 2011-06-01]
Yes. Running this command:

thunderbird "mailto:?%73%55%62%4A%65%43%74=subject&%74%4F=to&%63%43=cc&%62%43%63=bcc&%62%4F%64%59=body"

doesn't put anything in any of the fields.

The result should be:

to field: to
cc field: cc
bcc field: bcc
subject field: subject
body field: body

Using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10

Works fine with Opera's built-in mail client.
You can see at <http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsSmtpUrl.cpp#120> that it's doing PL_strcasecmp (token, "bcc") for example to see if the hname is "bcc". But, it doesn't percent-decode token first.
I don't know what I'm doing with any of the code or even if I have the right functions etc. But, looking at things:

From <http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsSmtpUrl.cpp#120>, maybe it could be something like this:

case 'B':
    nsCString decoded_token;
    MsgUnescapeString(token, 0, decoded_token);
    if (!PL_strcasecmp (decoded_token.c_str(), "bcc"))  {
    
    }

Then, do that type of thing inside each case block.
    
However, I think it'd make much more sense to create an nsCSTRING of token once, lowercase it and just do if statements (instead of the switch(char) mess). Something like:


nsCString hfname;
MsgUnescapeString(token, 0, hfname);
ToLowerCase(hfname);

if (hfname == "bcc") {

} else if (hfname == "body") {

} else if (hfname == "cc") {

}

etc.

Hopefully you get what I mean.
Attached patch Proposed Idea (obsolete) — Splinter Review
This patch won't work because I don't know how to use nsCString properly. But, the idea should be clear. If you can show me how to fix it, that'd help.
Attachment #536326 - Attachment is obsolete: true
I just need to test for regressions and such.
Attachment #536338 - Attachment is obsolete: true
I see use of mimeConverter after every use of MsgUnescapeString() at <http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsSmtpUrl.cpp#220> to convert values to utf-8.

I'm using MsgUnescapeString to get the decoded hfname, but not using mimeConverter after it to convert to utf-8. I'm only comparing to "to", "cc" etc., so I'm not sure decoded_token needs to be converted to utf-8 with mimeConverter. If it is needed, let me know.
Assignee: nobody → shadow
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [closeme 2011-06-01]
Attachment #536394 - Flags: review?(dbienvenu)
Comment on attachment 536585 [details] [diff] [review]
Better variable name and got rid of '_' in it to match other variables. Using mimeConverter to convert decodedName to UTF-8.

thx for the patch - in order for us to take this patch, you'd need to extend \mailnews\compose\test\unit\test_mailtoURL.js to test the kinds of urls that you're now handling with this patch.
O.K.
Comment on attachment 536585 [details] [diff] [review]
Better variable name and got rid of '_' in it to match other variables. Using mimeConverter to convert decodedName to UTF-8.

please re-request review when you've added some test cases to the existing test. Thx!
Attachment #536585 - Flags: review?(dbienvenu)
(In reply to comment #20)
> please re-request review when you've added some test cases to the existing
> test. Thx!

Will do. How do I run the tests though?
(In reply to comment #22)
> https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests
> https://developer.mozilla.org/en/MailNews_xpcshell-tests

Thanks. That's a lot to figure out. But, through trial and error on what directories to run the commands in, I got this to work:

make -C ~/comm-central/objdir-tb-release/mailnews/compose/test
cd ~/comm-central/mailnews
make xpcshell-tests > test_results.txt

Then, I open ~/comm-central/objdir-tb-release/test_results.txt

(should I be running "make xpcshell-tests > test_results.txt" in ~/comm-central/objdir-tb-release/mailnews instead?)

I get (excluding the other tests):

TEST-INFO | c:\Users\user\comm-central\objdir-tb-release\mozilla\_tests\xpcshell\mailnews\compose\test\unit\test_mailtoURL.js | running test ...
TEST-PASS | c:\Users\user\comm-central\objdir-tb-release\mozilla\_tests\xpcshell\mailnews\compose\test\unit\test_mailtoURL.js | test passed

INFO | Result summary:
INFO | Passed: 246
INFO | Failed: 0

Is that the correct way to do it?

If so, I'll add my tests to test_mailtoURL.js. But, if I do, it's comm-central/mailnews/compose/test/unit/test_mailtoURL.js that I modify and produce a diff against correct? And, if I modify that test_mailtoURL.js again, I need to run make -C again too, correct?

Also, I'm not getting how to modify the commands above to use the check-one option to just test test_mailtoURL.js. Could you give explicit, exact commands for what I need to do?

Thanks.
(In reply to comment #23)
> make -C ~/comm-central/objdir-tb-release/mailnews/compose/test
> cd ~/comm-central/mailnews
> make xpcshell-tests > test_results.txt
> 
> Then, I open ~/comm-central/objdir-tb-release/test_results.txt
> 
> (should I be running "make xpcshell-tests > test_results.txt" in
> ~/comm-central/objdir-tb-release/mailnews instead?)

Nevermind on that part. I guess I did run it in objdir-tb-release/mailnews and not ~/comm-central/mailnews
Attachment #538906 - Flags: review?(dbienvenu)
Attachment #536585 - Flags: review?(dbienvenu)
With my fixes, one thing I'm not liking is that if I represent "subject" as "=?utf-8?B?c3ViamVjdA==?=" and percent-encode it like so:

mailto:?%3D%3Futf-8%3FB%3Fc3ViamVjdA%3D%3D%3F%3D=subject

"subject" ends up in subject field.

This is because in addition to using MsgUnescapeString() to percent-decode the hname, I'm using mimeconverter to convert it to UTF-8. But, mimeconverter understands "=?utf-8?B?base64_data?=". I don't want to hnames encoded like that to match. It's not a big deal though as no one is going to author a mailto URI like that. But, I'd like to fix that anyway.

So, I'll remove my use of mimeconverter.
Attachment #536585 - Flags: review?(dbienvenu)
No changes in the tests I added are needed.
Attachment #536585 - Attachment is obsolete: true
Attachment #539163 - Flags: review?(dbienvenu)
Comment on attachment 538906 [details] [diff] [review]
Here are the added tests to test_mailtoURL.js

this test does fail w/o the other patch, and looks OK. I'm just re-building with the core change now.
Attachment #538906 - Flags: review?(dbienvenu) → review+
Comment on attachment 539163 [details] [diff] [review]
Got rid of my use of mimeConverter. Just using MsgUnescapeString()

Thx for the new patch.

I think you want nsDependentCString(token), not nsCString(token).

this line is too long - it should be broken into two lines:
+      else if (decodedName.LowerCaseEqualsLiteral("html-part") || decodedName.LowerCaseEqualsLiteral("html-body"))

Can't you just change the switch to be something like switch (NS_ToUpper(decodedName.First()) and keep the rest of the logic the same, just replacing PL_strcasecmp(token, ...) with decodedName.LowerCaseEqualsLiteral(...). Otherwise, you should at least check for the more likely tokens first (like to:, cc:, etc).
Attachment #539163 - Flags: review?(dbienvenu) → review-
(In reply to comment #29)
> Comment on attachment 539163 [details] [diff] [review] [review]
> Got rid of my use of mimeConverter. Just using MsgUnescapeString()
> 
> Thx for the new patch.
> 
> I think you want nsDependentCString(token), not nsCString(token).

O.K. I see <https://developer.mozilla.org/en/nsDependentCString> and what it says it's for.  If I use that though, it looks like I won't be able to use MsgUnescapeString() on it. Is that correct?

> this line is too long - it should be broken into two lines:
> +      else if (decodedName.LowerCaseEqualsLiteral("html-part") ||
> decodedName.LowerCaseEqualsLiteral("html-body"))

O.K.

> Can't you just change the switch to be something like switch
> (NS_ToUpper(decodedName.First()) and keep the rest of the logic the same,
> just replacing PL_strcasecmp(token, ...) with
> decodedName.LowerCaseEqualsLiteral(...).

O.K. I can try that. I don't see the point though of doing a switch(char) just to group (case 'B' for example) the other conditions (like body and bcc) to just check again for "body" and "bcc". I think it makes sense to just simply check for what we want. I've never seen mailto URI parsing being a performance problem where something like that is needed. But, I'll still try it.

> Otherwise, you should at least
> check for the more likely tokens first (like to:, cc:, etc).

I just kept the order of how it was. But, to, cc, subject and body (in that order) being first might be the best if you want the condition chain to exit the quickest in the most common situations.
As requested, changed things to use the original switch() with use of decodedName.First() and use of LowerCaseEqualsLiteral() in the conditions. Didn't use nsDependentCString though as it didn't look like it'd work in this case according to the documentation. My added tests are still valid and all tests pass with this code.
Attachment #539163 - Attachment is obsolete: true
Attachment #540697 - Flags: review?(dbienvenu)
(In reply to comment #31)

> Didn't use nsDependentCString though as it didn't look like it'd work in
> this case according to the documentation. 

Can you elaborate on that, perhaps with a link to the doc? It should work fine, just as well as nsCString(), without the extra allocation overhead. 

>My added tests are still valid and
> all tests pass with this code.

The tests all pass with nsDependentCstring as well...
(In reply to comment #32)
> (In reply to comment #31)
> 
> > Didn't use nsDependentCString though as it didn't look like it'd work in
> > this case according to the documentation. 
> 
> Can you elaborate on that, perhaps with a link to the doc? It should work
> fine, just as well as nsCString(), without the extra allocation overhead. 

I'm sorry. I thought you meant to use nsDependentCString for decodedName too. I see you meant to *just* replace MsgUnescapeString(nsCString(token), 0, decodedName); with MsgUnescapeString(nsDependentCString(token), 0, decodedName);.

I'll make the change.
(In reply to comment #33)
I see you meant to *just* replace MsgUnescapeString(nsCString(token),
> 0, decodedName); with MsgUnescapeString(nsDependentCString(token), 0,
> decodedName);.
> 
> I'll make the change.
No need, I've made it locally and I can land the patch for you tomorrow.
(In reply to comment #34)
> (In reply to comment #33)
> I see you meant to *just* replace MsgUnescapeString(nsCString(token),
> > 0, decodedName); with MsgUnescapeString(nsDependentCString(token), 0,
> > decodedName);.
> > 
> > I'll make the change.
> No need, I've made it locally and I can land the patch for you tomorrow.

Thanks. If you need me to change anything else, let me know.
http://hg.mozilla.org/comm-central/rev/3205136dcb66
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 7.0
Comment on attachment 540697 [details] [diff] [review]
Made changes as requested.

I think David forgot to set r+ here, so setting it on his behalf. This is already landed so it doesn't matter too much anyway.
Attachment #540697 - Flags: review?(dbienvenu) → review+
You need to log in before you can comment on or make changes to this bug.