Closed Bug 415491 Opened 12 years ago Closed 5 years ago

Make UnEscapeURIForUI escape bidirectional formatting characters

Categories

(Core :: Internationalization, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: Gavin, Assigned: mats)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

(Keywords: sec-low, Whiteboard: [sg:low] (prevent whack-a-mole in callers))

Attachments

(5 files, 2 obsolete files)

The patch for bug 388372 made Firefox's nsIXULBrowserWindow::setOverLink implementation re-encode any bidi formatting characters, per RFC 3987 sections 3.2 and 4.1 paragraph 6. Should this be done by UnEscapeURIForUI instead? That would fix this problem in a more general way.
Yes, it absolutely should!  The existing behavior causes security problems unless we play whack-a-mole with each of the 10+ callers of UnEscapeURIForUI in the tree...
Flags: blocking1.9.1?
Flags: blocking1.9.0.4?
Flags: blocking1.9.0.4? → blocking1.9.0.5?
Blocks: 436517
I think it should also escape the other characters that RFC 3987 lists as unsafe in IRIs.
Status: NEW → ASSIGNED
Summary: Should UnEscapeURIForUI escape bidirectional formatting characters? → Make UnEscapeURIForUI escape bidirectional formatting characters
Attached patch Patch (obsolete) — Splinter Review
Attachment #345013 - Flags: review?(bzbarsky)
Hmm.  Can we only escape the unsafe stuff
We could if we rolled our own escaper, but I think it makes just as much sense not to, parallel to the way that we display illegal IRIs in punycode rather than literally.
Probably not a true blocker, but we'd love a branch fix when the trunk is done. Please ask for approval on the reviewed patch.
Flags: blocking1.9.0.5? → wanted1.9.0.x+
Sorry for the lag; I wasn't cced.  :(
Comment on attachment 345013 [details] [diff] [review]
Patch

Please add -p to your default diff options?

>+++ b/intl/uconv/src/nsTextToSubURI.cpp
>@@ -235,16 +238,41 @@
>   // in case of failure, return escaped URI

Are we guaranteed that this won't have bidi control chars or whatnot in the hostname in unescaped form, even when it's the escaped hostname?  I'm not sure what guarantees that...

>+    // if there are any characters that are unsafe for IRIs, reescape.
>+    if (mUnsafeChars.IsEmpty()) {

Would it make sense to key off IsVoid() instead and make sure the string is void in the constructor, and is set non-void when we do the pref get, even if we get no pref value?  Or are we likely to hit this code during startup before our prefs are sane or some such mess?

>+      // Note that this reescapes all non-ASCII characters in the URI, not just
>+      // the unsafe characters
>+      NS_EscapeURL(PromiseFlatCString(unescapedSpec), esc_OnlyNonASCII,
>+                   reescapedSpec);

Shouldn't we escape _retval, not unescapedSpec?  Otherwise, if the charset involved is something like UTF-7, unescapedSpec might not in fact have any non-ASCII in it, while after convertURItoUnicode happens it has random non-ASCII stuff.

The behavior should be documented in the idl, right?

I'm assuming your all.js change is correct; I'm not qualified to review it.

A related question: doesn't UnEscapeURIForUI mess up if aCharset is UTF-16, say, due to not unescaping %00?  Or am I missing something there?
Comment on attachment 345013 [details] [diff] [review]
Patch

r- pending response to my question
Attachment #345013 - Flags: review?(bzbarsky) → review-
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
bz: Just to poke this bug awake, is the follow a correct summary of what you wanted?

administrivia:
 - add -p option to the diff
 - document the behavior in the idl

questions:
1) If the unescaping fails what guarantees that all dangerous characters were escaped in the first place? Could an attack sneak a bidi char in there by making sure there was also something that would trigger this to fail?

2) Would it be better to distinguish an unset list from an empty pref value? (Personally I think we should hardcode in the default list to be used if prefs aren't loaded or some embeddor forgot to set this one)

3) If unsafes are found should we escape _retval instead of unescapedSpec?

4) Does UnEscapeURIForUI mess up if aCharset is UTF-16 due to not unescaping %00?

The all.js change looks fine, it's what was there before plus the characters moved from browser.js. any super-reviewer can bless that, doesn't need a prefs module owner. Unless I'm missing bz's point
Whiteboard: [sg:low] (prevent whack-a-mole in callers)
Comment 10 sounds right.  My only point about prefs.js is that I have no idea whether all the chars we want added have been added.
I can grab this after I get bug 493975 in if this isn't being actively worked on.
I'm going to attach 4 patches that tries to address the points outlined
in dveditz summary above.

(In reply to Daniel Veditz [:dveditz] from comment #10)
>  - add -p option to the diff

Fixed.

>  - document the behavior in the idl

I've added/corrected documentation in a few places; in this patch
and those that follows.

> 2) Would it be better to distinguish an unset list from an empty pref value?
> (Personally I think we should hardcode in the default list to be used if
> prefs aren't loaded or some embeddor forgot to set this one)

Fixed (both the IsVoid and static fallback points).

NOTE
I have made one exception to the list in network.IDN.blacklist_chars:
I excluded SPACE (0x20) because there are numerous tests that fails
if we escape that here.  I'm not sure what the problem is, I'm guessing
we have code that does stuff based on word boundaries and %20 would mess
that up.  Also, %20 looks kind of ugly in UI IMO.  For example if
you type a few space separated words in the URL bar you will see:
Search Google for a%20few%20words
in the popup menu under the URL bar.

FYI, here's the Try failures:
https://tbpl.mozilla.org/?tree=Try&rev=0672d7599421
Attachment #345013 - Attachment is obsolete: true
Attachment #8515629 - Flags: review?(bzbarsky)
(In reply to Daniel Veditz [:dveditz] from comment #10)
> 3) If unsafes are found should we escape _retval instead of unescapedSpec?


This is the first part of the fix for that, the second part comes next.
Attachment #8515630 - Flags: review?(bzbarsky)
(In reply to Daniel Veditz [:dveditz] from comment #10)
> 1) If the unescaping fails what guarantees that all dangerous characters
> were escaped in the first place? Could an attack sneak a bidi char in there
> by making sure there was also something that would trigger this to fail?

IIUC, always checking and escaping the final result should fix that?

https://tbpl.mozilla.org/?tree=Try&rev=446f1af8e77c
Attachment #8515632 - Flags: review?(bzbarsky)
> 4) Does UnEscapeURIForUI mess up if aCharset is UTF-16 due to not unescaping
> %00?

An input value containing %00 (escaped NUL) will pass through
UnEscapeURIForUI as is.  That seems like the desired behavior
to me.  If you disagree, could you elaborate on the problem
and suggest a solution please?
Flags: needinfo?(dveditz)
Flags: needinfo?(bzbarsky)
Blocks: 456375
> That seems like the desired behavior to me.

Not when aCharset is UTF-16.  Because then "a" would be encoded in escaped form as "a%00" (or "%00a" depending on whether it's LE or BE), no?  Or am I missing something?
Flags: needinfo?(bzbarsky) → needinfo?(mats)
I guess I misunderstood your question in comment 8 - I thought by
%00 you meant an already escaped NUL in the input, i.e. the three
ASCII characters '%' '0' '0' (that's why I pointed that out in my
last comment).  I guess that by writing %00 you meant a byte with
the value zero?

So you want to know what the output is for the input string:
        nsAutoCString in;
        in.Append('a');
        in.Append(char(0x0));

for various utf-16 charsets, is that correct?
Flags: needinfo?(mats) → needinfo?(bzbarsky)
I meant, what happens if the input, which is an _escaped_ URI, contains the string "a%00" and claims to be escaped UTF-16LE?

It seems to me that correctly unescaping that should result in the Unicode string "a", no?
Flags: needinfo?(bzbarsky)
As I said in comment 18, "%00" in the input just pass through UnEscapeURIForUI
as is.  That's because we use 'esc_SkipControl' in the NS_UnescapeURL call:
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/nsTextToSubURI.cpp?rev=3c98330df76f#207

If I remove the 'esc_SkipControl' there, we get an unescaped NUL at that point,
but convertURItoUnicode treats that as ASCII here:
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/nsTextToSubURI.cpp?rev=3c98330df76f#159
('isStatefulCharset' is false and IsASCII on the octets 0x61 and 0x00 is true.)
http://mxr.mozilla.org/mozilla-central/source/xpcom/string/nsReadableUtils.cpp#466

So the end result, with 'esc_SkipControl' removed, is the octets 0x61 0x00 0x00 0x00.
Which shows up in the UI as 'a' followed by a hex-glyph for zero (I suspect that
the glyph for NUL might be font-dependent though).
Flags: needinfo?(dveditz)
> "%00" in the input just pass through UnEscapeURIForUI as is.

Yes, that behavior is what my question was about!

> we get an unescaped NUL at that point, but convertURItoUnicode treats that as ASCII here:

That seems like a bug to me, in the case of UTF-16...

All of this is preexisting, so dealing with it in a followup seems fine to me, but it's pretty clearly buggy.  :(
Sounds good to me - filed bug 1094318.
Comment on attachment 8515629 [details] [diff] [review]
part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space)

r=me
Attachment #8515629 - Flags: review?(bzbarsky) → review+
Comment on attachment 8515629 [details] [diff] [review]
part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space)

Though, I'd prever IsEmpty() to IsVoid() for unsafeChars...
To be clear, I mean we should use "if (mUnsafeChars.IsVoid())" to decide whether to get the pref and we should use "mUnsafeChars.IsEmpty() ? sNetworkIDNBlacklistChars : mUnsafeChars" to compute unsafeChars.
Comment on attachment 8515630 [details] [diff] [review]
part 2 - Implement a char16_t version of NS_EscapeURL

>+  aBuffer[i++] = hexChars[(aChar >> 4) & 0xF]; // high nibble

Why do you need the "& 0xF" bit there?

>+  0                                                               // 8x .. Fx

The ".. Fx" part makes no sense to me.  The DEL that was there earlier makes more sense....

-    if (tempBufferPos >= sizeof(tempBuffer) - 4) {
+    if (tempBufferPos >= mozilla::ArrayLength(tempBuffer) - ENCODE_MAX_LEN) {

The old max encode length was 3, and we subtracted 4.  Now it's 6 and we subtract 6.  Why the difference by 1?  Because we no longer try to null-terminate tempBuffer?

r=me
Attachment #8515630 - Flags: review?(bzbarsky) → review+
Comment on attachment 8515631 [details] [diff] [review]
part 3 - Use NS_EscapeURL of _retval as the result

r=me
Attachment #8515631 - Flags: review?(bzbarsky) → review+
Comment on attachment 8515632 [details] [diff] [review]
part 4 - Make UnEscapeURIForUI always escape the result if it contains blacklisted characters (wdiff, for review only)

r=me
Attachment #8515632 - Flags: review?(bzbarsky) → review+
Oh, and we still need docs in the IDL.
(In reply to Boris Zbarsky [:bz] from comment #27)
> To be clear, I mean we should use "if (mUnsafeChars.IsVoid())" to decide
> whether to get the pref and we should use "mUnsafeChars.IsEmpty() ?
> sNetworkIDNBlacklistChars : mUnsafeChars" to compute unsafeChars.

Is there a reason we shouldn't honor an empty string as the pref value?
I don't feel strongly either way; it just seems odd to not honor it
so it might be worth a code comment explaining why we don't do that.

(In reply to Boris Zbarsky [:bz] from comment #31)
> Oh, and we still need docs in the IDL.

I did document the added functionality for unEscapeURIForUI in nsITextToSubURI.idl
(in part 1).  Is there something else that I missed?
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #28)
> Why do you need the "& 0xF" bit there?

Right, I removed it.

> The ".. Fx" part makes no sense to me.  The DEL that was there earlier makes
> more sense....

Well, the "8x  DEL" in the old comment was simply wrong (DEL == 7F).
I changed the "8x .. Fx" to "80 to FF are zero" - does it make sense now?

> The old max encode length was 3, and we subtracted 4.  Now it's 6 and we
> subtract 6.  Why the difference by 1?  Because we no longer try to
> null-terminate tempBuffer?

Exactly.
Attachment #8515630 - Attachment is obsolete: true
> Is there a reason we shouldn't honor an empty string as the pref value?

Mostly because I can't think of a reason anyone would set that except by mistake.

> Is there something else that I missed?

No, I just missed it.  Looks good.

>I changed the "8x .. Fx" to "80 to FF are zero" - does it make sense now?

Yes, much better.
Flags: needinfo?(bzbarsky)
Comment on attachment 8515629 [details] [diff] [review]
part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space)

> nsTextToSubURI::nsTextToSubURI()
> {
>+  mUnsafeChars.SetIsVoid(true);
> }
...
>+  nsXPIDLString mUnsafeChars;

FYI nsXPIDLString defaults to F_VOIDED in its constructor.
Depends on: 1143644
Depends on: 1248812
You need to log in before you can comment on or make changes to this bug.