Closed Bug 415491 Opened 17 years ago Closed 10 years ago

Make UnEscapeURIForUI escape bidirectional formatting characters

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla36

People

(Reporter: Gavin, Assigned: MatsPalmgren_bugz)

References

(Blocks 2 open bugs)

Details

(Keywords: sec-low, Whiteboard: [sg:low] (prevent whack-a-mole in callers))

Attachments

(5 files, 2 obsolete files)

Patch 16 years ago Simon Montagu :smontagu 6.03 KB, patch	bzbarsky : review-	Details \| Diff \| Splinter Review
part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space) 10 years ago Mats Palmgren (inactive) 8.66 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
part 2 - Implement a char16_t version of NS_EscapeURL 10 years ago Mats Palmgren (inactive) 14.31 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
part 3 - Use NS_EscapeURL of _retval as the result 10 years ago Mats Palmgren (inactive) 1.36 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
part 4 - Make UnEscapeURIForUI always escape the result if it contains blacklisted characters (wdiff, for review only) 10 years ago Mats Palmgren (inactive) 1.90 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
(part 4, for checkin) 10 years ago Mats Palmgren (inactive) 3.30 KB, patch		Details \| Diff \| Splinter Review
part 2 - Implement a char16_t version of NS_EscapeURL 10 years ago Mats Palmgren (inactive) 14.32 KB, patch		Details \| Diff \| Splinter Review

:Gavin Sharp [email: gavin@gavinsharp.com]

Reporter

Description

•

17 years ago

The patch for bug 388372 made Firefox's nsIXULBrowserWindow::setOverLink implementation re-encode any bidi formatting characters, per RFC 3987 sections 3.2 and 4.1 paragraph 6. Should this be done by UnEscapeURIForUI instead? That would fix this problem in a more general way.

Boris Zbarsky [:bzbarsky]

Comment 1

•

16 years ago

Yes, it absolutely should! The existing behavior causes security problems unless we play whack-a-mole with each of the 10+ callers of UnEscapeURIForUI in the tree...

Flags: blocking1.9.1?

Flags: blocking1.9.0.4?

Daniel Veditz [:dveditz]

Updated

•

16 years ago

Flags: blocking1.9.0.4? → blocking1.9.0.5?

Simon Montagu :smontagu

Updated

•

16 years ago

Blocks: 436517

Simon Montagu :smontagu

Comment 2

•

16 years ago

I think it should also escape the other characters that RFC 3987 lists as unsafe in IRIs.

Status: NEW → ASSIGNED

Summary: Should UnEscapeURIForUI escape bidirectional formatting characters? → Make UnEscapeURIForUI escape bidirectional formatting characters

Simon Montagu :smontagu

Comment 3

•

16 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

Attachment #345013 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 4

•

16 years ago

Hmm. Can we only escape the unsafe stuff

Simon Montagu :smontagu

Comment 5

•

16 years ago

We could if we rolled our own escaper, but I think it makes just as much sense not to, parallel to the way that we display illegal IRIs in punycode rather than literally.

Daniel Veditz [:dveditz]

Comment 6

•

16 years ago

Probably not a true blocker, but we'd love a branch fix when the trunk is done. Please ask for approval on the reviewed patch.

Flags: blocking1.9.0.5? → wanted1.9.0.x+

Boris Zbarsky [:bzbarsky]

Comment 7

•

16 years ago

Sorry for the lag; I wasn't cced. :(

Boris Zbarsky [:bzbarsky]

Comment 8

•

16 years ago

Comment on attachment 345013 [details] [diff] [review] Patch Please add -p to your default diff options? >+++ b/intl/uconv/src/nsTextToSubURI.cpp >@@ -235,16 +238,41 @@ > // in case of failure, return escaped URI Are we guaranteed that this won't have bidi control chars or whatnot in the hostname in unescaped form, even when it's the escaped hostname? I'm not sure what guarantees that... >+ // if there are any characters that are unsafe for IRIs, reescape. >+ if (mUnsafeChars.IsEmpty()) { Would it make sense to key off IsVoid() instead and make sure the string is void in the constructor, and is set non-void when we do the pref get, even if we get no pref value? Or are we likely to hit this code during startup before our prefs are sane or some such mess? >+ // Note that this reescapes all non-ASCII characters in the URI, not just >+ // the unsafe characters >+ NS_EscapeURL(PromiseFlatCString(unescapedSpec), esc_OnlyNonASCII, >+ reescapedSpec); Shouldn't we escape _retval, not unescapedSpec? Otherwise, if the charset involved is something like UTF-7, unescapedSpec might not in fact have any non-ASCII in it, while after convertURItoUnicode happens it has random non-ASCII stuff. The behavior should be documented in the idl, right? I'm assuming your all.js change is correct; I'm not qualified to review it. A related question: doesn't UnEscapeURIForUI mess up if aCharset is UTF-16, say, due to not unescaping %00? Or am I missing something there?

Boris Zbarsky [:bzbarsky]

Comment 9

•

16 years ago

Comment on attachment 345013 [details] [diff] [review] Patch r- pending response to my question

Attachment #345013 - Flags: review?(bzbarsky) → review-

Benjamin Smedberg

Updated

•

16 years ago

Flags: wanted1.9.1+

Flags: blocking1.9.1?

Flags: blocking1.9.1-

Daniel Veditz [:dveditz]

Comment 10

•

16 years ago

bz: Just to poke this bug awake, is the follow a correct summary of what you wanted? administrivia: - add -p option to the diff - document the behavior in the idl questions: 1) If the unescaping fails what guarantees that all dangerous characters were escaped in the first place? Could an attack sneak a bidi char in there by making sure there was also something that would trigger this to fail? 2) Would it be better to distinguish an unset list from an empty pref value? (Personally I think we should hardcode in the default list to be used if prefs aren't loaded or some embeddor forgot to set this one) 3) If unsafes are found should we escape _retval instead of unescapedSpec? 4) Does UnEscapeURIForUI mess up if aCharset is UTF-16 due to not unescaping %00? The all.js change looks fine, it's what was there before plus the characters moved from browser.js. any super-reviewer can bless that, doesn't need a prefs module owner. Unless I'm missing bz's point

Whiteboard: [sg:low] (prevent whack-a-mole in callers)

Boris Zbarsky [:bzbarsky]

Comment 11

•

16 years ago

Comment 10 sounds right. My only point about prefs.js is that I have no idea whether all the chars we want added have been added.

Shawn Wilsher :sdwilsh

Comment 12

•

16 years ago

I can grab this after I get bug 493975 in if this isn't being actively worked on.

Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]]

Updated

•

13 years ago

Keywords: sec-low

Mats Palmgren (inactive)

Assignee

Comment 13

•

10 years ago

Attached patch part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space) — Details — Splinter Review

I'm going to attach 4 patches that tries to address the points outlined in dveditz summary above. (In reply to Daniel Veditz [:dveditz] from comment #10) > - add -p option to the diff Fixed. > - document the behavior in the idl I've added/corrected documentation in a few places; in this patch and those that follows. > 2) Would it be better to distinguish an unset list from an empty pref value? > (Personally I think we should hardcode in the default list to be used if > prefs aren't loaded or some embeddor forgot to set this one) Fixed (both the IsVoid and static fallback points). NOTE I have made one exception to the list in network.IDN.blacklist_chars: I excluded SPACE (0x20) because there are numerous tests that fails if we escape that here. I'm not sure what the problem is, I'm guessing we have code that does stuff based on word boundaries and %20 would mess that up. Also, %20 looks kind of ugly in UI IMO. For example if you type a few space separated words in the URL bar you will see: Search Google for a%20few%20words in the popup menu under the URL bar. FYI, here's the Try failures: https://tbpl.mozilla.org/?tree=Try&rev=0672d7599421

Attachment #345013 - Attachment is obsolete: true

Attachment #8515629 - Flags: review?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 14

•

10 years ago

Attached patch part 2 - Implement a char16_t version of NS_EscapeURL (obsolete) — Details — Splinter Review

(In reply to Daniel Veditz [:dveditz] from comment #10) > 3) If unsafes are found should we escape _retval instead of unescapedSpec? This is the first part of the fix for that, the second part comes next.

Attachment #8515630 - Flags: review?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 15

•

10 years ago

Attached patch part 3 - Use NS_EscapeURL of _retval as the result — Details — Splinter Review

Attachment #8515631 - Flags: review?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 16

•

10 years ago

Attached patch part 4 - Make UnEscapeURIForUI always escape the result if it contains blacklisted characters (wdiff, for review only) — Details — Splinter Review

(In reply to Daniel Veditz [:dveditz] from comment #10) > 1) If the unescaping fails what guarantees that all dangerous characters > were escaped in the first place? Could an attack sneak a bidi char in there > by making sure there was also something that would trigger this to fail? IIUC, always checking and escaping the final result should fix that? https://tbpl.mozilla.org/?tree=Try&rev=446f1af8e77c

Attachment #8515632 - Flags: review?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 17

•

10 years ago

Attached patch (part 4, for checkin) — Details — Splinter Review

Mats Palmgren (inactive)

Assignee

Comment 18

•

10 years ago

> 4) Does UnEscapeURIForUI mess up if aCharset is UTF-16 due to not unescaping > %00? An input value containing %00 (escaped NUL) will pass through UnEscapeURIForUI as is. That seems like the desired behavior to me. If you disagree, could you elaborate on the problem and suggest a solution please?

Flags: needinfo?(dveditz)

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 19

•

10 years ago

> That seems like the desired behavior to me. Not when aCharset is UTF-16. Because then "a" would be encoded in escaped form as "a%00" (or "%00a" depending on whether it's LE or BE), no? Or am I missing something?

Flags: needinfo?(bzbarsky) → needinfo?(mats)

Mats Palmgren (inactive)

Assignee

Comment 20

•

10 years ago

I guess I misunderstood your question in comment 8 - I thought by %00 you meant an already escaped NUL in the input, i.e. the three ASCII characters '%' '0' '0' (that's why I pointed that out in my last comment). I guess that by writing %00 you meant a byte with the value zero? So you want to know what the output is for the input string: nsAutoCString in; in.Append('a'); in.Append(char(0x0)); for various utf-16 charsets, is that correct?

Flags: needinfo?(mats) → needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 21

•

10 years ago

I meant, what happens if the input, which is an _escaped_ URI, contains the string "a%00" and claims to be escaped UTF-16LE? It seems to me that correctly unescaping that should result in the Unicode string "a", no?

Flags: needinfo?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 22

•

10 years ago

As I said in comment 18, "%00" in the input just pass through UnEscapeURIForUI as is. That's because we use 'esc_SkipControl' in the NS_UnescapeURL call: http://mxr.mozilla.org/mozilla-central/source/intl/uconv/nsTextToSubURI.cpp?rev=3c98330df76f#207 If I remove the 'esc_SkipControl' there, we get an unescaped NUL at that point, but convertURItoUnicode treats that as ASCII here: http://mxr.mozilla.org/mozilla-central/source/intl/uconv/nsTextToSubURI.cpp?rev=3c98330df76f#159 ('isStatefulCharset' is false and IsASCII on the octets 0x61 and 0x00 is true.) http://mxr.mozilla.org/mozilla-central/source/xpcom/string/nsReadableUtils.cpp#466 So the end result, with 'esc_SkipControl' removed, is the octets 0x61 0x00 0x00 0x00. Which shows up in the UI as 'a' followed by a hex-glyph for zero (I suspect that the glyph for NUL might be font-dependent though).

Flags: needinfo?(dveditz)

Boris Zbarsky [:bzbarsky]

Comment 23

•

10 years ago

> "%00" in the input just pass through UnEscapeURIForUI as is. Yes, that behavior is what my question was about! > we get an unescaped NUL at that point, but convertURItoUnicode treats that as ASCII here: That seems like a bug to me, in the case of UTF-16... All of this is preexisting, so dealing with it in a followup seems fine to me, but it's pretty clearly buggy. :(

Mats Palmgren (inactive)

Assignee

Comment 24

•

10 years ago

Sounds good to me - filed bug 1094318.

Boris Zbarsky [:bzbarsky]

Comment 25

•

10 years ago

Comment on attachment 8515629 [details] [diff] [review] part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space) r=me

Attachment #8515629 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 26

•

10 years ago

Boris Zbarsky [:bzbarsky]

Comment 27

•

10 years ago

To be clear, I mean we should use "if (mUnsafeChars.IsVoid())" to decide whether to get the pref and we should use "mUnsafeChars.IsEmpty() ? sNetworkIDNBlacklistChars : mUnsafeChars" to compute unsafeChars.

Boris Zbarsky [:bzbarsky]

Comment 28

•

10 years ago

Comment on attachment 8515630 [details] [diff] [review] part 2 - Implement a char16_t version of NS_EscapeURL >+ aBuffer[i++] = hexChars[(aChar >> 4) & 0xF]; // high nibble Why do you need the "& 0xF" bit there? >+ 0 // 8x .. Fx The ".. Fx" part makes no sense to me. The DEL that was there earlier makes more sense.... - if (tempBufferPos >= sizeof(tempBuffer) - 4) { + if (tempBufferPos >= mozilla::ArrayLength(tempBuffer) - ENCODE_MAX_LEN) { The old max encode length was 3, and we subtracted 4. Now it's 6 and we subtract 6. Why the difference by 1? Because we no longer try to null-terminate tempBuffer? r=me

Attachment #8515630 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 29

•

10 years ago

Comment on attachment 8515631 [details] [diff] [review] part 3 - Use NS_EscapeURL of _retval as the result r=me

Attachment #8515631 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 30

•

10 years ago

Comment on attachment 8515632 [details] [diff] [review] part 4 - Make UnEscapeURIForUI always escape the result if it contains blacklisted characters (wdiff, for review only) r=me

Attachment #8515632 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Comment 31

•

10 years ago

Oh, and we still need docs in the IDL.

Mats Palmgren (inactive)

Assignee

Comment 32

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #27) > To be clear, I mean we should use "if (mUnsafeChars.IsVoid())" to decide > whether to get the pref and we should use "mUnsafeChars.IsEmpty() ? > sNetworkIDNBlacklistChars : mUnsafeChars" to compute unsafeChars. Is there a reason we shouldn't honor an empty string as the pref value? I don't feel strongly either way; it just seems odd to not honor it so it might be worth a code comment explaining why we don't do that. (In reply to Boris Zbarsky [:bz] from comment #31) > Oh, and we still need docs in the IDL. I did document the added functionality for unEscapeURIForUI in nsITextToSubURI.idl (in part 1). Is there something else that I missed?

Flags: needinfo?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 33

•

10 years ago

Attached patch part 2 - Implement a char16_t version of NS_EscapeURL — Details — Splinter Review

(In reply to Boris Zbarsky [:bz] from comment #28) > Why do you need the "& 0xF" bit there? Right, I removed it. > The ".. Fx" part makes no sense to me. The DEL that was there earlier makes > more sense.... Well, the "8x DEL" in the old comment was simply wrong (DEL == 7F). I changed the "8x .. Fx" to "80 to FF are zero" - does it make sense now? > The old max encode length was 3, and we subtracted 4. Now it's 6 and we > subtract 6. Why the difference by 1? Because we no longer try to > null-terminate tempBuffer? Exactly.

Attachment #8515630 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Comment 34

•

10 years ago

> Is there a reason we shouldn't honor an empty string as the pref value? Mostly because I can't think of a reason anyone would set that except by mistake. > Is there something else that I missed? No, I just missed it. Looks good. >I changed the "8x .. Fx" to "80 to FF are zero" - does it make sense now? Yes, much better.

Flags: needinfo?(bzbarsky)

Mats Palmgren (inactive)

Assignee

Comment 35

•

10 years ago

OK, I changed it to IsEmpty() and added a code comment to say it's intentional. https://hg.mozilla.org/integration/mozilla-inbound/rev/2024df052302 https://hg.mozilla.org/integration/mozilla-inbound/rev/2a2ffce90d15 https://hg.mozilla.org/integration/mozilla-inbound/rev/927204e2bd36 https://hg.mozilla.org/integration/mozilla-inbound/rev/f884c46fa07d

Assignee: smontagu → mats

neil@parkwaycc.co.uk

Comment 36

•

10 years ago

Comment on attachment 8515629 [details] [diff] [review] part 1, Make UnEscapeURIForUI escape the URI if it contains any 'network.IDN.blacklist_chars' characters (except space) > nsTextToSubURI::nsTextToSubURI() > { >+ mUnsafeChars.SetIsVoid(true); > } ... >+ nsXPIDLString mUnsafeChars; FYI nsXPIDLString defaults to F_VOIDED in its constructor.

Carsten Book [:Tomcat]

Comment 37

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/2024df052302 https://hg.mozilla.org/mozilla-central/rev/2a2ffce90d15 https://hg.mozilla.org/mozilla-central/rev/927204e2bd36 https://hg.mozilla.org/mozilla-central/rev/f884c46fa07d

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla36

Alice0775 White

Updated

•

10 years ago

Depends on: 1143644

Masatoshi Kimura [:emk]

Updated

•

9 years ago

Depends on: 1248812

Masatoshi Kimura [:emk]

Updated

•

4 years ago

Regressions: 1680139

Mathew Hodson

Updated

•

3 years ago

Blocks: 1248812, 1143644

No longer depends on: 1248812, 1143644

Mathew Hodson

Updated

•

3 years ago

No longer blocks: 1143644

Regressions: 1143644

You need to log in before you can comment on or make changes to this bug.