Last Comment Bug 505991 - Location bar should not decode precent encoded overlong UTF-8 sequence
: Location bar should not decode precent encoded overlong UTF-8 sequence
Status: VERIFIED FIXED
[sg:low] [qa!]
: verified-aurora, verified-beta
Product: Firefox
Classification: Client Software
Component: Location Bar (show other bugs)
: Trunk
: All All
: -- normal (vote)
: Firefox 8
Assigned To: Masahiro YAMADA
:
: Marco Bonardo [::mak]
Mentors:
http://www.google.co.jp/search?&q=%E0...
Depends on: 511859 520095
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-23 06:59 PDT by Masahiro YAMADA
Modified: 2012-06-04 04:21 PDT (History)
10 users (show)
dveditz: blocking1.9.0.14-
dveditz: wanted1.9.0.x+
sdwilsh: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed
fixed
wontfix
-
wanted


Attachments
patch rev.1.0 for Trunk (1.21 KB, patch)
2009-07-23 07:02 PDT, Masahiro YAMADA
dao+bmo: review+
Details | Diff | Splinter Review
fixed patch (1.21 KB, patch)
2009-07-25 00:45 PDT, Masahiro YAMADA
no flags Details | Diff | Splinter Review
Fixed patch rev.2 (1.21 KB, patch)
2009-07-25 01:23 PDT, Masahiro YAMADA
no flags Details | Diff | Splinter Review

Description Masahiro YAMADA 2009-07-23 06:59:24 PDT
If URL contains percent encoded invalid UTF-8 byte sequence,
location bar converts it as U+FFFD using function losslessDecodeURI(aURI) in "browser.js".
But U+FFFD means decoding failure. If decoded result contains U+FFFD, we should use original URL string, not decoded result.

It can be used to spoof URL.

For example:
Steps to reproduce
1. Set "ja-JP" to browser's default accetable language
2. Go to "http://www.google.co.jp/search?&q=%E0%80%A2"
3. It will show serach result of "烙「".
   ("烙「" is result of decoding byte sequence [E0][80][A2] by Shift_JIS)
4.Copy URL displayed in Location Bar
5.Paste it to other text editor

Expected result
Pasted text is "http://www.google.co.jp/search?&q=%E0%80%A2"

Actual Result:
Pasted result is "http://www.google.co.jp/search?&q=%EF%BF%BD"
(%EF%BF%BD is percent encoded UTF-8 representation of U+FFFD)
Comment 1 Masahiro YAMADA 2009-07-23 07:02:43 PDT
Created attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk
Comment 2 Shawn Wilsher :sdwilsh 2009-07-23 07:14:12 PDT
This will need a test too.
Comment 3 Daniel Veditz [:dveditz] 2009-07-23 08:49:18 PDT
How is this a spoofing security problem? Your description sounds more like copy/paste dataloss, I don't see how anyone is fooled about anything.
Comment 4 Samuel Sidler (old account; do not CC) 2009-07-23 10:28:03 PDT
If we block on this for 1.9.0.13, we'll block on it for 1.9.1.3, but we really need an answer to comment 3 first.
Comment 5 Masahiro YAMADA 2009-07-23 14:18:49 PDT
(In reply to comment #3)
> How is this a spoofing security problem? Your description sounds more like
> copy/paste dataloss, I don't see how anyone is fooled about anything.

No, it is not only copy/paste problem. Is it example.
This is data loss of decodeURI().

Almostly, decodeURI throws exception when string does not seems to be UTF-8.
But, if byte sequence matches some malformed multibyte representation (It is not prohibited at RFC 2279, but prohibited at RFC 3929), decodeURI() decodes it as U+FFFD, not throwing exception.

list of malformed otetects

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 1100000x 10xxxxxx
   0000 0000-0000 07FF | 11100000 100xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0000 0000-0000 FFFF | 11110000 10xxxxxx 10xxxxxx 10xxxxxx

This problem can also break History, Autocomplete, Text editing on location bar.

And, At Trunk, It might be better to fix decodeURI() of JavaScript to throw exeception.
Comment 6 Masahiro YAMADA 2009-07-24 14:05:29 PDT
This problem depends on How decodeURI() handle overlong UTF-8 sequences.
When broken UTF-8 sequence other than overlong sequences, decodeURI() throws exception.
But, if broken UTF-8 sequence is overlong sequence, it is decoded as U+FFFD.
(See also bug 172699.)

This problem breaks autocomplete of locationbar because autocomplete uses decoded URL, not original URL.

after visiting http://www.google.co.jp/search?&q=%E0%80%A2 , 
when typing "http://www.google.co.jp/search?&q=%E0%80",
autocomplete shows "http://www.google.co.jp/search?&q=%E0%80%A2",
 but, completed URL is decoded URL.
It is "http://www.google.co.jp/search?&q=%EF%BF%BD"
So, attacker can spoof URL using autocomplete.
Comment 7 Daniel Veditz [:dveditz] 2009-07-24 15:43:48 PDT
"spoofing" seems too strong a term if you're limited to uFFFD characters. We'll approve a reviewed patch on the branch to fix the dataloss but not blocking.
Comment 8 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-07-24 16:02:45 PDT
Comment on attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk

comment should be:

// Presence of the replacement character means the decoding failed

Looks OK to me, but lets see what Dao has to say.
Comment 9 Dão Gottwald [:dao] 2009-07-25 00:27:17 PDT
Comment on attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk

>+  //Exissance of replacement character, means failure of decode.

Please rewrite as suggested by Gavin (including the space after //). And if you think decodeURI should throw instead of using the replacement character, please file a bug on that and add the bug number to this code comment.

>+  if (value.indexOf("\ufffd")>=0) {
>+     value = aURI.spec
>+  }

Remove { and }, add a space before and after >=, remove a space before "value = aURI.spec" and append a semicolon.
Comment 10 Masahiro YAMADA 2009-07-25 00:45:19 PDT
Created attachment 390630 [details] [diff] [review]
fixed patch
Comment 11 Dão Gottwald [:dao] 2009-07-25 00:46:50 PDT
Comment on attachment 390630 [details] [diff] [review]
fixed patch

>+  if (value.indexOf("\ufffd")>=0)

spaces around >= are still missing
Comment 12 Masahiro YAMADA 2009-07-25 01:23:09 PDT
Created attachment 390633 [details] [diff] [review]
Fixed patch rev.2
Comment 13 Shawn Wilsher :sdwilsh 2009-07-25 08:15:37 PDT
Hate feeling like I'm in an echo chamber, but test *please*?  This won't make it on branch without one I bet...
Comment 14 Dão Gottwald [:dao] 2009-07-30 23:46:31 PDT
Can this land?
Comment 15 Masahiro YAMADA 2009-08-21 05:56:01 PDT
Some overlong UTF-8 sequence is not converted to U+FFFD.
It is also needed to fix bug 511859.
Comment 16 Masahiro YAMADA 2009-10-09 07:59:37 PDT
As result of fix of bug 511859, left problems are only two patterns
(%EF%BF%BE  = U+FFFE,  %EF%BF%BF = U+FFFF).

At current implementation, decodeURI decodes them to U+FFFD.

If decodeURI() decodes them as U+FFFE and U+FFFF, patch is not needed.
(See also :bug 520095)
Comment 17 Masahiro YAMADA 2009-10-09 08:05:40 PDT
> 
> If decodeURI() decodes them as U+FFFE and U+FFFF, patch is not needed.
> (See also :bug 520095)

Oops, What I want to say is, 
If decodeURI() decodes them as U+FFFE and U+FFFF, current patch will be obsolete.
But,I don't know how location bar will show U+FFFE and U+FFFF.
If showing U+FFFE and U+FFFF causes another problem , new patch will be needed.
Comment 18 Masahiro YAMADA 2011-07-21 23:10:20 PDT
Now all cases are fixed by bug 511859 and bug 520095.
-->FIXED.
Comment 19 Alex Keybl [:akeybl] 2011-10-27 09:28:10 PDT
This bug (or rather 520095) does not appear critical enough to take the chance of regression in 3.6. status1.9.2->wontfix
Comment 20 Paul Silaghi, QA [:pauly] 2011-11-14 08:03:06 PST
The issue is no longer reproducible on:
Mozilla/5.0 (Windows NT 5.1; rv:11.0a1) Gecko/20111113 Firefox/11.0a1
Mozilla/5.0 (Windows NT 5.1; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20100101 Firefox/9.0

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111113 Firefox/11.0a1

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0a1) Gecko/20111113 Firefox/11.0a1

Mozilla/5.0 (X11; Linux x86_64; rv:11.0a1) Gecko/20111113 Firefox/11.0a1
Mozilla/5.0 (X11; Linux x86_64; rv:10.0a2) Gecko/20111114 Firefox/10.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20100101 Firefox/9.0

IMO this is verified fixed.
Comment 21 Mihaela Velimiroviciu (:mihaelav) 2011-11-14 08:04:53 PST
Marking VERIFIED per previous comment.
Comment 22 Masahiro YAMADA 2011-11-14 22:43:44 PST
As result of bug 511859, bug 660612,  all known problem of invalid UTF-8 is fixed.
I agree to mark this bug as VERIFIED.
Comment 23 Paul Silaghi, QA [:pauly] 2012-05-28 09:27:25 PDT
Is this the right behavior - bug 750266 ?
Comment 24 Masahiro YAMADA 2012-06-04 04:21:38 PDT
(In reply to Paul Silaghi [QA] from comment #23)
> Is this the right behavior - bug 750266 ?

I think , bug 750266 is Copy & Paste problem.
Target of this bug is INVALID UTF-8 sequence, not valid UTF-8 sequence.
URL written in bug 750266 contains only valid UTF-8 sequence.

Note You need to log in before you can comment on or make changes to this bug.