Location bar should not decode precent encoded overlong UTF-8 sequence

VERIFIED FIXED in Firefox 8

Status

()

Firefox
Location Bar
VERIFIED FIXED
8 years ago
5 years ago

People

(Reporter: Masahiro YAMADA, Assigned: Masahiro YAMADA)

Tracking

({verified-aurora, verified-beta})

Trunk
Firefox 8
verified-aurora, verified-beta
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.0.14 -
wanted1.9.0.x +
in-testsuite ?

Firefox Tracking Flags

(firefox8 fixed, firefox9 fixed, firefox10 fixed, status1.9.2 wontfix, blocking1.9.1 -, status1.9.1 wanted)

Details

([sg:low] [qa!], URL)

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

8 years ago
If URL contains percent encoded invalid UTF-8 byte sequence,
location bar converts it as U+FFFD using function losslessDecodeURI(aURI) in "browser.js".
But U+FFFD means decoding failure. If decoded result contains U+FFFD, we should use original URL string, not decoded result.

It can be used to spoof URL.

For example:
Steps to reproduce
1. Set "ja-JP" to browser's default accetable language
2. Go to "http://www.google.co.jp/search?&q=%E0%80%A2"
3. It will show serach result of "烙「".
   ("烙「" is result of decoding byte sequence [E0][80][A2] by Shift_JIS)
4.Copy URL displayed in Location Bar
5.Paste it to other text editor

Expected result
Pasted text is "http://www.google.co.jp/search?&q=%E0%80%A2"

Actual Result:
Pasted result is "http://www.google.co.jp/search?&q=%EF%BF%BD"
(%EF%BF%BD is percent encoded UTF-8 representation of U+FFFD)
(Assignee)

Comment 1

8 years ago
Created attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk
Attachment #390221 - Flags: review?(gavin.sharp)
This will need a test too.

Updated

8 years ago
Flags: in-testsuite?
(Assignee)

Updated

8 years ago
blocking1.9.1: --- → ?
Flags: blocking1.9.0.13?
How is this a spoofing security problem? Your description sounds more like copy/paste dataloss, I don't see how anyone is fooled about anything.
Whiteboard: [sg:needinfo]
If we block on this for 1.9.0.13, we'll block on it for 1.9.1.3, but we really need an answer to comment 3 first.
status1.9.1: --- → ?
Flags: wanted1.9.0.x?
(Assignee)

Comment 5

8 years ago
(In reply to comment #3)
> How is this a spoofing security problem? Your description sounds more like
> copy/paste dataloss, I don't see how anyone is fooled about anything.

No, it is not only copy/paste problem. Is it example.
This is data loss of decodeURI().

Almostly, decodeURI throws exception when string does not seems to be UTF-8.
But, if byte sequence matches some malformed multibyte representation (It is not prohibited at RFC 2279, but prohibited at RFC 3929), decodeURI() decodes it as U+FFFD, not throwing exception.

list of malformed otetects

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 1100000x 10xxxxxx
   0000 0000-0000 07FF | 11100000 100xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0000 0000-0000 FFFF | 11110000 10xxxxxx 10xxxxxx 10xxxxxx

This problem can also break History, Autocomplete, Text editing on location bar.

And, At Trunk, It might be better to fix decodeURI() of JavaScript to throw exeception.
(Assignee)

Comment 6

8 years ago
This problem depends on How decodeURI() handle overlong UTF-8 sequences.
When broken UTF-8 sequence other than overlong sequences, decodeURI() throws exception.
But, if broken UTF-8 sequence is overlong sequence, it is decoded as U+FFFD.
(See also bug 172699.)

This problem breaks autocomplete of locationbar because autocomplete uses decoded URL, not original URL.

after visiting http://www.google.co.jp/search?&q=%E0%80%A2 , 
when typing "http://www.google.co.jp/search?&q=%E0%80",
autocomplete shows "http://www.google.co.jp/search?&q=%E0%80%A2",
 but, completed URL is decoded URL.
It is "http://www.google.co.jp/search?&q=%EF%BF%BD"
So, attacker can spoof URL using autocomplete.
"spoofing" seems too strong a term if you're limited to uFFFD characters. We'll approve a reviewed patch on the branch to fix the dataloss but not blocking.
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.13?
Flags: blocking1.9.0.13-
Comment on attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk

comment should be:

// Presence of the replacement character means the decoding failed

Looks OK to me, but lets see what Dao has to say.
Attachment #390221 - Flags: review?(gavin.sharp) → review?(dao)
blocking1.9.1: ? → -
status1.9.1: ? → wanted

Updated

8 years ago
Attachment #390221 - Flags: review?(dao) → review+

Comment 9

8 years ago
Comment on attachment 390221 [details] [diff] [review]
patch rev.1.0 for Trunk

>+  //Exissance of replacement character, means failure of decode.

Please rewrite as suggested by Gavin (including the space after //). And if you think decodeURI should throw instead of using the replacement character, please file a bug on that and add the bug number to this code comment.

>+  if (value.indexOf("\ufffd")>=0) {
>+     value = aURI.spec
>+  }

Remove { and }, add a space before and after >=, remove a space before "value = aURI.spec" and append a semicolon.
(Assignee)

Comment 10

8 years ago
Created attachment 390630 [details] [diff] [review]
fixed patch
Comment on attachment 390630 [details] [diff] [review]
fixed patch

>+  if (value.indexOf("\ufffd")>=0)

spaces around >= are still missing
(Assignee)

Comment 12

8 years ago
Created attachment 390633 [details] [diff] [review]
Fixed patch rev.2
Attachment #390630 - Attachment is obsolete: true
Hate feeling like I'm in an echo chamber, but test *please*?  This won't make it on branch without one I bet...
(Assignee)

Updated

8 years ago
Summary: URL spoofing using percent encoded illegal UTF-8 byte sequence → Location bar should not decode precent encoded overlong UTF-8 sequence
Can this land?
Assignee: nobody → masa141421356
(Assignee)

Comment 15

8 years ago
Some overlong UTF-8 sequence is not converted to U+FFFD.
It is also needed to fix bug 511859.
Depends on: 511859
(Assignee)

Comment 16

8 years ago
As result of fix of bug 511859, left problems are only two patterns
(%EF%BF%BE  = U+FFFE,  %EF%BF%BF = U+FFFF).

At current implementation, decodeURI decodes them to U+FFFD.

If decodeURI() decodes them as U+FFFE and U+FFFF, patch is not needed.
(See also :bug 520095)

Updated

8 years ago
Depends on: 520095
(Assignee)

Comment 17

8 years ago
> 
> If decodeURI() decodes them as U+FFFE and U+FFFF, patch is not needed.
> (See also :bug 520095)

Oops, What I want to say is, 
If decodeURI() decodes them as U+FFFE and U+FFFF, current patch will be obsolete.
But,I don't know how location bar will show U+FFFE and U+FFFF.
If showing U+FFFE and U+FFFF causes another problem , new patch will be needed.
(Assignee)

Comment 18

6 years ago
Now all cases are fixed by bug 511859 and bug 520095.
-->FIXED.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
This bug (or rather 520095) does not appear critical enough to take the chance of regression in 3.6. status1.9.2->wontfix
status1.9.2: --- → wontfix
Group: core-security
status-firefox10: --- → fixed
status-firefox8: --- → fixed
status-firefox9: --- → fixed
Whiteboard: [sg:needinfo] → [sg:low]
Target Milestone: --- → Firefox 8
Group: core-security
Group: core-security
The issue is no longer reproducible on:
Mozilla/5.0 (Windows NT 5.1; rv:11.0a1) Gecko/20111113 Firefox/11.0a1
Mozilla/5.0 (Windows NT 5.1; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20100101 Firefox/9.0

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111113 Firefox/11.0a1

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0a2) Gecko/20111113 Firefox/10.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0a1) Gecko/20111113 Firefox/11.0a1

Mozilla/5.0 (X11; Linux x86_64; rv:11.0a1) Gecko/20111113 Firefox/11.0a1
Mozilla/5.0 (X11; Linux x86_64; rv:10.0a2) Gecko/20111114 Firefox/10.0a2
Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20100101 Firefox/9.0

IMO this is verified fixed.
Marking VERIFIED per previous comment.
Status: RESOLVED → VERIFIED
Keywords: verified-aurora, verified-beta
Whiteboard: [sg:low] → [sg:low] [qa!]
(Assignee)

Comment 22

5 years ago
As result of bug 511859, bug 660612,  all known problem of invalid UTF-8 is fixed.
I agree to mark this bug as VERIFIED.
Is this the right behavior - bug 750266 ?
(Assignee)

Comment 24

5 years ago
(In reply to Paul Silaghi [QA] from comment #23)
> Is this the right behavior - bug 750266 ?

I think , bug 750266 is Copy & Paste problem.
Target of this bug is INVALID UTF-8 sequence, not valid UTF-8 sequence.
URL written in bug 750266 contains only valid UTF-8 sequence.
You need to log in before you can comment on or make changes to this bug.