Alternate URL encodings are not the same (normalization)

NEW
Unassigned

Status

()

Core
Networking: Cookies
P5
normal
9 years ago
9 months ago

People

(Reporter: dr.kral, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-would-take])

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.4) Gecko/20091016 Firefox/3.5.4 (.NET CLR 3.5.30729)

Alternate ways of specifying URL are not treated the same even when the location bar display is the same.

Example -- www.asstr.com/%7Ename  and  www.asstr.com/~name are both displayed in the later form but they are treated as different paths in the cookies list.



Reproducible: Always

Steps to Reproduce:
1.create a web page with a ~ in the name that writes a cookie.
2.access using ~ and %7E in the url.
3.note that the url in the location bar always shows ~ once the page is accessed.
4.note that there are two cookies with different paths in the cookie list.
Actual Results:  
I get two cookies based on "different" paths that look the same to me in the location bar since the encoding is always converted back to simple characters.

Expected Results:  
The url is displayed the same so should have the same cookie path.

Also the same problem in Opera and Safari(win)

IE7 does this correctly.
Component: General → Networking: Cookies
Product: Firefox → Core
QA Contact: general → networking.cookies
Version: unspecified → 1.9.1 Branch
This seems to be intendent if I read bug 269751 comment#1
(Reporter)

Comment 2

9 years ago
I have found a reference which I believe supports my case that when the given URL is equivalent to the one displayed the path name should also be the same.  Intuitively it is the way that people operate.

Please see:  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html  which is part of "Hypertext Transfer Protocol -- HTTP/1.1" 

The relevant section is:
<quote>

3.2.3 URI Comparison

When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:

      - A port that is empty or not given is equivalent to the default
        port for that URI-reference;

        - Comparisons of host names MUST be case-insensitive;

        - Comparisons of scheme names MUST be case-insensitive;

        - An empty abs_path is equivalent to an abs_path of "/".

Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

For example, the following three URIs are equivalent:

      http://abc.com:80/~smith/home.html
      http://ABC.com/%7Esmith/home.html
      http://ABC.com:/%7esmith/home.html

</quote>

Comment 3

9 years ago
We already perform normalization of the host before comparison - UTF8 characters are normalized to ACE, for instance, and everything is lowercased.

My gut reaction is that we should unescape the path. CC'ing dveditz and Adam Barth in case they have thoughts on this.

Comment 4

9 years ago
I can write a test for the http-state test suite and give you a survey of what other browsers do here.  I'll be able to do that in the next couple of days.
(Reporter)

Comment 5

9 years ago
A simple test writing a cookie with URL= host/~path and then attempting to read with URL = host/%7epath yields the following

The cookie is found:
     IE7 and Chrome3  -- Yes
     FF3, Opera10 and Safari4(windows) -- No

The URL in the address bar is changed to <host/~path> from <host/%7epath>:
     IE7, Chrome3, FF3, and Opera10  -- Yes
     Safari4(windows) -- No

Comment 6

9 years ago
I agree with Dan and the reporter that we should follow IE and the RFC here.

Comment 7

9 years ago
(In reply to comment #5)
> The URL in the address bar is changed to <host/~path> from <host/%7epath>:
>      IE7, Chrome3, FF3, and Opera10  -- Yes
>      Safari4(windows) -- No

Hmm. So Safari4 just doesn't set (or otherwise doesn't render accessible) the cookie for <host/~path> at all? Curious.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows Vista → All
Hardware: x86 → All
Version: 1.9.1 Branch → Trunk
(Reporter)

Comment 8

9 years ago
(In reply to comment #7)
Safari considers <host/~path> and <host/%7Epath> as different as <host/~path2>.  I don't have the tools to determine if Safari or the host server does the unescaping to fetch the correct page.

Permit me to explain in more detail.  Consider a web page at host/~abc/file.htm with a relative link to secondfile.htm.

===================
In IE7
start with url=host/~abc/file.htm and write a cookie I get
  address bar = host/~abc/file.htm
  cookie host = host
  cookie path = ~abc
 link page in address bar = host/~abc/secondfile.htm and cookie with path= ~abc  is available

start with url=host/%7Eabc/file.htm and write a cookie I get
  address bar = host/~abc/file.htm
  cookie host = host
  cookie path = ~abc
  link page in address bar = host/~abc/secondfile.htm and cookie with path= ~abc is available

It appears that IE unescapes the URL at the beginning and there is never any ambiguity. (I think that this is the correct way and in accordance with the RFC previously quoted.)

======================
In FF 3.5
start with url=host/~abc/file.htm and write a cookie I get
  address bar = host/~abc/file.htm
  cookie host = host
  cookie path = ~abc
  link page in address bar = host/~abc/secondfile.htm and cookie with path= ~abc is available

start with url=host/%7Eabc/file.htm and write a cookie I get
  address bar = host/~abc/file.htm
  cookie host = host
  cookie path = %7Eabc
  link page in address bar = host/~abc/secondfile.htm and cookie with path= %7Eabc is available

It appears that FF unescapes the URL for display but not internally so that relative link are still escaped as is the cookie path.  (I think that this is confusing behavior.)

=======================
In Safari4(windows) 
start with url=host/~abc/file.htm and write a cookie I get
  address bar = host/~abc/file.htm
  cookie host = host
  cookie path = ~abc
 link page in address bar = host/~abc/secondfile.htm and cookie with path= ~abc is available

start with url=host/%7Eabc/file.htm and write a cookie I get
  address bar = host/%7Eabc/file.htm
  cookie host = host
  cookie path = %7Eabc
  link page in address bar = host/%7Eabc/secondfile.htm and cookie with path= %7Eabc is available

It appears that Safari never unescapes the URL so that the two are different in all places.  (I think that this is consistent but incorrect.)

Comment 9

9 years ago
Here's my test case.  The server responds with a redirect and the following headers:

Set-Cookie: foo=bar; path=/cookie-parser-result/f%6Fo/bar
Location: /cookie-parser-result/foo/bar?path0028

When the browser (IE8, Firefox 3.5, Safari 4, Chrome 4) arrive at /cookie-parser-result/foo/bar?path0028, the cookie is not visible.

The only browser I've tested that behaves as you describe (i.e., unescapes the path before comparing against the path) is Opera 10.

I've added this to the http-state test suite as path0028.

Can you supply a working test case (e.g., a server) that reproduces the issue you're seeing?
(Reporter)

Comment 10

9 years ago
(In reply to comment #9)
I hope I have not misled you.  I don't have the tools to examine what is happening INSIDE the browser or to determine who/when the unescaping is done or the exchanges between the browser and server.  I can only see the display in the location bar and the path given in the cookie list.

I looked at my cookie list and noticed that all but one are from domains that are private (i.e., single company) and all set the path to "/".  The one exception I found is a shared domain, i.e., www.asstr.org, where users are given subdomains begining with "~".  These are users that rarely use cookies.

I also do not know how many levels of path are significant.

> When the browser (IE8, Firefox 3.5, Safari 4, Chrome 4) arrive at
> /cookie-parser-result/foo/bar?path0028, the cookie is not visible.
If you set the cookie starting with /cookie-parser-result/f%6Fo/bar that is what I expect except in IE.


> Can you supply a working test case (e.g., a server) that reproduces the issue
> you're seeing?

Here is a location that you can try.   
      http://www.asstr.org/~YLeeCoyote/yourway.htm

Go to the location and set some parameters.  I suggest setting a color at the option "Set a starting color for:" (as it easy to see later) and then save the cookie.

Change the location bar to 
     http://www.asstr.org/%7EYLeeCoyote/yourway.htm 
and repeat with different value(s).  Note in IE the first cookie is found but not in FF.

Check the cookie list and see that there are two cookies in FF but only one in IE.  Both IE and FF show the unescaped form of the url in the location bar.

Reload the location bar with each form and restore saved data as prompted.  See that you get the last saved values in IE and two different ones in FF.

Comment 11

9 years ago
> Change the location bar to 
>      http://www.asstr.org/%7EYLeeCoyote/yourway.htm 
> and repeat with different value(s).  Note in IE the first cookie is found but
> not in FF.

I understand now.  I suspect the issue is not whether the path comparison algorithm in the cookie manager is sensitive or insensitive to URL escaping, the issue is that Firefox is not canonicalizing URLs entered in the location bar.

Can you reproduce the issue without typing in the location bar (e.g., by following hyperlinks or redirects)?
(Reporter)

Comment 12

9 years ago
(In reply to comment #11)
> Can you reproduce the issue without typing in the location bar (e.g., by
> following hyperlinks or redirects)?

The short answer is YES.  In fact, that is how I found the problem for I followed a link that was escaped.

If I enter a url such as .../%7Epath.... or follow such a link I get the same results.  The location bar display is .../~path... but the cookie path is /%7Epath.

If the page has a relative link, then the linked (second) page is  .../%7Epath/secondfile...  rather than .../~path/secondfile....   But if the first page has a base tag with href=".../~path..." then the second page is now .../~path/secondfile... and uses the other cookie.

With the confidence of great ignorance of internals, I suggest that the original url is stored as the "location" and then it is unescaped for display in the location bar.  The original location is used for the cookie path and for base for relative links.

I suggest that the original url should be unescaped BEFORE it is storied as the "location" and then all would be consistent.

Comment 13

9 years ago
> If the page has a relative link,

Oh, I meant by following hyperlinks from a URL that doesn't have this issue.  It's clear that once you confuse Firefox by typing this kind of URL into the location bar that Firefox can remain confused.  The question is can it get confused without typing in the location bar.
(Reporter)

Comment 14

9 years ago
(In reply to comment #13)
YES.

You can see this by using the example I gave in #10 above.  Go to both links and set different parameters so that you can tell which cookie is being used.  Then go back and use the link on that page "return to directory" (another page that uses the cookie data) to is see that the %7E form is retained as the "internal location" when you start with the second link even though it is displayed with the ~.

Comment 15

9 years ago
I see.  Yeah, this behavior is definitely wrong.  The problem seems to be that the cookie manager is using the unescaped URL instead of the canonicalized URL.  Thanks for being patient with me.
(Reporter)

Comment 16

9 years ago
(In reply to comment #15)
>  Thanks for being patient with me.

Not a problem.  Sometimes it is difficult to clearly explain things without knowing about the internals and having the correct vocabulary.  You have also made me think more about the overall issue.

I think that the problem is more pervasive than just the cookie manager using the unescaped URL -- with the exception of the location bar display EVERYTHING (that I can examine) uses the unescaped URL.  By everything I mean the cookie manager, the bookmark manager, the history manager and most importantly, the internal location string which is used for relative links from the page.

I have tried using relative links and the unescaped url is used.

I have tried a bookmark and it saves and shows the unescaped url.

I have tried getting the DOM location with JavaScript and it shows the unescaped url.  
Just use: <script type="text/javascript">alert(window.location.href)</script>

and, of course, the cookie manager uses the unescaped url.

============

Please made any changes in the title and classification necessary.  I'm not qualified to do that but certainly more than the 'networking cookies' is involved.

When this is corrected, it probably would be good to clean up the bookmark, history and cookies files.  The first two I think would be easy and safe just to edit and unescape.  The cookies could provide a conflict if there are both escaped and unescaped versions that are different.
Whiteboard: [necko-would-take]
Summary: Alternate URL encodings are not the same → Alternate URL encodings are not the same (normalization)
Duplicate of this bug: 665851

Comment 18

2 years ago
There is a circular "duplicate" mark between this bug and 665851.

Comment 19

2 years ago
Are there any plans for fixing this issue ?

I'm asking because it's *not* only an annoyance. It makes firefox incompatible to some websites, as soon as you need session cookies.
You need to log in before you can comment on or make changes to this bug.