Closed Bug 483304 Opened 16 years ago Closed 9 years ago

location.hash getter returns the hash value unescaped ("%7C" turns into "|")

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1093611

People

(Reporter: szabobela1987, Unassigned)

References

Details

(Keywords: html5, testcase)

Attachments

(4 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5

The encodeURIComponent doesn't encode the "|" character to "%7C", yet all the major browsers (Chrome, IE, Opera, Safari) does, this way makes cross-browser development harder.

Reproducible: Always

Steps to Reproduce:
1. Created a variable containing a hash (the "|" character's role in the hash is significant)
2. Used the mentioned function on a the variable.
3. Replaced the current hash with the variable
Actual Results:  
The current URL changed to this -> "blog.php#page:1|1"

Expected Results:  
The current URL should have changed to this -> "blog.php#page:1%7C1"
Assignee: nobody → general
Component: General → JavaScript Engine
OS: Windows 7 → All
Product: Firefox → Core
QA Contact: general → general
Hardware: x86 → All
Summary: encodeURIComponent function error → encodeURIComponent doesn't encode "|"
Works for me:
js1.9.0> encodeURIComponent("|")
%7C

js1.9.1> encodeURIComponent("|")
%7C

js1.9.2> encodeURIComponent("|")
%7C

Do you have a testcase that demonstrates the problem? Perhaps the difference in behavior is at some other level, or under different circumstances.
Need a testcase. Sounds like this involves URL processing by Necko or Gecko or even the Location Bar. It's not JS engine as Gavin shows.

/be
Assignee: general → nobody
Component: JavaScript Engine → General
QA Contact: general → general
Sorry, I just realized that the problem has a different nature: the function
works correctly but if I enter "%7C" to the URL bar it will convert it back to
"|".
We display unescaped values in the location bar on purpose - it's much more user friendly. That shouldn't have any impact on what the server sees, though. Are you suggesting it does?
Summary: encodeURIComponent doesn't encode "|" → "%7C" displayed unescaped ("|") in the location bar
No, on the server side it works correctly, my problem with it that I store my AJAX variables in the hash and separate them with the "|" character for example:
"blog.php#page:3|user:2|category:5"

This is only problematic when the variable itself contains the "|", in that case it should be stored as "%7C", so it doesn't cut the variable's content in half.

Actually this can be considered quite rare, but since I'm working on a CMS project I would like to know if this will change in the near future or I should remove the "|" characters from the variables if the user agent is Firefox?
The difference shouldn't be visible to page scripts either.

Can you provide example code that behaves differently in Firefox because of this difference?
I've uploaded an example script. I've tested it with Chrome, Firefox 3, IE 6/7/8beta, Opera 9 and Safari, all return the same result except Firefox.
Difference between RFC 2396 and RFC 3986 is relevant?
> http://www.faqs.org/rfcs/rfc2396.html
>   2.4.3. Excluded US-ASCII Characters
>   (snip)
>   unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
>   Data corresponding to excluded characters must be escaped in order to
>   be properly represented within a URI.
> http://www.faqs.org/rfcs/rfc3986.html
>   No special description on "|" in RFC 3986.

Bugs which are open && ( summary contains rfc && (2396 || 3986) ) :  
> Bug 275689 same-document references should work according to RFC 3986
> Bug 309671 Support %-escaped hostnames per RFC 3986 (3.2.2) / Cannot open IDN from other applications(e.g., from Thunderbird)
> Bug 394537 Update the encodeURI and the decodeURI JavaScript functions to reflect the new reserved characters in RFC 3986
Ah, this is because we unescape the value in GetHash explicitly:


There's a relevant discussion in bug 135309, starting at bug 135309 comment 30.
Status: UNCONFIRMED → NEW
Component: General → DOM
Ever confirmed: true
QA Contact: general → general
Summary: "%7C" displayed unescaped ("|") in the location bar → location.hash getter returns the hash value unescaped ("%7C" turns into "|")
(In reply to comment #9)
> Ah, this is because we unescape the value in GetHash explicitly:

http://hg.mozilla.org/mozilla-central/annotate/29a714ae9d11/dom/base/nsLocation.cpp#l399
Attached file testcase
This testcase passes in:
Opera 10.00 Alpha 6166
Safari Version 3.2.1 (5525.27.1)
IE 7

It fails in Firefox trunk/branch.
Attachment #367493 - Attachment is obsolete: true
Severity: minor → normal
Keywords: testcase
Yeah, this is just a duplicate of bug 135309, no?

Until necko stops escaping refs on url parse, we can't really change this.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
(In reply to comment #12)
> Yeah, this is just a duplicate of bug 135309, no?

Except that that bug is FIXED...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Status: REOPENED → NEW
That bug should have been marked "wontfix"; it got hijacked, basically.
This is also a problem with .hash on the <a> element. HTML5 doesn't prescribe unescaping here.
Keywords: html5
gsnedders says Google Wave expects the current behavior from Firefox (browser sniffing going on).
ok, 8 years later and we're still waiting for a fix for this. any estimates on when we can expect this to be fixed, guys?
Probably right after you fix it?  ;)

Seriously, just changing this behavior breaks other things; to change it one has to change how URI refs are handled throughout Gecko wholesale.  In particular, it involves the mess in bug 436006.  So once someone steps up and spends a few weeks of full-time work on it, it'll be fixed.  Given the amount of work involved and the low impact, it's not a high priority for me personally at the moment.
(In reply to Boris Zbarsky (:bz) from comment #21)
> Seriously, just changing this behavior breaks other things; to change it one
> has to change how URI refs are handled throughout Gecko wholesale.  In
> particular, it involves the mess in bug 436006.  So once someone steps up
> and spends a few weeks of full-time work on it, it'll be fixed.  Given the
> amount of work involved and the low impact, it's not a high priority for me
> personally at the moment.

Is it not possible to fix the public interfaces of the <a> DOM element and the window/document.location object to expose the Right Thing? That sounds like it should cover the web-exposed side of this, be much easier, more contained, and take much less work. Or would that patch not get taken because it's Not the Right Thing? :-)
> Is it not possible to fix the public interfaces of the <a> DOM element and the
> window/document.location object to expose the Right Thing

No, because the "right thing" requires changing our internal storage of the URI.  Right now we have to unescape because the URI is stored escaped.  And escaping is always lossy.

And if we change the internal URI storage, then you end up with a cascade of other things that need to be fixed, as I said in the comment you quoted.
What about
1. storing the URI without escaping,
2. escaping the URI on retrieving it via nsIURI or nsIURL, and
3. adding a new interface nsIUnescapedURL or something and use it from Location and others expecting unescaped representation
to minimize the impact?
I do not want to add yet another hackaround for a new iid. Actual serialization format change will be in the following patch.
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #650387 - Flags: review?(bzbarsky)
Attached patch Part 4: Implement nsIURI.hash (obsolete) — Splinter Review
Attachment #650391 - Flags: review?(bzbarsky)
Attachment #650392 - Flags: review?(bzbarsky)
I'm not going to get to these reviews at least until the week starting August 27, unfortunately.
But just so I'm clear on what's going on, is the .hash thing just working around the fact that <a name> stuff doesn't follow the spec right now?  Or are there other consumers you're worried about?
I'm planning to implement the URL standard.
http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html
nsURI.hash will be used from URL.hash.
Blocks: 781400
I guess the real question is why we're adding .hash instead of changing behavior of .ref.
For compatibility with existing consumers. Note that ref (mRef) is used as a part of nsIURI comparison (by Equals method).
Right, but the question is what those consumers are.  It's just a bit confusing to have two properties that basically do the same thing, and people will keep using the wrong one...  I can live with it if we have to, but if we can avoid it that would be ideal.

mRef can be kept as an implementation detail, of course; it just doesn't have to be exposed in the API.  Though comparing on escaped representations if the unescaped ones are supposed to be used can screw things up (e.g. not scrolling on anchor navigation even though two different anchors are pointed to), no?
<a name="è"> and <a name="%C3%A8"> are different per HTML5 spec while "#è" and "#%C3%A8" are supposed to be the same per IRI spec (RFC 3987). So we need to keep unescaped representation for DOM even if they are considered identical as URI/IRI. I added mHash for this purpose.
> while "#è" and "#%C3%A8" are supposed to be the same per IRI spec (RFC 3987)

Last I checked, this was one of the parts of RFC 3987 that doesn't quite match reality... In practice, we clearly can't canonicalize the former to the latter if they need to have different behavior!
IRI equivalence depends on the comparison level.
https://tools.ietf.org/html/rfc3987#section-5.3.1
https://tools.ietf.org/html/rfc3987#section-5.3.2.3
Some consumers may require IRI-to-URI mapping and some others may not. So the different properties will be required. No?
Hmm.  Maybe.  ccing some people who're actually up on all this network stuff.
This really needs another reviewer.  I don't know when I'll have a chance to dig into all this gunk, if ever, and the necko change should really get review from someon who's an active necko peer.
Attachment #650387 - Flags: review?(bzbarsky) → review?(benjamin)
Attachment #650388 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Attachment #650390 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Attachment #650391 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Attachment #650392 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Comment on attachment 650393 [details] [diff] [review]
Part 6: Use nsIURI.hash from DOM interfaces

r=me if the previous stuff is all OK, I guess.
Attachment #650393 - Flags: review?(bzbarsky) → review+
Attachment #650388 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 650391 [details] [diff] [review]
Part 4: Implement nsIURI.hash

OK, this attribute certainly needs more documentation on how it differs from .ref: http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIURI.idl#220

How does it differ from ref?
Attachment #650391 - Flags: review-
hmm so after reading some comments... .hash is the escaped version of .ref...?
(In reply to Christian :Biesinger (don't email me, ping me on IRC) from comment #46)
> hmm so after reading some comments... .hash is the escaped version of
> .ref...?
More precisely, .hash will not be escaped regardless of "network.standard-url.escape-utf8" state. If the input string is already escaped, it will be kept as-is.
I consider removing this pref because I doubt we can change the setting.
Comment on attachment 650390 [details] [diff] [review]
Part 3: Fix inconsistency between nsSimpleURI::SetSpec and nsSimpleURI::SetRef and add automated tests for non-ASCII hash

I'm hoping :biesi can take this--I've talked with him about it on IRC.  Really only :bz or he are equipped to take this on without some learning curve, and I don't have time for it.
Attachment #650390 - Flags: review?(jduell.mcbugs) → review?(cbiesinger)
Attachment #650391 - Flags: review?(jduell.mcbugs) → review?(cbiesinger)
Attachment #650392 - Flags: review?(jduell.mcbugs) → review?(cbiesinger)
Entering URL in the address bar of Iceweasel 10.0.7 on Debian unstable and using Web Console:

1. document.location.search

a. https://www.google.ca/search?q=test|  => Enter =>  https://www.google.ca/search?q=test|

document.location.search   =>   "?q=test|"


b. https://www.google.ca/search?q=test%7C  => Enter =>  https://www.google.ca/search?q=test|

document.location.search   =>   "?q=test%7C"


c. https://www.google.ca/search?q=testè  ==> Enter ==>  https://www.google.ca/search?q=testè  (cut-and-paste shows %C3%A8)

document.location.search   =>   "?q=test%C3%A8"

d. https://www.google.ca/search?q=test%C3%A8  ==> Enter ==> https://www.google.ca/search?q=testè  (cut-and-paste shows %C3%A8)

document.location.search   =>   "?q=test%C3%A8"

e. https://www.google.ca/search?q=test%3D  ==>  Enter  ==>  https://www.google.ca/search?q=test%3D

document.location.search   =>   "?q=test%3D"


Firefox's safe-percent-decoding-of-queries in the address bar does not degenerate when re-submitting queries (b) thanks to preserving "%3F" "?", "%3D" "=" and "%26" "&"  in user input (e).  Only javascripts will see difference in .search between submitting the original input in the address bar and re-submitting the address bar contents converted by Firefox.



2. document.location.hash

a. https://www.google.ca/search?q=test#%7C  ==> Enter ==>  https://www.google.ca/search?q=test|

document.location.hash   =>   "#|"

b. https://www.google.ca/search?q=test#|  ==> Enter ==>  https://www.google.ca/search?q=test|

document.location.hash   =>   "#|"

c. https://www.google.ca/search?q=test#è  ==> Enter ==>  https://www.google.ca/search?q=test#è (cut-and-paste shows "#%C3%A8")

document.location.hash   =>   "#è"

d. https://www.google.ca/search?q=test#%C3%A8  ==> Enter ==> https://www.google.ca/search?q=test#è (cut-and-paste shows "#%C3%A8")

document.location.hash   =>   "#è"

e. https://www.google.ca/search?q=test#%3D  ==> Enter ==>  https://www.google.ca/search?q=test#%3D

document.location.hash   =>   "#="


3.  Questions.

a. How does this disagree with other browsers?

b. Regardless of existing implementations, what disagrees with expectations?  Why would we expect other results?

c. How do standards such as W3C URL Living Draft disagree with the current behaviour?
Comment on attachment 650387 [details] [diff] [review]
Part 1: Add iid parameter to nsISeralizable::Read to determine a serialization format and add a type-safe utility function to nsIObjectInputStream.idl

I'm going to clear this request until biesi has looked at the other parts of this bug. I pinged him about it in IRC. Please rerequest review when appropriate and I'll do it quickly ;-)
Attachment #650387 - Flags: review?(benjamin)
So, re. question b:

I'm writing a web application with ember.js, which stores part of its state in an url fragment. It goes like this:

location.hash = '#/graphs/' + encodeURIComponent(JSON.stringify(state));

This makes for bookmarkable URLs. At some later time, ember may parse the URL into routes and parameters - here the route is /graphs/:graph_json, and ember passes encoded data into my de-serialization function for deserialization.

That works great in Chrome. Due to this bug, it *fails horribly* in Firefox, because there's every chance that the data I'm serializing contains slashes.

I'm working around the problem by browser sniffing and double-escaping my URL fragments, which sort-of-but-not-quite works. Sometimes they end up double-unescaped, sometimes someone manages to get a double-escaped version in a chat message and send it to a Chrome user, where it'll fail, often the double-escaping means the URL is too long for the web server.

Basically, when I assign some properly escaped value to location.hash, I want to get the same value back later.

Please fix.
Keywords: dev-doc-needed
Blocks: 887364
No mater what you guys thing about the correctness of encoding the fragment part of the URL, this behavior causes the `hash` to be changed after page reload, which is ridiculous and is definitely a BUG that requires a fix.

I have seen this discussion started like 7 years ago, right? So I think it's about time to fix it, stop looking for excuses and move on.
In case anyone had still some doubts, please consider the following example:

    window.loaction.hash = "#%3F" // or "%3F", whatever

The url becomes "/#%3F" which is the expected behavior. However,

    window.location.hash === "#?"

which I could understood if it wasn't for the fact that after I hit `Ctrl-R`, the url becomes "/#?". WHAT? Really? Can anyone explain to me what is the logic behind this, and what are the reasons for not considering it as a BUG?
This above behaviour in Firefox is wrong per URL spec (and not only this):
http://url.spec.whatwg.org/#fragment-state

Other borwser do this correct (test Chrome37 and IE11), so right now (and for a long time) Firefox for developers is problematic. 

set: 
window.loaction.hash = "#%3F"
get: 
window.loaction.hash

Result:
Firefox: "#?"
Chrome and IE and Opera (Presto): "#%3F"
Hello. I also got some troubles with hash. See that: https://github.com/putaindecode/fix-all-the-things/issues/11
Assignee: VYV03354 → nobody
Status: ASSIGNED → NEW
The unescaping is the real issue with the current implementation:

window.location.hash = '%3F?';
console.log(window.location.hash);
> #??

This way I have to use the window.location.href to get back the original value, since the getter cannot return the same string I set with the setter.
The testcase passes in current nightly versions.

I think this bug was fixed by bug 1093611.
Sure.
If I copied a url with "#%7C" from the location bar or press Enter on the location bar, it will turn into "#|", but it is a different (Firefox) bug.
Status: NEW → RESOLVED
Closed: 16 years ago9 years ago
Resolution: --- → FIXED
Attachment #650390 - Attachment is obsolete: true
Attachment #650390 - Flags: review?(cbiesinger)
Attachment #650391 - Attachment is obsolete: true
Attachment #650391 - Flags: review?(cbiesinger)
Attachment #650392 - Attachment is obsolete: true
Attachment #650392 - Flags: review?(cbiesinger)
The bug seems to be wrongly marked as "RESOLVED FIX", the testcase still fails in Firefox 38.0.5 as well es in Nightly 41.0a1 (2015-06-08).
(In reply to Malte Wedel from comment #60)
> The bug seems to be wrongly marked as "RESOLVED FIX", the testcase still
> fails in Firefox 38.0.5 as well es in Nightly 41.0a1 (2015-06-08).

This was fixed by bug 1093611, which then got backed out because too much internet broke. I'm not sure it makes sense to keep both bugs open here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: → 1093611
guys - could this please be fixed at some point soon? "because too much internet broke" as an explanation simply isn't good enough - Firefox is just one out of many browsers and by far not the one with the biggest market share. Some poorly written code that only works in Firefox should not stop progress.
Workaround (I think it is worth repeating idea from comment #57, which is commonly known and used, but wasn't clearly stated here):

For now consider `location.hash` write-only for cross-browser implementations. *Never ever READ its value*. So DO NOT  USE constructs like `document.location.hash.slice(1)`.

Instead, for retrieval get it from `document.location.href` like with `document.location.href.split('#')[1]`.
a more fail-safe version would be:

var location_hash  = location.href.split("#");
location_hash.shift();
location_hash = "#"+location_hash.join("#"); // equals location.hash as implemented in other browsers

the #-character could be in location.hash more than once. But yes, Michal's version should be sufficient for most cases.
Is this bug still happening? I can't seem to reproduce it in Fx 41.
current stable is 39 - can be reproduced in that version.
As :gijs wrote in #61 this was fixed by the resolution of bug 1093611 which was included in v41 - though as he states the fix was then removed again(?)
Ah yeah, thanks, I didn't read this comment properly.

Looks like Bug 1093611 was relanded! (Note it was originally backed out because it produced incorrect results -- not because "the internet broke" ;) ).
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → DUPLICATE
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: