firefox incorrectly decodeURIComponent's window.location.hash

RESOLVED DUPLICATE of bug 483304

Status

()

Core
DOM: Core & HTML
RESOLVED DUPLICATE of bug 483304
11 years ago
9 years ago

People

(Reporter: Piers Haken, Unassigned)

Tracking

Trunk
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

11 years ago
User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; InfoPath.2)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

the test URL contains the equivalent of:

    "a=" + encodeURIComponent ("b=c")



Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Actual Results:  
firefox sets windows.location.hash to 'a=b=c'

Expected Results:  
it should set it to the passed value: 'a=b%3dc'
IE & Safari both act as expected.

this has serious implications for the ability to pass parameterized information in the URL hash (used for SEO and ajax history)
(In reply to comment #0)
> Actual Results:  
> firefox sets windows.location.hash to 'a=b=c'
> Expected Results:  
> it should set it to the passed value: 'a=b%3dc'
> IE & Safari both act as expected.

RFC 2396 "2.4.1. Escaped Encoding" defines "An escaped octet" as follows.
> An escaped octet is encoded as a character triplet, consisting of the
> percent character "%" followed by the two hexadecimal digits
> representing the octet code. For example, "%20" is the escaped
> encoding for the US-ASCII space character.
>     escaped     = "%" hex hex
>     hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
>                           "a" | "b" | "c" | "d" | "e" | "f"
And RFC 2396 "2.4.2. When to Escape and Unescape" clearly says;
> Because the percent "%" character always has the reserved purpose of
> being the escape indicator, it must be escaped as "%25" in order to
> be used as data within a URI.
( http://www.faqs.org/rfcs/rfc2396.html )

Sounds INVALID...

How the URI string is passed to Firefox? <a href> in HTML? Or by JavaScript generated DOM element? Or by window.location.hash='...' etc.?
Unescape should be inhibited when your case? (Which RFC about it?)
Or "too many or needless unescape" is involved when Firefox?
Do you know IE7's action when your case?

(Reporter)

Comment 2

11 years ago
In this case, the hash is passed as part of the URL in the address bar, although this incorrect behaviour is also exhibited when the URL is given as an <a href> attribute, or in window.open(), etc...

There's no RFC that specificially defines the mapping, but http://www.faqs.org/rfcs/rfc1738.html and http://www.faqs.org/rfcs/rfc1866.html (sec 7.4) defines Fragment Identifiers, but does NOT state that they should be encoded/decoded.

also, http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/ref_h-l.htm states that "The hash property specifies a portion of the URL". NOTE: it DOES NOT say "a decoded portion of the URL".

As I said in the original report: IE6, IE7 & Safari DO NOT decode the fragment.

even if that doesn't convince you it's broken, consider this: why is 'location.hash' decoded, but 'location.search' isn't?
(In reply to comment #2)
> even if that doesn't convince you it's broken, consider this: why is
> 'location.hash' decoded, but 'location.search' isn't?

I still think Web designer's fault - Web designer should escape "%" as "%25" when he/she writes URI in HTML.
About difference between location.hash and location.search:
This is possibly a result of quirks in search string for non-ascii characters.
And behavior may be affected by network.standard-url.encode-utf8 and/or network.standard-url.escape-utf8 or some others, I'm not sure though.

Comment 4

11 years ago
This is very inconvenient in the following scenario (in addition to being inconsistent with the behaviors of IE, Opera and Safari):
Imagine a history management system wants to set the hash to "a={0}" where {0} will be replaced by a string that is specified by the application that consumes the system (so the system doesn't know this in advance), encoded using encodeURIComponent.
If the string that is being passed in is for example "b=c", on all browsers the hash will be set to "a=b%3dc" and on all browsers but Firefox, reading window.hash after that will return "a=b%3dc". On Firefox you'll get "a=b=c". If the system doesn't want to do browser detection (and hope this never gets fixed in Firefox!), it will have to always do decodeURIComponent.
Where this unfortunately breaks is if the string is "b%3dc" as on all browsers it will be encoded as "b%253dc".
Then, on all browsers but Firefox, you'll decode that and get "b%3dc", which is the expected result.
On Firefox, you'll get "b%3dc"from window.hash, which you'll then decode into "b=c", which is not the string that was set.

In other words, save for browser detection, there is no way of figuring out if decoding should be performed or not on the hash, which leads to some serious bugs in applications.

Please consider making this consistent with other browser vendors.
(Reporter)

Comment 5

10 years ago
here's the code from rsh (google's really simple history):

NOTE: this function essentially emulates the correct behaviour of window.location.hash. this bug is the only reason this function exists - it has effectively rendered window.location.hash unusable: any application that uses it is broken on firefox.


/*Public: Manually parse the current url for a hash; tip of the hat to YUI*/
    getCurrentHash: function() {
		var r = window.location.href;
		var i = r.indexOf("#");
		return (i >= 0
			? r.substr(i+1)
			: ""
		);
    },

Comment 6

10 years ago
If I could have my say: 
I don't agree on this being a bug but neither agree on this being a web designer's fault.

Let me explain, first of all the latest standard for URIs is RFC 3986 
(which obsoletes old ones) so we'd better rely on this RFC.

<<<[RFC 3986 => 2.4. When to Encode or Decode]
   Under normal circumstances, the only time when octets within a URI
   are percent-encoded is during the process of producing the URI from
   its component parts.  This is when an implementation determines which
   of the reserved characters are to be used as subcomponent delimiters
   and which can be safely used as data.  Once produced, a URI is always
   in its percent-encoded form.

   When a URI is dereferenced, the components and subcomponents
   significant to the scheme-specific dereferencing process (if any)
   must be parsed and separated before the percent-encoded octets within
   those components can be safely decoded, as otherwise the data may be
   mistaken for component delimiters.
<<<

1 => Every URI (stated as is) should always be percent-encoded unless when dereferencing it (e.g. parsing it into chunks).

<<<[RFC 3986 => 3.5. Fragment]
  fragment    = *( pchar / "/" / "?" )

   The semantics of a fragment identifier are defined by the set of
   representations that might result from a retrieval action on the
   primary resource.  The fragment's format and resolution is therefore
   dependent on the media type [RFC2046] of a potentially retrieved
   representation, even though such a retrieval is only performed if the
   URI is dereferenced.  If no such representation exists, then the
   semantics of the fragment are considered unknown and are effectively
   unconstrained.
<<<

2 => The fragment semantics depends on the text/html type.


<<<[HTML 4.01 => 2.1.2 Fragment identifiers]
Some URIs refer to a location within a resource. This kind of URI ends with "#" followed by an anchor identifier (called the fragment identifier).
<<<[HTML 4.01 => 12.1.1 Visiting a linked resource]
The destination anchor of a link may be an element within an HTML document. The destination anchor must be given an anchor name and any URI addressing this anchor must include the name as its fragment identifier.
Destination anchors in HTML documents may be specified either by the A element (naming it with the name attribute), or by any other element (naming with the id attribute).
<<<[HTML 4.01 => 12.2 The A element]
  name        CDATA          #IMPLIED  -- named link end --
<<<[HTML 4.01 => 6.2 SGML basic types]
CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:
    * Replace character entities with characters,
    * Ignore line feeds,
    * Replace each carriage return or tab with a single space.
<<<

3 => In HTML a fragment is a simple sequence of characters.


Every browser is free to implement javascript/JScript/ECMAScript as they want,
usually they try to be in sync with ECMA-262 and DOM reference specification 
but aren't obliged to. Moreover DOM specification doesn't define the 
window object. As far as I know there is an ongoing work to define the window object and its properties but it's still unclear about the hash property.
(see http://www.w3.org/TR/Window/).

So the only usable reference here is the Mozilla JavaScript specification:
<<<[http://developer.mozilla.org/en/docs/DOM:window.location]
hash : the part of the URL that follows the # symbol, including the # symbol.
href : the entire URL.
search : the part of the URL that follows the ? symbol, including the ? symbol.
<<<

4 => hash and search are stated as URI parts then we may assume original URI 
is already dereferenced and so already percent-decoded.
- search can't be percent-decoded, it is only possible after arguments are separated.
- hash may be percent-decoded because in HTML it's a simple string.


CONCLUSION:

Firefox's behavior is correct but IE's one is also correct. It's only a matter of choice.

So instead of using window.location.hash web designers should use the href part 
to read the fragment identifier for cross-browser compatibility. This is only a hack more to know about.


RELATED BUG:

I noticed a related issue concerning the address bar, it does percent-decoding
for display purposes but not for retrieval.
- enter http://www.google.f%72 in the address bar
- you will receive a failure:
'Firefox can't find the server at www.google.f%72.'
- while in the address bar it will show http://www.google.fr

Enjoy and sorry for this long post...

Comment 7

10 years ago
Hi,

While it's always interesting to know the context (thank you for providing that), I still think it would be in everyone's best interest that this becomes consistent with the behavior on other browsers (IE is not the only one, Safari does the same thing). location.hash is currently broken because of this bug and pretty much too unsafe to use. The resolution of this bug shouldn't be to emulate the feature by rewriting it in JavaScript.

But yes, manually extracting the hash part will work in the meantime (not that we have a choice :) ).

Thanks,
Bertrand

Comment 8

10 years ago
I do agree that I would be a burden less for JavaScript developers to have an uniform way of accessing the hash part (and I am one them :).

(But actually there are already so much issues about cross-JavaScript development that I wouldn't mind neither to add this hack to my list of known JavaScript issues neither ...)

As I said it's only a matter of choice so only Firefox developers are responsible to make this choice because both solutions on hash property are perfectly valid.

Comment 9

10 years ago
Hi, again... I just figured out that the Window Object specification is no longer maintained due to HTML5.0 specification that should encompass the window object.
So I read the HTML5.0 which is still beta but apparently gives reason to this bug report.
Please see http://www.w3.org/html/wg/html5 
=> 2.3.5 Interfaces for URL manipulation

I understand it as each time you get the hash it will be the percent-encoded version (the fragment part of the URL) and each time you'll set it will be percent-encoded for chars that are considered invalid in the fragment context.

Thus if you set it to a=b%3dc it shouldn't change anything because it's a valid RFC3986 fragment. And each time you'll get it would return #a=b%3dc

So If there aren't any change in the HTML5.0 before its release then Firefox behavior would be invalid.





(In reply to comment #9)
> So If there aren't any change in the HTML5.0 before its release then Firefox
> behavior would be invalid.

"2.3.1 Terminology" in HTML5 spec says;
> 2.3.1 Terminology
> A URL is a string used to identify a resource. A URL is always associated with
> a Document, either explicitly when the URL is created or defined; or through
> a DOM node, in which case the associated Document is the node's Document; or
> through a script, in which case the associated Document is
> the script's script document context. 
And link of "script document context" says;
> 5.4.1 Script execution contexts
>(snip)
> Every script whose script execution context is a Window object
> is also associated with a Document object, known as its script document context.

"2.3.5" of HTML5 spec refers to "Document" only(SRC attributes etc. in HTML). HTML5 doesn't refer to window.location object and it's attributes which was defined by so called DOM-0(who doesn't have clear spec nor clear rules on it).
So, it still can't be said "Firefox's behavior is invalid".

But I think following change of window.location.hash will be better, in order to avoid clams/confusions of "HTML5 violation!" once HTML5 will be released.
 - similar to document.xxx.src.flagment(if implemented, I guess this attribute)
 - same as other browsers
 - same as window.location.search(original/not-modified version==escaped format)
If reason why current "unescaped version when window.location.hash" is next,
  - failure in "branch to specified(escaped format) flagment"
    upon load/reload, or window.location.hash change by script,
and if some logics expect "unescaped version in window.location.hash", it also should be modified.
I've found very old Bug 135309 for same problem, which is marked as FIXED.
 - According to Bug 135309 Comment #22, unescaping was done by Bug 124042.
 - Status of Bug 135309 is FIXED, but Bug 135309 Comment #27 says as follows. 
   > Marking fixed, though perhaps "wontfix" is more appropriate
 - Discussion looks to be continued in the bug, even after FIXED on 2004-05-20.
FYI.
Bug 135309 Comment #16 says ; 
> the jumping code doesn't use location.hash.
So no problem will arise in jumping by Fx, even if window.location.hash will be changed to original(not-unescaped) string in URI.

Comment 13

10 years ago
(In reply to comment #10)
> "2.3.5" of HTML5 spec refers to "Document" only(SRC attributes etc. in HTML).
> HTML5 doesn't refer to window.location object and it's attributes which was
> defined by so called DOM-0(who doesn't have clear spec nor clear rules on it).
> So, it still can't be said "Firefox's behavior is invalid".

For sure HTML5.0 is not the reference document but I mentioned as it gives a good clue of how an URL should be decoded and what will be expected from a window/location/hash for tomorrow's web script engines.
I totally agree on the fact that Firefox behavior is correct as there is no reference document for the currently implemented "DOM-0"...

(In reply to comment #12)
> So no problem will arise in jumping by Fx, even if window.location.hash will be changed to original(not-unescaped) string in URI.
> 
It sounds good to me. If Fx's jumping code is not affected by this change then by making the change we would only break some scripts relying on the unescaped hash. And I can hardly think of scripts relying on this knowing that it won't work reliably on other browsers (in the worse case they'll will have to use browser sniffing to fix it).






Changing to Core/DOM:Level 0(same one as Bug 135309)
Component: General → DOM: Level 0
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → Trunk
Setting Dependency to Bug 135309 instead of DUPing-to/Re-open Bug 135309, because I don't know what action is appropriate.
Depends on: 135309
Status: UNCONFIRMED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 483304
You need to log in before you can comment on or make changes to this bug.