Closed Bug 1093611 Opened 10 years ago Closed 9 years ago

AnchorElement.hash should be the encoded version of the href attribute's fragment

Categories

(Core :: Networking, defect)

33 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox38 --- disabled
firefox39 --- disabled
firefox40 --- disabled
firefox41 --- fixed

People

(Reporter: raymond, Assigned: valentin)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete, site-compat, testcase)

Attachments

(7 files, 4 obsolete files)

7.67 KB, patch
smaug
: review+
Details | Diff | Splinter Review
10.50 KB, patch
mayhemer
: review+
Details | Diff | Splinter Review
4.89 KB, patch
mayhemer
: review+
smaug
: review+
Details | Diff | Splinter Review
888 bytes, text/html
Details
1.89 KB, patch
smaug
: review+
Details | Diff | Splinter Review
7.59 KB, patch
smaug
: review+
Details | Diff | Splinter Review
2.13 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36

Steps to reproduce:

Clicking on an anchor link with an encoded hash e.g. #/search/%23important will cause everything from the encoded hash onward to be removed from the URI.

Clicking on #/search/%23important puts #/search/ in the URL bar.

Pasting #/search/%23important into the URL bar works.


Actual results:

When clicking on an anchor such as #/search/%23important, %23important is removed from the URL bar instead of remaining as a proper url.


Expected results:

#/search/%23important should have been the new location.

Correct behavior is seen in all other browsers, only Firefox is affected and this only started happening recently.
Can't reproduce with a simple testcase:

http://jsbin.com/ruqirijezo

Can you elaborate as to when/how/where you're seeing this, and in the best case, provide a testcase that fails for you?
Flags: needinfo?(raymond)
Go to https://www.wunderlist.com signup or login add a task in your inbox with a tag, e.g. "#bug tags are broken on Firefox"

Clicking on the link doesn't update the URL bar correctly.  The link is a simple href with the tag (#bug) encoded as part of the URI component.

Works as expected in other browsers.
Flags: needinfo?(raymond)
(In reply to Raymond from comment #2)
> Go to https://www.wunderlist.com signup or login add a task in your inbox
> with a tag, e.g. "#bug tags are broken on Firefox"
>
> Clicking on the link doesn't update the URL bar correctly.  The link is a
> simple href with the tag (#bug) encoded as part of the URI component.

I can reproduce this. Note to self: by tag you mean any word preceded by a hash, not an actual tag field or something like that...

However, I violently disagree with your "it's a simple href". Otherwise, why would my testcase work and wunderlist break?

The issue is that wunderlist adds a gazillion click handlers all over the place, all of which go through some kind of proxy function that's the same everywhere, so it's hard to see what's going on. However, I will bet money that it's not just "a simple href" at play here.

It looks like instead this depends on them accessing link.hash, which looks like it's unescaped and breaks stuff later on.
Yes, it's going through a proxy and looking at element.hash which is unescaped for some reason now.

I created a jsbin for looking at element.hash and firefox is the only one that returns false;

http://jsbin.com/gevuloyeri/edit?html,js,console
I can work around it, with an if Firefox re-encode it block, but I shouldn't need to write FF specific hacks like this.
Our document.querySelector('a').hash has been returning the unescaped variant since at least 29, though, so that's not regressed...
So this is because eventually, wunderlist hits history.navigate, which calls getFragment, which does a:

e.replace(/^[#\/]|\s+$/g, "")

which obviously ends up replacing the initial hash.

It then does another:

if (e = e.replace(O, ""), this.fragment !== e) {

where O = /#.*$/

which ends up replacing the last part and causes the bug here.

I don't know why that second replace is even necessary, but that's up to wunderlist (should I just be saying 'you' here?).

Now, I'm happy to accept that we should be returning the escaped hash here... but that hasn't changed recently. Obviously I have no insight in what changed on wunderlist's side...
(In reply to Raymond from comment #5)
> I can work around it, with an if Firefox re-encode it block, but I shouldn't
> need to write FF specific hacks like this.

You shouldn't need to, no, but if you do chose to workaround in some way, please be aware that we might actually fix the browser bug, and double-escaping will make everyone sad too. :-)
Relevant IRC conversation:

<annevk> Gijs: per https://url.spec.whatwg.org/#fragment-state it should be encoded
<annevk> Gijs: browsers are a disaster in terms of interop though
<annevk> :/
<Gijs> annevk: heh, interesting. It seems that at least for anchorelement.hash, we decode, and chrome/ie do not
<annevk> Gijs: you might want to file bugs
<annevk> I suspect changing this will be extremely hard

New testcase:

http://jsbin.com/tanesazape

link goes red when broken, green when OK, hash is echoed afterwards for inspection (should have %20, doesn't in Firefox, does in Chrome and IE11).
Blocks: url
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking
Ever confirmed: true
OS: Mac OS X → All
Product: Firefox → Core
Hardware: x86 → All
Summary: Firefox no longer properly handles hash links with encoded hashes → AnchorElement.hash should be the encoded version of the href attribute's fragment
We changed how we deal with internal app links recently, which due to a difference in FF element.hash, resulted in a regression in Wunderlist.
It seems we actively decode the %23 for Link elements, which causes this behaviour.
The fix is fairly easy. If try is green I'll post the patch and we could fix this rather quickly.
Assignee: nobody → valentin.gosu
Do we know what kind of "browsers are a disaster in terms of interop though" we're dealing with here.
What does IE do? What about Chrome? Do they both follow the spec?
(In reply to Olli Pettay [:smaug] from comment #15)
> Do we know what kind of "browsers are a disaster in terms of interop though"
> we're dealing with here.
> What does IE do? What about Chrome? Do they both follow the spec?

Latest versions of both do both follow the spec for element.hash. I've not looked at these patches but I know that we've had issues before with location.hash, and the story there might be different/harder, if that is also affected (see e.g. bug 483304). If at all possible, I think we should tread carefully and only change behaviour for element.hash.
Hm, except location.hash apparently is expected to also be encoded, and if the comments on that bug are to be believed, it also is so in IE11/Chrome.

How many workarounds/hacks we end up breaking if we make our behaviour there spec-conformant, no idea - probably decidedly more than for element.hash, though. Still, considering that that is the spec, presumably it's worth fixing at some point, and I don't know that waiting will make the situation any better.
Regarding this issue (that .hash is always percent encoded) yes, Chrome and IE seem to follow the spec ( I tested with http://intertwingly.net/projects/pegurl/liveview.html )
If we're worried that this change would break existing websites (some of our tests were also relying on this assumption) maybe we could hide this changes behind a pref? Just in case it breaks stuff once it hits the release channel (as it happened with bug 237623).
Comment on attachment 8521124 [details] [diff] [review]
AnchorElement.hash should be the encoded version of the href attribute's fragment

So how is network.standard-url.escape-utf8 pref related to this?
Could we perhaps remove that pref?

Have you tested this all with some more unusual urls?
I guess window.history.pushState(1, document.title, '#q=♥'); does that.
Do other browsers have the same behavior there? Please test also url.hash and link.hash.
> If at all possible, I think we should
> tread carefully and only change behaviour for element.hash.

This seems like a bad idea. We want these to work the same. Making it harder for web developers is non-goal.

It looks like both network.standard-url prefs can be removed. Should I file a bug?
(In reply to Anne (:annevk) from comment #21)
> > If at all possible, I think we should
> > tread carefully and only change behaviour for element.hash.
> 
> This seems like a bad idea. We want these to work the same. Making it harder
> for web developers is non-goal.
Agree. We want consistency here.


> It looks like both network.standard-url prefs can be removed. Should I file
> a bug?
Please :)
Comment on attachment 8521124 [details] [diff] [review]
AnchorElement.hash should be the encoded version of the href attribute's fragment

So please use a pref for now for this stuff, so that we can easily switch back to the old behavior.
Attachment #8521124 - Flags: review?(bugs) → review-
Comment on attachment 8521125 [details] [diff] [review]
Add/modify tests making sure Link,URL,nsLocation::GetHash don't unescape characters

>+  window.history.pushState(1, document.title, '#q=â¥');
Please add this kind # tests for all the relevant APIs.
Attachment #8521125 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #24) 
> So please use a pref for now for this stuff, so that we can easily switch
> back to the old behavior.
And probably use a AddBoolVarCache just in one place and add a simpler getter to get the current
pref value.
I added tests for all of url/link/location, and moved them to the same test file.
Attachment #8524668 - Flags: review?(bugs)
Attachment #8521125 - Attachment is obsolete: true
Attachment #8521124 - Attachment is obsolete: true
Attachment #8524668 - Flags: review?(bugs) → review+
Comment on attachment 8524669 [details] [diff] [review]
AnchorElement.hash should be the encoded version of the href attribute's fragment

(Somehow we need to merge all the url handling of link/location/url.
 Perhaps they all could inherit some templated class URLUtils, which
 implements all the interesting stuff, but knows how to get the nsIURI from the
 inheriting class)

( I wonder why dom/base/nsLocation.cpp does so much more ... looks like because of bug 135309. )

Have you tested this doesn't break page internal anchors? It shouldn't, but better to make sure.
Attachment #8524669 - Flags: review?(bugs) → review+
Valentin, Olli, what behavior does this actually produce for Location?  See the discussion in bug 483304?
The testcase in bug 483304 is a PASS, so the | in the hash will be escaped to %7C
It seems these patches broke the http://graphs.mozilla.org/graph.html#tests=[[251,131,25]] format we use on talos.
Not only that, but it seems that there are more websites than expected that insert JSON into the hash part of the URL.

After talking to Anne, it seems that Chrome and IE don't do any percent encoding/decoding on the hash.
Safari is spec compliant, and always percent encodes the hash.
Firefox, at the moment, is kind of in between. If you call url.href you see the hash is encoded, but url.hash does percent decoding on the hash. That's a bit inconsistent.

The best course of action is probably to change the spec to IE/Chrome's behaviour, since we don't want to break scripts that rely on this assumption (including some of our internal code)

This issue is tracked at: https://www.w3.org/Bugs/Public/show_bug.cgi?id=27252
I'm waiting for the spec change before I post the updated patch (with no encoding/decoding of url.hash)
The specification change is delayed due to the toolchain that generates the specification being broken :-( If you want you can study the change I wanted to make to resolve this here: https://github.com/whatwg/url/commit/3c792a2a951751f78fcd878c86636dffc3211ba0
Comment on attachment 8537130 [details] [diff] [review]
nsStandardURL::SetRef only does percent-encoding if dom.url.encode_decode_hash is true

The patch disables percent encoding of the ref/hash, conditioned on the pref (which was renamed to dom.url.encode_decode_hash)
Attachment #8537130 - Flags: review?(honzab.moz)
Attachment #8524668 - Attachment is obsolete: true
Comment on attachment 8537138 [details] [diff] [review]
Add/modify tests making sure Link,URL,nsLocation::GetHash don't unescape characters

Updated the tests for this feature, so the input always matches the output for the hash.
Attachment #8537138 - Flags: review?(honzab.moz)
Attachment #8537138 - Flags: review?(bugs)
Attachment #8537138 - Flags: review?(bugs) → review+
Comment on attachment 8537130 [details] [diff] [review]
nsStandardURL::SetRef only does percent-encoding if dom.url.encode_decode_hash is true

Review of attachment 8537130 [details] [diff] [review]:
-----------------------------------------------------------------

Should you also update http://hg.mozilla.org/mozilla-central/annotate/b3f84cf78dc2/netwerk/base/src/nsStandardURL.cpp#l676 ?

r+ with that answered.
Attachment #8537130 - Flags: review?(honzab.moz) → review+
Comment on attachment 8537138 [details] [diff] [review]
Add/modify tests making sure Link,URL,nsLocation::GetHash don't unescape characters

Review of attachment 8537138 [details] [diff] [review]:
-----------------------------------------------------------------

test_hash_encoded.html should also include some json in it (more than just a single forms).
Attachment #8537138 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #43)
> Comment on attachment 8537130 [details] [diff] [review]
> nsStandardURL::SetRef only does percent-encoding if
> dom.url.encode_decode_hash is true
> 
> Review of attachment 8537130 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Should you also update
> http://hg.mozilla.org/mozilla-central/annotate/b3f84cf78dc2/netwerk/base/src/
> nsStandardURL.cpp#l676 ?
> 
> r+ with that answered.

>>  if (mRef.mLen >= 0) {
>>      buf[i++] = '#';
>>      i = AppendSegmentToBuf(buf, i, spec, mRef, &encRef, useEncRef);
>>  }

No, I don't think that's necessary. useEncRef will be false, so AppendSegmentToBuf will append spec+mRef.mPos and encRef will stay unused.
Release Note Request (optional, but appreciated)
[Why is this notable]:
This change makes us spec compatible and more consistent overall.
Whereas before url.hash would return a percent decoded string, and url.href contained the percent encoded hash, now url.hash (together with link.hash and location.hash) contain the exact string they are set to.
[Suggested wording]:
The URL parser has been changed in order to avoid doing percent encoding when setting the Fragment part of the URL, and percent decoding when getting the Fragment. This is in line with the URL spec.
[Links (documentation, blog post, etc)]:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27252 (spec change)
https://url.spec.whatwg.org/#fragment-state (spec)
relnote-firefox: --- → ?
Keywords: dev-doc-needed
Depends on: 1122436
Quick question so that I'm sure I understand well before updating the docs. For a Web dev, this change (avoid doing percent encoding) the behavior of hash() on HTMLAnchorElement, HTMLAreaElement and on Location (and its worker counterpart WorkerLocation). Is this correct?

(Thank you in advance)
Flags: needinfo?(valentin.gosu)
Blocks: 1122948
Yes. That is correct. Starting with this changeset the URL fragment part (url.hash) will contain the exact string it is set to, for HTMLAnchorElement, HTMLAreaElement and on Location.

Thanks for pointing out WorkerLocation. That one still displays the old behaviour. I'm writing a patch for that one too. Filed Bug 1122948.
Flags: needinfo?(valentin.gosu)
Valentin, since Jean-Yves didn't list it, is new URL() covered?
Yes, it is.
So, the changes were made to URL.cpp, Link.cpp, nsLocation.cpp. Both of HTMLAnchorElement and HTMLAreaElement's implementations of Get/SetHash call into Link.cpp. (GetHash was a problem in these files, because it did percent decoding)
Second patch touches nsStandardURL.cpp, so it doesn't do percent decoding during parsing or in SetRef().
Updated https://developer.mozilla.org/en-US/docs/Web/API/URLUtils.hash (URL.hash, HTMLAnchorElement.hash, HTMLAreaElement.hash, Location.hash redirects to it)
and to
https://developer.mozilla.org/en-US/Firefox/Releases/38

I'll update WorkerLocation via the other bug.
Have added this to Aurora/DevEd 38 release notes as:


URL parser avoids doing percent encoding when setting the Fragment part of the URL, and percent decoding when getting the Fragment in line with the URL spec
Attached file testcase
Google Translate is currently broken in Firefox 36 (and 37b1) on some URLs (e.g. https://translate.google.com/?sl=auto&tl=de#en/de/Test%E3%80%82), a reduced testcase is attached.

The interesting thing is: this was fixed in Firefox Nightly by this bug (or at least in the range https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3d846527576f&tochange=63006936ab99 and I think this is the only change that seems logical).

The question is therefore: Any chance to get the patch into Firefox 37?

Also the attached testcase might be useful as regression test.
Hi Andreas,
I think it's best if this change went through the usual release process.
6 more weeks don't seem like that much of a burden, and the extra test time we get is quite valuable.
Depends on: 1129855
Depends on: 1149913
I don't see anchor.href encoded in all cases; in fact, sometimes even href gets unescaped. Try:

var a = document.createElement('a');
a.href = 'http://example.com/#zażółćGęśląJaźń'
console.log(a.href);
console.log(a.hash);

Firefox 37.0.1 logs:
http://example.com/#za%C5%BC%C3%B3%C5%82%C4%87G%C4%99%C5%9Bl%C4%85Ja%C5%BA%C5%84
#zażółćGęśląJaźń

Firefox Nightly (40.0a1 (2015-04-07)) logs:
http://example.com/#zażółćGęśląJaźń
#zażółćGęśląJaźń
This change hasn't hit the release channel yet, so that's why it's not in Firefox 37.
The bug's title is a bit misleading, as in the bug we decided not to encode/decode the hash at all, to be compatible with other browsers.
In any case, we are currently considering reverting the patch in bug 1149913, because of a few regressions.
Backed out by bug 1149913.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: → 483304
Valentin, what's the current state?
Flags: needinfo?(valentin.gosu)
This fell off my radar. I'll post a patch today.
Switching back to dev-doc-needed so that we are notified when this is resolved.
Attachment #8617773 - Flags: review?(bugs) → review+
Comment on attachment 8617774 [details] [diff] [review]
Stage 2 - Make hash getters not do percent decoding if dom.url.getters_decode_hash is false

>    * Returns true if URL setters should percent encode the Hash/Ref segment
>    * and getters should return the percent decoded value of the segment
>    */
>   static bool EncodeDecodeURLHash()
>   {
>     return sEncodeDecodeURLHash;
>   }
> 
>+  /*
>+   * Returns true if URL getters should percent decode the value of the segment
>+   */
>+  static bool GettersDecodeURLHash()
>+  {
>+    return sGettersDecodeURLHash;
>+  }
I don't know understand the setup.
EncodeDecodeURLHash() says      "Returns true if URL... getters should return the percent decoded value of the segment "
and the new GettersDecodeURLHash() "Returns true if URL getters should percent decode the value of the segment"
So what exactly? How are those methods related?

And looks like every time you use GettersDecodeURLHash you also use EncodeDecodeURLHash.
So it isn't clear when one should use EncodeDecodeURLHash(), when GettersDecodeURLHash() and when both.
Attachment #8617774 - Flags: review?(bugs) → review-
> >+    return sGettersDecodeURLHash;
> >+  }
> I don't know understand the setup.
> EncodeDecodeURLHash() says      "Returns true if URL... getters should
> return the percent decoded value of the segment "
> and the new GettersDecodeURLHash() "Returns true if URL getters should
> percent decode the value of the segment"
> So what exactly? How are those methods related?
> 
> And looks like every time you use GettersDecodeURLHash you also use
> EncodeDecodeURLHash.
> So it isn't clear when one should use EncodeDecodeURLHash(), when
> GettersDecodeURLHash() and when both.

if EncodeDecodeURLHash() returns true, both the getters and the setters will perform percent encoding/decoding of the hash.
If GettersDecodeURLHash() is true (only) the getters perform percent decoding.

The initial plan - to never do percent encoding/decoding of the hash - didn't pan out, due to "is having spaces in a URL good for the web" concerns.
What this patch intends is to stop doing percent decoding in the getters, so that for "http://a.com/#%C8%99" url.hash returns "#%C8%99" instead of the percent decoded "ș"
I could separate the prefs so that there is one for encoding in hash setters, and one for decoding in hash getters, but only encoding in hash setters doesn't really make sense.

Do you have any suggestions on how I could make this clearer?
Flags: needinfo?(valentin.gosu) → needinfo?(bugs)
Well, at least make GettersDecodeURLHash() check that also EncodeDecodeURLHash() returns true, so
that the caller doesn't need to always use them both?
Flags: needinfo?(bugs)
Attachment #8617774 - Attachment is obsolete: true
Attachment #8622469 - Flags: review?(bugs) → review+
Comment on attachment 8624542 [details] [diff] [review]
Unescape location.hash, since it may contain escaped characters

jmaher is a better reviewer for talos related changes.
Attachment #8624542 - Flags: review?(bob) → review?(jmaher)
Comment on attachment 8624542 [details] [diff] [review]
Unescape location.hash, since it may contain escaped characters

Review of attachment 8624542 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #8624542 - Flags: review?(jmaher) → review+
can this be closed now?
I'll land the other patches on m-c.
Thanks Joel!
https://hg.mozilla.org/mozilla-central/rev/f83a1a7c2ffb
https://hg.mozilla.org/mozilla-central/rev/9a04997eec28
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Depends on: 1184131
Depends on: 1184589
:valentin For the purpose of documentation, when this was relanded, did it reintroduce the same behavior as before the back-out (keeping percent encoding values in *.hash() and URL()) ?
Flags: needinfo?(valentin.gosu)
Depends on: 1210416
(In reply to Jean-Yves Perrier [:teoli] from comment #83)
> :valentin For the purpose of documentation, when this was relanded, did it
> reintroduce the same behavior as before the back-out (keeping percent
> encoding values in *.hash() and URL()) ?

This re-landed on 2015-06-22, in Firefox 41.
The behaviour is now consistent, meaning url.hash returns the same percent encoded string which is present in url.href
I can confirm that the compat doc is correct.
Flags: needinfo?(valentin.gosu)
Depends on: 1210927
Depends on: 1210095
Depends on: 1212060
Depends on: 1213870
Depends on: 1207742
Updated: https://developer.mozilla.org/en-US/docs/Web/API/URLUtils/hash (removed note and indicated the bug as a comment in the compat table.
and
https://developer.mozilla.org/en-US/Firefox/Releases/41#Miscellaneous

[ Note that the change URLUtils -> HTMLHyperlinkElementUtils is in bug 1213815 ]
Depends on: 1234249
Depends on: 1209876
Depends on: 1273225
See Also: → 1294361
You need to log in before you can comment on or make changes to this bug.