"~" character converting to "%7E" in href attribute



16 years ago
8 years ago


(Reporter: tim young, Unassigned)


Firefox Tracking Flags

(Not tracked)




16 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/20020311
BuildID:    20020311

The "~" character converts to "%7E" when I look at the source.

The problem occurs when the "~" is in a LINK address. 

**** The problem occurs on both Windows and Linux versions (I've tested on both

Reproducible: Always
Steps to Reproduce:
[what I enter in as the link] --> 

Actual Results:  [what the source code displays] -->

Expected Results:  [this should be the source code] -->

Thankfully, the link does work correctly, even with the "%7E".  But it makes for
confusing source code.

Comment 1

16 years ago
seems all kind of links are affected lately; bug 130079 was the first instance


16 years ago
Hardware: PC → All
Summary: "~" character in Composer converting to "%7E" → "~" character in Composer converting to "%7E" in href attribute

Comment 2

16 years ago
I don't think this is a problem; I recommend WONTFIX

Comment 3

16 years ago
marking WONTFIX per brade
Last Resolved: 16 years ago
Resolution: --- → WONTFIX

Comment 4

16 years ago

Comment 5

15 years ago
*** Bug 146615 has been marked as a duplicate of this bug. ***

Comment 6

15 years ago
It seems this has been set WONTFIX, i can't follow the reasoning behind this, i
consider it to be a bug IMO, please tell me why it will not be fixed?

Comment 7

15 years ago
"Escaping" invalid characters in urls is necessary for the browser to work right;
"%7E" is correct, e.g., in the url bar of browser. HTML source is simply the
output HTML that would be produced when writing to a file, so it is correct to
"escape" the "~" then. Ideally, we should present the characters as unescaped 
in the user interface, such as in the location input field in the Link dialog.
A much smarter HTML source editor would also do the same, so I'll reopen and 
future this we remember to address this issue if we do improve HTML source.
Resolution: WONTFIX → ---

Comment 8

15 years ago
reassign to me and future this.
Assignee: syd → cmanske
Depends on: 69329
Summary: "~" character in Composer converting to "%7E" in href attribute → Unescape characters when editing HTML Source
Target Milestone: --- → Future

Comment 9

15 years ago
This concept frightens me.  Escaping/Unescaping of characters can be lossy if
you don't do it in the right way and under appropriate circumstances.

If we are talking about html source, I think we should be showing the actual
source that we are going to save or publish (and not something that is unescaped).

I think this bug should be WONTFIX.

Comment 10

15 years ago
I'm sorry but I dont see what the problem is here. HTML files with ~user work
fine in mozilla and every other browser I have ever used.
I can edit files with ~user fine, but when i save they are converted to %7euser.
As i click on the URL's it displays as %7e then convertes to ~ 1 second after

Mozilla is "corrupting" my html files which makes editing them by hand difficult.
see my bug 146615 for futher explanation.


Comment 11

15 years ago
Here are some relevant urls:
  http://ietf.org/rfc/rfc2396.txt?number=2396 (see section 2.3)

note that we do the escaping to deal with this particular issue:

Any "fixes" desired should not regress the above.  

Comment 12

15 years ago
I may be missing something, but I'm not seeing anything in any of those
references that says that the tilde character (ascii 127 or 7E) isn't a legal
character in a URL.  And certainly it's a very common one -- many ISPs (roughly
half of the ones I've used) use the http://site/~user form for home pages, for
users who don't have their own domain.

I agree that we should continue to escape ampersands and nonascii characters if
we encounter them, but tilde seems to be what this bug is about (clarifying
summary and confirming).
Ever confirmed: true
Summary: Unescape characters when editing HTML Source → "~" character converting to "%7E" in href attribute

Comment 13

15 years ago
A little off topic, but may help somebody who comes across this when using the
mozilla editor to edit files that work with PHP.

Composer will encode the html
<a href="testing.html?foo=1&bar=2">
<a href="testing.html?foo=1&amp;bar=2">

This will cause a default PHP installation to return an association of 'amp;bar'
=> 2, this can be fixed in PHP by changing the php.ini setting
arg_separator.input = ";&"

Comment 14

15 years ago
Removing dependency on bug 69329 since that makes no sense to me.  HTML Source
view doesn't need to be used to encounter this bug.
No longer depends on: 69329

Comment 15

15 years ago
BTW, <a href="testing.html?foo=1&bar=2"> is invalid HTML.

Comment 16

14 years ago
*** Bug 208599 has been marked as a duplicate of this bug. ***

Comment 17

14 years ago
208559 was my dupe, sorry about that.

However, i'm still puzzled.  Yes, composer should escape illegal characters in
URLs.  However, i have NEVER heard that ~ is an illegal character.  It has been
used for years to represent a user directory, and Apache documentation directs
you to use this syntax, as it's the default.

To put it simply:
Illegal chars should be converted
Legal chars should not be converted

Unless someone can show that ~ is an illegal URL character, it should not get
converted. (as a side note, ? is a legal character as well... look at the URL
for this bug)

Comment 18

14 years ago
This is an annoyance, and confusing to viewers of pages created in Composer. 

It should be fixed.

1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
Just like underscore or hyphen, it's perfectly legal! 
2. We don't escape it in page text, why do we in links?
3. We don't escape it in the status bar when you mouse over the link.
4. It's confusing to the user when they click the link and the %7E shows up in
the URL bar. 
5. This should be a very easy thing to fix.
6. Stop trying to tell me how to write or format my HTML!

CCing glazman@netscape.com since he's doing a lot of work on Composer

Comment 19

13 years ago
(In reply to comment #18)
> 1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
> Just like underscore or hyphen, it's perfectly legal! 

Section 2.2
Other characters are unsafe because gateways and other transport agents are
known to sometimes modify such characters. These characters are "{", "}", "|",
"\", "^", "~", "[", "]", and "`".
Product: Browser → Seamonkey

Comment 20

12 years ago
  For example, "%7e" is sometimes used instead of "~" in an http URL
  path, but the two are equivalent for an http URL.
http://www.ietf.org/rfc/rfc2396.txt, Section 2.4.2

Comment 21

11 years ago
*** Bug 360410 has been marked as a duplicate of this bug. ***

Comment 22

11 years ago
Characters other than those in the "reserved" and "unsafe" sets (see
   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

   For example, the following three URIs are equivalent:


RFC 2616, section 3.2.3, June 1999

Comment 23

11 years ago
I can't disagree with what the specs say. The thing we have to remember is that Composer and the browser need to be useful and clear to a user. The browser is compatible with '~' (tilde) being used, so why not make an exception to the specs in the interest of simplicity and usability?

FYI: Entering http://abc.go.com/~smith/home.html in Firefox on Kubuntu 6.06 shows the URL with the '~' tilde intact.  So we just need other Mozilla components to follow this.


Comment 24

11 years ago
(In reply to comment #21)
> *** Bug 360410 has been marked as a duplicate of this bug. ***

The real problem for me was, if I cut-and-paste the link with "%7E" in it, open the page from browser, edit the same page from composer, then "publish" it, the url path was not recognized anymore.

If '~' and '%7E" are truely equivalent, shouldn't be at least a reverse conversion so when publishing the link back to its original site then handle the differences? thanks, 

Comment 25

10 years ago
For fans of this bug, I may have solid and decisive info.

RFC 3986 
Updates: 1738
Obsoletes: 2732, 2396, 1808
Category: Standards Track
and is written by the Web's creator, T. Berners-Lee (W3C/MIT) and is rather recent (January 2005).

Section 2.3 
apply to this bug perfectly:

2.3 Unreserved Characters

    Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.

          unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

   URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. 


2.4 When to Encode or Decode


When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.

So Jonathan Stewart was right when saying
> Illegal chars should be converted
> Legal chars should not be converted
> Unless someone can show that ~ is an illegal URL character, it 
> should not get converted.

and Jim Booth is also right when saying
> Nobody has cited any evidence that tilde is an invalid character in a URL.
> Just like underscore or hyphen, it's perfectly legal!

Assignee: cmanske → nobody
QA Contact: sujay → composer
Target Milestone: Future → ---

Comment 26

8 years ago
Bug bounty, £250 to get this fix into Firefox for next Ubuntu release.

Other bug bounties for Firefox here: http://jguk.org/2009/09/firefox-fixfox-bounties-for-scaled.html

Comment 27

8 years ago
this is also relevant for Firefox, not only SeaMonkey. could someone add firefox as product?
You need to log in before you can comment on or make changes to this bug.