Last Comment Bug 130615 - "~" character converting to "%7E" in href attribute
: "~" character converting to "%7E" in href attribute
Status: NEW
Product: SeaMonkey
Classification: Client Software
Component: Composer (show other bugs)
: Trunk
: All All
-- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: 146615 208599 360410 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2002-03-13 08:35 PST by tim young
Modified: 2010-01-27 10:52 PST (History)
15 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Description User image tim young 2002-03-13 08:35:13 PST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/20020311
BuildID:    20020311

The "~" character converts to "%7E" when I look at the source.

The problem occurs when the "~" is in a LINK address. 

**** The problem occurs on both Windows and Linux versions (I've tested on both

Reproducible: Always
Steps to Reproduce:
[what I enter in as the link] -->  

Actual Results:  [what the source code displays] -->

Expected Results:  [this should be the source code] -->

Thankfully, the link does work correctly, even with the "%7E".  But it makes for
confusing source code.
Comment 1 User image R.K.Aa. 2002-03-13 11:17:20 PST
seems all kind of links are affected lately; bug 130079 was the first instance
Comment 2 User image Kathleen Brade 2002-03-15 07:55:45 PST
I don't think this is a problem; I recommend WONTFIX
Comment 3 User image Syd Logan 2002-04-14 19:35:24 PDT
marking WONTFIX per brade
Comment 4 User image sujay 2002-04-15 10:05:29 PDT
Comment 5 User image R.K.Aa. 2002-05-23 22:16:35 PDT
*** Bug 146615 has been marked as a duplicate of this bug. ***
Comment 6 User image jg 2002-05-23 22:23:34 PDT
It seems this has been set WONTFIX, i can't follow the reasoning behind this, i
consider it to be a bug IMO, please tell me why it will not be fixed?
Comment 7 User image Charles Manske 2002-05-24 10:57:01 PDT
"Escaping" invalid characters in urls is necessary for the browser to work right;
"%7E" is correct, e.g., in the url bar of browser. HTML source is simply the
output HTML that would be produced when writing to a file, so it is correct to
"escape" the "~" then. Ideally, we should present the characters as unescaped 
in the user interface, such as in the location input field in the Link dialog.
A much smarter HTML source editor would also do the same, so I'll reopen and 
future this we remember to address this issue if we do improve HTML source.
Comment 8 User image Charles Manske 2002-05-24 11:57:25 PDT
reassign to me and future this.
Comment 9 User image Kathleen Brade 2002-05-24 12:17:30 PDT
This concept frightens me.  Escaping/Unescaping of characters can be lossy if
you don't do it in the right way and under appropriate circumstances.

If we are talking about html source, I think we should be showing the actual
source that we are going to save or publish (and not something that is unescaped).

I think this bug should be WONTFIX.
Comment 10 User image jg 2002-05-27 00:30:08 PDT
I'm sorry but I dont see what the problem is here. HTML files with ~user work
fine in mozilla and every other browser I have ever used.
I can edit files with ~user fine, but when i save they are converted to %7euser.
As i click on the URL's it displays as %7e then convertes to ~ 1 second after

Mozilla is "corrupting" my html files which makes editing them by hand difficult.
see my bug 146615 for futher explanation.

Comment 11 User image Kathleen Brade 2002-05-29 08:36:44 PDT
Here are some relevant urls: (see section 2.3)

note that we do the escaping to deal with this particular issue:

Any "fixes" desired should not regress the above.  
Comment 12 User image Akkana Peck 2002-05-29 10:01:41 PDT
I may be missing something, but I'm not seeing anything in any of those
references that says that the tilde character (ascii 127 or 7E) isn't a legal
character in a URL.  And certainly it's a very common one -- many ISPs (roughly
half of the ones I've used) use the http://site/~user form for home pages, for
users who don't have their own domain.

I agree that we should continue to escape ampersands and nonascii characters if
we encounter them, but tilde seems to be what this bug is about (clarifying
summary and confirming).
Comment 13 User image Alan Knowles 2002-11-18 20:01:45 PST
A little off topic, but may help somebody who comes across this when using the
mozilla editor to edit files that work with PHP.

Composer will encode the html
<a href="testing.html?foo=1&bar=2">
<a href="testing.html?foo=1&amp;bar=2">

This will cause a default PHP installation to return an association of 'amp;bar'
=> 2, this can be fixed in PHP by changing the php.ini setting
arg_separator.input = ";&"
Comment 14 User image Kathleen Brade 2003-04-03 06:35:04 PST
Removing dependency on bug 69329 since that makes no sense to me.  HTML Source
view doesn't need to be used to encounter this bug.
Comment 15 User image 2003-04-03 06:42:29 PST
BTW, <a href="testing.html?foo=1&bar=2"> is invalid HTML.
Comment 16 User image Jo Hermans 2003-06-07 03:27:50 PDT
*** Bug 208599 has been marked as a duplicate of this bug. ***
Comment 17 User image Jonathan Stewart 2003-06-07 14:02:32 PDT
208559 was my dupe, sorry about that.

However, i'm still puzzled.  Yes, composer should escape illegal characters in
URLs.  However, i have NEVER heard that ~ is an illegal character.  It has been
used for years to represent a user directory, and Apache documentation directs
you to use this syntax, as it's the default.

To put it simply:
Illegal chars should be converted
Legal chars should not be converted

Unless someone can show that ~ is an illegal URL character, it should not get
converted. (as a side note, ? is a legal character as well... look at the URL
for this bug)
Comment 18 User image Jim Booth 2003-07-04 09:55:37 PDT
This is an annoyance, and confusing to viewers of pages created in Composer. 

It should be fixed.

1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
Just like underscore or hyphen, it's perfectly legal! 
2. We don't escape it in page text, why do we in links?
3. We don't escape it in the status bar when you mouse over the link.
4. It's confusing to the user when they click the link and the %7E shows up in
the URL bar. 
5. This should be a very easy thing to fix.
6. Stop trying to tell me how to write or format my HTML!

CCing since he's doing a lot of work on Composer
Comment 19 User image Ric Gates 2004-04-24 00:58:40 PDT
(In reply to comment #18)
> 1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
> Just like underscore or hyphen, it's perfectly legal!
Section 2.2
Other characters are unsafe because gateways and other transport agents are
known to sometimes modify such characters. These characters are "{", "}", "|",
"\", "^", "~", "[", "]", and "`".
Comment 20 User image Gérard Talbot 2005-10-01 22:50:11 PDT
  For example, "%7e" is sometimes used instead of "~" in an http URL
  path, but the two are equivalent for an http URL.
}, Section 2.4.2
Comment 21 User image Gérard Talbot 2006-11-11 22:27:07 PST
*** Bug 360410 has been marked as a duplicate of this bug. ***
Comment 22 User image Gérard Talbot 2006-11-11 22:45:47 PST
Characters other than those in the "reserved" and "unsafe" sets (see
   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

   For example, the following three URIs are equivalent:

RFC 2616, section 3.2.3, June 1999
Comment 23 User image jg 2006-11-12 10:55:22 PST
I can't disagree with what the specs say. The thing we have to remember is that Composer and the browser need to be useful and clear to a user. The browser is compatible with '~' (tilde) being used, so why not make an exception to the specs in the interest of simplicity and usability?

FYI: Entering in Firefox on Kubuntu 6.06 shows the URL with the '~' tilde intact.  So we just need other Mozilla components to follow this.

Comment 24 User image joesyu 2006-11-13 11:36:21 PST
(In reply to comment #21)
> *** Bug 360410 has been marked as a duplicate of this bug. ***

The real problem for me was, if I cut-and-paste the link with "%7E" in it, open the page from browser, edit the same page from composer, then "publish" it, the url path was not recognized anymore.

If '~' and '%7E" are truely equivalent, shouldn't be at least a reverse conversion so when publishing the link back to its original site then handle the differences? thanks, 
Comment 25 User image Gérard Talbot 2007-04-07 22:30:15 PDT
For fans of this bug, I may have solid and decisive info.

RFC 3986
Updates: 1738
Obsoletes: 2732, 2396, 1808
Category: Standards Track
and is written by the Web's creator, T. Berners-Lee (W3C/MIT) and is rather recent (January 2005).

Section 2.3
apply to this bug perfectly:

2.3 Unreserved Characters

    Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.

          unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

   URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. 


2.4 When to Encode or Decode


When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.

So Jonathan Stewart was right when saying
> Illegal chars should be converted
> Legal chars should not be converted
> Unless someone can show that ~ is an illegal URL character, it 
> should not get converted.

and Jim Booth is also right when saying
> Nobody has cited any evidence that tilde is an invalid character in a URL.
> Just like underscore or hyphen, it's perfectly legal!

Comment 26 User image jg 2009-12-13 10:41:51 PST
Bug bounty, £250 to get this fix into Firefox for next Ubuntu release.

Other bug bounties for Firefox here:
Comment 27 User image NatanaelA 2010-01-27 10:52:16 PST
this is also relevant for Firefox, not only SeaMonkey. could someone add firefox as product?

Note You need to log in before you can comment on or make changes to this bug.