Last Comment Bug 130615 - "~" character converting to "%7E" in href attribute
: "~" character converting to "%7E" in href attribute
Status: NEW
:
Product: SeaMonkey
Classification: Client Software
Component: Composer (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
: 146615 208599 360410 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-13 08:35 PST by tim young
Modified: 2010-01-27 10:52 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description tim young 2002-03-13 08:35:13 PST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/20020311
BuildID:    20020311

The "~" character converts to "%7E" when I look at the source.

The problem occurs when the "~" is in a LINK address. 

**** The problem occurs on both Windows and Linux versions (I've tested on both
platforms).




Reproducible: Always
Steps to Reproduce:
[what I enter in as the link] --> 
http://www.netaxs.com/~youtim73/powered_by_redhat.gif  




Actual Results:  [what the source code displays] -->
http://www.netaxs.com/%7Eyoutim73/powered_by_redhat.gif

Expected Results:  [this should be the source code] -->
http://www.netaxs.com/~youtim73/powered_by_redhat.gif

Thankfully, the link does work correctly, even with the "%7E".  But it makes for
confusing source code.
Comment 1 R.K.Aa. 2002-03-13 11:17:20 PST
seems all kind of links are affected lately; bug 130079 was the first instance
reported
Comment 2 Kathleen Brade 2002-03-15 07:55:45 PST
I don't think this is a problem; I recommend WONTFIX
Comment 3 Syd Logan 2002-04-14 19:35:24 PDT
marking WONTFIX per brade
Comment 4 sujay 2002-04-15 10:05:29 PDT
verified.
Comment 5 R.K.Aa. 2002-05-23 22:16:35 PDT
*** Bug 146615 has been marked as a duplicate of this bug. ***
Comment 6 jg 2002-05-23 22:23:34 PDT
Hello
It seems this has been set WONTFIX, i can't follow the reasoning behind this, i
consider it to be a bug IMO, please tell me why it will not be fixed?
JG
Comment 7 Charles Manske 2002-05-24 10:57:01 PDT
"Escaping" invalid characters in urls is necessary for the browser to work right;
"%7E" is correct, e.g., in the url bar of browser. HTML source is simply the
output HTML that would be produced when writing to a file, so it is correct to
"escape" the "~" then. Ideally, we should present the characters as unescaped 
in the user interface, such as in the location input field in the Link dialog.
A much smarter HTML source editor would also do the same, so I'll reopen and 
future this we remember to address this issue if we do improve HTML source.
Comment 8 Charles Manske 2002-05-24 11:57:25 PDT
reassign to me and future this.
Comment 9 Kathleen Brade 2002-05-24 12:17:30 PDT
This concept frightens me.  Escaping/Unescaping of characters can be lossy if
you don't do it in the right way and under appropriate circumstances.

If we are talking about html source, I think we should be showing the actual
source that we are going to save or publish (and not something that is unescaped).

I think this bug should be WONTFIX.
Comment 10 jg 2002-05-27 00:30:08 PDT
Hi,
I'm sorry but I dont see what the problem is here. HTML files with ~user work
fine in mozilla and every other browser I have ever used.
I can edit files with ~user fine, but when i save they are converted to %7euser.
As i click on the URL's it displays as %7e then convertes to ~ 1 second after

Mozilla is "corrupting" my html files which makes editing them by hand difficult.
see my bug 146615 for futher explanation.

JG
Comment 11 Kathleen Brade 2002-05-29 08:36:44 PDT
Here are some relevant urls:
  http://www.w3.org/TR/html4/types.html#h-6.4
  http://ietf.org/rfc/rfc2396.txt?number=2396 (see section 2.3)
  http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

note that we do the escaping to deal with this particular issue:
  http://www.w3.org/TR/html4/appendix/notes.html#ampersands-in-uris

Any "fixes" desired should not regress the above.  
Comment 12 Akkana Peck 2002-05-29 10:01:41 PDT
I may be missing something, but I'm not seeing anything in any of those
references that says that the tilde character (ascii 127 or 7E) isn't a legal
character in a URL.  And certainly it's a very common one -- many ISPs (roughly
half of the ones I've used) use the http://site/~user form for home pages, for
users who don't have their own domain.

I agree that we should continue to escape ampersands and nonascii characters if
we encounter them, but tilde seems to be what this bug is about (clarifying
summary and confirming).
Comment 13 Alan Knowles 2002-11-18 20:01:45 PST
A little off topic, but may help somebody who comes across this when using the
mozilla editor to edit files that work with PHP.

Composer will encode the html
<a href="testing.html?foo=1&bar=2">
to
<a href="testing.html?foo=1&amp;bar=2">

This will cause a default PHP installation to return an association of 'amp;bar'
=> 2, this can be fixed in PHP by changing the php.ini setting
arg_separator.input = ";&"
Comment 14 Kathleen Brade 2003-04-03 06:35:04 PST
Removing dependency on bug 69329 since that makes no sense to me.  HTML Source
view doesn't need to be used to encounter this bug.
Comment 15 neil@parkwaycc.co.uk 2003-04-03 06:42:29 PST
BTW, <a href="testing.html?foo=1&bar=2"> is invalid HTML.
Comment 16 Jo Hermans 2003-06-07 03:27:50 PDT
*** Bug 208599 has been marked as a duplicate of this bug. ***
Comment 17 Jonathan Stewart 2003-06-07 14:02:32 PDT
208559 was my dupe, sorry about that.

However, i'm still puzzled.  Yes, composer should escape illegal characters in
URLs.  However, i have NEVER heard that ~ is an illegal character.  It has been
used for years to represent a user directory, and Apache documentation directs
you to use this syntax, as it's the default.

To put it simply:
Illegal chars should be converted
Legal chars should not be converted

Unless someone can show that ~ is an illegal URL character, it should not get
converted. (as a side note, ? is a legal character as well... look at the URL
for this bug)
Comment 18 Jim Booth 2003-07-04 09:55:37 PDT
This is an annoyance, and confusing to viewers of pages created in Composer. 

It should be fixed.

1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
Just like underscore or hyphen, it's perfectly legal! 
2. We don't escape it in page text, why do we in links?
3. We don't escape it in the status bar when you mouse over the link.
4. It's confusing to the user when they click the link and the %7E shows up in
the URL bar. 
5. This should be a very easy thing to fix.
6. Stop trying to tell me how to write or format my HTML!

CCing glazman@netscape.com since he's doing a lot of work on Composer
Comment 19 Ric Gates 2004-04-24 00:58:40 PDT
(In reply to comment #18)
> 1. Nobody has cited any evidence that tilde is an invalid character in a URL. 
> Just like underscore or hyphen, it's perfectly legal! 

http://www.ietf.org/rfc/rfc1738.txt
Section 2.2
Other characters are unsafe because gateways and other transport agents are
known to sometimes modify such characters. These characters are "{", "}", "|",
"\", "^", "~", "[", "]", and "`".
Comment 20 Gérard Talbot 2005-10-01 22:50:11 PDT
{
  For example, "%7e" is sometimes used instead of "~" in an http URL
  path, but the two are equivalent for an http URL.
}
http://www.ietf.org/rfc/rfc2396.txt, Section 2.4.2
Comment 21 Gérard Talbot 2006-11-11 22:27:07 PST
*** Bug 360410 has been marked as a duplicate of this bug. ***
Comment 22 Gérard Talbot 2006-11-11 22:45:47 PST
{
Characters other than those in the "reserved" and "unsafe" sets (see
   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

   For example, the following three URIs are equivalent:

      http://abc.com:80/~smith/home.html
      http://ABC.com/%7Esmith/home.html
      http://ABC.com:/%7esmith/home.html
}

RFC 2616, section 3.2.3, June 1999
http://www.ietf.org/rfc/rfc2616.txt
Comment 23 jg 2006-11-12 10:55:22 PST
I can't disagree with what the specs say. The thing we have to remember is that Composer and the browser need to be useful and clear to a user. The browser is compatible with '~' (tilde) being used, so why not make an exception to the specs in the interest of simplicity and usability?

FYI: Entering http://abc.go.com/~smith/home.html in Firefox 1.5.0.7 on Kubuntu 6.06 shows the URL with the '~' tilde intact.  So we just need other Mozilla components to follow this.

Cheers
Jon
Comment 24 joesyu 2006-11-13 11:36:21 PST
(In reply to comment #21)
> *** Bug 360410 has been marked as a duplicate of this bug. ***
> 

The real problem for me was, if I cut-and-paste the link with "%7E" in it, open the page from browser, edit the same page from composer, then "publish" it, the url path was not recognized anymore.

If '~' and '%7E" are truely equivalent, shouldn't be at least a reverse conversion so when publishing the link back to its original site then handle the differences? thanks, 
Comment 25 Gérard Talbot 2007-04-07 22:30:15 PDT
For fans of this bug, I may have solid and decisive info.

RFC 3986 
http://www.apps.ietf.org/rfc/rfc3986.html
Updates: 1738
Obsoletes: 2732, 2396, 1808
Category: Standards Track
and is written by the Web's creator, T. Berners-Lee (W3C/MIT) and is rather recent (January 2005).

Section 2.3 
http://www.apps.ietf.org/rfc/rfc3986.html#sec-2.3
apply to this bug perfectly:

{
2.3 Unreserved Characters

    Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.

          unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

   URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. 

(...)

2.4 When to Encode or Decode

(...)

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.
}

So Jonathan Stewart was right when saying
> Illegal chars should be converted
> Legal chars should not be converted
> Unless someone can show that ~ is an illegal URL character, it 
> should not get converted.

and Jim Booth is also right when saying
> Nobody has cited any evidence that tilde is an invalid character in a URL.
> Just like underscore or hyphen, it's perfectly legal!

Comment 26 jg 2009-12-13 10:41:51 PST
Bug bounty, £250 to get this fix into Firefox for next Ubuntu release.

Other bug bounties for Firefox here: http://jguk.org/2009/09/firefox-fixfox-bounties-for-scaled.html
Comment 27 NatanaelA 2010-01-27 10:52:16 PST
this is also relevant for Firefox, not only SeaMonkey. could someone add firefox as product?

Note You need to log in before you can comment on or make changes to this bug.