Open
Bug 130615
Opened 23 years ago
Updated 4 years ago
"~" character converting to "%7E" in href attribute
Categories
(SeaMonkey :: Composer, defect)
SeaMonkey
Composer
Tracking
(Not tracked)
NEW
People
(Reporter: youtim73, Unassigned)
References
Details
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/20020311
BuildID: 20020311
The "~" character converts to "%7E" when I look at the source.
The problem occurs when the "~" is in a LINK address.
**** The problem occurs on both Windows and Linux versions (I've tested on both
platforms).
Reproducible: Always
Steps to Reproduce:
[what I enter in as the link] -->
http://www.netaxs.com/~youtim73/powered_by_redhat.gif
Actual Results: [what the source code displays] -->
http://www.netaxs.com/%7Eyoutim73/powered_by_redhat.gif
Expected Results: [this should be the source code] -->
http://www.netaxs.com/~youtim73/powered_by_redhat.gif
Thankfully, the link does work correctly, even with the "%7E". But it makes for
confusing source code.
seems all kind of links are affected lately; bug 130079 was the first instance
reported
Updated•23 years ago
|
Hardware: PC → All
Summary: "~" character in Composer converting to "%7E" → "~" character in Composer converting to "%7E" in href attribute
Comment 2•23 years ago
|
||
I don't think this is a problem; I recommend WONTFIX
marking WONTFIX per brade
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
*** Bug 146615 has been marked as a duplicate of this bug. ***
Hello
It seems this has been set WONTFIX, i can't follow the reasoning behind this, i
consider it to be a bug IMO, please tell me why it will not be fixed?
JG
Comment 7•23 years ago
|
||
"Escaping" invalid characters in urls is necessary for the browser to work right;
"%7E" is correct, e.g., in the url bar of browser. HTML source is simply the
output HTML that would be produced when writing to a file, so it is correct to
"escape" the "~" then. Ideally, we should present the characters as unescaped
in the user interface, such as in the location input field in the Link dialog.
A much smarter HTML source editor would also do the same, so I'll reopen and
future this we remember to address this issue if we do improve HTML source.
Status: VERIFIED → UNCONFIRMED
Resolution: WONTFIX → ---
Comment 8•23 years ago
|
||
reassign to me and future this.
Assignee: syd → cmanske
Depends on: 69329
Summary: "~" character in Composer converting to "%7E" in href attribute → Unescape characters when editing HTML Source
Target Milestone: --- → Future
Comment 9•23 years ago
|
||
This concept frightens me. Escaping/Unescaping of characters can be lossy if
you don't do it in the right way and under appropriate circumstances.
If we are talking about html source, I think we should be showing the actual
source that we are going to save or publish (and not something that is unescaped).
I think this bug should be WONTFIX.
URL: http://N/A
Comment 10•23 years ago
|
||
Hi,
I'm sorry but I dont see what the problem is here. HTML files with ~user work
fine in mozilla and every other browser I have ever used.
I can edit files with ~user fine, but when i save they are converted to %7euser.
As i click on the URL's it displays as %7e then convertes to ~ 1 second after
Mozilla is "corrupting" my html files which makes editing them by hand difficult.
see my bug 146615 for futher explanation.
JG
Comment 11•23 years ago
|
||
Here are some relevant urls:
http://www.w3.org/TR/html4/types.html#h-6.4
http://ietf.org/rfc/rfc2396.txt?number=2396 (see section 2.3)
http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars
note that we do the escaping to deal with this particular issue:
http://www.w3.org/TR/html4/appendix/notes.html#ampersands-in-uris
Any "fixes" desired should not regress the above.
Comment 12•23 years ago
|
||
I may be missing something, but I'm not seeing anything in any of those
references that says that the tilde character (ascii 127 or 7E) isn't a legal
character in a URL. And certainly it's a very common one -- many ISPs (roughly
half of the ones I've used) use the http://site/~user form for home pages, for
users who don't have their own domain.
I agree that we should continue to escape ampersands and nonascii characters if
we encounter them, but tilde seems to be what this bug is about (clarifying
summary and confirming).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Unescape characters when editing HTML Source → "~" character converting to "%7E" in href attribute
Comment 13•22 years ago
|
||
A little off topic, but may help somebody who comes across this when using the
mozilla editor to edit files that work with PHP.
Composer will encode the html
<a href="testing.html?foo=1&bar=2">
to
<a href="testing.html?foo=1&bar=2">
This will cause a default PHP installation to return an association of 'amp;bar'
=> 2, this can be fixed in PHP by changing the php.ini setting
arg_separator.input = ";&"
Comment 14•22 years ago
|
||
Removing dependency on bug 69329 since that makes no sense to me. HTML Source
view doesn't need to be used to encounter this bug.
No longer depends on: 69329
Comment 15•22 years ago
|
||
BTW, <a href="testing.html?foo=1&bar=2"> is invalid HTML.
Comment 16•22 years ago
|
||
*** Bug 208599 has been marked as a duplicate of this bug. ***
Comment 17•22 years ago
|
||
208559 was my dupe, sorry about that.
However, i'm still puzzled. Yes, composer should escape illegal characters in
URLs. However, i have NEVER heard that ~ is an illegal character. It has been
used for years to represent a user directory, and Apache documentation directs
you to use this syntax, as it's the default.
To put it simply:
Illegal chars should be converted
Legal chars should not be converted
Unless someone can show that ~ is an illegal URL character, it should not get
converted. (as a side note, ? is a legal character as well... look at the URL
for this bug)
Comment 18•22 years ago
|
||
This is an annoyance, and confusing to viewers of pages created in Composer.
It should be fixed.
1. Nobody has cited any evidence that tilde is an invalid character in a URL.
Just like underscore or hyphen, it's perfectly legal!
2. We don't escape it in page text, why do we in links?
3. We don't escape it in the status bar when you mouse over the link.
4. It's confusing to the user when they click the link and the %7E shows up in
the URL bar.
5. This should be a very easy thing to fix.
6. Stop trying to tell me how to write or format my HTML!
CCing glazman@netscape.com since he's doing a lot of work on Composer
Comment 19•21 years ago
|
||
(In reply to comment #18)
> 1. Nobody has cited any evidence that tilde is an invalid character in a URL.
> Just like underscore or hyphen, it's perfectly legal!
http://www.ietf.org/rfc/rfc1738.txt
Section 2.2
Other characters are unsafe because gateways and other transport agents are
known to sometimes modify such characters. These characters are "{", "}", "|",
"\", "^", "~", "[", "]", and "`".
Updated•20 years ago
|
Product: Browser → Seamonkey
Comment 20•19 years ago
|
||
{
For example, "%7e" is sometimes used instead of "~" in an http URL
path, but the two are equivalent for an http URL.
}
http://www.ietf.org/rfc/rfc2396.txt, Section 2.4.2
Comment 21•18 years ago
|
||
*** Bug 360410 has been marked as a duplicate of this bug. ***
Comment 22•18 years ago
|
||
{
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
For example, the following three URIs are equivalent:
http://abc.com:80/~smith/home.html
http://ABC.com/%7Esmith/home.html
http://ABC.com:/%7esmith/home.html
}
RFC 2616, section 3.2.3, June 1999
http://www.ietf.org/rfc/rfc2616.txt
Comment 23•18 years ago
|
||
I can't disagree with what the specs say. The thing we have to remember is that Composer and the browser need to be useful and clear to a user. The browser is compatible with '~' (tilde) being used, so why not make an exception to the specs in the interest of simplicity and usability?
FYI: Entering http://abc.go.com/~smith/home.html in Firefox 1.5.0.7 on Kubuntu 6.06 shows the URL with the '~' tilde intact. So we just need other Mozilla components to follow this.
Cheers
Jon
Comment 24•18 years ago
|
||
(In reply to comment #21)
> *** Bug 360410 has been marked as a duplicate of this bug. ***
>
The real problem for me was, if I cut-and-paste the link with "%7E" in it, open the page from browser, edit the same page from composer, then "publish" it, the url path was not recognized anymore.
If '~' and '%7E" are truely equivalent, shouldn't be at least a reverse conversion so when publishing the link back to its original site then handle the differences? thanks,
Comment 25•18 years ago
|
||
For fans of this bug, I may have solid and decisive info.
RFC 3986
http://www.apps.ietf.org/rfc/rfc3986.html
Updates: 1738
Obsoletes: 2732, 2396, 1808
Category: Standards Track
and is written by the Web's creator, T. Berners-Lee (W3C/MIT) and is rather recent (January 2005).
Section 2.3
http://www.apps.ietf.org/rfc/rfc3986.html#sec-2.3
apply to this bug perfectly:
{
2.3 Unreserved Characters
Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.
(...)
2.4 When to Encode or Decode
(...)
When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation.
}
So Jonathan Stewart was right when saying
> Illegal chars should be converted
> Legal chars should not be converted
> Unless someone can show that ~ is an illegal URL character, it
> should not get converted.
and Jim Booth is also right when saying
> Nobody has cited any evidence that tilde is an invalid character in a URL.
> Just like underscore or hyphen, it's perfectly legal!
Updated•17 years ago
|
Assignee: cmanske → nobody
QA Contact: sujay → composer
Target Milestone: Future → ---
Comment 26•15 years ago
|
||
Bug bounty, £250 to get this fix into Firefox for next Ubuntu release.
Other bug bounties for Firefox here: http://jguk.org/2009/09/firefox-fixfox-bounties-for-scaled.html
Comment 27•15 years ago
|
||
this is also relevant for Firefox, not only SeaMonkey. could someone add firefox as product?
You need to log in
before you can comment on or make changes to this bug.
Description
•