Open Bug 130615 Opened 23 years ago Updated 4 years ago

"~" character converting to "%7E" in href attribute

Categories

(SeaMonkey :: Composer, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: youtim73, Unassigned)

References

Details

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9) Gecko/20020311 BuildID: 20020311 The "~" character converts to "%7E" when I look at the source. The problem occurs when the "~" is in a LINK address. **** The problem occurs on both Windows and Linux versions (I've tested on both platforms). Reproducible: Always Steps to Reproduce: [what I enter in as the link] --> http://www.netaxs.com/~youtim73/powered_by_redhat.gif Actual Results: [what the source code displays] --> http://www.netaxs.com/%7Eyoutim73/powered_by_redhat.gif Expected Results: [this should be the source code] --> http://www.netaxs.com/~youtim73/powered_by_redhat.gif Thankfully, the link does work correctly, even with the "%7E". But it makes for confusing source code.
seems all kind of links are affected lately; bug 130079 was the first instance reported
Hardware: PC → All
Summary: "~" character in Composer converting to "%7E" → "~" character in Composer converting to "%7E" in href attribute
I don't think this is a problem; I recommend WONTFIX
marking WONTFIX per brade
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
verified.
Status: RESOLVED → VERIFIED
*** Bug 146615 has been marked as a duplicate of this bug. ***
Hello It seems this has been set WONTFIX, i can't follow the reasoning behind this, i consider it to be a bug IMO, please tell me why it will not be fixed? JG
"Escaping" invalid characters in urls is necessary for the browser to work right; "%7E" is correct, e.g., in the url bar of browser. HTML source is simply the output HTML that would be produced when writing to a file, so it is correct to "escape" the "~" then. Ideally, we should present the characters as unescaped in the user interface, such as in the location input field in the Link dialog. A much smarter HTML source editor would also do the same, so I'll reopen and future this we remember to address this issue if we do improve HTML source.
Status: VERIFIED → UNCONFIRMED
Resolution: WONTFIX → ---
reassign to me and future this.
Assignee: syd → cmanske
Depends on: 69329
Summary: "~" character in Composer converting to "%7E" in href attribute → Unescape characters when editing HTML Source
Target Milestone: --- → Future
This concept frightens me. Escaping/Unescaping of characters can be lossy if you don't do it in the right way and under appropriate circumstances. If we are talking about html source, I think we should be showing the actual source that we are going to save or publish (and not something that is unescaped). I think this bug should be WONTFIX.
Hi, I'm sorry but I dont see what the problem is here. HTML files with ~user work fine in mozilla and every other browser I have ever used. I can edit files with ~user fine, but when i save they are converted to %7euser. As i click on the URL's it displays as %7e then convertes to ~ 1 second after Mozilla is "corrupting" my html files which makes editing them by hand difficult. see my bug 146615 for futher explanation. JG
Here are some relevant urls: http://www.w3.org/TR/html4/types.html#h-6.4 http://ietf.org/rfc/rfc2396.txt?number=2396 (see section 2.3) http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars note that we do the escaping to deal with this particular issue: http://www.w3.org/TR/html4/appendix/notes.html#ampersands-in-uris Any "fixes" desired should not regress the above.
I may be missing something, but I'm not seeing anything in any of those references that says that the tilde character (ascii 127 or 7E) isn't a legal character in a URL. And certainly it's a very common one -- many ISPs (roughly half of the ones I've used) use the http://site/~user form for home pages, for users who don't have their own domain. I agree that we should continue to escape ampersands and nonascii characters if we encounter them, but tilde seems to be what this bug is about (clarifying summary and confirming).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Unescape characters when editing HTML Source → "~" character converting to "%7E" in href attribute
A little off topic, but may help somebody who comes across this when using the mozilla editor to edit files that work with PHP. Composer will encode the html <a href="testing.html?foo=1&bar=2"> to <a href="testing.html?foo=1&amp;bar=2"> This will cause a default PHP installation to return an association of 'amp;bar' => 2, this can be fixed in PHP by changing the php.ini setting arg_separator.input = ";&"
Removing dependency on bug 69329 since that makes no sense to me. HTML Source view doesn't need to be used to encounter this bug.
No longer depends on: 69329
BTW, <a href="testing.html?foo=1&bar=2"> is invalid HTML.
*** Bug 208599 has been marked as a duplicate of this bug. ***
208559 was my dupe, sorry about that. However, i'm still puzzled. Yes, composer should escape illegal characters in URLs. However, i have NEVER heard that ~ is an illegal character. It has been used for years to represent a user directory, and Apache documentation directs you to use this syntax, as it's the default. To put it simply: Illegal chars should be converted Legal chars should not be converted Unless someone can show that ~ is an illegal URL character, it should not get converted. (as a side note, ? is a legal character as well... look at the URL for this bug)
This is an annoyance, and confusing to viewers of pages created in Composer. It should be fixed. 1. Nobody has cited any evidence that tilde is an invalid character in a URL. Just like underscore or hyphen, it's perfectly legal! 2. We don't escape it in page text, why do we in links? 3. We don't escape it in the status bar when you mouse over the link. 4. It's confusing to the user when they click the link and the %7E shows up in the URL bar. 5. This should be a very easy thing to fix. 6. Stop trying to tell me how to write or format my HTML! CCing glazman@netscape.com since he's doing a lot of work on Composer
(In reply to comment #18) > 1. Nobody has cited any evidence that tilde is an invalid character in a URL. > Just like underscore or hyphen, it's perfectly legal! http://www.ietf.org/rfc/rfc1738.txt Section 2.2 Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".
Product: Browser → Seamonkey
{ For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL. } http://www.ietf.org/rfc/rfc2396.txt, Section 2.4.2
*** Bug 360410 has been marked as a duplicate of this bug. ***
{ Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding. For example, the following three URIs are equivalent: http://abc.com:80/~smith/home.html http://ABC.com/%7Esmith/home.html http://ABC.com:/%7esmith/home.html } RFC 2616, section 3.2.3, June 1999 http://www.ietf.org/rfc/rfc2616.txt
I can't disagree with what the specs say. The thing we have to remember is that Composer and the browser need to be useful and clear to a user. The browser is compatible with '~' (tilde) being used, so why not make an exception to the specs in the interest of simplicity and usability? FYI: Entering http://abc.go.com/~smith/home.html in Firefox 1.5.0.7 on Kubuntu 6.06 shows the URL with the '~' tilde intact. So we just need other Mozilla components to follow this. Cheers Jon
(In reply to comment #21) > *** Bug 360410 has been marked as a duplicate of this bug. *** > The real problem for me was, if I cut-and-paste the link with "%7E" in it, open the page from browser, edit the same page from composer, then "publish" it, the url path was not recognized anymore. If '~' and '%7E" are truely equivalent, shouldn't be at least a reverse conversion so when publishing the link back to its original site then handle the differences? thanks,
For fans of this bug, I may have solid and decisive info. RFC 3986 http://www.apps.ietf.org/rfc/rfc3986.html Updates: 1738 Obsoletes: 2732, 2396, 1808 Category: Standards Track and is written by the Web's creator, T. Berners-Lee (W3C/MIT) and is rather recent (January 2005). Section 2.3 http://www.apps.ietf.org/rfc/rfc3986.html#sec-2.3 apply to this bug perfectly: { 2.3 Unreserved Characters Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. (...) 2.4 When to Encode or Decode (...) When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time. For example, the octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by "~" without changing its interpretation. } So Jonathan Stewart was right when saying > Illegal chars should be converted > Legal chars should not be converted > Unless someone can show that ~ is an illegal URL character, it > should not get converted. and Jim Booth is also right when saying > Nobody has cited any evidence that tilde is an invalid character in a URL. > Just like underscore or hyphen, it's perfectly legal!
Assignee: cmanske → nobody
QA Contact: sujay → composer
Target Milestone: Future → ---
Bug bounty, £250 to get this fix into Firefox for next Ubuntu release. Other bug bounties for Firefox here: http://jguk.org/2009/09/firefox-fixfox-bounties-for-scaled.html
this is also relevant for Firefox, not only SeaMonkey. could someone add firefox as product?
You need to log in before you can comment on or make changes to this bug.