Closed Bug 434211 Opened 16 years ago Closed 8 years ago

sites not supporting URL containing escaped apostrophe

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jeroen, Unassigned)

References

()

Details

(Keywords: regression)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0

Page contains a link to another page on the same site with an apostrophe in the URL: http://www.vrijparkeren.nl/rubriek/casino's.html

Apostroph is a legal character here, but in the GET request that FF sends out, it replaces the apostrophe chracter by %027

On this site that causes the intended page not to appear: Apache (with Joomla CMS) returns a 404 (Page not found).

This is a regression as earlier released versions (FF 1 and 2) retrieve the page
correctly, as do MSIE 6 and 7. FF should not % encode URL's where it's not needed (i.e. with characters that are legal).



Reproducible: Always

Steps to Reproduce:
1. Go to http://www.vrijparkeren.nl
2. Click on the link Casino's in either the top or left navigation bar

Actual Results:  
GET /rubriek/casino%27s.html HTTP/1.1

Expected Results:  
GET /rubriek/casino's.html HTTP/1.1

RFC 1738 (ftp://ftp.isi.edu/in-notes/rfc1738.txt), section 2.2 has the
text:

   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.


And RFC 2396 (ftp://ftp.isi.edu/in-notes/rfc2396.txt) says:

2.3. Unreserved Characters

   Data characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set of punctuation marks and
   symbols.

      unreserved  = alphanum | mark

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

   Unreserved characters can be escaped without changing the semantics
   of the URI, but this should not be done unless the URI is being used
   in a context that does not allow the unescaped character to appear.
Flags: blocking-firefox3?
Keywords: regression
Version: unspecified → 3.0 Branch
Would be great to get a regression date here so we can figure out why this was changed.

It does sound like the last paragraph says that we are allowed to do this per spec though. But if it does indeed break servers than we should of course avoid doing so.
Looks like this change was made in bug 376844
Blocks: 376844
Yeah, the change was intentional, and I don't think it's likely to change back.  It's a little more pain to deal with, but realistically, you should have been dealing with it already, and there's some win in helping to avoid certain classes of security exploits.  I'd say "sorry, it sucks, but there's an overall win due to safety gains" and WONTFIX.

Is this an Apache bug or a Joomla! bug?  I'm hoping for the latter since that's probably easier to fix, but either way we should figure out where the bug lies and evangelise to get a fix upstream, since this is a spec bug on the part of one or the other (or something else) on your server.
Apache is certainly not the issue here. Looking at Joomla, it looks like it is doing urldecode correctly (at least I see that in JURI::getInstance) but with an extensible application like this it is hard to tell for sure. My guess would be some Joomla extension failing to process the input correctly. Plain Joomla doesn't seem to use human-readable URLs, so one of the extensions on http://extensions.joomla.org/index.php?option=com_mtree&task=listcats&cat_id=1803&Itemid=35 must be in use here.
Checked out sh404SEF, it always decodes the URL path as of version 1.3.1. So much for guessing, will wait for Jeroen to give us any details on the Joomla extension used.
Yes, it is a bug in a Joomla extension used on the site (SEF advance in this case). I've fixed it on the site now, so it willno longer return a 404.

But here's the tricky part: this change may affect a lot of PHP sites out there. For the reason, look at this excerpt from the PHP documentation:

http://www.php.net/addslashes

The PHP directive  magic_quotes_gpc is on by default, and it essentially runs addslashes() on all GET, POST, and COOKIE data. Do not use addslashes() on strings that have already been escaped with magic_quotes_gpc as you'll then do double escaping. The function get_magic_quotes_gpc() may come in handy for checking this.

So a lot of PHP code will do something like this:

$string = urldecode($string);
if (!get_magic_quotes_gpc()) {
    $string = addslashes($string);
}

Now if the URL contains casino's this will result in casino\'s but if the URL contains casino%27s it will become casino's.

Of course we can argue that the PHP applications should enhance their logic, but please note that

- RFC 2396 says about FF's changed behavior "this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear"
- "should in an RFC usually means that "there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course."
- in the example given above this causes an apostrophe to end up unquoted in somehting that's to be used in a database query

So I fail to see how the changed behavior would make things more secure? It may be PHP's problem and PHP isn't my language of choice either, but I do think the changed behavior is a regression instead of a security improvement. What exactly is the safety gain here?
Component: General → Networking
Flags: blocking-firefox3?
Product: Firefox → Core
QA Contact: general → networking
Version: 3.0 Branch → Trunk
Summary: Incorrect GET request sent for URL containing apostrophe with FF3.0 RC1 → sites not supporting URL containing escaped apostrophe
OS: Windows XP → All
Hardware: PC → All
Respectfully, this is not the bug page for Joomla or PHP.

This is the bug page for firefox.

Did you want to make firefox 3 incompatible with a large section of the internet as a new feature? If not, then please reconsider restoring functionality supported by previous versions and by IE6 and IE7.

Yes, web sites have to handle %27. But there is nothing in the spec that says they may not handle it by redirecting to the corrected page name, which uses the apostrophe. This is in fact what I have done in a server I wrote, and since my server is fully compliant with RFC 2396 (ftp://ftp.isi.edu/in-notes/rfc2396.txt).

For exactly the reasons you have quoted, I am unwilling to alter my web server code: It works, and is RFC compliant, and altering it may introduce new bugs.

However, in my case, the assertion is factual and reasonable. In yours, it is of dubious factual basis, and is not reasonable: you do not even know what was the cause of the change in behaviour, which is in fact a reason for you to audit this code.

I believe my stance is the correct one: while it shoudl be acceptable for firefox to request a page using %27, eg if the link includes this string, it should ALSO ACCORDING TO THE SAME RFC be acceptable for it to request a page including an apostrophe in the uri, eg if the link or a redirect includes that character.

The safest behaviour is the often simplest. The simplest behaviour for firefox in requesting a uri, should be to request it *as parsed*, without change. If given an instruction to fetch http://test.com/this'page it should send just that in the request. If instructed to fetch http://test.com/this%27page it should send just that.

For my web server, it is desirable that the more human-readable form is used in uris that are human readable, ie the one that the user arrives at. For this reason I redirect the user to the form which uses the LEGAL apostrophe character.

I assume my argument is the same one that would be put by authors of many web applications.

I further assume that there is a vast, huge, massive resource of existing internet pages which use the apostrophe, and which you are saying should not be viewed by firefox users.

Thanks in advance for restoring the correct function of firefox, ie that which has historically been supported and is supported by other browsers, and is legal under the appropriate RFC.
a similar bug report (370579) has been filed, perhaps you can view that as 'confirmation' or the existence and relevence of this bug report? https://bugzilla.mozilla.org/show_bug.cgi?id=370579
btw perhaps i should describe for you the current behaviour of firefox when this error is encountered with my web app in particular, there is an infinite redirect loop, because my app redirects viewers to a legal uri, then firefox transforms that uri, and tries to load it ... web app alters it, and redirects ff there, ff transforms and tries to load ....... the user sees a long pause, followed by an unfamiliar error about too many redirects ... hardly appealing. If they try in a different browser, the page works, they imagine ff must be dodgy.
(In reply to comment #8)
OK that was my mistake .. bug report related to ff 1.5
back to the bug ... at the very least, firefox should display in the url bar what it has requested. it doesn't. the url bar displays the apostrophe character, while firefox silently transforms this into the %27 and requests that instead ... this information is visible in the debug stream that goes to console, when i cancel the looping redirect, as well as on my web server debug stream.

ie, if FF is asked to go to either http://example.com/this%27page or http://example.com/this'page, it will in fact request http://example.com/this%27page in both cases, while in BOTH cases it will tell the user that it is asking for http://example.com/this'page ... ie it lies to the user, as well as the problematic req behaviour.
here there are duplicate bugs: https://bugzilla.mozilla.org/show_bug.cgi?id=407172 ... reported three years ago in the beta ...
The w3c XML 1.1 spec reiterates what the RFC (http://tools.ietf.org/html/rfc3986#section-2.2) says:

"Since escaping is not always a fully reversible process, it MUST be performed only when absolutely necessary and as late as possible in a processing chain." -- http://www.w3.org/TR/xml11/#sec-external-ent

"Escaping apostrophe's is hard" doesn't seem to qualify as 'absolutely necessary'. As late as possible in the chain would be, on the destination web server, not on the requesting client.
Depends on: 407172
Oh come this is ridiculous! Your hectically policing the internet! 'Sir or madam, please step aside and upgrade your web-app to the latest Mozilla open-RFC standard, we have critical infrastructure changes to make for you' won't rub for multimillion dollar bespoke corporate solutions, but also ... people put code on read-only media, have you ever noticed? It's no good complaining that oh, upstream have done it again (done what? failed to upgrade to our new anti-standard!) when upstream is a multimedia distributed already to libraries and customers on CDROM.

Nor are complaints that 'Firefox's URI implementation is based on RFC 2396, not 3986 yet' ( https://bugzilla.mozilla.org/show_bug.cgi?id=407172#c32 ) convincing, when in fact Firefox's URI implementation WAS correct, as is the behaviour of other browsers, AND RFC 2396 strongly hinted that the newer behaviour was not correct.

Having broken the code you received like a hand-me-down sweater, or the keys to Dad's car, or perhaps some inheritance from a rich Uncle, you show your immaturity with the flippant declaration that 'user must upgrade within 10 seconds to comply'. You meddle with you know not what powers, oh Mozillascape scriptkiddies...

How hard can it be to undo this change? Sheesh, am I talking to myself?
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.