User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Quick Search does not URL-encode plus signs (+).
Searching for "c++" (without quotes), the %s (replaced portion of the URL) appears in the URL as:
The encoding (++) being decoded by the server as two spaces.
Steps to Reproduce:
1) Create a bookmark with the address:
2) Assign a keyword to the bookmark, i.e: 'g' (without quotes).
3) Type "g c++" (without quotes) in the address bar and press Enter.
The plus signs (+) do not get URL-encoded.
+ should be encoded as %2B
This may be a regression as it is reported as bug 319169 and bug 316863 (and an number of others mentioned therein). The status of these bug reports have been set to "Resolved".
But this feature has never worked for me since Firefox 1.0 and (mostly) automatic update versions up to 2.0.
This is a fresh install of Firefox 2.0 (no previous version) on a fresh install of Windows.
Just a confirmation: this bug is also present on (a clean install of) Ubuntu Linux (Edgy Eft version: 6.10).
Firefox version string:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20060601 Firefox/2.0 (Ubuntu-edgy)
Since this bug was reported as "Resolved" on multiple occasions, I'm starting to think that I'm making some sort of naive mistake - like an about:config setting.
Confirming on Windows trunk (I don't have a 2.0 build ATM).
Interestingly, this seems fixed for keyword.URL (I use a standard Google Search, and the +s get escaped properly), but not for bookmark keywords. I could probably figure this out if I had a lot of time to look at Firefox stuff lately.
People interested in fixing this should look at https://bugzilla.mozilla.org/attachment.cgi?id=226851 , specifically the very last change, which escapes using XPAlphas. My suspicion is that somehow, the keyword.URL mangling is making it to this line while the bookmark keyword mangling is not, and probably should be. If these are made to go through the same code path, this should all start working properly.
I don't know why I marked bug 316863 fixed when it's clearly not (I suspect I tested with keyword.URL and not a bookmark keyword), but now that we have this bug, it's a bit pointless to reopen. I will remove the "regression" keyword since this isn't a regression, it's just something that wasn't ever properly fixed when it should have been.
Resummarizing for clarity.
If anyone has time for this, the patch might be safe enough for a 2.x release, and would be nice-to-have.
The patch for bug 258223 made us use escape() if there is a specified character set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy was introduced, but it would explain the "+" signs not being escaped (escape() doesn't escape them, encodeURIComponent() does).
*** Bug 370067 has been marked as a duplicate of this bug. ***
*** Bug 373754 has been marked as a duplicate of this bug. ***
*** Bug 441201 has been marked as a duplicate of this bug. ***
*** Bug 482593 has been marked as a duplicate of this bug. ***
Not blocking, but cc'ing Mardak who, I think, dealt with escaping/unescaping stuff in the past.
WFM. There's even a testcase for it! ;)
This bug was originally filed for Firefox 2... I checked on Firefox 3, Shiretoko, and Minefield and all work.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090311 Minefield/3.2a1pre
I still can't do a keyword search for c++. At least, it goes to http://www.google.nl/search?hl=nl&q=c++&meta=
On trunk, it shows you the query as an option in the autocomplete. What you see there is unescaped. If you select the entry you'll see that it's %2B. Even if I don't select the entry, I see the resulting page go to %2B%2B.
I can't reproduce the fix. I tried a brand new profile and the following STR:
- go to www.google.com
- right click on the input field -> Add a Keyword for this Search
- give Name Google and Keyword g
- type g c++
it goes to http://www.google.nl/search?hl=nl&q=c++&meta=
But I see that it works through the Search Engine manager.
I filed a duplicate bug about this yesterday from FF 3.0.6 (sorry about that; I swear I spent a fair amount of time looking), and I also can still reproduce the bug, to the point of being able to do it after downloading and running the latest nightly from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-3.2a1pre.en-US.linux-i686.tar.bz2.
The steps Ria Klaassen posted still reproduce the bug for me.
"But I see that it works through the Search Engine manager."
I didn't know you could set keyword searches that way; thanks, that will work as a workaround. It'd be nice to fix for arbitrary searches though.
This is very strange.. While the autocomplete entry works fine, if you just hit enter without selecting anything, it *sometimes* doesn't convert + to %2B.
When I tried it on my profile, all the keywords were correctly converting + to %2B. But when I just tried adding a new keyword using the "Add a Keyword for this Search", I run into problems. But, if I manually create a bookmark, it changes + to %2B.
Some digging around in my places.sqlite, I noticed the ones that *were* working correctly (+ -> %2B) had *no* moz_anno with that bookmark's place_id.
The ones that were created through "Add a Keyword for this Search" do have an annotation and some reason prevents + from changing to %2B. ??
So as another way to work around this issue...
1) Perform the search you want
2) Copy the url from the resulting page
3) Create a bookmark with that url replacing your search term with %s
Presumably the difference is which of the two branches end up being taken in getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does.
that is probably the charset anno, if a charset is known for the page we choose another way there
and when we add a keyword we save the page charset iirc
(In reply to comment #18)
> Presumably the difference is which of the two branches end up being taken in
> getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does.
I think this also happens to bookmarklets; that would explain why when I tried to generate an is.gd URL from an is.gd bookmarklet, the "+" didn't go through.
*** Bug 503089 has been marked as a duplicate of this bug. ***
I quickly tested the problem on a mac today at a friends place. Atleast then it seemed that it worked properly on a mac, but would like to have someone confirm it.
As far as I've used this feature, this bug has been fixed for me for a while. Now using Firefox 3.5.2 on Windows. Though I'd read that for some this bug was resolved for a long time (on what might've been even the 1+ series).
slavkop: Comment #16 and #17 explain what works and what doesn't; near as I can tell those comments are still accurate.
If you manually create a bookmark, substituting %s in the appropriate place, then there is no bug. (This is what your original bug report did.) However, if you use the "add keyword for this search" feature (right click on, say, Google's text box), the bug still exists in FF 3.5.3/Win.
(In reply to comment #4)
> The patch for bug 258223 made us use escape() if there is a specified character
> set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy
> was introduced, but it would explain the "+" signs not being escaped (escape()
> doesn't escape them, encodeURIComponent() does).
From what I can tell, its because encodeURIComponent() always returns UTF-8. Although, the result of encodeURIComponent() can just be converted to the relevant character set, and all the problems should just magically go away.
Created attachment 405365 [details] [diff] [review]
That patch is equivalent to just removing the escape() call. I don't understand how that matches your comment, or why it would be OK.
Created attachment 405404 [details] [diff] [review]
(In reply to comment #28)
> That patch is equivalent to just removing the escape() call. I don't understand
> how that matches your comment, or why it would be OK.
Whoops... Don't know how I did that.
+ encodedParam = convertFromUnicode(charset, param);
+ encodedParam = convertFromUnicode(charset, encodedParam);
This patch should be what I intend.
*** Bug 528876 has been marked as a duplicate of this bug. ***
*** Bug 548660 has been marked as a duplicate of this bug. ***
*** Bug 558605 has been marked as a duplicate of this bug. ***
*** Bug 480727 has been marked as a duplicate of this bug. ***
*** Bug 613917 has been marked as a duplicate of this bug. ***
*** Bug 631010 has been marked as a duplicate of this bug. ***
*** Bug 631301 has been marked as a duplicate of this bug. ***
Created attachment 509929 [details] [diff] [review]
The previously proposed patch in attachment 405404 [details] [diff] [review] above does not handle other charsets correctly. The resulting string is the uri-encoded utf-8 version of the string. Thus, the string is using %escapes to represent the utf-8 octets in ascii, and a charset encoding of that would not produce the correct result. For example, consider the swedish character å (å) and a latin-1 charset:
escape(convertFromUnicode("ISO-8859-1", "å")) /* => "%E5" (correct) */
convertFromUnicode("ISO-8859-1", encodeURIComponent("å")) /* => "%C3%A5" (incorrect) */
I have attached another patch (untested) that changes the old behavior in two ways:
1. If the charset is "UTF-8", the encodeURIComponent code path is taken. This should be the appropriate solution for any site that accepts "UTF-8" charsets.
2. For another charset, escape is used first, and then any remaining +/@ are escaped using encodeURIComponent. This should maintain correct handling of other charsets, while ensuring that the +/@ characters are correctly escaped.
So you said that using another encoding takes another path through the code that can encode the plus sign, while the default path does not. Wouldn't that mean that if I specify an encoding, the plus sign would be encoded correctly already? I tried with this URL:
But it still didn't work. The plus sign is *never* encoded correctly. What's going on here, has somebody not understood URL encoding? In a web browser?!
(In reply to comment #38)
> So you said that using another encoding takes another path through the code
> that can encode the plus sign, while the default path does not. Wouldn't that
> mean that if I specify an encoding, the plus sign would be encoded correctly
The current code will work perfectly for hosts that expect UTF-8, if there is _no_ charset specified. However, if you specify the charset to anything (even UTF-8), the symbols +/@ will not be escaped. Note that specifying the charset can be done explicitly through the mozcharset parameter, or it can be fetched implicitly from the places database. Unfortunately, I don't know of any way of checking the places database except manually running sqlite3 $profile/places.sqlite on the command line, and then issuing lengthy sql queries.
The first bullet I mentioned in comment #37, was that the patch in attachment 509929 [details] [diff] [review] modifies the if-condition to handle the case where the charset is specified as UTF-8 as if there was no charset specified at all. This would solve all issues with the google queries, since google expects UTF-8 in its input.
> But it still didn't work. The plus sign is *never* encoded correctly. What's
> going on here, has somebody not understood URL encoding? In a web browser?!
Umm, I didn't really understand that, but it doesn't work for me where you say it should. Probably my other bug 631301 is about something else then and not a duplicate of this one.
(In reply to comment #40)
> Umm, I didn't really understand that, but it doesn't work for me where you say
> it should. Probably my other bug 631301 is about something else then and not a
> duplicate of this one.
Did you actually try the patch in attachment 509929 [details] [diff] [review]?
Can you provide me a build with that patch?
Created attachment 510150 [details] [diff] [review]
Updated the patch by extending the browser_getshortcutoruri.js test with a test that fails for the original code, and passes with the patch applied. I've also tested that "Add Keyword for this Search..." works on google.com, with the patch applied.
Hey Gustav, sorry for the review delay here - saw your ping on IRC. Patch looks promising from a quick glance, but I'll need to spend a bit more time thinking about this to review, and Firefox 4 blockers are keeping me busy at the moment. Don't hesitate to poke me again if I'm taking too much time.
Comment on attachment 405404 [details] [diff] [review]
As Gustav pointed out, convertFromUnicode does not unescape, so this won't work.
Comment on attachment 510150 [details] [diff] [review]
I thought that we could perhaps make use of escape's secret "mask" argument that allows you to control which characters get escaped (see http://hg.mozilla.org/mozilla-central/annotate/52b6489a3140/js/src/jsstr.cpp#l523 ). But that doesn't quite allow us to get the behaviour that we want, and isn't part of any spec, and isn't documented, so I guess that won't work :)
pkasting's hunch in comment 2 was on the right track - we want to try to match the keyword and form submission behaviour. nsFSURLEncoded::URLEncode and nsDefaultURIFixup::KeywordToURI use nsEscape() with url_XPAlphas, and it differs from escape() in that it additionally encodes "+@/". So this patch is correct. The test is also great. I have only one comment: it would be good to explicitly mention in a comment that we're trying to match the behaviour of the platform code (nsEscape() with url_XPAlphas), in case it should ever change for whatever reason.
Thanks for the great work here, Gustav, and I'm sorry that it took so long to get responses. I'm glad you were patient enough to persevere.
Created attachment 528951 [details] [diff] [review]
Here's a slightly modified patch that tweaks the comment as I suggested, ready for check-in.
I'm trying to verify this fix but I'm not able to reproduce the non-encoded issue even in Firefox 4. I followed the steps from comment 0 but not sure what I miss to make this issue appear. Gavin, can you help?
For some reason, Firefox 4 no longer (always) assumes that the destination encoding matches the encoding on the search page and therefore does not save the encoding when creating a new quicksearch.
The result, in any case, is that no encoding is stored for the quicksearch, and thus the faulty code path is not executed unless provoked. The easiest way to provoke it is to add a &mozcharset= to the end of the URL, like this:
1. Show all Bookmarks
2. Create new bookmark with
- location: http://search.hp.com/query.html?charset=iso-8859-1&qt=%s&mozcharset=ISO-8859-1
- keyword: test
3. Enter: "hp c++" into the location bar and observe that it only searches for c
When creating a new quicksearch from the search box on the resulting page, it sometimes seems to get stored without encoding and sometimes with. When it is stored without encoding then "+" works correctly, but for instance swedish letters "åäö" are wrongly turned to UTF-8. When encoding is stored correctly, then the letters "åäö" are handled correctly, but the + signs disappear.
Thanks Gustav. That information helped me a lot to verify this fix on mozilla-central with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0a1) Gecko/20110501 Firefox/6.0a1