Last Comment Bug 359809 - Bookmark Keyword searches are not encoded properly (though keyword.URL is)
: Bookmark Keyword searches are not encoded properly (though keyword.URL is)
Status: VERIFIED FIXED
:
Product: Firefox
Classification: Client Software
Component: Location Bar (show other bugs)
: 2.0 Branch
: All All
: -- normal with 5 votes (vote)
: Firefox 6
Assigned To: Gustav Munkby
:
Mentors:
http://bonsai.mozilla.org/cvsblame.cg...
: 370067 373754 441201 480727 482593 503089 528876 548660 558605 613917 631010 631301 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-07 07:57 PST by slavkop
Modified: 2011-05-11 04:19 PDT (History)
24 users (show)
mbeltzner: blocking‑firefox3.5-
mbeltzner: blocking‑firefox3.6-
mbeltzner: wanted‑firefox3.6+
mak77: in‑testsuite+
hskupin: in‑litmus-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (1.11 KB, patch)
2009-10-08 17:07 PDT, Blair McBride [:Unfocused] (mostly unavailable, needinfo open, reviews not)
no flags Details | Diff | Review
Patch v1.1 (1.12 KB, patch)
2009-10-08 21:48 PDT, Blair McBride [:Unfocused] (mostly unavailable, needinfo open, reviews not)
gavin.sharp: review-
Details | Diff | Review
Patch 2 (790 bytes, patch)
2011-02-04 15:54 PST, Gustav Munkby
no flags Details | Diff | Review
Patch v2.2 (1.75 KB, patch)
2011-02-06 12:35 PST, Gustav Munkby
gavin.sharp: review+
Details | Diff | Review
updated patch (2.98 KB, patch)
2011-04-28 13:57 PDT, :Gavin Sharp [email: gavin@gavinsharp.com]
gavin.sharp: review+
Details | Diff | Review

Description slavkop 2006-11-07 07:57:11 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

Quick Search does not URL-encode plus signs (+). 

Searching for "c++" (without quotes), the %s (replaced portion of the URL) appears in the URL as:
c++
Instead of:
c%2B%2B

The encoding (++) being decoded by the server as two spaces. 


Reproducible: Always

Steps to Reproduce:
1) Create a bookmark with the address:
http://www.google.com.au/search?hl=en&q=%s&btnG=Google+Search&meta=

2) Assign a keyword to the bookmark, i.e: 'g' (without quotes). 

3) Type "g c++" (without quotes) in the address bar and press Enter. 

Actual Results:  
The plus signs (+) do not get URL-encoded. 

Expected Results:  
+ should be encoded as %2B

This may be a regression as it is reported as bug 319169 and bug 316863 (and an number of others mentioned therein). The status of these bug reports have been set to "Resolved". 

But this feature has never worked for me since Firefox 1.0 and (mostly) automatic update versions up to 2.0. 

This is a fresh install of Firefox 2.0 (no previous version) on a fresh install of Windows.
Comment 1 slavkop 2006-11-09 04:40:10 PST
Just a confirmation: this bug is also present on (a clean install of) Ubuntu Linux (Edgy Eft version: 6.10). 

Firefox version string:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20060601 Firefox/2.0 (Ubuntu-edgy)

Since this bug was reported as "Resolved" on multiple occasions, I'm starting to think that I'm making some sort of naive mistake - like an about:config setting. 
Comment 2 Peter Kasting 2006-11-09 10:16:28 PST
Confirming on Windows trunk (I don't have a 2.0 build ATM).

Interestingly, this seems fixed for keyword.URL (I use a standard Google Search, and the +s get escaped properly), but not for bookmark keywords.  I could probably figure this out if I had a lot of time to look at Firefox stuff lately.

People interested in fixing this should look at https://bugzilla.mozilla.org/attachment.cgi?id=226851 , specifically the very last change, which escapes using XPAlphas.  My suspicion is that somehow, the keyword.URL mangling is making it to this line while the bookmark keyword mangling is not, and probably should be.  If these are made to go through the same code path, this should all start working properly.

I don't know why I marked bug 316863 fixed when it's clearly not (I suspect I tested with keyword.URL and not a bookmark keyword), but now that we have this bug, it's a bit pointless to reopen.  I will remove the "regression" keyword since this isn't a regression, it's just something that wasn't ever properly fixed when it should have been.
Comment 3 Peter Kasting 2006-11-09 10:18:07 PST
Resummarizing for clarity.

If anyone has time for this, the patch might be safe enough for a 2.x release, and would be nice-to-have.
Comment 4 :Gavin Sharp [email: gavin@gavinsharp.com] 2006-11-09 14:16:47 PST
The patch for bug 258223 made us use escape() if there is a specified character set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy was introduced, but it would explain the "+" signs not being escaped (escape() doesn't escape them, encodeURIComponent() does).
Comment 5 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-02-11 11:22:14 PST
*** Bug 370067 has been marked as a duplicate of this bug. ***
Comment 6 Ria Klaassen (not reading all bugmail) 2007-03-13 05:35:20 PDT
*** Bug 373754 has been marked as a duplicate of this bug. ***
Comment 7 Uri Bernstein (Google) 2008-06-23 13:44:43 PDT
*** Bug 441201 has been marked as a duplicate of this bug. ***
Comment 8 Ria Klaassen (not reading all bugmail) 2009-03-11 02:12:48 PDT
*** Bug 482593 has been marked as a duplicate of this bug. ***
Comment 9 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-11 10:49:44 PDT
Not blocking, but cc'ing Mardak who, I think, dealt with escaping/unescaping stuff in the past.
Comment 10 Ed Lee :Mardak 2009-03-11 13:13:44 PDT
WFM. There's even a testcase for it! ;)

http://hg.mozilla.org/mozilla-central/file/a15164388bf1/toolkit/components/places/tests/autocomplete/test_keyword_search.js#l37

This bug was originally filed for Firefox 2... I checked on Firefox 3, Shiretoko, and Minefield and all work.
Comment 11 Ria Klaassen (not reading all bugmail) 2009-03-11 14:17:55 PDT
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090311 Minefield/3.2a1pre

I still can't do a keyword search for c++. At least, it goes to http://www.google.nl/search?hl=nl&q=c++&meta=
Comment 12 Ed Lee :Mardak 2009-03-11 14:22:51 PDT
On trunk, it shows you the query as an option in the autocomplete. What you see there is unescaped. If you select the entry you'll see that it's %2B. Even if I don't select the entry, I see the resulting page go to %2B%2B.
Comment 13 Ria Klaassen (not reading all bugmail) 2009-03-11 14:40:18 PDT
I can't reproduce the fix. I tried a brand new profile and the following STR:

- go to www.google.com
- right click on the input field -> Add a Keyword for this Search
- give Name Google and Keyword g
- type g c++

it goes to http://www.google.nl/search?hl=nl&q=c++&meta=
Comment 14 Ria Klaassen (not reading all bugmail) 2009-03-11 14:47:21 PDT
But I see that it works through the Search Engine manager.
Comment 15 Evan 2009-03-11 15:20:36 PDT
I filed a duplicate bug about this yesterday from FF 3.0.6 (sorry about that; I swear I spent a fair amount of time looking), and I also can still reproduce the bug, to the point of being able to do it after downloading and running the latest nightly from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-3.2a1pre.en-US.linux-i686.tar.bz2.

The steps Ria Klaassen posted still reproduce the bug for me.

"But I see that it works through the Search Engine manager."

I didn't know you could set keyword searches that way; thanks, that will work as a workaround. It'd be nice to fix for arbitrary searches though.
Comment 16 Ed Lee :Mardak 2009-03-11 15:39:02 PDT
This is very strange.. While the autocomplete entry works fine, if you just hit enter without selecting anything, it *sometimes* doesn't convert + to %2B.

When I tried it on my profile, all the keywords were correctly converting + to %2B. But when I just tried adding a new keyword using the "Add a Keyword for this Search", I run into problems. But, if I manually create a bookmark, it changes + to %2B.

Some digging around in my places.sqlite, I noticed the ones that *were* working correctly (+ -> %2B) had *no* moz_anno with that bookmark's place_id.

The ones that were created through "Add a Keyword for this Search" do have an annotation and some reason prevents + from changing to %2B. ??
Comment 17 Ed Lee :Mardak 2009-03-11 15:50:31 PDT
So as another way to work around this issue...

1) Perform the search you want
2) Copy the url from the resulting page
3) Create a bookmark with that url replacing your search term with %s
Comment 18 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-03-11 20:04:19 PDT
Presumably the difference is which of the two branches end up being taken in getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does.

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1891
Comment 19 Marco Bonardo [::mak] 2009-03-13 06:19:41 PDT
that is probably the charset anno, if a charset is known for the page we choose another way there
Comment 20 Marco Bonardo [::mak] 2009-03-13 06:20:19 PDT
and when we add a keyword we save the page charset iirc
Comment 21 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-17 14:14:48 PDT
(In reply to comment #18)
> Presumably the difference is which of the two branches end up being taken in
> getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does.
> 
> http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1891

I think this also happens to bookmarklets; that would explain why when I tried to generate an is.gd URL from an is.gd bookmarklet, the "+" didn't go through.
Comment 22 Jo Hermans 2009-07-08 16:38:33 PDT
*** Bug 503089 has been marked as a duplicate of this bug. ***
Comment 23 Yellowish_blob 2009-07-08 17:59:13 PDT
I quickly tested the problem on a mac today at a friends place. Atleast then it seemed that it worked properly on a mac, but would like to have someone confirm it.
Comment 24 slavkop 2009-09-04 10:01:45 PDT
As far as I've used this feature, this bug has been fixed for me for a while. Now using Firefox 3.5.2 on Windows. Though I'd read that for some this bug was resolved for a long time (on what might've been even the 1+ series).

Just created a quick bookmarklet (bookmark location: javascript:alert("%s")) and assigned a keyword and this works, correctly displaying "C++" [unquoted] for a keyword-based search for "C++" [unquoted].
Comment 25 Evan 2009-10-01 22:24:49 PDT
slavkop: Comment #16 and #17 explain what works and what doesn't; near as I can tell those comments are still accurate.

If you manually create a bookmark, substituting %s in the appropriate place, then there is no bug. (This is what your original bug report did.) However, if you use the "add keyword for this search" feature (right click on, say, Google's text box), the bug still exists in FF 3.5.3/Win.
Comment 26 Blair McBride [:Unfocused] (mostly unavailable, needinfo open, reviews not) 2009-10-08 17:06:12 PDT
(In reply to comment #4)
> The patch for bug 258223 made us use escape() if there is a specified character
> set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy
> was introduced, but it would explain the "+" signs not being escaped (escape()
> doesn't escape them, encodeURIComponent() does).

From what I can tell, its because encodeURIComponent() always returns UTF-8. Although, the result of encodeURIComponent() can just be converted to the relevant character set, and all the problems should just magically go away.
Comment 27 Blair McBride [:Unfocused] (mostly unavailable, needinfo open, reviews not) 2009-10-08 17:07:48 PDT
Created attachment 405365 [details] [diff] [review]
Patch v1
Comment 28 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-10-08 18:28:12 PDT
That patch is equivalent to just removing the escape() call. I don't understand how that matches your comment, or why it would be OK.
Comment 29 Blair McBride [:Unfocused] (mostly unavailable, needinfo open, reviews not) 2009-10-08 21:48:14 PDT
Created attachment 405404 [details] [diff] [review]
Patch v1.1

(In reply to comment #28)
> That patch is equivalent to just removing the escape() call. I don't understand
> how that matches your comment, or why it would be OK.

Whoops... Don't know how I did that.

This line:
+      encodedParam = convertFromUnicode(charset, param);

Should be:
+      encodedParam = convertFromUnicode(charset, encodedParam);

This patch should be what I intend.
Comment 30 [retired] 2009-11-16 02:44:39 PST
*** Bug 528876 has been marked as a duplicate of this bug. ***
Comment 31 Jo Hermans 2010-04-11 04:05:55 PDT
*** Bug 548660 has been marked as a duplicate of this bug. ***
Comment 32 Jo Hermans 2010-04-11 04:06:04 PDT
*** Bug 558605 has been marked as a duplicate of this bug. ***
Comment 33 Jo Hermans 2010-04-11 04:06:44 PDT
*** Bug 480727 has been marked as a duplicate of this bug. ***
Comment 34 John P Baker 2010-11-22 07:31:56 PST
*** Bug 613917 has been marked as a duplicate of this bug. ***
Comment 35 John P Baker 2011-02-03 05:27:05 PST
*** Bug 631010 has been marked as a duplicate of this bug. ***
Comment 36 John P Baker 2011-02-04 00:16:46 PST
*** Bug 631301 has been marked as a duplicate of this bug. ***
Comment 37 Gustav Munkby 2011-02-04 15:54:06 PST
Created attachment 509929 [details] [diff] [review]
Patch 2

The previously proposed patch in attachment 405404 [details] [diff] [review] above does not handle other charsets correctly. The resulting string is the uri-encoded utf-8 version of the string. Thus, the string is using %escapes to represent the utf-8 octets in ascii, and a charset encoding of that would not produce the correct result. For example, consider the swedish character å (å) and a latin-1 charset:

escape(convertFromUnicode("ISO-8859-1", "å")) /* => "%E5" (correct) */
convertFromUnicode("ISO-8859-1", encodeURIComponent("å")) /* => "%C3%A5" (incorrect) */

I have attached another patch (untested) that changes the old behavior in two ways:

1. If the charset is "UTF-8", the encodeURIComponent code path is taken. This should be the appropriate solution for any site that accepts "UTF-8" charsets.
2. For another charset, escape is used first, and then any remaining +/@ are escaped using encodeURIComponent. This should maintain correct handling of other charsets, while ensuring that the +/@ characters are correctly escaped.
Comment 38 Yves Goergen 2011-02-05 02:39:24 PST
So you said that using another encoding takes another path through the code that can encode the plus sign, while the default path does not. Wouldn't that mean that if I specify an encoding, the plus sign would be encoded correctly already? I tried with this URL:

http://www.google.de/search?&q=%s&mozcharset=UTF-8

But it still didn't work. The plus sign is *never* encoded correctly. What's going on here, has somebody not understood URL encoding? In a web browser?!
Comment 39 Gustav Munkby 2011-02-05 03:14:10 PST
(In reply to comment #38)
> So you said that using another encoding takes another path through the code
> that can encode the plus sign, while the default path does not. Wouldn't that
> mean that if I specify an encoding, the plus sign would be encoded correctly
> already?

The current code will work perfectly for hosts that expect UTF-8, if there is _no_ charset specified. However, if you specify the charset to anything (even UTF-8), the symbols +/@ will not be escaped. Note that specifying the charset can be done explicitly through the mozcharset parameter, or it can be fetched implicitly from the places database. Unfortunately, I don't know of any way of checking the places database except manually running sqlite3 $profile/places.sqlite on the command line, and then issuing lengthy sql queries.

The first bullet I mentioned in comment #37, was that the patch in attachment 509929 [details] [diff] [review] modifies the if-condition to handle the case where the charset is specified as UTF-8 as if there was no charset specified at all. This would solve all issues with the google queries, since google expects UTF-8 in its input.

> But it still didn't work. The plus sign is *never* encoded correctly. What's
> going on here, has somebody not understood URL encoding? In a web browser?!

The problem is that the input parameter is a Unicode string, and for transmission to a certain host it needs to be encoded both with a charset encoding and a URI encoding. Theoretically, the problem should be easily solved using the the JavaScript function encodeURIComponent, but in addition to URI encoding it also forces the charset encoding to be UTF-8, which (for historical reasons) does not work with all hosts.

The escape function in JavaScript does not force charset encoding to be UTF-8, and can thus be used for any charset, but unfortunately does not replace the symbols +/@. This is why the second part of the patch in attachment 509929 [details] [diff] [review] extends the code-path for non-UTF-8 charset encodings to also replace the ASCII characters +/@ with their respective URI-encoded representations. Since these are ASCII characters, they are represented as-is in UTF-8, and thus we can use encodeURIComponent for encoding these characters (independent of target charset encoding).
Comment 40 Yves Goergen 2011-02-05 05:02:58 PST
Umm, I didn't really understand that, but it doesn't work for me where you say it should. Probably my other bug 631301 is about something else then and not a duplicate of this one.
Comment 41 Gustav Munkby 2011-02-05 05:25:04 PST
(In reply to comment #40)
> Umm, I didn't really understand that, but it doesn't work for me where you say
> it should. Probably my other bug 631301 is about something else then and not a
> duplicate of this one.

Did you actually try the patch in attachment 509929 [details] [diff] [review]?
Comment 42 Yves Goergen 2011-02-05 05:54:07 PST
Can you provide me a build with that patch?
Comment 43 Gustav Munkby 2011-02-06 12:35:16 PST
Created attachment 510150 [details] [diff] [review]
Patch v2.2

Updated the patch by extending the browser_getshortcutoruri.js test with a test that fails for the original code, and passes with the patch applied. I've also tested that "Add Keyword for this Search..." works on google.com, with the patch applied.
Comment 44 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-02-21 18:19:47 PST
Hey Gustav, sorry for the review delay here - saw your ping on IRC. Patch looks promising from a quick glance, but I'll need to spend a bit more time thinking about this to review, and Firefox 4 blockers are keeping me busy at the moment. Don't hesitate to poke me again if I'm taking too much time.
Comment 45 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-04-28 13:38:12 PDT
Comment on attachment 405404 [details] [diff] [review]
Patch v1.1

As Gustav pointed out, convertFromUnicode does not unescape, so this won't work.
Comment 46 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-04-28 13:52:07 PDT
Comment on attachment 510150 [details] [diff] [review]
Patch v2.2

I thought that we could perhaps make use of escape's secret "mask" argument that allows you to control which characters get escaped (see http://hg.mozilla.org/mozilla-central/annotate/52b6489a3140/js/src/jsstr.cpp#l523 ). But that doesn't quite allow us to get the behaviour that we want, and isn't part of any spec, and isn't documented, so I guess that won't work :)

pkasting's hunch in comment 2 was on the right track - we want to try to match the keyword and form submission behaviour. nsFSURLEncoded::URLEncode and nsDefaultURIFixup::KeywordToURI use nsEscape() with url_XPAlphas, and it differs from escape() in that it additionally encodes "+@/". So this patch is correct. The test is also great. I have only one comment: it would be good to explicitly mention in a comment that we're trying to match the behaviour of the platform code (nsEscape() with url_XPAlphas), in case it should ever change for whatever reason.

Thanks for the great work here, Gustav, and I'm sorry that it took so long to get responses. I'm glad you were patient enough to persevere.
Comment 47 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-04-28 13:57:19 PDT
Created attachment 528951 [details] [diff] [review]
updated patch

Here's a slightly modified patch that tweaks the comment as I suggested, ready for check-in.
Comment 48 Dão Gottwald [:dao] 2011-04-29 03:54:44 PDT
http://hg.mozilla.org/mozilla-central/rev/f4c738892af6
Comment 49 Henrik Skupin (:whimboo) 2011-05-02 01:41:11 PDT
I'm trying to verify this fix but I'm not able to reproduce the non-encoded issue even in Firefox 4. I followed the steps from comment 0 but not sure what I miss to make this issue appear. Gavin, can you help?
Comment 50 Gustav Munkby 2011-05-02 08:20:35 PDT
For some reason, Firefox 4 no longer (always) assumes that the destination encoding matches the encoding on the search page and therefore does not save the encoding when creating a new quicksearch.

The result, in any case, is that no encoding is stored for the quicksearch, and thus the faulty code path is not executed unless provoked. The easiest way to provoke it is to add a &mozcharset= to the end of the URL, like this:

1. Show all Bookmarks
2. Create new bookmark with
    - location: http://search.hp.com/query.html?charset=iso-8859-1&qt=%s&mozcharset=ISO-8859-1
    - keyword: test
3. Enter: "hp c++" into the location bar and observe that it only searches for c

When creating a new quicksearch from the search box on the resulting page, it sometimes seems to get stored without encoding and sometimes with. When it is stored without encoding then "+" works correctly, but for instance swedish letters "åäö" are wrongly turned to UTF-8. When encoding is stored correctly, then the letters "åäö" are handled correctly, but the + signs disappear.
Comment 51 Henrik Skupin (:whimboo) 2011-05-03 01:47:38 PDT
Thanks Gustav. That information helped me a lot to verify this fix on mozilla-central with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0a1) Gecko/20110501 Firefox/6.0a1

Note You need to log in before you can comment on or make changes to this bug.