359809 - Bookmark Keyword searches are not encoded properly (though keyword.URL is)

Reporter

Description

•

19 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Quick Search does not URL-encode plus signs (+). Searching for "c++" (without quotes), the %s (replaced portion of the URL) appears in the URL as: c++ Instead of: c%2B%2B The encoding (++) being decoded by the server as two spaces. Reproducible: Always Steps to Reproduce: 1) Create a bookmark with the address: http://www.google.com.au/search?hl=en&q=%s&btnG=Google+Search&meta= 2) Assign a keyword to the bookmark, i.e: 'g' (without quotes). 3) Type "g c++" (without quotes) in the address bar and press Enter. Actual Results: The plus signs (+) do not get URL-encoded. Expected Results: + should be encoded as %2B This may be a regression as it is reported as bug 319169 and bug 316863 (and an number of others mentioned therein). The status of these bug reports have been set to "Resolved". But this feature has never worked for me since Firefox 1.0 and (mostly) automatic update versions up to 2.0. This is a fresh install of Firefox 2.0 (no previous version) on a fresh install of Windows.

slavkop

Reporter

Updated

•

19 years ago

Keywords: regression

slavkop

Reporter

Comment 1

•

19 years ago

Just a confirmation: this bug is also present on (a clean install of) Ubuntu Linux (Edgy Eft version: 6.10). Firefox version string: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20060601 Firefox/2.0 (Ubuntu-edgy) Since this bug was reported as "Resolved" on multiple occasions, I'm starting to think that I'm making some sort of naive mistake - like an about:config setting.

Peter Kasting

Comment 2

•

19 years ago

Confirming on Windows trunk (I don't have a 2.0 build ATM). Interestingly, this seems fixed for keyword.URL (I use a standard Google Search, and the +s get escaped properly), but not for bookmark keywords. I could probably figure this out if I had a lot of time to look at Firefox stuff lately. People interested in fixing this should look at https://bugzilla.mozilla.org/attachment.cgi?id=226851 , specifically the very last change, which escapes using XPAlphas. My suspicion is that somehow, the keyword.URL mangling is making it to this line while the bookmark keyword mangling is not, and probably should be. If these are made to go through the same code path, this should all start working properly. I don't know why I marked bug 316863 fixed when it's clearly not (I suspect I tested with keyword.URL and not a bookmark keyword), but now that we have this bug, it's a bit pointless to reopen. I will remove the "regression" keyword since this isn't a regression, it's just something that wasn't ever properly fixed when it should have been.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keywords: regression

Peter Kasting

Comment 3

•

19 years ago

Resummarizing for clarity. If anyone has time for this, the patch might be safe enough for a 2.x release, and would be nice-to-have.

Status: NEW → ASSIGNED

Summary: Quick Search: Plus Sign not Encoded → Bookmark Keyword searches are not encoded properly (though keyword.URL is)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 4

•

19 years ago

The patch for bug 258223 made us use escape() if there is a specified character set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy was introduced, but it would explain the "+" signs not being escaped (escape() doesn't escape them, encodeURIComponent() does).

URL: http://bonsai.mozilla.org/cvsblame.cg...

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

19 years ago

Status: ASSIGNED → NEW

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

19 years ago

OS: Windows XP → All

Hardware: PC → All

Ria Klaassen (not reading all bugmail)

Updated

•

17 years ago

Flags: blocking-firefox3.2?

Flags: blocking-firefox3.1?

Mike Beltzner [:beltzner, not reading bugmail]

Comment 9

•

17 years ago

Not blocking, but cc'ing Mardak who, I think, dealt with escaping/unescaping stuff in the past.

Flags: blocking-firefox3.1? → blocking-firefox3.1-

Ed Lee :Mardak

Comment 10

•

17 years ago

WFM. There's even a testcase for it! ;) http://hg.mozilla.org/mozilla-central/file/a15164388bf1/toolkit/components/places/tests/autocomplete/test_keyword_search.js#l37 This bug was originally filed for Firefox 2... I checked on Firefox 3, Shiretoko, and Minefield and all work.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → WORKSFORME

Ria Klaassen (not reading all bugmail)

Comment 11

•

17 years ago

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090311 Minefield/3.2a1pre I still can't do a keyword search for c++. At least, it goes to http://www.google.nl/search?hl=nl&q=c++&meta=

Ed Lee :Mardak

Comment 12

•

17 years ago

On trunk, it shows you the query as an option in the autocomplete. What you see there is unescaped. If you select the entry you'll see that it's %2B. Even if I don't select the entry, I see the resulting page go to %2B%2B.

Ria Klaassen (not reading all bugmail)

Comment 13

•

17 years ago

I can't reproduce the fix. I tried a brand new profile and the following STR: - go to www.google.com - right click on the input field -> Add a Keyword for this Search - give Name Google and Keyword g - type g c++ it goes to http://www.google.nl/search?hl=nl&q=c++&meta=

Ria Klaassen (not reading all bugmail)

Comment 14

•

17 years ago

But I see that it works through the Search Engine manager.

Evan

Comment 15

•

17 years ago

I filed a duplicate bug about this yesterday from FF 3.0.6 (sorry about that; I swear I spent a fair amount of time looking), and I also can still reproduce the bug, to the point of being able to do it after downloading and running the latest nightly from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-3.2a1pre.en-US.linux-i686.tar.bz2. The steps Ria Klaassen posted still reproduce the bug for me. "But I see that it works through the Search Engine manager." I didn't know you could set keyword searches that way; thanks, that will work as a workaround. It'd be nice to fix for arbitrary searches though.

Ed Lee :Mardak

Comment 16

•

17 years ago

This is very strange.. While the autocomplete entry works fine, if you just hit enter without selecting anything, it *sometimes* doesn't convert + to %2B. When I tried it on my profile, all the keywords were correctly converting + to %2B. But when I just tried adding a new keyword using the "Add a Keyword for this Search", I run into problems. But, if I manually create a bookmark, it changes + to %2B. Some digging around in my places.sqlite, I noticed the ones that *were* working correctly (+ -> %2B) had *no* moz_anno with that bookmark's place_id. The ones that were created through "Add a Keyword for this Search" do have an annotation and some reason prevents + from changing to %2B. ??

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Ed Lee :Mardak

Comment 17

•

17 years ago

So as another way to work around this issue... 1) Perform the search you want 2) Copy the url from the resulting page 3) Create a bookmark with that url replacing your search term with %s

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 18

•

17 years ago

Presumably the difference is which of the two branches end up being taken in getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does. http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1891

Marco Bonardo [:mak]

Comment 19

•

17 years ago

that is probably the charset anno, if a charset is known for the page we choose another way there

Marco Bonardo [:mak]

Comment 20

•

17 years ago

and when we add a keyword we save the page charset iirc

Mike Beltzner [:beltzner, not reading bugmail]

Comment 21

•

17 years ago

(In reply to comment #18) > Presumably the difference is which of the two branches end up being taken in > getShortcutOrURI. escape() doesn't escape "+", but encodeURIComponent does. > > http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1891 I think this also happens to bookmarklets; that would explain why when I tried to generate an is.gd URL from an is.gd bookmarklet, the "+" didn't go through.

Yellowish_blob

Comment 23

•

16 years ago

I quickly tested the problem on a mac today at a friends place. Atleast then it seemed that it worked properly on a mac, but would like to have someone confirm it.

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

16 years ago

Flags: wanted-firefox3.6+

Flags: blocking-firefox3.6?

Flags: blocking-firefox3.6-

slavkop

Reporter

Comment 24

•

16 years ago

As far as I've used this feature, this bug has been fixed for me for a while. Now using Firefox 3.5.2 on Windows. Though I'd read that for some this bug was resolved for a long time (on what might've been even the 1+ series). Just created a quick bookmarklet (bookmark location: javascript:alert("%s")) and assigned a keyword and this works, correctly displaying "C++" [unquoted] for a keyword-based search for "C++" [unquoted].

Evan

Comment 25

•

16 years ago

slavkop: Comment #16 and #17 explain what works and what doesn't; near as I can tell those comments are still accurate. If you manually create a bookmark, substituting %s in the appropriate place, then there is no bug. (This is what your original bug report did.) However, if you use the "add keyword for this search" feature (right click on, say, Google's text box), the bug still exists in FF 3.5.3/Win.

Blair McBride [:Unfocused] (UNAVAILABLE)

Comment 26

•

16 years ago

(In reply to comment #4) > The patch for bug 258223 made us use escape() if there is a specified character > set, and encodeURIComponent if there isn't. I'm not sure why that discrepancy > was introduced, but it would explain the "+" signs not being escaped (escape() > doesn't escape them, encodeURIComponent() does). From what I can tell, its because encodeURIComponent() always returns UTF-8. Although, the result of encodeURIComponent() can just be converted to the relevant character set, and all the problems should just magically go away.

Assignee: nobody → bmcbride

Blair McBride [:Unfocused] (UNAVAILABLE)

Comment 27

•

16 years ago

Attached patch Patch v1 (obsolete) — Details — Splinter Review

Attachment #405365 - Flags: review?(gavin.sharp)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 28

•

16 years ago

That patch is equivalent to just removing the escape() call. I don't understand how that matches your comment, or why it would be OK.

Blair McBride [:Unfocused] (UNAVAILABLE)

Comment 29

•

16 years ago

Attached patch Patch v1.1 (obsolete) — Details — Splinter Review

(In reply to comment #28) > That patch is equivalent to just removing the escape() call. I don't understand > how that matches your comment, or why it would be OK. Whoops... Don't know how I did that. This line: + encodedParam = convertFromUnicode(charset, param); Should be: + encodedParam = convertFromUnicode(charset, encodedParam); This patch should be what I intend.

Attachment #405365 - Attachment is obsolete: true

Attachment #405404 - Flags: review?(gavin.sharp)

Attachment #405365 - Flags: review?(gavin.sharp)

Gustav Munkby

Assignee

Comment 37

•

15 years ago

Attached patch Patch 2 (obsolete) — Details — Splinter Review

The previously proposed patch in attachment 405404 [details] [diff] [review] above does not handle other charsets correctly. The resulting string is the uri-encoded utf-8 version of the string. Thus, the string is using %escapes to represent the utf-8 octets in ascii, and a charset encoding of that would not produce the correct result. For example, consider the swedish character å (å) and a latin-1 charset: escape(convertFromUnicode("ISO-8859-1", "å")) /* => "%E5" (correct) */ convertFromUnicode("ISO-8859-1", encodeURIComponent("å")) /* => "%C3%A5" (incorrect) */ I have attached another patch (untested) that changes the old behavior in two ways: 1. If the charset is "UTF-8", the encodeURIComponent code path is taken. This should be the appropriate solution for any site that accepts "UTF-8" charsets. 2. For another charset, escape is used first, and then any remaining +/@ are escaped using encodeURIComponent. This should maintain correct handling of other charsets, while ensuring that the +/@ characters are correctly escaped.

Yves Goergen [:ygoe]

Comment 38

•

15 years ago

So you said that using another encoding takes another path through the code that can encode the plus sign, while the default path does not. Wouldn't that mean that if I specify an encoding, the plus sign would be encoded correctly already? I tried with this URL: http://www.google.de/search?&q=%s&mozcharset=UTF-8 But it still didn't work. The plus sign is *never* encoded correctly. What's going on here, has somebody not understood URL encoding? In a web browser?!

Gustav Munkby

Assignee

Comment 39

•

15 years ago

(In reply to comment #38) > So you said that using another encoding takes another path through the code > that can encode the plus sign, while the default path does not. Wouldn't that > mean that if I specify an encoding, the plus sign would be encoded correctly > already? The current code will work perfectly for hosts that expect UTF-8, if there is _no_ charset specified. However, if you specify the charset to anything (even UTF-8), the symbols +/@ will not be escaped. Note that specifying the charset can be done explicitly through the mozcharset parameter, or it can be fetched implicitly from the places database. Unfortunately, I don't know of any way of checking the places database except manually running sqlite3 $profile/places.sqlite on the command line, and then issuing lengthy sql queries. The first bullet I mentioned in comment #37, was that the patch in attachment 509929 [details] [diff] [review] modifies the if-condition to handle the case where the charset is specified as UTF-8 as if there was no charset specified at all. This would solve all issues with the google queries, since google expects UTF-8 in its input. > But it still didn't work. The plus sign is *never* encoded correctly. What's > going on here, has somebody not understood URL encoding? In a web browser?! The problem is that the input parameter is a Unicode string, and for transmission to a certain host it needs to be encoded both with a charset encoding and a URI encoding. Theoretically, the problem should be easily solved using the the JavaScript function encodeURIComponent, but in addition to URI encoding it also forces the charset encoding to be UTF-8, which (for historical reasons) does not work with all hosts. The escape function in JavaScript does not force charset encoding to be UTF-8, and can thus be used for any charset, but unfortunately does not replace the symbols +/@. This is why the second part of the patch in attachment 509929 [details] [diff] [review] extends the code-path for non-UTF-8 charset encodings to also replace the ASCII characters +/@ with their respective URI-encoded representations. Since these are ASCII characters, they are represented as-is in UTF-8, and thus we can use encodeURIComponent for encoding these characters (independent of target charset encoding).

Yves Goergen [:ygoe]

Comment 40

•

15 years ago

Umm, I didn't really understand that, but it doesn't work for me where you say it should. Probably my other bug 631301 is about something else then and not a duplicate of this one.

Gustav Munkby

Assignee

Comment 41

•

15 years ago

(In reply to comment #40) > Umm, I didn't really understand that, but it doesn't work for me where you say > it should. Probably my other bug 631301 is about something else then and not a > duplicate of this one. Did you actually try the patch in attachment 509929 [details] [diff] [review]?

Yves Goergen [:ygoe]

Comment 42

•

15 years ago

Can you provide me a build with that patch?

Gustav Munkby

Assignee

Comment 43

•

15 years ago

Attached patch Patch v2.2 (obsolete) — Details — Splinter Review

Updated the patch by extending the browser_getshortcutoruri.js test with a test that fails for the original code, and passes with the patch applied. I've also tested that "Add Keyword for this Search..." works on google.com, with the patch applied.

Attachment #509929 - Attachment is obsolete: true

Attachment #510150 - Flags: review?

Gustav Munkby

Assignee

Updated

•

15 years ago

Attachment #510150 - Flags: review? → review?(gavin.sharp)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 44

•

15 years ago

Hey Gustav, sorry for the review delay here - saw your ping on IRC. Patch looks promising from a quick glance, but I'll need to spend a bit more time thinking about this to review, and Firefox 4 blockers are keeping me busy at the moment. Don't hesitate to poke me again if I'm taking too much time.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 45

•

15 years ago

Comment on attachment 405404 [details] [diff] [review] Patch v1.1 As Gustav pointed out, convertFromUnicode does not unescape, so this won't work.

Attachment #405404 - Flags: review?(gavin.sharp) → review-

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

15 years ago

Attachment #405404 - Attachment is obsolete: true

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 46

•

15 years ago

Comment on attachment 510150 [details] [diff] [review] Patch v2.2 I thought that we could perhaps make use of escape's secret "mask" argument that allows you to control which characters get escaped (see http://hg.mozilla.org/mozilla-central/annotate/52b6489a3140/js/src/jsstr.cpp#l523 ). But that doesn't quite allow us to get the behaviour that we want, and isn't part of any spec, and isn't documented, so I guess that won't work :) pkasting's hunch in comment 2 was on the right track - we want to try to match the keyword and form submission behaviour. nsFSURLEncoded::URLEncode and nsDefaultURIFixup::KeywordToURI use nsEscape() with url_XPAlphas, and it differs from escape() in that it additionally encodes "+@/". So this patch is correct. The test is also great. I have only one comment: it would be good to explicitly mention in a comment that we're trying to match the behaviour of the platform code (nsEscape() with url_XPAlphas), in case it should ever change for whatever reason. Thanks for the great work here, Gustav, and I'm sorry that it took so long to get responses. I'm glad you were patient enough to persevere.

Attachment #510150 - Flags: review?(gavin.sharp) → review+

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 47

•

15 years ago

Attached patch updated patch — Details — Splinter Review

Here's a slightly modified patch that tweaks the comment as I suggested, ready for check-in.

Attachment #510150 - Attachment is obsolete: true

Attachment #528951 - Flags: review+

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

15 years ago

Assignee: bmcbride → grddev

Keywords: checkin-needed

Dão Gottwald [:dao]

Comment 48

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/f4c738892af6

Status: REOPENED → RESOLVED

Closed: 17 years ago → 15 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 6

Marco Bonardo [:mak]

Updated

•

15 years ago

Flags: in-testsuite+

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 49

•

15 years ago

I'm trying to verify this fix but I'm not able to reproduce the non-encoded issue even in Firefox 4. I followed the steps from comment 0 but not sure what I miss to make this issue appear. Gavin, can you help?

Flags: in-litmus-

Gustav Munkby

Assignee

Comment 50

•

15 years ago

For some reason, Firefox 4 no longer (always) assumes that the destination encoding matches the encoding on the search page and therefore does not save the encoding when creating a new quicksearch. The result, in any case, is that no encoding is stored for the quicksearch, and thus the faulty code path is not executed unless provoked. The easiest way to provoke it is to add a &mozcharset= to the end of the URL, like this: 1. Show all Bookmarks 2. Create new bookmark with - location: http://search.hp.com/query.html?charset=iso-8859-1&qt=%s&mozcharset=ISO-8859-1 - keyword: test 3. Enter: "hp c++" into the location bar and observe that it only searches for c When creating a new quicksearch from the search box on the resulting page, it sometimes seems to get stored without encoding and sometimes with. When it is stored without encoding then "+" works correctly, but for instance swedish letters "åäö" are wrongly turned to UTF-8. When encoding is stored correctly, then the letters "åäö" are handled correctly, but the + signs disappear.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 51

•

15 years ago

Thanks Gustav. That information helped me a lot to verify this fix on mozilla-central with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0a1) Gecko/20110501 Firefox/6.0a1

Status: RESOLVED → VERIFIED

Version: unspecified → 2.0 Branch

Patch v1 16 years ago Blair McBride [:Unfocused] (UNAVAILABLE) 1.11 KB, patch		Details \| Diff \| Splinter Review
Patch v1.1 16 years ago Blair McBride [:Unfocused] (UNAVAILABLE) 1.12 KB, patch	Gavin : review-	Details \| Diff \| Splinter Review
Patch 2 15 years ago Gustav Munkby 790 bytes, patch		Details \| Diff \| Splinter Review
Patch v2.2 15 years ago Gustav Munkby 1.75 KB, patch	Gavin : review+	Details \| Diff \| Splinter Review
updated patch 15 years ago :Gavin Sharp [email: gavin@gavinsharp.com] 2.98 KB, patch	Gavin : review+	Details \| Diff \| Splinter Review