User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+ Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+ The keywords are not correctly encoded when using the new bookmark keyword functionality and a word with non-ascii characters. Reproducible: Always Steps to Reproduce: 1. Create a Google keyword bookmark with the URL http://www.google.com/search?q=%s and a keyword of your choice, I use "g". 2. Write "g ääliö" or "g mañana" or some other keyword with non-ascii characters. 3. Press enter and an incorrect list of results will be shown. Actual Results: Using the Finnish word "ääliö", keyword bookmark took me to http://www.google.com/search?q=%8A%8Ali%9A With the word "mañana", Chimera opened the page http://www.google.com/search?q=ma%96ana Expected Results: The pages should've been: "ääliö" http://www.google.com/search?q=%E4%E4li%F6 "mañana" http://www.google.com/search?q=ma%F1ana
That's odd; I thought it was unicode all the way. I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This is actually hard to fix. I'm going to have to convert the search term into the character set of the page before dispatching it, but I don't know what the page charset is at this point. Maybe converting into ISO-8859-1 is OK?
Mozilla has the same problem, FWIW.
Omniweb has problems with this too. If the search engine had s UTF-8 version of their page, we should be OK, but I could not find one for Google.
...and IE has this problem with their "search from the location bar" functionality. You should be able to switch Google to UTF-8 with http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 but non-ascii characters still don't work.
Specifically, the problem is that the hexadecimal numbers in the query string encode MacRoman bytes rather than UTF-8 bytes.
Even if they were UTF-8 bytes, we still can't guarantee that the search page uses UTF-8 in their search URLs.
Some sites still want something like ISO-8859-1, but it is inadequate for many languages, so using it as the only possible encoding would be bad. If only one encoding had to be chosen, choosing UTF-8 would make sense since it can encode all or Unicode and Google supports is with ie=UTF-8. I think having a per-bookmark encoding setting would be ugly in terms of UI.
Is comment 5 correct? Bookmarking the URL http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 and searching for ™, é, ü, etc. works fine. Wouldn't it just be satisfactory to assume UTF-8 encoding?
Sorry for the spam... apparently bugzilla doesn't display HTML entities. Searching for ™, é, ü, î, ñ, etc. works fine.
Searching for HTML encoded strings doesn't work for me. But: it looks like searching with non-ascii characters (without encoding them) works now when you set Google to use UTF-8! Using Camino build 2003041105.
Keyword-based searches now use UTF-8. Cool. This bug has been fixed (either intentionally or unintentionally).
this still applies when using camino's "search the web" function. specifically, searching for "camino search bug" generates: http://www.google.com/search?q=camino search bug which is then translated to: http://www.google.com/search?q=camino%20search%20bug it looks like the translation is being done server-side by google; it should be done client-side, no? (should i spin this into a separate bug, or is this the right place for it?)
It looks like bookmarks keywords are screwed up again: using http://www.google.com/search?q=%s to search for any word containing Polish characters (say ¿±¶æ) couses searched word to be mangled into ??? It doesn't work on Mozilla nor Firebird under Linux and Windows (do not have Camino to test). I can get this to work when explicitly using windows-1250 (Windows) or iso-8859-2 (Linux) as imput charset: ie: http://www.google.com/search?q=%s&ie=windows-1250&oe=UTF-8 (on Windows) gives correct results. It looks like bookmark searches are not using UTF-8 anymore.
Is this bug also used to track problems with typing special characters into the URL, or should it be a different bug? For example, typing plus characters does not work. Original reported problem still seems to be broken in firefox too. I tried searching for cyrillic characters on google, and ended up with just question marks... Should the product classification be changed too?
I feel that bug 123006 is somewhat related to this one, but I'm not too familiar with dependency tree management, and I'm not sure also whether this bug is an entire duplicate of the above mentioned. After all, I never hacked Camino code. So I'll leave it to bug reporter or someone else who has enough rights to manage this bug. Follow the above link and make a wise decision.
Bug 264406 is about similar problems in Mozilla, I wonder by reading the comments here if there's actually any difference between the behaviour of Mozilla and Camino. Bug 123006 is about encoding "special" ASCII character, I think this bug should stay dedicated to the problem of i18n (non US-ASCII) characters. I would like to signal to someone who would try to implement a solution that one method to always get the charset right when constructing the request URL would be to use the LAST_CHARSET info of the bookmark entry to select the charset. The patch to bug 123006 so was checked to FF resulted in FF always using UTF-8, and this might be considered sufficent. It is conformant to current standards about URL (there's a RFC that says they ought to be always encoded in UTF-8).
See also bug 258223 where I made a patch with a different approach.
Wow, so... this WORKSFORME. The steps in comment 0 work perfectly for me. In addition, the "expected results" are now incorrect. Did something change?
Oh, I should give the URIs which Google now creates. For ääliö: http://www.google.com/search?q=%C3%A4%C3%A4li%C3%B6 For mañana: http://www.google.com/search?q=ma%C3%B1ana
This WFM for me, too. I just deleted all of my Google cookies and tried with Arabic for a fancier test. Reading the comments, it seems like this is one of those bugs that breaks and gets fixed periodically on the trunk (comment 14, comment 16); perhaps bug 123006 fixed it most recently?
We don't use the code from bug 123006, but if this is WFM, then great!
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → WORKSFORME
Summary: Keywords not correctly encoded → Bookmark Keywords not correctly encoded
You need to log in before you can comment on or make changes to this bug.