Closed
Bug 181243
Opened 22 years ago
Closed 19 years ago
Bookmark Keywords not correctly encoded
Categories
(Camino Graveyard :: Bookmarks, defect, P3)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
Camino1.0
People
(Reporter: visa, Assigned: sfraser_bugs)
References
Details
(Keywords: intl)
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+ Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+ The keywords are not correctly encoded when using the new bookmark keyword functionality and a word with non-ascii characters. Reproducible: Always Steps to Reproduce: 1. Create a Google keyword bookmark with the URL http://www.google.com/search?q=%s and a keyword of your choice, I use "g". 2. Write "g ääliö" or "g mañana" or some other keyword with non-ascii characters. 3. Press enter and an incorrect list of results will be shown. Actual Results: Using the Finnish word "ääliö", keyword bookmark took me to http://www.google.com/search?q=%8A%8Ali%9A With the word "mañana", Chimera opened the page http://www.google.com/search?q=ma%96ana Expected Results: The pages should've been: "ääliö" http://www.google.com/search?q=%E4%E4li%F6 "mañana" http://www.google.com/search?q=ma%F1ana
Assignee | ||
Comment 1•22 years ago
|
||
That's odd; I thought it was unicode all the way. I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee | ||
Comment 2•22 years ago
|
||
This is actually hard to fix. I'm going to have to convert the search term into the character set of the page before dispatching it, but I don't know what the page charset is at this point. Maybe converting into ISO-8859-1 is OK?
Assignee | ||
Comment 3•22 years ago
|
||
Mozilla has the same problem, FWIW.
Assignee | ||
Comment 4•22 years ago
|
||
Omniweb has problems with this too. If the search engine had s UTF-8 version of their page, we should be OK, but I could not find one for Google.
...and IE has this problem with their "search from the location bar" functionality. You should be able to switch Google to UTF-8 with http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 but non-ascii characters still don't work.
Comment 6•22 years ago
|
||
Specifically, the problem is that the hexadecimal numbers in the query string encode MacRoman bytes rather than UTF-8 bytes.
Assignee | ||
Comment 7•22 years ago
|
||
Even if they were UTF-8 bytes, we still can't guarantee that the search page uses UTF-8 in their search URLs.
Comment 8•22 years ago
|
||
Some sites still want something like ISO-8859-1, but it is inadequate for many languages, so using it as the only possible encoding would be bad. If only one encoding had to be chosen, choosing UTF-8 would make sense since it can encode all or Unicode and Google supports is with ie=UTF-8. I think having a per-bookmark encoding setting would be ugly in terms of UI.
Comment 9•21 years ago
|
||
As it seems the javascript-function "escape" does the correct conversion. Using the following Location for a Google-Keyword-Search does the trick for german umlauts (can someone else try with more "foreign" characters?). javascript:location.href='http://www.google.de/search?client=googlet&q='+escape('%s') It would be nice to add that to the keyword search functionality by piping %s thru escape. Problem here is: What happens when Javascript of off?
Comment 10•21 years ago
|
||
Is comment 5 correct? Bookmarking the URL http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 and searching for ™, é, ü, etc. works fine. Wouldn't it just be satisfactory to assume UTF-8 encoding?
Comment 11•21 years ago
|
||
According to the Google API UTF-8 is enabled by default. Besides that the user interface would be rather bad, because people have to write the HTML-Encoding into their search strings. For instance when I have to search for "Bärbel" I would have to use Bärbel for the search string. Is this correct or did I get something wrong? My version using the javascript seems to solve this better.
Comment 12•21 years ago
|
||
Sorry for the spam... apparently bugzilla doesn't display HTML entities. Searching for ™, é, ü, î, ñ, etc. works fine.
Reporter | ||
Comment 13•21 years ago
|
||
Searching for HTML encoded strings doesn't work for me. But: it looks like searching with non-ascii characters (without encoding them) works now when you set Google to use UTF-8! Using Camino build 2003041105.
Comment 14•21 years ago
|
||
Keyword-based searches now use UTF-8. Cool. This bug has been fixed (either intentionally or unintentionally).
Comment 15•21 years ago
|
||
this still applies when using camino's "search the web" function. specifically, searching for "camino search bug" generates: http://www.google.com/search?q=camino search bug which is then translated to: http://www.google.com/search?q=camino%20search%20bug it looks like the translation is being done server-side by google; it should be done client-side, no? (should i spin this into a separate bug, or is this the right place for it?)
Comment 16•21 years ago
|
||
It looks like bookmarks keywords are screwed up again: using http://www.google.com/search?q=%s to search for any word containing Polish characters (say ¿±¶æ) couses searched word to be mangled into ??? It doesn't work on Mozilla nor Firebird under Linux and Windows (do not have Camino to test). I can get this to work when explicitly using windows-1250 (Windows) or iso-8859-2 (Linux) as imput charset: ie: http://www.google.com/search?q=%s&ie=windows-1250&oe=UTF-8 (on Windows) gives correct results. It looks like bookmark searches are not using UTF-8 anymore.
Comment 17•20 years ago
|
||
Is this bug also used to track problems with typing special characters into the URL, or should it be a different bug? For example, typing plus characters does not work. Original reported problem still seems to be broken in firefox too. I tried searching for cyrillic characters on google, and ended up with just question marks... Should the product classification be changed too?
Comment 18•20 years ago
|
||
I feel that bug 123006 is somewhat related to this one, but I'm not too familiar with dependency tree management, and I'm not sure also whether this bug is an entire duplicate of the above mentioned. After all, I never hacked Camino code. So I'll leave it to bug reporter or someone else who has enough rights to manage this bug. Follow the above link and make a wise decision.
Comment 19•20 years ago
|
||
Bug 264406 is about similar problems in Mozilla, I wonder by reading the comments here if there's actually any difference between the behaviour of Mozilla and Camino. Bug 123006 is about encoding "special" ASCII character, I think this bug should stay dedicated to the problem of i18n (non US-ASCII) characters. I would like to signal to someone who would try to implement a solution that one method to always get the charset right when constructing the request URL would be to use the LAST_CHARSET info of the bookmark entry to select the charset. The patch to bug 123006 so was checked to FF resulted in FF always using UTF-8, and this might be considered sufficent. It is conformant to current standards about URL (there's a RFC that says they ought to be always encoded in UTF-8).
Comment 20•20 years ago
|
||
See also bug 258223 where I made a patch with a different approach.
Keywords: intl
Assignee | ||
Updated•19 years ago
|
Priority: -- → P3
Target Milestone: --- → Camino1.0
Comment 21•19 years ago
|
||
Wow, so... this WORKSFORME. The steps in comment 0 work perfectly for me. In addition, the "expected results" are now incorrect. Did something change?
Comment 22•19 years ago
|
||
Oh, I should give the URIs which Google now creates. For ääliö: http://www.google.com/search?q=%C3%A4%C3%A4li%C3%B6 For mañana: http://www.google.com/search?q=ma%C3%B1ana
This WFM for me, too. I just deleted all of my Google cookies and tried with Arabic for a fancier test. Reading the comments, it seems like this is one of those bugs that breaks and gets fixed periodically on the trunk (comment 14, comment 16); perhaps bug 123006 fixed it most recently?
Assignee | ||
Comment 24•19 years ago
|
||
We don't use the code from bug 123006, but if this is WFM, then great!
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WORKSFORME
Summary: Keywords not correctly encoded → Bookmark Keywords not correctly encoded
You need to log in
before you can comment on or make changes to this bug.
Description
•