Bookmark Keywords not correctly encoded

RESOLVED WORKSFORME

Status

Camino Graveyard
Bookmarks
P3
normal
RESOLVED WORKSFORME
16 years ago
11 years ago

People

(Reporter: Visa Kopu, Assigned: Simon Fraser)

Tracking

(Blocks: 1 bug, {intl})

unspecified
Camino1.0
PowerPC
Mac OS X
Dependency tree / graph

Details

(Reporter)

Description

16 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+

The keywords are not correctly encoded when using the new bookmark keyword
functionality and a word with non-ascii characters.

Reproducible: Always

Steps to Reproduce:
1. Create a Google keyword bookmark with the URL
http://www.google.com/search?q=%s and a keyword of your choice, I use "g".
2. Write "g ääliö" or "g mañana" or some other keyword with non-ascii characters.
3. Press enter and an incorrect list of results will be shown.
Actual Results:  
Using the Finnish word "ääliö", keyword bookmark took me to
http://www.google.com/search?q=%8A%8Ali%9A

With the word "mañana", Chimera opened the page
http://www.google.com/search?q=ma%96ana

Expected Results:  
The pages should've been:

"ääliö" http://www.google.com/search?q=%E4%E4li%F6
"mañana" http://www.google.com/search?q=ma%F1ana
(Assignee)

Comment 1

16 years ago
That's odd; I thought it was unicode all the way. I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Assignee)

Comment 2

16 years ago
This is actually hard to fix. I'm going to have to convert the search term into
the character set of the page before dispatching it, but I don't know what the
page charset is at this point. Maybe converting into ISO-8859-1 is OK?
(Assignee)

Comment 3

16 years ago
Mozilla has the same problem, FWIW.
(Assignee)

Comment 4

16 years ago
Omniweb has problems with this too.

If the search engine had s UTF-8 version of their page, we should be OK, but I
could not find one for Google.
(Reporter)

Comment 5

16 years ago
...and IE has this problem with their "search from the location bar" functionality.

You should be able to switch Google to UTF-8 with
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 but non-ascii characters
still don't work.
Specifically, the problem is that the hexadecimal numbers in the query string
encode MacRoman bytes rather than UTF-8 bytes.
(Assignee)

Comment 7

16 years ago
Even if they were UTF-8 bytes, we still can't guarantee that the search page
uses UTF-8 in their search URLs.
Some sites still want something like ISO-8859-1, but it is inadequate for many
languages, so using it as the only possible encoding would be bad. If only one
encoding had to be chosen, choosing UTF-8 would make sense since it can encode
all or Unicode and Google supports is with ie=UTF-8.

I think having a per-bookmark encoding setting would be ugly in terms of UI.

Comment 9

16 years ago
As it seems the javascript-function "escape" does the correct conversion. Using
the following Location for a Google-Keyword-Search does the trick for german
umlauts (can someone else try with more "foreign" characters?).

javascript:location.href='http://www.google.de/search?client=googlet&q='+escape('%s')

It would be nice to add that to the keyword search functionality by piping %s
thru escape. Problem here is: What happens when Javascript of off?

Comment 10

16 years ago
Is comment 5 correct?  Bookmarking the URL
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 and searching for ™,
é, ü, etc. works fine. 

 Wouldn't it just be satisfactory to assume UTF-8 encoding?

Comment 11

16 years ago
According to the Google API UTF-8 is enabled by default. Besides that the user
interface would be rather bad, because people have to write the HTML-Encoding
into their search strings. For instance when I have to search for "Bärbel" I
would have to use Bärbel for the search string. Is this correct or did I
get something wrong? My version using the javascript seems to solve this better.

Comment 12

16 years ago
Sorry for the spam... apparently bugzilla doesn't display HTML entities.
Searching for ™, é, ü, î, ñ, etc. works fine.
(Reporter)

Comment 13

16 years ago
Searching for HTML encoded strings doesn't work for me. But: it looks like
searching with non-ascii characters (without encoding them) works now when you
set Google to use UTF-8! Using Camino build 2003041105.
Keyword-based searches now use UTF-8. Cool.

This bug has been fixed (either intentionally or unintentionally).

Comment 15

15 years ago
this still applies when using camino's "search the web" function. specifically,
searching for "camino search bug" generates:

http://www.google.com/search?q=camino search bug

which is then translated to:

http://www.google.com/search?q=camino%20search%20bug

it looks like the translation is being done server-side by google; it should be
done client-side, no? (should i spin this into a separate bug, or is this the
right place for it?)

Comment 16

15 years ago
It looks like bookmarks keywords are screwed up again: 
using http://www.google.com/search?q=%s to search for any word containing Polish
characters (say ¿±¶æ) couses searched word to be mangled into ??? 

It doesn't work on Mozilla nor Firebird under Linux and Windows (do not have
Camino to test). I can get this to work when explicitly using windows-1250
(Windows) or iso-8859-2 (Linux) as imput charset:
ie: http://www.google.com/search?q=%s&ie=windows-1250&oe=UTF-8 (on Windows)
gives correct results. 
It looks like bookmark searches are not using UTF-8 anymore.

Comment 17

14 years ago
Is this bug also used to track problems with typing special characters into the
URL, or should it be a different bug? For example, typing plus characters does
not work.

Original reported problem still seems to be broken in firefox too. I tried
searching for cyrillic characters on google, and ended up with just question
marks...

Should the product classification be changed too?
I feel that bug 123006 is somewhat related to this one, but I'm not too familiar
with dependency tree management, and I'm not sure also whether this bug is an 
entire duplicate of the above mentioned. After all, I never hacked Camino code.

So I'll leave it to bug reporter or someone else who has enough rights to manage
this bug. Follow the above link and make a wise decision.

Updated

14 years ago
Depends on: 123006

Comment 19

14 years ago
Bug 264406 is about similar problems in Mozilla, I wonder by reading the
comments here if there's actually any difference between the behaviour of
Mozilla and Camino.

Bug 123006 is about encoding "special" ASCII character, I think this bug should
stay dedicated to the problem of i18n (non US-ASCII) characters.

I would like to signal to someone who would try to implement a solution that one
method to always get the charset right when constructing the request URL would
be to use the LAST_CHARSET info of the bookmark entry to select the charset.

The patch to bug 123006 so was checked to FF resulted in FF always using UTF-8,
and this might be considered sufficent. It is conformant to current standards
about URL (there's a RFC that says they ought to be always encoded in UTF-8).

Comment 20

14 years ago
See also bug 258223 where I made a patch with a different approach. 
Keywords: intl
(Assignee)

Updated

13 years ago
Priority: -- → P3
Target Milestone: --- → Camino1.0
(Assignee)

Updated

13 years ago
Blocks: 301740
Wow, so... this WORKSFORME.

The steps in comment 0 work perfectly for me. In addition, the "expected
results" are now incorrect.

Did something change?
Oh, I should give the URIs which Google now creates.

For ääliö: http://www.google.com/search?q=%C3%A4%C3%A4li%C3%B6

For mañana: http://www.google.com/search?q=ma%C3%B1ana
This WFM for me, too.  I just deleted all of my Google cookies and tried with
Arabic for a fancier test.

Reading the comments, it seems like this is one of those bugs that breaks and
gets fixed periodically on the trunk (comment 14, comment 16); perhaps bug
123006 fixed it most recently?
(Assignee)

Comment 24

13 years ago
We don't use the code from bug 123006, but if this is WFM, then great!
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → WORKSFORME

Updated

11 years ago
Summary: Keywords not correctly encoded → Bookmark Keywords not correctly encoded
You need to log in before you can comment on or make changes to this bug.