181243 - Bookmark Keywords not correctly encoded

Reporter

Description

•

22 years ago

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+

The keywords are not correctly encoded when using the new bookmark keyword
functionality and a word with non-ascii characters.

Reproducible: Always

Steps to Reproduce:
1. Create a Google keyword bookmark with the URL
http://www.google.com/search?q=%s and a keyword of your choice, I use "g".
2. Write "g ääliö" or "g mañana" or some other keyword with non-ascii characters.
3. Press enter and an incorrect list of results will be shown.
Actual Results:  
Using the Finnish word "ääliö", keyword bookmark took me to
http://www.google.com/search?q=%8A%8Ali%9A

With the word "mañana", Chimera opened the page
http://www.google.com/search?q=ma%96ana

Expected Results:  
The pages should've been:

"ääliö" http://www.google.com/search?q=%E4%E4li%F6
"mañana" http://www.google.com/search?q=ma%F1ana

Simon Fraser [no longer active]

Assignee

Comment 1

•

22 years ago

That's odd; I thought it was unicode all the way. I'll have a look.

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Simon Fraser [no longer active]

Assignee

Comment 2

•

22 years ago

This is actually hard to fix. I'm going to have to convert the search term into
the character set of the page before dispatching it, but I don't know what the
page charset is at this point. Maybe converting into ISO-8859-1 is OK?

Simon Fraser [no longer active]

Assignee

Comment 3

•

22 years ago

Mozilla has the same problem, FWIW.

Simon Fraser [no longer active]

Assignee

Comment 4

•

22 years ago

Omniweb has problems with this too.

If the search engine had s UTF-8 version of their page, we should be OK, but I
could not find one for Google.

Visa Kopu

Reporter

Comment 5

•

22 years ago

...and IE has this problem with their "search from the location bar" functionality.

You should be able to switch Google to UTF-8 with
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 but non-ascii characters
still don't work.

Henri Sivonen (:hsivonen)

Comment 6

•

22 years ago

Specifically, the problem is that the hexadecimal numbers in the query string
encode MacRoman bytes rather than UTF-8 bytes.

Simon Fraser [no longer active]

Assignee

Comment 7

•

22 years ago

Even if they were UTF-8 bytes, we still can't guarantee that the search page
uses UTF-8 in their search URLs.

Henri Sivonen (:hsivonen)

Comment 8

•

22 years ago

Some sites still want something like ISO-8859-1, but it is inadequate for many
languages, so using it as the only possible encoding would be bad. If only one
encoding had to be chosen, choosing UTF-8 would make sense since it can encode
all or Unicode and Google supports is with ie=UTF-8.

I think having a per-bookmark encoding setting would be ugly in terms of UI.

Martin Girschick

Comment 9

•

21 years ago

As it seems the javascript-function "escape" does the correct conversion. Using
the following Location for a Google-Keyword-Search does the trick for german
umlauts (can someone else try with more "foreign" characters?).

javascript:location.href='http://www.google.de/search?client=googlet&q='+escape('%s')

It would be nice to add that to the keyword search functionality by piping %s
thru escape. Problem here is: What happens when Javascript of off?

Prachi Gauriar

Comment 10

•

21 years ago

Is comment 5 correct?  Bookmarking the URL
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 and searching for &trade;,
&eacute;, &uuml;, etc. works fine. 

 Wouldn't it just be satisfactory to assume UTF-8 encoding?

Martin Girschick

Comment 11

•

21 years ago

According to the Google API UTF-8 is enabled by default. Besides that the user
interface would be rather bad, because people have to write the HTML-Encoding
into their search strings. For instance when I have to search for "Bärbel" I
would have to use B&auml;rbel for the search string. Is this correct or did I
get something wrong? My version using the javascript seems to solve this better.

Prachi Gauriar

Comment 12

•

21 years ago

Sorry for the spam... apparently bugzilla doesn't display HTML entities.
Searching for ™, é, ü, î, ñ, etc. works fine.

Visa Kopu

Reporter

Comment 13

•

21 years ago

Searching for HTML encoded strings doesn't work for me. But: it looks like
searching with non-ascii characters (without encoding them) works now when you
set Google to use UTF-8! Using Camino build 2003041105.

Henri Sivonen (:hsivonen)

Comment 14

•

21 years ago

Keyword-based searches now use UTF-8. Cool.

This bug has been fixed (either intentionally or unintentionally).

louis bennett

Comment 15

•

21 years ago

this still applies when using camino's "search the web" function. specifically,
searching for "camino search bug" generates:

http://www.google.com/search?q=camino search bug

which is then translated to:

http://www.google.com/search?q=camino%20search%20bug

it looks like the translation is being done server-side by google; it should be
done client-side, no? (should i spin this into a separate bug, or is this the
right place for it?)

Peter Bartecki

Comment 16

•

21 years ago

It looks like bookmarks keywords are screwed up again: 
using http://www.google.com/search?q=%s to search for any word containing Polish
characters (say ¿±¶æ) couses searched word to be mangled into ??? 

It doesn't work on Mozilla nor Firebird under Linux and Windows (do not have
Camino to test). I can get this to work when explicitly using windows-1250
(Windows) or iso-8859-2 (Linux) as imput charset:
ie: http://www.google.com/search?q=%s&ie=windows-1250&oe=UTF-8 (on Windows)
gives correct results. 
It looks like bookmark searches are not using UTF-8 anymore.

denis

Comment 17

•

20 years ago

Is this bug also used to track problems with typing special characters into the
URL, or should it be a different bug? For example, typing plus characters does
not work.

Original reported problem still seems to be broken in firefox too. I tried
searching for cyrillic characters on google, and ended up with just question
marks...

Should the product classification be changed too?

Sergey Sokoloff

Comment 18

•

20 years ago

I feel that bug 123006 is somewhat related to this one, but I'm not too familiar
with dependency tree management, and I'm not sure also whether this bug is an 
entire duplicate of the above mentioned. After all, I never hacked Camino code.

So I'll leave it to bug reporter or someone else who has enough rights to manage
this bug. Follow the above link and make a wise decision.

louis bennett

Updated

•

20 years ago

Depends on: 123006

Jean-Marc Desperrier

Comment 19

•

20 years ago

Bug 264406 is about similar problems in Mozilla, I wonder by reading the
comments here if there's actually any difference between the behaviour of
Mozilla and Camino.

Bug 123006 is about encoding "special" ASCII character, I think this bug should
stay dedicated to the problem of i18n (non US-ASCII) characters.

I would like to signal to someone who would try to implement a solution that one
method to always get the charset right when constructing the request URL would
be to use the LAST_CHARSET info of the bookmark entry to select the charset.

The patch to bug 123006 so was checked to FF resulted in FF always using UTF-8,
and this might be considered sufficent. It is conformant to current standards
about URL (there's a RFC that says they ought to be always encoded in UTF-8).

Jungshik Shin

Comment 20

•

20 years ago

See also bug 258223 where I made a patch with a different approach.

Keywords: intl

Simon Fraser [no longer active]

Assignee

Updated

•

19 years ago

Priority: -- → P3

Target Milestone: --- → Camino1.0

Simon Fraser [no longer active]

Assignee

Updated

•

19 years ago

Blocks: 301740

Samuel Sidler (old account; do not CC)

Comment 21

•

19 years ago

Wow, so... this WORKSFORME.

The steps in comment 0 work perfectly for me. In addition, the "expected
results" are now incorrect.

Did something change?

Samuel Sidler (old account; do not CC)

Comment 22

•

19 years ago

Oh, I should give the URIs which Google now creates.

For ääliö: http://www.google.com/search?q=%C3%A4%C3%A4li%C3%B6

For mañana: http://www.google.com/search?q=ma%C3%B1ana

Smokey Ardisson (offline for a while; not following bugs - do not email)

Comment 23

•

19 years ago

This WFM for me, too.  I just deleted all of my Google cookies and tried with
Arabic for a fancier test.

Reading the comments, it seems like this is one of those bugs that breaks and
gets fixed periodically on the trunk (comment 14, comment 16); perhaps bug
123006 fixed it most recently?

Simon Fraser [no longer active]

Assignee

Comment 24

•

19 years ago

We don't use the code from bug 123006, but if this is WFM, then great!

Status: ASSIGNED → RESOLVED

Closed: 19 years ago

Resolution: --- → WORKSFORME

benc

Updated

•

17 years ago

Summary: Keywords not correctly encoded → Bookmark Keywords not correctly encoded