Closed Bug 181243 Opened 19 years ago Closed 17 years ago

Bookmark Keywords not correctly encoded

Categories

(Camino Graveyard :: Bookmarks, defect, P3)

PowerPC
macOS
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME
Camino1.0

People

(Reporter: visa, Assigned: sfraser_bugs)

References

(Blocks 1 open bug)

Details

(Keywords: intl)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021120 Chimera/0.6+

The keywords are not correctly encoded when using the new bookmark keyword
functionality and a word with non-ascii characters.

Reproducible: Always

Steps to Reproduce:
1. Create a Google keyword bookmark with the URL
http://www.google.com/search?q=%s and a keyword of your choice, I use "g".
2. Write "g ääliö" or "g mañana" or some other keyword with non-ascii characters.
3. Press enter and an incorrect list of results will be shown.
Actual Results:  
Using the Finnish word "ääliö", keyword bookmark took me to
http://www.google.com/search?q=%8A%8Ali%9A

With the word "mañana", Chimera opened the page
http://www.google.com/search?q=ma%96ana

Expected Results:  
The pages should've been:

"ääliö" http://www.google.com/search?q=%E4%E4li%F6
"mañana" http://www.google.com/search?q=ma%F1ana
That's odd; I thought it was unicode all the way. I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This is actually hard to fix. I'm going to have to convert the search term into
the character set of the page before dispatching it, but I don't know what the
page charset is at this point. Maybe converting into ISO-8859-1 is OK?
Mozilla has the same problem, FWIW.
Omniweb has problems with this too.

If the search engine had s UTF-8 version of their page, we should be OK, but I
could not find one for Google.
...and IE has this problem with their "search from the location bar" functionality.

You should be able to switch Google to UTF-8 with
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 but non-ascii characters
still don't work.
Specifically, the problem is that the hexadecimal numbers in the query string
encode MacRoman bytes rather than UTF-8 bytes.
Even if they were UTF-8 bytes, we still can't guarantee that the search page
uses UTF-8 in their search URLs.
Some sites still want something like ISO-8859-1, but it is inadequate for many
languages, so using it as the only possible encoding would be bad. If only one
encoding had to be chosen, choosing UTF-8 would make sense since it can encode
all or Unicode and Google supports is with ie=UTF-8.

I think having a per-bookmark encoding setting would be ugly in terms of UI.
As it seems the javascript-function "escape" does the correct conversion. Using
the following Location for a Google-Keyword-Search does the trick for german
umlauts (can someone else try with more "foreign" characters?).

javascript:location.href='http://www.google.de/search?client=googlet&q='+escape('%s')

It would be nice to add that to the keyword search functionality by piping %s
thru escape. Problem here is: What happens when Javascript of off?
Is comment 5 correct?  Bookmarking the URL
http://www.google.com/search?q=%s&ie=UTF-8&oe=UTF-8 and searching for ™,
é, ü, etc. works fine. 

 Wouldn't it just be satisfactory to assume UTF-8 encoding?
According to the Google API UTF-8 is enabled by default. Besides that the user
interface would be rather bad, because people have to write the HTML-Encoding
into their search strings. For instance when I have to search for "Bärbel" I
would have to use Bärbel for the search string. Is this correct or did I
get something wrong? My version using the javascript seems to solve this better.
Sorry for the spam... apparently bugzilla doesn't display HTML entities.
Searching for ™, é, ü, î, ñ, etc. works fine.
Searching for HTML encoded strings doesn't work for me. But: it looks like
searching with non-ascii characters (without encoding them) works now when you
set Google to use UTF-8! Using Camino build 2003041105.
Keyword-based searches now use UTF-8. Cool.

This bug has been fixed (either intentionally or unintentionally).
this still applies when using camino's "search the web" function. specifically,
searching for "camino search bug" generates:

http://www.google.com/search?q=camino search bug

which is then translated to:

http://www.google.com/search?q=camino%20search%20bug

it looks like the translation is being done server-side by google; it should be
done client-side, no? (should i spin this into a separate bug, or is this the
right place for it?)
It looks like bookmarks keywords are screwed up again: 
using http://www.google.com/search?q=%s to search for any word containing Polish
characters (say ¿±¶æ) couses searched word to be mangled into ??? 

It doesn't work on Mozilla nor Firebird under Linux and Windows (do not have
Camino to test). I can get this to work when explicitly using windows-1250
(Windows) or iso-8859-2 (Linux) as imput charset:
ie: http://www.google.com/search?q=%s&ie=windows-1250&oe=UTF-8 (on Windows)
gives correct results. 
It looks like bookmark searches are not using UTF-8 anymore.
Is this bug also used to track problems with typing special characters into the
URL, or should it be a different bug? For example, typing plus characters does
not work.

Original reported problem still seems to be broken in firefox too. I tried
searching for cyrillic characters on google, and ended up with just question
marks...

Should the product classification be changed too?
I feel that bug 123006 is somewhat related to this one, but I'm not too familiar
with dependency tree management, and I'm not sure also whether this bug is an 
entire duplicate of the above mentioned. After all, I never hacked Camino code.

So I'll leave it to bug reporter or someone else who has enough rights to manage
this bug. Follow the above link and make a wise decision.
Depends on: 123006
Bug 264406 is about similar problems in Mozilla, I wonder by reading the
comments here if there's actually any difference between the behaviour of
Mozilla and Camino.

Bug 123006 is about encoding "special" ASCII character, I think this bug should
stay dedicated to the problem of i18n (non US-ASCII) characters.

I would like to signal to someone who would try to implement a solution that one
method to always get the charset right when constructing the request URL would
be to use the LAST_CHARSET info of the bookmark entry to select the charset.

The patch to bug 123006 so was checked to FF resulted in FF always using UTF-8,
and this might be considered sufficent. It is conformant to current standards
about URL (there's a RFC that says they ought to be always encoded in UTF-8).
See also bug 258223 where I made a patch with a different approach. 
Keywords: intl
Priority: -- → P3
Target Milestone: --- → Camino1.0
Blocks: 301740
Wow, so... this WORKSFORME.

The steps in comment 0 work perfectly for me. In addition, the "expected
results" are now incorrect.

Did something change?
Oh, I should give the URIs which Google now creates.

For ääliö: http://www.google.com/search?q=%C3%A4%C3%A4li%C3%B6

For mañana: http://www.google.com/search?q=ma%C3%B1ana
This WFM for me, too.  I just deleted all of my Google cookies and tried with
Arabic for a fancier test.

Reading the comments, it seems like this is one of those bugs that breaks and
gets fixed periodically on the trunk (comment 14, comment 16); perhaps bug
123006 fixed it most recently?
We don't use the code from bug 123006, but if this is WFM, then great!
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Summary: Keywords not correctly encoded → Bookmark Keywords not correctly encoded
You need to log in before you can comment on or make changes to this bug.