Closed
Bug 119825
Opened 23 years ago
Closed 21 years ago
URL (location) bar Search Feature ignores national encoding (google)
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: M.Hankus, Assigned: jshin1987)
References
Details
(Keywords: fixed1.6, intl)
Attachments
(1 file)
1.15 KB,
patch
|
smontagu
:
review+
brendan
:
superreview+
asa
:
approval1.6+
|
Details | Diff | Splinter Review |
Linux build 2002011108
I use search feature of URL bar, and i noticed that url bar ignores national
encoding of entered text. As an example I use ISO-8859-2, and Google as
preferred search engine. When I enter something in URL bar and select search,
mozilla query google with
http://www.google.com/search?q=%3F%F3%3F%3F&sourceid=mozilla-search
but when I open google and enter the same sentence in a form I got query string
http://www.google.com/search?q=%BF%F3%B3%E6&hl=pl&btnG=Szukaj+z+Google
so results are completly different.
Comment 1•23 years ago
|
||
*** Bug 118339 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 2•23 years ago
|
||
It might be more general, because Search tab in Sidebar behaves in the same way
as URL bar. In case of ISO8859-2 all non ascii chars are converted to %3F
Comment 3•23 years ago
|
||
*** Bug 131126 has been marked as a duplicate of this bug. ***
Comment 4•23 years ago
|
||
It bothers me on Win2k too.
Comment 5•23 years ago
|
||
can someone reproduce this on 1.0RC1 ?
Comment 6•23 years ago
|
||
It disappeared in Win2K(was in 0.9.9)
Reporter | ||
Comment 7•23 years ago
|
||
I can reproduce it in 2002041903 on Windows 98SE. I have not tested RC1.
Reporter | ||
Comment 8•23 years ago
|
||
On Linux RC1 build it is reproducable, as is in 2002042121 (linux)
*** Bug 124588 has been marked as a duplicate of this bug. ***
Comment 10•23 years ago
|
||
*** Bug 141393 has been marked as a duplicate of this bug. ***
Comment 11•23 years ago
|
||
*** Bug 141841 has been marked as a duplicate of this bug. ***
Comment 12•23 years ago
|
||
Verified with Hebrew characters and BeOS (1.0 RC1.0 - 2002050509)
Searching using the Google homepage worked fine, giving 16000 results:
http://www.google.com/search?hl=en&q=%26%231496%3B%26%231511%3B%26%231505%3B%26%231496%3B&btnG=Google+Search
The URL search for the same string returned no results:
http://www.google.com/search?q=%3F%3F%3F%3F&sourceid=mozilla-search
Request to change OS from Linux to All
Comment 13•23 years ago
|
||
Can confirm this bug on WIN2K in RC3. Using cyrillics.
Searching with Google form is fine, searching through URL bar - all characters
are sent as %3F, which obviously screws up the search.
Comment 14•23 years ago
|
||
*** Bug 143838 has been marked as a duplicate of this bug. ***
Comment 15•23 years ago
|
||
*** Bug 136858 has been marked as a duplicate of this bug. ***
Comment 16•23 years ago
|
||
related: bug 102984, bug 83277
Comment 17•23 years ago
|
||
changing component
Assignee: hewitt → yokoyama
Component: URL Bar → Internationalization
QA Contact: claudius → ruixu
Comment 18•23 years ago
|
||
can we assume this has been confirmed then? :)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 20•23 years ago
|
||
*** Bug 152065 has been marked as a duplicate of this bug. ***
Comment 21•23 years ago
|
||
*** Bug 153487 has been marked as a duplicate of this bug. ***
Comment 22•23 years ago
|
||
The search description file defaults to ISO-8859-1.
http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/datasets/google.src
Adding bobj to cc. He was trying to send UTF-8 for google search.
Reporter | ||
Comment 23•23 years ago
|
||
It looks like it is fixed now (it works for me) build 2002071911 Linux.
Comment 24•23 years ago
|
||
*** Bug 144939 has been marked as a duplicate of this bug. ***
Comment 25•23 years ago
|
||
cc nhotta
Reporter | ||
Comment 27•23 years ago
|
||
I'm not sure if anything has changed but linux build 2002080321 worked fine,
and 2002080508 is not working (I just installed latest build).
Updated•23 years ago
|
Summary: URL bar Search Feature ignores national encoding → URL (location) bar Search Feature ignores national encoding (google)
Comment 28•23 years ago
|
||
*** Bug 155386 has been marked as a duplicate of this bug. ***
Comment 29•22 years ago
|
||
*** Bug 128224 has been marked as a duplicate of this bug. ***
Comment 30•22 years ago
|
||
*** Bug 149029 has been marked as a duplicate of this bug. ***
Comment 31•22 years ago
|
||
So many DUPS here.
Latest one is 155386 which is reported 07/02/2002.
As Mirek mentioned in#27, Mirek tested 2002080321. It works.
I tested on 2002101805 build. It works also.
Mirek: Could you please test on latest?
Reporter | ||
Comment 32•22 years ago
|
||
for me it is working fine for some time (also 2002121922 linux build)
Comment 33•22 years ago
|
||
Some time?
Not all the time?
Comment 34•22 years ago
|
||
Since many bugs are merged to this one, so I have to describe all my
observations, although I really doubt all of these is simply one bug.
I'm using yesterday's nightly build (English) for windows. Running on w2k English.
1) If you search "中文" in sidebar, it returns no result
2) If you search "中文" in address bar, such as:
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=中文&btnG=Google+Search
, it translates "中文" to "%D6%D0%CE%C4" & get no result back. It should
translate to "%E4%B8%AD%E6%96%87" & get plenty of results.
3) If you highlight "中文" in browser & right click & select web search, it
translate to
http://www.google.com/search?q=%3F%3F&sourceid=mozilla-search&start=0&start=0
In short, all mozilla based chinese search failed :(
Comment 35•22 years ago
|
||
Has bug 145375 affected this one?
Assignee | ||
Comment 36•22 years ago
|
||
It appears that search in both the sidebar and url-bar gets affected by
Preferece | Navigator | Language setting. However, they're affected differently.
In the URL-bar, what's entered by a user is correctly converted to UTF-8 no
matter what language is at the top of the pref. lang list. That is, if I type
U+AC00 and U+AC01,
the url of the search result shown in the URL bar contains '%ea%b0%80%ea%b0%81'
(url-escaped UTF-8 representation of <U+AC00><U+AC01>). Moreover, what I can
type in the URL bar is NOT restricted by the repertoire of the locale charset
(at least under Win2k. I guess the same is true of Moz-Linux at least under
ll_CC.UTF-8 locale).
However, the search result is all mangled (the result itself appears correct,
though if an actual serach engine used - as opposed to the meta search server -
supports UTF-8)
Changing the character coding (EUC-KR) doesn't help. With 'ko' at the top of
the list, the search result (for Korean word) is properly rendered.
Given this, the problem is not on the Mozilla's side but is on the server side.
It is converting the search result into the legacy MIME charset that is
primarily associated with the language at the top of the list. If it's English,
the 'search server' assumes the result is in ISO-8859-1 although they're
actually in EUC-KR. Converting EUC-KR to UTF-8 assuming it's in ISO-8859-1
leads to a lot of question marks. Two things have to be done: 1. The 'meta
search server' should store everything in UTF-8 at its DB 2. When sending
back the result, it should just hand over the result without any conversion
regardless of the prefered langauge setting. These will make multilingual
search possible.
The search in the sidebar behaves differently and for this Mozilla's also to
blame because Mozilla is not converting the input to UTF-8 . It only works when
the language of keywords entered matches the language at the top of the prefered
lang. list.
This is definitely an item for I18N release note. To make search in language
'X' work correctly, that language has to be at the top of the prefered language
list in Pref|Navigator|Language.
Matt, can you move up zh(-CN) or zh-TW to the top and see what you get?
Assignee | ||
Comment 37•22 years ago
|
||
re: comment #12
> On BeOS...
> The URL search for the same string returned no results:
> http://www.google.com/search?q=%3F%3F%3F%3F&sourceid=mozilla-search
Is it still the case that Hebrew characters typed in the URL bar turn to '?'
(U+003F) even with Hebrew at the top of your prefered lang. list?
What's the locale under which you run Mozilla (if BeOS has such a thing..)?
It might have to do with Unicode-based system (Win2k/XP and Linux with UTF-8
locale) vs legacy encoding based system (Win9x/ME and Linux with locales using
legacy encodings).
Assignee | ||
Comment 38•22 years ago
|
||
With Google sherlock file updated, the search sidebar work perfectly well for
Google regardless of what's at the top of the prefered. lang. list.
I tested en-US Mozilla under Win2k(KO) with the zh-CN at the top of the pref.
lang. list. Both Korean word and Greek word (with Greek letter NOT representable
in EUC-KR. What I tried is 'Καλωσήλθατε'. CJK legacy character sets cover modern
Greek letters without diacritic marks, but don't cover those with diacritic
marks such as 'ή' U+03AE, eta with tono ) worked well with Google.
However, search in the location(URL) bar doesn't work so well. When I typed
'가각' (U+AC00, U+AC01. set View|Character Coding to UTF-8 to see the word)
in the location bar with zh-CN at the top of the pref. lang list, I got no
result with the URL in the location bar that reads:
http://search-intl.netscape.com/zh-cn/google.tmpl?
cp=clkzhcnsrp&charset=UTF-8&search=%EA%B0%80%EA%B0%81&
lr=lang_zh-CN
'%EA%B0%80%EA%B0%81' is the correct UTF-8 representation of '가각'(U+AC00,
U+AC01) so that the URL seems to be right. It's most likely that google.tmpl
at http://search-intl.netscape.com is to blame. It's assuming that
lang=zh-CN means that the character repertoire should be restricted to
that of GB2312.
With 'ko' at the top, I expected '가각' in the location bar
to work fine. I was suprised to find that it does not. Note that the url
below has a different format from the one that appeared with zh-CN as the most
preferred lang. Notably, 'ko/' is missing before 'google.tmpl' and '&lr=lang_ko'
is missing after search.
http://search-intl.netscape.com/google.tmpl?
cp=clkkosrp&charset=UTF-8&all=yes&cat=World/Korean
&search=%EA%B0%80%EA%B0%81
When I manually fixed up the url as follows, it worked.
http://search-intl.netscape.com/ko/google.tmpl?
cp=clkkosrp&charset=UTF-8&cat=World/Korean&search=%EA%B0%80%EA%B0%81&lr=lang_ko
So, this problem with Korean has to be fixed on the Mozilla's side.
Next I put Greek(el) at the top of my pref. lang. list and tried
'Καλωσήλθατε'. The search result seems to be correct, but the result
looked totally garbled. The URL used was
http://search.netscape.com/nscp_results.adp?
query=%ce%9a%ce%b1%ce%bb%cf%89%cf%83%ce%ae%ce%bb%ce%b8%ce%b1%cf%84%ce%b5
&source=NSCPRedirect
The url-escaped string after query= is the correct representation of
'Καλωσήλθατε'.
http://search-intl.netscape.com/el/google.tmpl?
cp=clkelsrp&charset=UTF-8
&search=%ce%9a%ce%b1%ce%bb%cf%89%cf%83%ce%ae%ce%bb%ce%b8%ce%b1%cf%84%ce%b5&
lr=lang_el
Greek was not so lucky and fixing up the url like the above didn't work.
So, this is another 'meta search server' issue. There's no
'el/google.tmpl' for Greek. I don't know why 'meta search server' cannot simply
fall back to English version if the localized version of 'greek.tmpl' is not
available on the server. Google supports a large number of languages and 'meta
search server' should be able to be a bridge between google's multilingual
search and the location bar.
Assignee | ||
Comment 39•22 years ago
|
||
> With 'ko' at the top, I expected '가각' in the location bar
> to work fine. I was suprised to find that it does not.
Somehow it began to work (with /ko/google.tmpl?....)
> So, this problem with Korean has to be fixed on the Mozilla's side
This turned out to be wrong. Most, if not all, fixes have to be done on the
server side (keyword.netscape.com). keyword.netscape.com determines which 'meta
server' to call with what parameters depending on the value of Accept-Lang http
header (that comes from the pref. lang. list of a client) and maybe other
parameters handed over from Mozilla.
I don't know how keyword.netscape.com determines which meta-search server to
redirect incoming requests to based on accept-lang. (can it be configurable on
the client side?). There seem to be three classes of 'meta search servers':
1. http://search-intl.netscape.com/ll-CC/google.tmpl : This one works well if
'll-CC' matches the first element in Accept-Lang. However, this one seems to be
used only when one of CJK lang. is at the top of the pref. lang. list. Even when
that's the case, there's a problem. It makes an invalid association between
ll-CC and MIME charset and replaces characters outside the repertoire of the
associated MIME charset with question marks. That is, when I include eta with
tonos (ή) with ko as my pref. language, it becomes '?'. This one should be
easiest to fix because google supports multilingual search very well and the
sidebar search already works well. Perhaps, this is a server-side complement of
the fix for bug 145375 (which is done on the client-side.)
2. The second category is completely broken.
www.netscape.fr (used with fr as my pref. language) and suche.netscape.de (for
German). They seem to interpret UTF-8 sequence as Windows-1252 sequence (when I
gave '가' (U+AC00 : 0xEA 0xB0 0x80), it searched for U+00EA, U+00B0, U+0080
(ê°€), instead. This means that they don't even work for French and German
keywords if there's even a single character outside US-ASCII. I just tried
Österreich with 'de' as my pref. language, suche.netscape.de looked for
Österreich, instead. Note that Ö in UTF-8 is 0xC3 0x96 which turn to Ö when
interpreted as Windows-1252
3. The third category is search.netscape.com/nscp_results.adp. It appears that
it's used when the first element in Accept-Lang is English or other languages
for which there's no dedicated meta-search server. At the moment, the latter
group includes Russian and Greek among many other languages. This is a curious case.
a. With Russian or Greek as my pref. lang.
When I give keywords not covered by US-ASCII, the search script running there
interpret incoming UTF-8 sequences correctly as in UTF-8 judging from the fact
that the pre-filled search box (for retry) in the result page preserves the
input string intact. It also comes up with some relevant hits. For instance, it
returns sites like http://www.vienna.at for Österreich with 'ru' as my pref.
lang. For 'Καλωσήλθατε' with Greek, some Greek sites are returned. However,
characters outside US-ASCII are all rendered with question marks. If I try a
Chinese/Japanese/Korean keyword, a couple of hits in the first page are relevant
while others appear to be off the mark.
A really funny thing happened when I gave 'Österreich' with Russian pref. and
manually switched to Windows-1252. The prefilled keyword for retry turned from
Österreich to Österreich, which is perfectly understandable. Strange thing is
there are a mix of hits, some with Österreich and the other with Österreich.
Apparently, what's stored in the DB for search.netscape.com is a mixture of data
in UTF-8(or legacy encoding with the proper encoding tag) and data in legacy
encoding(with no or wrong encoding tag).
The simplest fix (at least when google is the preferred search engine for the
sidebar search) may be to make keyword.netscape.com redirect all keyword search
to search-intl.netscape.com/xx/google.tmpl instead of lang-specific ones (that
don't even work for target languages) and search.netscape.com/nscp_results.adp
And, needless to say, google.tmpl script should not restrict the repertoire to
that of legacy encodings. Instead, it should allow any character in Unicode.
Comment 40•21 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031120
I have a probably related problem:
When I try to search (both Sidebar and address bar, but not additional MozzilaPL
XUL applet)
for a word with Polish diacritical chars in it, it gets messed up:
word: moździerz
Address bar/Sidebar (broken)
http://www.google.com/search?q=mo%25u017Adzierz&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
Google XUL applet (works)
http://www.google.com/search?q=mo%C5%BAdzierz&ie=utf8&oe=utf8&sourceid=mozilla-xul
Comment 41•21 years ago
|
||
No problem in Mozilla Firebird (with default-charset set to iso8859-2):
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031206 Firebird/0.7+
Got:
http://www.google.com/search?q=mo%C5%BAdzierz&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
Comment 42•21 years ago
|
||
Sorry for spam - I triplechecked and the browser had ISO8859-1 set as default
charset.
Setting it to ISO8859-2 doesn't cause the problem to show up though.
Assignee | ||
Comment 43•21 years ago
|
||
As I wrote in comment #39, it's still broken (in some cases, it works while in
other cases it doesn't) NOT because Mozilla (as a client) does anything wrong
BUT because keywrod.netscape.com (search.netscape.com) is broken. Presumably,
search.netscape.com/keyword.netscape.com is not under the control of mozilla.org
anymore.
asa, I'm sorry to bother you, but what's mozilla.org's plan for the keyword
server(s)? There's nothing we can do on the 'client side' and a relatively
simple fix on the server side would fix the problem (comment #39). I'm tempted
to change the product field to 'mozilla.org'. For a better tracking, I'm
assigning to myself, but should be reassigned to someone who can fix things on
the server-side eventually.
P.S. Everyone who wants to post to this bug has to set the character coding in
View menu to UTF-8 _before_ posting to avoid characters outside the repertoire
of the current character encoding turn to NCRs (〹) as in comment #34.
re: comment #40. That was a 'transitive' bug. We fixed our escape/unescape to be
complaint to ECMAscript standard(bug 44272), but hadn't fixed all our
__misuses__ of escape/unescape (bug 225695). Those problems have been addressed
since so that 1.6b should be fine with that.
Assignee: nhottanscp → jshin
Status: ASSIGNED → NEW
Assignee | ||
Comment 44•21 years ago
|
||
> to avoid characters outside the repertoire
> of the current character encoding turn to NCRs (〹) as in comment #34.
to avoid turning characters outside the repertoire of the current character
encoding to NCRs (匼) as in comment #34.
Status: NEW → ASSIGNED
Assignee | ||
Comment 45•21 years ago
|
||
Because we're not sure of the value of setting up a separate keyword server at
mozilla.org and it's too late for 1.6 even if we decide to do that, we'd better
take a simple way out by setting 'keyword.URL' to google.
Had we better use 'google feeling lucky' as firebird does? In this patch, I'm
using 'the plain google search'.
Assignee | ||
Comment 46•21 years ago
|
||
I think this should be fixed in both 1.4.2 and 1.6.
chofmann, what do you think? I guess you favor setting up our own server, but
as you wrote it's too late for 1.6. As for fixing things on AOL servers, I can
only guess it's a rather simple fix, but can't be sure because I have never seen
the code on that side. Therefore, making the default keyword.URL point to google
seems to be a n easy way out.
Flags: blocking1.6?
Flags: blocking1.4.2?
Assignee | ||
Comment 47•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
asking for r/sr.
I can't quite decide who to ask for r/sr... (I would have asked smontagu for r,
but he's on vacation).
This is kinda just filling the hole, but should be a lot better than what we
have now.
Attachment #137625 -
Flags: superreview?(brendan)
Attachment #137625 -
Flags: review?(chofmann)
Comment 48•21 years ago
|
||
This would not block the release. Please request approval when you have the
necessary reviews and drivers will consider the fix for inclusion in 1.6.
Flags: blocking1.6?
Flags: blocking1.6-
Flags: blocking1.4.2?
Flags: blocking1.4.2-
Comment 49•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
Someone test this heavily; code review is not the thing here.
/be
Attachment #137625 -
Flags: superreview?(brendan) → superreview+
Assignee | ||
Comment 50•21 years ago
|
||
Thanks for sr.
All the test cases mentioned here (Greek, Russian, Polish, German, Korean,
Japanese, Chinese) and some others I just made up work well as far as I can
tell. Others can test it by setting 'keyword.URL' to
'http://www.google.com/search?ie=UTF-8&oe=utf-8&q=' in about:config and enabling
'keyword' in Edit|Preference|Navigator|Smart Browsing.
See http://www.mozilla.org/docs/end-user/internet-keywords.html for details.
Comment 51•21 years ago
|
||
the patch works for me for French language, thanks
Assignee | ||
Comment 52•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
asking the module owner for review
Attachment #137625 -
Flags: review?(chofmann) → review?(smontagu)
Comment 53•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
r=smontagu.
This seems to work well enough out of the box, but I see the %3Fs can still
resurface if the default search engine is reset from the search sidebar, e.g.
to AskJeeves. There may not be much we can do about that.
Attachment #137625 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 54•21 years ago
|
||
fix checked into the trunk.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 55•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
asking for a1.6
Attachment #137625 -
Flags: approval1.6?
Comment 56•21 years ago
|
||
Comment on attachment 137625 [details] [diff] [review]
a patch
a=asa (on behalf of drivers) for checkin to 1.6
Attachment #137625 -
Flags: approval1.6? → approval1.6+
Comment 57•21 years ago
|
||
forgot to comment; checked in to 1.6 branch this afternoon.
You need to log in
before you can comment on or make changes to this bug.
Description
•