Closed
Bug 387723
Opened 18 years ago
Closed 17 years ago
smart/keyword/bookmark searches don't work with non-ascii (russian) characters with the new urlbar binding
Categories
(Firefox :: Search, defect, P3)
Tracking
()
VERIFIED
FIXED
Firefox 3 beta3
People
(Reporter: asqueella, Assigned: dao)
References
Details
(Keywords: intl, regression)
Attachments
(1 obsolete file)
I have a keyword search for google:
- keyword is "g"
- URL is http://www.google.com/search?&q=%s
When the new location bar binding is used, searching for russian terms via this keyword doesn't work.
I'm not sure how would you go about reproducing it, but try the following. Visit this URL and copy the page's header to the clipboard:
http://ru.wikipedia.org/wiki/%D0%98%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%BE
then (assuming you have the aforementioned keyword search set up), put "g <pasted text>" in the location bar and hit "Go". You should see the garbled text being searched for.
The regression range is 2007-07-05-04 - 2007-07-06-04, and if I enable the urlbar binding in the 20070705 build, I can reproduce the bug too, so I'm guessing it's a bug in the new binding.
Flags: blocking-firefox3?
Assignee | ||
Comment 1•18 years ago
|
||
Works for me.
My original URL for "g" is "http://www.google.de/search?q=%s". I entered "g ", pasted "Искусство", hit enter ->
http://www.google.de/search?q=%D0%98%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%BE (displayed as http://www.google.de/search?q=Искусство).
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a7pre) Gecko/2007071008 Minefield/3.0a7pre
Reporter | ||
Comment 2•18 years ago
|
||
I just started with a clean profile and it doesn't work here, neither on the build you're using, nor on the latest trunk build. I have everything set to Russian in Regional settings in the control panel and it's an english version of Windows (XP).
If you could tell me where to put the tracing statements to debug this, without having to understand the code, I could try to debug.
Comment 3•18 years ago
|
||
WFM with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007071104 Minefield/3.0a7pre, en_GB.UTF-8 on Debian GNU/Linux 4.0etch+some_lenny.
Assignee | ||
Comment 4•18 years ago
|
||
(In reply to comment #2)
> If you could tell me where to put the tracing statements to debug this, without
> having to understand the code, I could try to debug.
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/urlbarBindings.xml&rev=1.13&mark=211-216#line_210
Reporter | ||
Comment 5•18 years ago
|
||
Heh, thanks, but too late :) I blame the insomnia and mozilla addiction.
What I see happening here is:
1. In canonizeUrl() the call to getShortcutOrURI() returns a correctly escaped URL.
2. That URL gets assigned to gURLBar.value.
3. Then any code that retrieves gURLBar.value gets the actual value in the text field, which is of the form "http://www.google.com/search?&q=<the query text in Unicode>". I.e. not the escaped URI that was assigned to gURLBar.value in step http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/browser/base/content/browser.js&rev=1.813&mark=2317,2320#2317
4. That leads to BrowserLoadURL calling loadURI with that broken param:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/browser/base/content/browser.js&rev=1.813&mark=1933#1932
which ends up calling
getWebNavigation().loadURI("http://www.google.com/search?&q=Искусство",
nsIWebNavigation.LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP,null, null, null);
...which, despite the nsIWebNavigation::loadURI IDL comments about characters being converted to UTF-8, encodes the URI assuming system charset (win-1251 in my case):
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/docshell/base/nsDocShell.cpp&rev=1.841&mark=2775#2766 ends up here:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/docshell/base/nsDefaultURIFixup.cpp&rev=1.51&mark=256#253
Reporter | ||
Comment 6•18 years ago
|
||
In this particular case I wonder if using the system charset is really necessary in this case, as the frontend does all the necessary charset manipulations already.
But generally to avoid breaking a bunch of code reading gURLBar.value, I'd make it return the escaped value (and make the code that needs the displayed value use another property). Of course it's 4am here, so take my suggestions with a grain of salt :)
Assignee | ||
Comment 7•18 years ago
|
||
(In reply to comment #5)
> getWebNavigation().loadURI("http://www.google.com/search?&q=Искусство",
> nsIWebNavigation.LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP,null, null, null);
>
> ...which, despite the nsIWebNavigation::loadURI IDL comments about characters
> being converted to UTF-8, encodes the URI assuming system charset (win-1251 in
> my case):
getShortcutOrURI() encodes the typed value only if it's a keyword search. Doesn't that mean that entering "http://www.google.com/search?&q=Искусство" without the new binding leads to the wrong page as well?
Reporter | ||
Comment 8•18 years ago
|
||
Yes.
Assignee | ||
Comment 9•18 years ago
|
||
As I see it, the options are: a) disallow third-party fixup or b) make sure it uses UTF-8. I don't know if either of those could be harmful.
Reporter | ||
Comment 10•18 years ago
|
||
Both those options will cause a behavior change from Fx2. I'm not sure it won't break people.
As I see it, there are at least the following cases involving non-ASCII characters in the location bar:
1. The user enters non-ASCII characters for a keyword search. The keyword searches have a well-defined behavior wrt charsets - for %s at least they default to encoding in UTF-8 and support a parameter to override that. So in this case the encoded string from getShortcutOrURI should be passed to loadURI.
2. The user enters a URL with escaped non-ASCII characters. The URL should be loaded as is. I suspect (can't test right now) the binding may try to decode that as UTF-8 and pass to loadURI as unicode, letting it do the encoding with the system charset.
3. The user enters a URL with unescaped non-ASCII characters. The old code escaped the characters using the system encoding in this case.
Reporter | ||
Comment 11•18 years ago
|
||
Thinking about it, in the third case it would probably indeed be better, in the perfect world, to encode in UTF-8, but it might break some sites for people with the system encoding matching the one the site expects. I think it might help to look why it was decided to use the system encoding when fixing up the URL.
Updated•18 years ago
|
Flags: blocking-firefox3? → blocking-firefox3+
Comment 12•18 years ago
|
||
I installed the latest trunc and I could access previously unaccessible sites because of the bug.
Updated•18 years ago
|
Target Milestone: --- → Firefox 3 M8
Comment 13•18 years ago
|
||
Just to confirm, it does not work with clean profile, regional settings = Lithuanian, letters = ąčęėįšųūž. However, search bar works without problems, so I assume it is possible to fix.
Assignee | ||
Comment 15•17 years ago
|
||
It's beyond me why CreateFixupURI thinks it should use the system charset (http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvsroot&subdir=/mozilla/docshell/base&command=DIFF_FRAMESET&file=nsDefaultURIFixup.cpp&rev2=1.20&rev1=1.19). As Nickolay noted, LoadURI utilizing this violates the interface. That's not my bug though, so here's a frontend band-aid.
Attachment #277693 -
Flags: review?(gavin.sharp)
Comment 16•17 years ago
|
||
Well, bug 130393 has at least part of the rationale... If you really think it's wrong, could you file a bug on it, and CC me?
Reporter | ||
Comment 17•17 years ago
|
||
So it was done for IE and possibly web compatibility. At least here, IE (6) doesn't encode the typed URL using the system charset (http://google.com/фыва -> http://www.google.com/%D1%84%D1%8B%D0%B2%D0%B0).
Assignee | ||
Updated•17 years ago
|
Attachment #277693 -
Flags: review?(gavin.sharp)
Comment 18•17 years ago
|
||
(In reply to comment #17)
> So it was done for IE and possibly web compatibility. At least here, IE (6)
> doesn't encode the typed URL using the system charset
Do you have the "always send as UTF-8" pref that was mentioned in bug 130393 comment 0 set? Is that on by default, or off?
Assignee | ||
Comment 19•17 years ago
|
||
filed bug 393246
Reporter | ||
Comment 20•17 years ago
|
||
(In reply to comment #18)
> Do you have the "always send as UTF-8" pref that was mentioned in bug 130393
> comment 0 set? Is that on by default, or off?
>
Ah. It is set, but I certainly didn't tweak any IE preferences.
Updated•17 years ago
|
Target Milestone: Firefox 3 M8 → Firefox 3 M9
Comment 21•17 years ago
|
||
moving out bugs that don't need to block b1
Target Milestone: Firefox 3 M9 → Firefox 3 M10
Assignee | ||
Comment 22•17 years ago
|
||
Can somebody tell me if setting network.standard-url.encode-query-utf8 = true fixes this bug?
Comment 23•17 years ago
|
||
If you mean "network.standard-url.encode-utf8 = true", then it absolutely does.
Assignee | ||
Updated•17 years ago
|
Attachment #277693 -
Attachment is obsolete: true
Assignee | ||
Updated•17 years ago
|
Whiteboard: [has patch] via bug 393246
Updated•17 years ago
|
Priority: -- → P3
Target Milestone: Firefox 3 M10 → Firefox 3 M11
Assignee | ||
Comment 24•17 years ago
|
||
This should be fixed by my secret plan B, bug 397815. However, bug 393246 should still be fixed in order to make "http://www.google.com/search?&q=Искусство" work just like "g Искусство".
Status: NEW → RESOLVED
Closed: 17 years ago
Depends on: 397815
Resolution: --- → FIXED
Whiteboard: [has patch] via bug 393246
Reporter | ||
Comment 25•17 years ago
|
||
Seems to work, thank you!
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #25)
> Seems to work, thank you!
I take this as a verification -- less work for our QA folks.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 27•17 years ago
|
||
OK, though I actually wanted someone else to verify this on a different system - these intl issues are rather tricky.
Since you took my comment as verification, I'll add more detail: it works on an XP system with everything in Regional & Language settings set to Russian and the following Firefox build: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011205 Minefield/3.0b3pre
Thanks again!
You need to log in
before you can comment on or make changes to this bug.
Description
•