Closed Bug 387723 Opened 13 years ago Closed 12 years ago

smart/keyword/bookmark searches don't work with non-ascii (russian) characters with the new urlbar binding

Categories

(Firefox :: Search, defect, P3)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
Firefox 3 beta3

People

(Reporter: asqueella, Assigned: dao)

References

Details

(Keywords: intl, regression)

Attachments

(1 obsolete file)

I have a keyword search for google:
- keyword is "g"
- URL is http://www.google.com/search?&q=%s

When the new location bar binding is used, searching for russian terms via this keyword doesn't work.

I'm not sure how would you go about reproducing it, but try the following. Visit this URL and copy the page's header to the clipboard:
http://ru.wikipedia.org/wiki/%D0%98%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%BE

then (assuming you have the aforementioned keyword search set up), put "g <pasted text>" in the location bar and hit "Go". You should see the garbled text being searched for.

The regression range is 2007-07-05-04 - 2007-07-06-04, and if I enable the urlbar binding in the 20070705 build, I can reproduce the bug too, so I'm guessing it's a bug in the new binding.
Flags: blocking-firefox3?
Works for me.
My original URL for "g" is "http://www.google.de/search?q=%s". I entered "g ", pasted "Искусство", hit enter ->
http://www.google.de/search?q=%D0%98%D1%81%D0%BA%D1%83%D1%81%D1%81%D1%82%D0%B2%D0%BE (displayed as http://www.google.de/search?q=Искусство).

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a7pre) Gecko/2007071008 Minefield/3.0a7pre
I just started with a clean profile and it doesn't work here, neither on the build you're using, nor on the latest trunk build. I have everything set to Russian in Regional settings in the control panel and it's an english version of Windows (XP).

If you could tell me where to put the tracing statements to debug this, without having to understand the code, I could try to debug.
WFM with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007071104 Minefield/3.0a7pre, en_GB.UTF-8 on Debian GNU/Linux 4.0etch+some_lenny.
(In reply to comment #2)
> If you could tell me where to put the tracing statements to debug this, without
> having to understand the code, I could try to debug.

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/urlbarBindings.xml&rev=1.13&mark=211-216#line_210
Heh, thanks, but too late :) I blame the insomnia and mozilla addiction.

What I see happening here is:

1. In canonizeUrl() the call to getShortcutOrURI() returns a correctly escaped URL.
2. That URL gets assigned to gURLBar.value.
3. Then any code that retrieves gURLBar.value gets the actual value in the text field, which is of the form "http://www.google.com/search?&q=<the query text in Unicode>". I.e. not the escaped URI that was assigned to gURLBar.value in step http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/browser/base/content/browser.js&rev=1.813&mark=2317,2320#2317
4. That leads to BrowserLoadURL calling loadURI with that broken param:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/browser/base/content/browser.js&rev=1.813&mark=1933#1932
which ends up calling

getWebNavigation().loadURI("http://www.google.com/search?&q=Искусство",
nsIWebNavigation.LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP,null, null, null);

...which, despite the nsIWebNavigation::loadURI IDL comments about characters being converted to UTF-8, encodes the URI assuming system charset (win-1251 in my case):
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/docshell/base/nsDocShell.cpp&rev=1.841&mark=2775#2766 ends up here:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/docshell/base/nsDefaultURIFixup.cpp&rev=1.51&mark=256#253
In this particular case I wonder if using the system charset is really necessary in this case, as the frontend does all the necessary charset manipulations already.

But generally to avoid breaking a bunch of code reading gURLBar.value, I'd make it return the escaped value (and make the code that needs the displayed value use another property). Of course it's 4am here, so take my suggestions with a grain of salt :)
(In reply to comment #5)
> getWebNavigation().loadURI("http://www.google.com/search?&q=Искусство",
> nsIWebNavigation.LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP,null, null, null);
> 
> ...which, despite the nsIWebNavigation::loadURI IDL comments about characters
> being converted to UTF-8, encodes the URI assuming system charset (win-1251 in
> my case):

getShortcutOrURI() encodes the typed value only if it's a keyword search. Doesn't that mean that entering "http://www.google.com/search?&q=Искусство" without the new binding leads to the wrong page as well?
Yes.
As I see it, the options are: a) disallow third-party fixup or b) make sure it uses UTF-8. I don't know if either of those could be harmful.
Both those options will cause a behavior change from Fx2. I'm not sure it won't break people.

As I see it, there are at least the following cases involving non-ASCII characters in the location bar:
1. The user enters non-ASCII characters for a keyword search. The keyword searches have a well-defined behavior wrt charsets - for %s at least they default to encoding in UTF-8 and support a parameter to override that. So in this case the encoded string from getShortcutOrURI should be passed to loadURI.
2. The user enters a URL with escaped non-ASCII characters. The URL should be loaded as is. I suspect (can't test right now) the binding may try to decode that as UTF-8 and pass to loadURI as unicode, letting it do the encoding with the system charset.
3. The user enters a URL with unescaped non-ASCII characters. The old code escaped the characters using the system encoding in this case.
Thinking about it, in the third case it would probably indeed be better, in the perfect world, to encode in UTF-8, but it might break some sites for people with the system encoding matching the one the site expects. I think it might help to look why it was decided to use the system encoding when fixing up the URL.
Flags: blocking-firefox3? → blocking-firefox3+
I installed the latest trunc and I could access previously unaccessible sites because of the bug.
Target Milestone: --- → Firefox 3 M8
Just to confirm, it does not work with clean profile, regional settings = Lithuanian, letters = ąčęėįšųūž. However, search bar works without problems, so I assume it is possible to fix.
Dao, this is your regression, any idea on an ETA?
Assignee: nobody → dao
Attached patch fix (obsolete) — Splinter Review
It's beyond me why CreateFixupURI thinks it should use the system charset (http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvsroot&subdir=/mozilla/docshell/base&command=DIFF_FRAMESET&file=nsDefaultURIFixup.cpp&rev2=1.20&rev1=1.19). As Nickolay noted, LoadURI utilizing this violates the interface. That's not my bug though, so here's a frontend band-aid.
Attachment #277693 - Flags: review?(gavin.sharp)
Well, bug 130393 has at least part of the rationale... If you really think it's wrong, could you file a bug on it, and CC me?
So it was done for IE and possibly web compatibility. At least here, IE (6) doesn't encode the typed URL using the system charset (http://google.com/фыва -> http://www.google.com/%D1%84%D1%8B%D0%B2%D0%B0).
Attachment #277693 - Flags: review?(gavin.sharp)
(In reply to comment #17)
> So it was done for IE and possibly web compatibility. At least here, IE (6)
> doesn't encode the typed URL using the system charset

Do you have the "always send as UTF-8" pref that was mentioned in bug 130393 comment 0 set? Is that on by default, or off?
Depends on: 393246
filed bug 393246
(In reply to comment #18)
> Do you have the "always send as UTF-8" pref that was mentioned in bug 130393
> comment 0 set? Is that on by default, or off?
> 
Ah. It is set, but I certainly didn't tweak any IE preferences.
Target Milestone: Firefox 3 M8 → Firefox 3 M9
moving out bugs that don't need to block b1
Target Milestone: Firefox 3 M9 → Firefox 3 M10
Can somebody tell me if setting network.standard-url.encode-query-utf8 = true fixes this bug?
If you mean "network.standard-url.encode-utf8 = true", then it absolutely does.
Attachment #277693 - Attachment is obsolete: true
Whiteboard: [has patch] via bug 393246
Priority: -- → P3
Target Milestone: Firefox 3 M10 → Firefox 3 M11
This should be fixed by my secret plan B, bug 397815. However, bug 393246 should still be fixed in order to make "http://www.google.com/search?&q=Искусство" work just like "g Искусство".
Status: NEW → RESOLVED
Closed: 12 years ago
Depends on: 397815
Resolution: --- → FIXED
Whiteboard: [has patch] via bug 393246
Seems to work, thank you!
(In reply to comment #25)
> Seems to work, thank you!

I take this as a verification -- less work for our QA folks.
Status: RESOLVED → VERIFIED
OK, though I actually wanted someone else to verify this on a different system - these intl issues are rather tricky.

Since you took my comment as verification, I'll add more detail: it works on an XP system with everything in Regional & Language settings set to Russian and the following Firefox build: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008011205 Minefield/3.0b3pre

Thanks again!
You need to log in before you can comment on or make changes to this bug.