Closed
Bug 393246
Opened 17 years ago
Closed 11 years ago
CreateFixupURI should use UTF-8, not the system charset
Categories
(Core :: DOM: Navigation, defect)
Core
DOM: Navigation
Tracking
()
RESOLVED
FIXED
mozilla28
Tracking | Status | |
---|---|---|
firefox28 | --- | fixed |
People
(Reporter: dao, Assigned: hsivonen)
References
Details
(Keywords: intl, Whiteboard: [qa-])
Attachments
(1 file, 2 obsolete files)
6.51 KB,
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
Five years ago, bug 130393 comment 0:
> Try to type any DBCS URL in the location bar and enter.
>
> You will get "file not found" error.
>
> It's b/c mozilla encode/escape URL typed in location bar to UTF-8 b/f sending
> it to server.
>
> For IE, it sends in system charset when "always send URLs as UTF-8" is off,
> and it sends in UTF-8 encoding when the option is on,
> which increases the possibility to find the file in the server.
>
> According to research done in bugs related to non-ASCII URLs, most modern
> servers are not ready to receive URLs in UTF-8 encoding, but expects them in
> their file system charset.
>
> Therefore, sending URLs in system charset will increase the possibility of
> fidning the file wanted, even though it would not work when the server charset
> does not match with client machine's charset.
Chances are that "most modern servers" are now ready to accept UTF-8 encoded URLs and that using the system charset actually produces more errors.
Since CreateFixupURI is utilized by LoadURI, the current behavior also violates the interface, which states: "For HTTP and FTP URLs and possibly others, characters above U+007F will be converted to UTF-8 and then URL- escaped per the rules of RFC 2396."
Note that "always send URLs as UTF-8" is on by default in IE6.
Flags: blocking1.9?
Reporter | ||
Updated•17 years ago
|
Whiteboard: [parity-opera] [parity-IE]
Reporter | ||
Updated•17 years ago
|
Summary: Always send typed URLs as UTF-8 → Always send URLs as UTF-8
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Reporter | ||
Comment 2•17 years ago
|
||
How can this be a duplicate of bug 261929? That one claims to be fixed.
Comment 3•17 years ago
|
||
It looks like the patch for bug added a "network.standard-url.encode-query-utf8" pref that defaults to false, and then changed the default value of network.standard-url.encode-utf8 to true. According to the comments there this makes us match IE...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Always send URLs as UTF-8 → Always encode query string values as UTF-8 (network.standard-url.encode-query-utf8 = true)
Comment 4•17 years ago
|
||
Dão, is this fixed?
Reporter | ||
Comment 5•17 years ago
|
||
Bug 387723 is still open, which depends on this one (at least if we don't want the dirty fix). I think what should happen here is to revert the patch in bug 130393.
Comment 6•17 years ago
|
||
Comment 7•17 years ago
|
||
Unclear what the issues here are. Bug summary sounds like it's just flipping a pref anyway, so this shouldn't be a deal breaker.
Flags: blocking1.9? → blocking1.9-
Reporter | ||
Comment 8•17 years ago
|
||
I was told this would really fix bug 387723.
Attachment #284307 -
Flags: superreview?(cbiesinger)
Attachment #284307 -
Flags: review?(cbiesinger)
Comment 9•17 years ago
|
||
So what exactly has changed since the fix for bug 261929?
Reporter | ||
Comment 10•17 years ago
|
||
Bookmark keywords broke because the location bar decodes utf8-encoded addresses which are then encoded with the system charset.
bug 387723 comment 5:
> getWebNavigation().loadURI("http://www.google.com/search?&q=Искусство",
> nsIWebNavigation.LOAD_FLAGS_ALLOW_THIRD_PARTY_FIXUP,null, null, null);
>
> ...which, despite the nsIWebNavigation::loadURI IDL comments about characters
> being converted to UTF-8, encodes the URI assuming system charset (win-1251 in
> my case):
bug 387723 comment 7:
> getShortcutOrURI() encodes the typed value only if it's a keyword search.
> Doesn't that mean that entering
> "http://www.google.com/search?&q=Искусство" without the new binding
> leads to the wrong page as well?
bug 387723 comment 8:
> Yes.
Comment 11•17 years ago
|
||
> Bookmark keywords broke because the location bar
That wasn't my question. What's changed in web content that would allow us to break compat with IE here? If bookmark keywords have a problem, maybe we should fix it without changing the behavior exposed to the web at large.
> despite the nsIWebNavigation::loadURI IDL comments
Maybe we should be fixing whatever is going on here, instead of blanket-changing the behavior of necko?
For example, having whatever NS_NewURI call loadURI does here pass in "UTF8" as the origin charset will make Necko use that for encoding URIs coming through loadURI, no matter what the preference values are or what the system encoding is.
That said, the code in LoadURI passes a null origin charset, which means UTF-8. So what's _really_ going on here?
Comment 12•17 years ago
|
||
In particular, please breakpoint in
2754 nsDocShell::LoadURI(const PRUnichar * aURI,
2755 PRUint32 aLoadFlags,
2756 nsIURI * aReferringURI,
2757 nsIInputStream * aPostStream,
2758 nsIInputStream * aHeaderStream)
step down to the NS_NewURI call on line 2777 (this is rev 1.861 of nsDocShell.cpp). When you step across that, what's the mCharset of the resulting URI? What is it encoded as?
Reporter | ||
Comment 13•17 years ago
|
||
(In reply to comment #11)
> > Bookmark keywords broke because the location bar
>
> That wasn't my question. What's changed in web content that would allow us to
> break compat with IE here? If bookmark keywords have a problem, maybe we
> should fix it without changing the behavior exposed to the web at large.
Why should typing "g Искусство" be different from "http://www.google.com/search?&q=Искусство"?
> That said, the code in LoadURI passes a null origin charset, which means UTF-8.
> So what's _really_ going on here?
As I wrote earlier, I think another possible fix is to back out bug 130393. It really seems to break the interface.
Comment 14•17 years ago
|
||
> Why should typing "g Искусство" be different
No reason it should. But if a site has a URI pointing to itself and uses the page encoding on the server, we need to use the page encoding when sending the URI, for compat. Therefore, blanket-changing the necko behavior is not acceptable.
> another possible fix is to back out bug 130393.
Aha! So this is purely a URI fixup issue, eh? Bug 387723 comment 7 made i sound like a pure loadURI issue.
I think that we should in fact back out bug 130393 if IE now defaults to having "always send URLs as UTF-8" on by default, which it sounds like it does.
Reporter | ||
Updated•17 years ago
|
Attachment #284307 -
Attachment is obsolete: true
Attachment #284307 -
Flags: superreview?(cbiesinger)
Attachment #284307 -
Flags: review?(cbiesinger)
Comment 15•17 years ago
|
||
I agree with Boris. IE*5* turned on "always send URLs as UTF-8" by default.
Also, we don't have to consider file system charset about "file:" URL because it now always uses UTF-8.
Reporter | ||
Updated•17 years ago
|
Summary: Always encode query string values as UTF-8 (network.standard-url.encode-query-utf8 = true) → CreateFixupURI should use UTF-8, not the system charset
Whiteboard: [parity-opera] [parity-IE]
Reporter | ||
Updated•17 years ago
|
Flags: blocking1.9- → blocking1.9?
Reporter | ||
Comment 16•17 years ago
|
||
Currently untested, feel free to wait with the review until I get that done.
Attachment #285020 -
Flags: superreview?(bzbarsky)
Attachment #285020 -
Flags: review?(bzbarsky)
Comment 17•17 years ago
|
||
Comment on attachment 285020 [details] [diff] [review]
patch
I think you should ask biesi for review. I can try to do it if he can't, but it will be _very_ laggy.
Attachment #285020 -
Flags: superreview?(bzbarsky)
Attachment #285020 -
Flags: review?(bzbarsky)
Reporter | ||
Comment 18•17 years ago
|
||
Comment on attachment 285020 [details] [diff] [review]
patch
Tested and appears to be working. My system charset is UTF-8, though.
Attachment #285020 -
Flags: superreview?(cbiesinger)
Attachment #285020 -
Flags: review?(cbiesinger)
Reporter | ||
Updated•17 years ago
|
Status: REOPENED → ASSIGNED
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9-
Comment 19•17 years ago
|
||
Should this be [wanted-1.9]?
Comment 20•17 years ago
|
||
Please don't ask this on every single bug we triage blocking1.9-. If we intend to mark something wanted-1.9, we will. That marking isn't going to make much difference in whether something gets fixed anyway.
Comment 21•17 years ago
|
||
Frankly, I think we should be blocking on this bug. It's got a patch; we just need to make sure it doesn't fall by the wayside and actually goes in.
Comment 22•17 years ago
|
||
(In reply to comment #20)
> Please don't ask this on every single bug we triage blocking1.9-. If we intend
> to mark something wanted-1.9, we will. That marking isn't going to make much
> difference in whether something gets fixed anyway.
I'm sorry, I didn't mean to spam; I was just a little puzzled because I thought
that usually when something is marked blocking1.9-, it is marked as
[wanted-1.9] because although it has been decided that it would not block the
release, it is still targeted for 1.9. Thank you for correcting me though.
Comment 23•17 years ago
|
||
(In reply to comment #21)
> Frankly, I think we should be blocking on this bug.
We've asked for a concise explanation and we haven't received one, and it's not clear that this a regression.
Comment 24•17 years ago
|
||
It's not a regression from 1.8, except insofar as more and more servers don't work with our code as they switch to UTF-8. And they're switching, because IE switched.
We really need a way to track "bugs that shouldn't block but that shouldn't miss the train just because a reviewer is being lame", which is where this bug could easily end up.
Comment 25•17 years ago
|
||
We've got a front end blocker depending on this, so I'll try to make sure it gets reviewed and landed.
Reporter | ||
Comment 26•17 years ago
|
||
Bug 387723 got a different fix. However, this should still be fixed in order to make "http://www.google.com/search?&q=Искусство" work just like "g Искусство".
Updated•17 years ago
|
Status: ASSIGNED → NEW
Updated•17 years ago
|
Attachment #285020 -
Flags: superreview?(cbiesinger)
Attachment #285020 -
Flags: superreview+
Attachment #285020 -
Flags: review?(cbiesinger)
Attachment #285020 -
Flags: review+
Reporter | ||
Comment 27•17 years ago
|
||
Comment on attachment 285020 [details] [diff] [review]
patch
Pretty late now, and I think this could wait until after 1.9 (now that bug 387723 got a different fix), but Boris thought this should be blocking, thus requesting approval.
Attachment #285020 -
Flags: approval1.9?
Comment 28•17 years ago
|
||
Comment on attachment 285020 [details] [diff] [review]
patch
a1.9=beltzner
Attachment #285020 -
Flags: approval1.9? → approval1.9+
Reporter | ||
Updated•17 years ago
|
Keywords: checkin-needed
Updated•17 years ago
|
Assignee: nobody → dao
Comment 29•17 years ago
|
||
Checking in docshell/base/nsDefaultURIFixup.cpp;
/cvsroot/mozilla/docshell/base/nsDefaultURIFixup.cpp,v <-- nsDefaultURIFixup.cpp
new revision: 1.52; previous revision: 1.51
done
Checking in docshell/base/nsDefaultURIFixup.h;
/cvsroot/mozilla/docshell/base/nsDefaultURIFixup.h,v <-- nsDefaultURIFixup.h
new revision: 1.12; previous revision: 1.11
done
Status: NEW → RESOLVED
Closed: 17 years ago → 17 years ago
Component: Networking → Embedding: Docshell
Keywords: checkin-needed
QA Contact: networking → docshell
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9beta5
Comment 30•17 years ago
|
||
See bug 432836
This actually breaks web services (like Yahoo Calendar) that don't handle UTF-8.
And we do NOT work like IE. Even with the IE8 preference turned on (UTF8), they still interpret native characters correctly. They don't convert them to UTF8.
If you have native characters, there is no way to convert them to an encoding that Yahoo Calendar understands, since escape makes them escaped unicode, encodeURIComponent makes them UTF-8, and the default is basically to call encodeURIComponent under the covers.
Comment 31•17 years ago
|
||
What happens with that site in IE?
Comment 32•17 years ago
|
||
All I can determine from IE8 is that the native characters stay native in the URL bar and appear correctly on the page.
I believe I have a better testcase in bug 432836.
I think the problem is going to be servers that use PHP urldecode. It does not handle UTF-8.
Comment 33•17 years ago
|
||
Boris/MConnor are you ok with backing this out to fix bug 432836?
Comment 34•17 years ago
|
||
Backed out for now. Think this is probably the right thing, but the change came a little too late for us to be comfortable shipping it now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 35•17 years ago
|
||
I'm fine with backing it out, esp. since the information provided in this bug about IE behavior was incorrect...
Reporter | ||
Comment 36•17 years ago
|
||
The confusion about IE is unfortunate, but it's not like bug 432836 would be very clear either. Seems to have mixed up two issues and didn't convince me at all that we were doing something wrong or harmful. The lack of other regression reports since beta 5 is most notable, so I definitely think we could re-land this in the not-so-far future. I won't push it again, though, leaving it to biesi, Boris or whoever feels responsible what to do with this bug.
Assignee: dao → nobody
Status: REOPENED → NEW
Comment 37•17 years ago
|
||
We WERE doing something wrong though. I'm investigating this further on IE and I finally figured out what the UTF-8 preference does - as someone else guessed on IRC, it only affects the path part of the URL. it does NOT affect the query part. We were converting the entire URL.
For this change to be correct (or at least consistent with IE), it should have only converted the path to UTF8, not the entire query, and it should probably have a preference just like IE does to cover legacy servers that still use ASCII encoding for path part (The existing preference network.standard-url.encode-utf8 is not good enough since it encodes the entire URL)
If anyone wants to mess around with this, I've created:
http://www.kaply.com/work/äëïöü/?string=äëïöü
In IE, you can not navigate to this URL unless you have UTF-8 URLs turned on. But even if you have UTF-8 URLs turned on, the string part is NOT encoded to UTF-8.
In Firefox 2, you cannot navigate to this URL unless you turn on network.standard-url.encode-utf8 which then converts the query to UTF-8 incorrectly.
In Firefox 3 Beta 5, you can navigate to the URL but if converts the query to UTF-8 incorrectly.
So to reiterate - this patch was based on incorrect information about IE's behavior and changed the behavior to actually be incompatible to with IE.
It made more sense to back it out then to create a new third behavior that is different IE.
And incidentally RFC 2396 does NOT mandate UTF-8. To quote section 2.1:
It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification.
Reporter | ||
Comment 38•17 years ago
|
||
(In reply to comment #37)
> In Firefox 3 Beta 5, you can navigate to the URL but if converts the query to
> UTF-8 incorrectly.
You keep claiming it's incorrect, but I don't see why. The system charset can be anything, including UTF-8. Why is using that to do the encoding better for the server?
Comment 39•17 years ago
|
||
(In reply to comment #37)
> For this change to be correct (or at least consistent with IE), it should have
> only converted the path to UTF8, not the entire query, and it should probably
> have a preference just like IE does to cover legacy servers that still use
> ASCII encoding for path part
FWIW, that's the behaviour you'd get by backing out this patch.
Also note the network.standard-url.encode-query-utf8 pref (http://lxr.mozilla.org/seamonkey/source/modules/libpref/src/init/all.js#721)
Comment 40•17 years ago
|
||
oh right, the patch did get backed out. So yeah, what we do now is encode the path as UTF-8, the query in the platform charset.
Updated•17 years ago
|
Whiteboard: [not needed for 1.9]
Comment 42•11 years ago
|
||
I found Firefox always converts a URL in the URL bar to UTF-8 regardless whether the user has typed in the URL bar since Firefox 10(!)
I propose to obsolete FIXUP_FLAG_USE_UTF8 and just always use UTF-8 because:
- This is already shipped by accident in two years, but nobody (including :mkaply) complained about the "breakage" of legacy Web pages.
- At least Yahoo! Calendar migrated to UTF-8. Bug 432836 shows no other Web pages in the real world. An artificially crafted test page is not a good reason to keep the behavior.
- We do not invent yet another new behavior because Chrome already does. It is not also minor anymore. (Chrome even beats our market share according to some statistics.)
Comment 43•11 years ago
|
||
support this motion. Assuming the useragents were true, I have had numerous visits from mainland China on my website using Firefox that have incorrectly encoded the query params in GBK...
Assignee | ||
Comment 44•11 years ago
|
||
Also, Mac and most Linux configurations have been using UTF-8 here for a long time.
Assignee | ||
Comment 45•11 years ago
|
||
Assignee | ||
Comment 46•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Target Milestone: mozilla1.9beta5 → ---
Comment 48•11 years ago
|
||
(In reply to Masatoshi Kimura [:emk] from comment #42)
> I found Firefox always converts a URL in the URL bar to UTF-8 regardless
> whether the user has typed in the URL bar since Firefox 10(!)
Do we know what checkin caused this? Are we sure that it doesn't depend on prefs?
Comment 49•11 years ago
|
||
(In reply to Simon Montagu :smontagu from comment #48)
> (In reply to Masatoshi Kimura [:emk] from comment #42)
> > I found Firefox always converts a URL in the URL bar to UTF-8 regardless
> > whether the user has typed in the URL bar since Firefox 10(!)
>
> Do we know what checkin caused this? Are we sure that it doesn't depend on
> prefs?
That's <https://hg.mozilla.org/mozilla-central/rev/5fdf8f720c7c>. In other words, it has never worked as expected from the start.
Comment 50•11 years ago
|
||
> That's <https://hg.mozilla.org/mozilla-central/rev/5fdf8f720c7c>. In other
> words, it has never worked as expected from the start.
The patch had two problems.
1. In loadCurrent(), "this" will no longer point gURLBar. So this.valueIsTyped will always be undefined.
2. Even if 1. is fixed, |this.value = url;| will clear .valueIsTyped in the setter function.
https://mxr.mozilla.org/mozilla-central/source/browser/base/content/urlbarBindings.xml?rev=dfb4db3ac446&mark=285-285#285
So .valueIsTyped will always be false.
Assignee | ||
Updated•11 years ago
|
Attachment #8339136 -
Flags: review?(bzbarsky)
Comment 51•11 years ago
|
||
Comment on attachment 8339136 [details] [diff] [review]
Patch that addresses bitrot
Can we also nix FIXUP_FLAG_USE_UTF8 and its consumers, please?
Attachment #8339136 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 52•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #51)
Thanks for the r+.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b4c3cd28738
> Can we also nix FIXUP_FLAG_USE_UTF8 and its consumers, please?
Bug 945627.
Comment 53•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 17 years ago → 11 years ago
Resolution: --- → FIXED
Whiteboard: [not needed for 1.9]
Target Milestone: --- → mozilla28
status-firefox28:
--- → fixed
Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•