Closed Bug 138780 Opened 23 years ago Closed 23 years ago

Redirect with non-ASCII in URL fails

Categories

(Core :: Networking: HTTP, defect, P2)

Sun
SunOS
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: Dan.Oscarsson, Assigned: darin.moz)

References

Details

(Keywords: intl, topembed, Whiteboard: [i18n] [fixed-trunk] [adt1])

Attachments

(1 file)

In Mozilla 1.0rc1 a redirect (HTTP 301) containing a URL with non-ASCII characters fails due to the non-ASCII characters being removed. This worked in version 0.9.8. In my case, when I enter a URL in the location filed with /tjänster (ISO 8859-1) the server does a redirect to /tjänster/. But this fail and the web server log shows that Mozilla requested the URL /tjster/ in response to the redirect. So in the new code somewhere, non-ASCII characters are removed. While you can argue that it is wrong for a web server to return a URL in a header containg non-ASCII, the client should handle this in a friendly manner. As HTTP is an 8-bit transport there is no problem to send headers using 8-bit characters. In all cases where Mozilla detects non-ASCII character in URLs, Mozilla should handle them, not delete them. It may convert them into UTF-8, but not remove them. The standards for having non-ASCII is far behind the real world. Mozilla needs to read, write, send, receive and display URLs containg non-ASCII in a user friendly manner.
how should I know, that this char is the same as the server thinks..?!? voting for invalid.
> In all cases where Mozilla detects non-ASCII character in URLs, Mozilla > should handle them When we detect a non-ascii char, we assume that it is encoded in UTF-8, since that's the standard non-ascii encoding for URIs. Your URI is encoded as ISO-8859-1, not UTF8. In UTF8, an 8-bit-set char _must_ be followed by another 8-bit-set char. So we take the following char, discover it does not have the 8-th bit set, and the UTF-decoder discards both chars and goes on. Summary: We _do_ handle non-ascii URLs but you have to properly encode them in UTF-8.
There is no standard encoding of non_ASCII yet, but there is a proposal. If you look in: http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt in section 4.6 you will find info about handling non-ASCII. True, the proposed standard is UTF-8, but you should handle older software sending other encodings. I have change my web server to send redirects using UTF-8 to see what happens.In location field I get: In Mozilla 0.9.8 I get: /Tj%C3%A4nster/ In Netscape 4: /Tjänster/ (before change to UTF-8 I got /Tjänster/) In Mozilla 1.0rc1: /Tj%E4nster/ As my web server can handle both raw and %-encoded UTF-8 as well as local ISO 8859-1, all things do work. Yes, Mozilla 1.0 does handle UTF-8 in redirects. The display is wrong. In Netscape 4 the display we correct before switching to UTF-8. At least MS IE 5 on Unix will do the same display as Netscape 4 does. Using a redirect using ISO 8859-1 will work and be displayed correctely in MS IE. So while it is the right way to go by handling URLs internally as UTF-8 and expecting them to be sent/received as UTF-8, Mozilla should still be able to handle URLs with non_ASCII not encoded using UTF-8 as it will take time before old software is fixed.
Whiteboard: [possible dupe of bug 138877]
Dan: there is absolutely no way for mozilla to know for certain what charset the non-ASCII characters belong to. this is a problem with redirects because unlike links contained in a document, there is no charset context. and, fwiw, the HTTP spec does not allow the transmission of non-ASCII characters in the raw. they have to be properly escaped. that said, mozilla could work around this problem by simply escaping what the server failed to escape. i believe my patch for bug 138877 might actually fix this problem. see bug 138877 comment #25 for details.
Attached patch v1 patchSplinter Review
actually, after some more thought... i think this is what is needed.
*** Bug 112305 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: [possible dupe of bug 138877] → [i18n]
Target Milestone: --- → mozilla1.0
Attachment #80614 - Flags: review+
fixed-on-trunk gagan: i think we should consider this one for the branch.
Keywords: adt1.0.0
Whiteboard: [i18n] → [i18n] [fixed-trunk]
This sounds like a pretty bad regression, nsbeta1+/adt1.
Keywords: nsbeta1nsbeta1+
Whiteboard: [i18n] [fixed-trunk] → [i18n] [fixed-trunk] [adt2]
Keywords: intl
Comment on attachment 80614 [details] [diff] [review] v1 patch a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #80614 - Flags: approval+
raising impact and adding adt1.0.0+. Please check this into the branch as soon as possible and add the fixed1.0.0 keyword.
Keywords: adt1.0.0adt1.0.0+
Whiteboard: [i18n] [fixed-trunk] [adt2] → [i18n] [fixed-trunk] [adt1]
fixed-on-branch
Keywords: fixed1.0.0
marking FIXED
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
cc'ing benc
Tom and I were talking about this... one question: would the fix here fix the problem originally stated? (thinks: no)...
benc: yes, it should fix the problem as originally stated.
Yes, I have check and it does work. Though I think it will have to be reworked soon. When I tried it I first had my patch in my web server for doing redirects using UTF-8. This resulted in URLs containg mixed UTF-8 and ISO 8859-1 (my local character set) due to the new URI handling code. This works in my web server but I suspect other servers are not that advanced. Going back to doing redirects using ISO 8859-1 works as before. There are several difficult areas now when we are switching to UTF-8 on the protocol level. And there are several problems with the current URI handling in Mozilla. I think I will create an new bug report and try to describe what the problems are and what needs to be done. But it is not easy to find all places in the code where URIs are handled.
dan: can you simply provide a live testcase?
Sorry, I have no web server on the Internet so I cannot give a live testcase. If it is the case with mixed UTF-8 and ISO 8859-1, I can explain why it can happen: From what I can see, Mozilla now stores URIs internally using the local character set instead of always using UTF-8. When I enter a URL in the location field, it is internally stored as ISO 8859-1 (which is my local character set). If you get a redirect giving a UTF-8 encoded URI, it is %-encoded and stored as %-encoded UTF-8 in the URI and displayed %-encoded in the location field. If I then add another path segment to the displayed URI with non-ASCII, that segment ends up as %-encoded ISO 8859-1. This way I get a URI with both UTF-8 and ISO 8859-1 in. Most of these problems would go away if URIs always (or at least as often as possible) internally where stored as UTF-8 strings and only were converted to other character sets with/without %-encoding where needed. For example when doing the HTTP call, the URI could be converted into local character set or UTF-8 depending on users preferences (or identified web server preferences).
Dan: thanks for the additional information ... i think the bug you are now describing is a bit different then the original bug. can you file a separate bug on the mixed encodings issue... please assign it to internationalization. thx! nhotta: see Dan's previous comment.
verified trunk and branch, 05/28/02 builds, winNT4, linux rh6, mac osX
Status: RESOLVED → VERIFIED
Keywords: verified1.0.0
forgot to remove fixed1.0.0 keyword so doing so now
Keywords: fixed1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: