Closed Bug 138780 Opened 23 years ago Closed 23 years ago

Redirect with non-ASCII in URL fails

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: Dan.Oscarsson, Assigned: darin.moz)

References

Details

(Keywords: intl, topembed, Whiteboard: [i18n] [fixed-trunk] [adt1])

Attachments

(1 file)

v1 patch 23 years ago Darin Fisher 786 bytes, patch	andreas.otte : review+ rpotts : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Dan.Oscarsson

Reporter

Description

•

23 years ago

In Mozilla 1.0rc1 a redirect (HTTP 301) containing a URL with non-ASCII characters fails due to the non-ASCII characters being removed. This worked in version 0.9.8. In my case, when I enter a URL in the location filed with /tjänster (ISO 8859-1) the server does a redirect to /tjänster/. But this fail and the web server log shows that Mozilla requested the URL /tjster/ in response to the redirect. So in the new code somewhere, non-ASCII characters are removed. While you can argue that it is wrong for a web server to return a URL in a header containg non-ASCII, the client should handle this in a friendly manner. As HTTP is an 8-bit transport there is no problem to send headers using 8-bit characters. In all cases where Mozilla detects non-ASCII character in URLs, Mozilla should handle them, not delete them. It may convert them into UTF-8, but not remove them. The standards for having non-ASCII is far behind the real world. Mozilla needs to read, write, send, receive and display URLs containg non-ASCII in a user friendly manner.

Kai Lahmann (is there, where MNG is)

Comment 1

•

23 years ago

how should I know, that this char is the same as the server thinks..?!? voting for invalid.

Boris Zbarsky [:bzbarsky]

Comment 2

•

23 years ago

> In all cases where Mozilla detects non-ASCII character in URLs, Mozilla > should handle them When we detect a non-ascii char, we assume that it is encoded in UTF-8, since that's the standard non-ascii encoding for URIs. Your URI is encoded as ISO-8859-1, not UTF8. In UTF8, an 8-bit-set char _must_ be followed by another 8-bit-set char. So we take the following char, discover it does not have the 8-th bit set, and the UTF-decoder discards both chars and goes on. Summary: We _do_ handle non-ascii URLs but you have to properly encode them in UTF-8.

Dan.Oscarsson

Reporter

Comment 3

•

23 years ago

There is no standard encoding of non_ASCII yet, but there is a proposal. If you look in: http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt in section 4.6 you will find info about handling non-ASCII. True, the proposed standard is UTF-8, but you should handle older software sending other encodings. I have change my web server to send redirects using UTF-8 to see what happens.In location field I get: In Mozilla 0.9.8 I get: /Tj%C3%A4nster/ In Netscape 4: /TjÃ¤nster/ (before change to UTF-8 I got /Tjänster/) In Mozilla 1.0rc1: /Tj%E4nster/ As my web server can handle both raw and %-encoded UTF-8 as well as local ISO 8859-1, all things do work. Yes, Mozilla 1.0 does handle UTF-8 in redirects. The display is wrong. In Netscape 4 the display we correct before switching to UTF-8. At least MS IE 5 on Unix will do the same display as Netscape 4 does. Using a redirect using ISO 8859-1 will work and be displayed correctely in MS IE. So while it is the right way to go by handling URLs internally as UTF-8 and expecting them to be sent/received as UTF-8, Mozilla should still be able to handle URLs with non_ASCII not encoded using UTF-8 as it will take time before old software is fixed.

Darin Fisher

Assignee

Updated

•

23 years ago

Whiteboard: [possible dupe of bug 138877]

Darin Fisher

Assignee

Comment 4

•

23 years ago

Dan: there is absolutely no way for mozilla to know for certain what charset the non-ASCII characters belong to. this is a problem with redirects because unlike links contained in a document, there is no charset context. and, fwiw, the HTTP spec does not allow the transmission of non-ASCII characters in the raw. they have to be properly escaped. that said, mozilla could work around this problem by simply escaping what the server failed to escape. i believe my patch for bug 138877 might actually fix this problem. see bug 138877 comment #25 for details.

Darin Fisher

Assignee

Comment 5

•

23 years ago

Attached patch v1 patch — Details — Splinter Review

actually, after some more thought... i think this is what is needed.

Darin Fisher

Assignee

Comment 6

•

23 years ago

*** Bug 112305 has been marked as a duplicate of this bug. ***

Darin Fisher

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Keywords: mozilla1.0, nsbeta1, topembed

Priority: -- → P2

Whiteboard: [possible dupe of bug 138877] → [i18n]

Target Milestone: --- → mozilla1.0

Andreas Otte

Comment 7

•

23 years ago

Comment on attachment 80614 [details] [diff] [review] v1 patch looks good to me, r=andreas.otte@debitel.net

Attachment #80614 - Flags: review+

rpotts (gone)

Comment 8

•

23 years ago

Comment on attachment 80614 [details] [diff] [review] v1 patch sr=rpotts@netscape.com

Attachment #80614 - Flags: superreview+

Darin Fisher

Assignee

Comment 9

•

23 years ago

fixed-on-trunk gagan: i think we should consider this one for the branch.

Keywords: adt1.0.0

Whiteboard: [i18n] → [i18n] [fixed-trunk]

Jaime Rodriguez, Jr.

Comment 10

•

23 years ago

This sounds like a pretty bad regression, nsbeta1+/adt1.

Keywords: nsbeta1 → nsbeta1+

Whiteboard: [i18n] [fixed-trunk] → [i18n] [fixed-trunk] [adt2]

Frank Tang

Updated

•

23 years ago

Keywords: intl

Asa Dotzler [:asa]

Comment 11

•

23 years ago

Comment on attachment 80614 [details] [diff] [review] v1 patch a=asa (on behalf of drivers) for checkin to the 1.0 branch

Attachment #80614 - Flags: approval+

scottputterman

Comment 12

•

23 years ago

raising impact and adding adt1.0.0+. Please check this into the branch as soon as possible and add the fixed1.0.0 keyword.

Keywords: adt1.0.0 → adt1.0.0+

Whiteboard: [i18n] [fixed-trunk] [adt2] → [i18n] [fixed-trunk] [adt1]

Darin Fisher

Assignee

Comment 13

•

23 years ago

fixed-on-branch

Keywords: fixed1.0.0

Darin Fisher

Assignee

Comment 14

•

23 years ago

marking FIXED

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Tom Everingham

Comment 15

•

23 years ago

cc'ing benc

benc

Comment 16

•

23 years ago

Tom and I were talking about this... one question: would the fix here fix the problem originally stated? (thinks: no)...

Darin Fisher

Assignee

Comment 17

•

23 years ago

benc: yes, it should fix the problem as originally stated.

Dan.Oscarsson

Reporter

Comment 18

•

23 years ago

Yes, I have check and it does work. Though I think it will have to be reworked soon. When I tried it I first had my patch in my web server for doing redirects using UTF-8. This resulted in URLs containg mixed UTF-8 and ISO 8859-1 (my local character set) due to the new URI handling code. This works in my web server but I suspect other servers are not that advanced. Going back to doing redirects using ISO 8859-1 works as before. There are several difficult areas now when we are switching to UTF-8 on the protocol level. And there are several problems with the current URI handling in Mozilla. I think I will create an new bug report and try to describe what the problems are and what needs to be done. But it is not easy to find all places in the code where URIs are handled.

Darin Fisher

Assignee

Comment 19

•

23 years ago

dan: can you simply provide a live testcase?

Dan.Oscarsson

Reporter

Comment 20

•

23 years ago

Sorry, I have no web server on the Internet so I cannot give a live testcase. If it is the case with mixed UTF-8 and ISO 8859-1, I can explain why it can happen: From what I can see, Mozilla now stores URIs internally using the local character set instead of always using UTF-8. When I enter a URL in the location field, it is internally stored as ISO 8859-1 (which is my local character set). If you get a redirect giving a UTF-8 encoded URI, it is %-encoded and stored as %-encoded UTF-8 in the URI and displayed %-encoded in the location field. If I then add another path segment to the displayed URI with non-ASCII, that segment ends up as %-encoded ISO 8859-1. This way I get a URI with both UTF-8 and ISO 8859-1 in. Most of these problems would go away if URIs always (or at least as often as possible) internally where stored as UTF-8 strings and only were converted to other character sets with/without %-encoding where needed. For example when doing the HTTP call, the URI could be converted into local character set or UTF-8 depending on users preferences (or identified web server preferences).

Darin Fisher

Assignee

Comment 21

•

23 years ago

Dan: thanks for the additional information ... i think the bug you are now describing is a bit different then the original bug. can you file a separate bug on the mixed encodings issue... please assign it to internationalization. thx! nhotta: see Dan's previous comment.

Tom Everingham

Comment 22

•

23 years ago

verified trunk and branch, 05/28/02 builds, winNT4, linux rh6, mac osX

Status: RESOLVED → VERIFIED

Keywords: verified1.0.0

benc

Comment 23

•

23 years ago

forgot to remove fixed1.0.0 keyword so doing so now

Keywords: fixed1.0.0

You need to log in before you can comment on or make changes to this bug.