Closed
Bug 138780
Opened 23 years ago
Closed 23 years ago
Redirect with non-ASCII in URL fails
Categories
(Core :: Networking: HTTP, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: Dan.Oscarsson, Assigned: darin.moz)
References
Details
(Keywords: intl, topembed, Whiteboard: [i18n] [fixed-trunk] [adt1])
Attachments
(1 file)
786 bytes,
patch
|
andreas.otte
:
review+
rpotts
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
In Mozilla 1.0rc1 a redirect (HTTP 301) containing a URL with non-ASCII
characters fails due to the non-ASCII characters being removed.
This worked in version 0.9.8.
In my case, when I enter a URL in the location filed with /tjänster (ISO 8859-1)
the server does a redirect to /tjänster/. But this fail and the web server
log shows that Mozilla requested the URL /tjster/ in response to the redirect.
So in the new code somewhere, non-ASCII characters are removed.
While you can argue that it is wrong for a web server to return
a URL in a header containg non-ASCII, the client should handle this in
a friendly manner. As HTTP is an 8-bit transport there is no problem to
send headers using 8-bit characters.
In all cases where Mozilla detects non-ASCII character in URLs, Mozilla
should handle them, not delete them. It may convert them into UTF-8, but
not remove them. The standards for having non-ASCII is far behind the
real world. Mozilla needs to read, write, send, receive and display
URLs containg non-ASCII in a user friendly manner.
Comment 1•23 years ago
|
||
how should I know, that this char is the same as the server thinks..?!? voting
for invalid.
![]() |
||
Comment 2•23 years ago
|
||
> In all cases where Mozilla detects non-ASCII character in URLs, Mozilla
> should handle them
When we detect a non-ascii char, we assume that it is encoded in UTF-8, since
that's the standard non-ascii encoding for URIs. Your URI is encoded as
ISO-8859-1, not UTF8. In UTF8, an 8-bit-set char _must_ be followed by another
8-bit-set char. So we take the following char, discover it does not have the
8-th bit set, and the UTF-decoder discards both chars and goes on.
Summary: We _do_ handle non-ascii URLs but you have to properly encode them in
UTF-8.
Reporter | ||
Comment 3•23 years ago
|
||
There is no standard encoding of non_ASCII yet, but there is a proposal.
If you look in:
http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt
in section 4.6 you will find info about handling non-ASCII.
True, the proposed standard is UTF-8, but you should handle
older software sending other encodings.
I have change my web server to send redirects using UTF-8 to see what
happens.In location field I get:
In Mozilla 0.9.8 I get: /Tj%C3%A4nster/
In Netscape 4: /Tjänster/ (before change to UTF-8 I got /Tjänster/)
In Mozilla 1.0rc1: /Tj%E4nster/
As my web server can handle both raw and %-encoded UTF-8 as well as
local ISO 8859-1, all things do work.
Yes, Mozilla 1.0 does handle UTF-8 in redirects. The display is wrong.
In Netscape 4 the display we correct before switching to UTF-8.
At least MS IE 5 on Unix will do the same display as Netscape 4 does.
Using a redirect using ISO 8859-1 will work and be displayed correctely
in MS IE.
So while it is the right way to go by handling URLs internally as UTF-8
and expecting them to be sent/received as UTF-8, Mozilla should still
be able to handle URLs with non_ASCII not encoded using UTF-8 as
it will take time before old software is fixed.
Assignee | ||
Updated•23 years ago
|
Whiteboard: [possible dupe of bug 138877]
Assignee | ||
Comment 4•23 years ago
|
||
Dan: there is absolutely no way for mozilla to know for certain what charset the
non-ASCII characters belong to. this is a problem with redirects because unlike
links contained in a document, there is no charset context. and, fwiw, the HTTP
spec does not allow the transmission of non-ASCII characters in the raw. they
have to be properly escaped.
that said, mozilla could work around this problem by simply escaping what the
server failed to escape.
i believe my patch for bug 138877 might actually fix this problem. see bug
138877 comment #25 for details.
Assignee | ||
Comment 5•23 years ago
|
||
actually, after some more thought... i think this is what is needed.
Assignee | ||
Comment 6•23 years ago
|
||
*** Bug 112305 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: [possible dupe of bug 138877] → [i18n]
Target Milestone: --- → mozilla1.0
Comment 7•23 years ago
|
||
Comment on attachment 80614 [details] [diff] [review]
v1 patch
looks good to me, r=andreas.otte@debitel.net
Attachment #80614 -
Flags: review+
Comment 8•23 years ago
|
||
Attachment #80614 -
Flags: superreview+
Assignee | ||
Comment 9•23 years ago
|
||
fixed-on-trunk
gagan: i think we should consider this one for the branch.
Keywords: adt1.0.0
Whiteboard: [i18n] → [i18n] [fixed-trunk]
Comment 10•23 years ago
|
||
This sounds like a pretty bad regression, nsbeta1+/adt1.
Comment 11•23 years ago
|
||
Comment on attachment 80614 [details] [diff] [review]
v1 patch
a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #80614 -
Flags: approval+
Comment 12•23 years ago
|
||
raising impact and adding adt1.0.0+. Please check this into the branch as soon
as possible and add the fixed1.0.0 keyword.
Assignee | ||
Comment 14•23 years ago
|
||
marking FIXED
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 15•23 years ago
|
||
cc'ing benc
Comment 16•23 years ago
|
||
Tom and I were talking about this... one question: would the fix here fix the
problem originally stated? (thinks: no)...
Assignee | ||
Comment 17•23 years ago
|
||
benc: yes, it should fix the problem as originally stated.
Reporter | ||
Comment 18•23 years ago
|
||
Yes, I have check and it does work.
Though I think it will have to be reworked soon.
When I tried it I first had my patch in my web server for
doing redirects using UTF-8. This resulted in
URLs containg mixed UTF-8 and ISO 8859-1 (my local character set)
due to the new URI handling code.
This works in my web server but I suspect other servers are
not that advanced.
Going back to doing redirects using ISO 8859-1 works
as before.
There are several difficult areas now when we are switching
to UTF-8 on the protocol level. And there are several problems
with the current URI handling in Mozilla.
I think I will create an new bug report and try to
describe what the problems are and what needs to be
done. But it is not easy to find all places in
the code where URIs are handled.
Assignee | ||
Comment 19•23 years ago
|
||
dan: can you simply provide a live testcase?
Reporter | ||
Comment 20•23 years ago
|
||
Sorry, I have no web server on the Internet so I cannot give a live
testcase. If it is the case with mixed UTF-8 and ISO 8859-1, I can
explain why it can happen:
From what I can see, Mozilla now stores URIs internally using
the local character set instead of always using UTF-8.
When I enter a URL in the location field, it is internally
stored as ISO 8859-1 (which is my local character set).
If you get a redirect giving a UTF-8 encoded URI, it is %-encoded
and stored as %-encoded UTF-8 in the URI and displayed %-encoded
in the location field. If I then add another path segment to the displayed
URI with non-ASCII, that segment ends up as %-encoded ISO 8859-1.
This way I get a URI with both UTF-8 and ISO 8859-1 in.
Most of these problems would go away if URIs always (or at least
as often as possible) internally where stored as UTF-8 strings and
only were converted to other character sets with/without %-encoding
where needed. For example when doing the HTTP call, the URI
could be converted into local character set or UTF-8 depending
on users preferences (or identified web server preferences).
Assignee | ||
Comment 21•23 years ago
|
||
Dan: thanks for the additional information ... i think the bug you are now
describing is a bit different then the original bug. can you file a separate
bug on the mixed encodings issue... please assign it to internationalization. thx!
nhotta: see Dan's previous comment.
Comment 22•23 years ago
|
||
verified trunk and branch, 05/28/02 builds, winNT4, linux rh6, mac osX
Status: RESOLVED → VERIFIED
Keywords: verified1.0.0
You need to log in
before you can comment on or make changes to this bug.
Description
•