Closed Bug 138951 Opened 22 years ago Closed 21 years ago

URLs are displayed using %-encoding when they should not

Categories

(SeaMonkey :: Location Bar, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 105909

People

(Reporter: Dan.Oscarsson, Assigned: hewitt)

Details

URL Bar is the nearests to the URL input field I could find, if they are not
the same you have to change this bug to a different component.

In Mozilla 1.0rc1 you are getting closer to handling non-ASCII in URLs.

In Mozilla 0.9.8 when I entered the URL in the location field: /Tjänster
in was changed to: /Tj%C3%A4nster/.
In 1.0rc1 it is changed to: /Tj%E4nster/

I am guessing this is because my local character set is ISO 8859-1.
As the server is doing a redirect using a UTF-8 URL (1.0rc1 do no longer
handle redirects with ISO 8859-1 in) it looks like now Mozilla understands
that my local character set is ISO 8859-1.
This is fine!

But the URL should not be displayed using %-encoding when the characters
can be displayed as themselves.
The above URL should be displayed as /Tjänster/
not as a %-encoding of the URL in local character set.
My understanding of RfC-1738 is that this bug is INVALID.

pi
That RFC doesn't really have much to say about _display_ of URLs....
OS: SunOS → All
Hardware: Sun → All
"2.2. URL Character Encoding Issues" says:

"URLs are written only with the graphic printable characters of the US-ASCII
coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and
the octets 00-1F and 7F hexadecimal represent control characters; these must be
encoded."

So I think this is what Mozilla should do.

pi
Boris: darin has been working on implementing iDNS, which permits UTF-8 URLs,
AIUI.  Allowing UTF-8 characters in the URLbar should be part of this...
Have a look in:
http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt

While a "URL" is said to be only ASCII and the "IRI" above is the
name of the URL in international context.
The most important thing for users is that a URL is displayed using
the available characters instead of %-encoding everything not ASCII.
People want to see things using their own letters.

All software should hide protocol details (like %-encoding) from
the user, if possible.

You can also see bug id: 105909 which was entered before
you did your redesign on URI-handling.
fixing this bug is definitely a noble goal IMO.  it is extremely tricky,
however, because there is no guarantee that non-ASCII bytes in an URL string
correspond to any charset at all.  moreover, some of the bytes may correspond to
a charset and some may not... it is impossible to know for sure.

nsIURI::originCharset hints to the charset of the unescaped URL string.  it may
not be correct though.  we basically need some sort of decoder that will
preserve the % escape sequences for characters that do not decode correctly.
Yes, you need a special decoder/encoder for displaying URLs.
There is both the possibly ACE-encoded host name part and then the
%-encoded path part.

When displaying, all characters that the local locale (isprint()) says is
printable, should be displayed as a character, others should be displayed
as %-encoded in in non-hostname part. If host name contains characters
that are not displayable, the host name need to be displayed as the
IDNA ACE-encoded name (if IDNA get selected to be used).
You're assuming the local locale has something to do with the encoding of the
URL...  it need not at all.
cc'ing nhotta
loading file ÆæØøÅåÄäÖö.html will display as
%C3%86%C3%A6%C3%98%C3%B8%C3%85%C3%A5%C3%84%C3%A4%C3%96%C3%B6.html
That's competely unreadable. The URL bar should be considered to be a tiny
editor. I need to read things there.

There are several instances of this bug however. bug 137597 is another.

*** This bug has been marked as a duplicate of 105909 ***
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.