URLs are displayed using %-encoding when they should not



Location Bar
16 years ago
10 years ago


(Reporter: Dan.Oscarsson, Assigned: Joe Hewitt (gone))


Firefox Tracking Flags

(Not tracked)




16 years ago
URL Bar is the nearests to the URL input field I could find, if they are not
the same you have to change this bug to a different component.

In Mozilla 1.0rc1 you are getting closer to handling non-ASCII in URLs.

In Mozilla 0.9.8 when I entered the URL in the location field: /Tjänster
in was changed to: /Tj%C3%A4nster/.
In 1.0rc1 it is changed to: /Tj%E4nster/

I am guessing this is because my local character set is ISO 8859-1.
As the server is doing a redirect using a UTF-8 URL (1.0rc1 do no longer
handle redirects with ISO 8859-1 in) it looks like now Mozilla understands
that my local character set is ISO 8859-1.
This is fine!

But the URL should not be displayed using %-encoding when the characters
can be displayed as themselves.
The above URL should be displayed as /Tjänster/
not as a %-encoding of the URL in local character set.

Comment 1

16 years ago
My understanding of RfC-1738 is that this bug is INVALID.

That RFC doesn't really have much to say about _display_ of URLs....
OS: SunOS → All
Hardware: Sun → All

Comment 3

16 years ago
"2.2. URL Character Encoding Issues" says:

"URLs are written only with the graphic printable characters of the US-ASCII
coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and
the octets 00-1F and 7F hexadecimal represent control characters; these must be

So I think this is what Mozilla should do.

Boris: darin has been working on implementing iDNS, which permits UTF-8 URLs,
AIUI.  Allowing UTF-8 characters in the URLbar should be part of this...

Comment 5

16 years ago
Have a look in:

While a "URL" is said to be only ASCII and the "IRI" above is the
name of the URL in international context.
The most important thing for users is that a URL is displayed using
the available characters instead of %-encoding everything not ASCII.
People want to see things using their own letters.

All software should hide protocol details (like %-encoding) from
the user, if possible.

You can also see bug id: 105909 which was entered before
you did your redesign on URI-handling.

Comment 6

16 years ago
fixing this bug is definitely a noble goal IMO.  it is extremely tricky,
however, because there is no guarantee that non-ASCII bytes in an URL string
correspond to any charset at all.  moreover, some of the bytes may correspond to
a charset and some may not... it is impossible to know for sure.

nsIURI::originCharset hints to the charset of the unescaped URL string.  it may
not be correct though.  we basically need some sort of decoder that will
preserve the % escape sequences for characters that do not decode correctly.

Comment 7

16 years ago
Yes, you need a special decoder/encoder for displaying URLs.
There is both the possibly ACE-encoded host name part and then the
%-encoded path part.

When displaying, all characters that the local locale (isprint()) says is
printable, should be displayed as a character, others should be displayed
as %-encoded in in non-hostname part. If host name contains characters
that are not displayable, the host name need to be displayed as the
IDNA ACE-encoded name (if IDNA get selected to be used).
You're assuming the local locale has something to do with the encoding of the
URL...  it need not at all.

Comment 9

16 years ago
cc'ing nhotta

Comment 10

15 years ago
loading file ÆæØøÅåÄäÖö.html will display as
That's competely unreadable. The URL bar should be considered to be a tiny
editor. I need to read things there.

There are several instances of this bug however. bug 137597 is another.

Comment 11

15 years ago

*** This bug has been marked as a duplicate of 105909 ***
Last Resolved: 15 years ago
Resolution: --- → DUPLICATE
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.