Add "uriCharsetEncodingHint" field to nsIURI

RESOLVED FIXED in Future

Status

()

P4
normal
RESOLVED FIXED
18 years ago
16 years ago

People

(Reporter: nhottanscp, Assigned: neeti)

Tracking

Trunk
Future
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

18 years ago
This was proposed in mozilla netlib newsgroup.
news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com

By adding "hint_charset", charset information will not be so the clients can use 
that information to apply appropriate charset conversion. Note that 
"hint_charset" may not match with nsIURI internal charset (which is UTF-8).
How about calling it |uriCharsetEncodingHint|?  This would help make it clear
that the encoding in question is intended to apply to the URI itself, and not
the thing pointed to by the URI.
(Reporter)

Comment 2

18 years ago
That sounds good, change the summary.
Summary: Add "hint_charset" field to nsIURI → Add "uriCharsetEncodingHint" field to nsIURI
(Assignee)

Updated

18 years ago
Priority: -- → P4
Target Milestone: --- → mozilla1.0

Comment 3

17 years ago
Let me try to understand, this would most likely be acquired from the charset in
the HTML / HTTP response / overridden from View->Encoding menu?
What uses (other than IDN) do you reckon this would be good for?
(Reporter)

Comment 4

17 years ago
Other cases would be path names, file names.
Possibly relevant to this is bug 84186.

Comment 6

17 years ago
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1

Comment 7

17 years ago
this proposal:

news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com

fails to satisfy several problems:

1) HTTP nsIURI's can be instantiated through redirects in which no charset
information is available.  the server may generate a URL in response to a
redirect that contains URL-escaped non-ASCII characters.  we have no way of
converting these URLs to UTF8.

2) also, servers may URL-escape characters that would interfer with parsing a
URL such as a '/' that is part of a path element and not a path element
delimiter... or a '@' in someones password.  there is a set of reserved
characters that must be URL-escaped, otherwise the URL would fail to parse properly.

in summary, adding a charset attribute to nsIURI is insufficient.
(Reporter)

Comment 8

17 years ago
>adding a charset attribute to nsIURI is insufficient.

I agree, but we need the hint charset in order to support existing documents if
we switch to UTF-8 URI.

Comment 9

17 years ago
nhotta:

the problem is that we cannot switch to a UTF-8 URL in all cases.  in some cases
we have no way of converting the unescaped URL to UTF-8.  now, that doesn't stop
us from converting the escaped URL to UTF-8, which is of course a no-op.  so it
is possible for nsIURI to support UTF-8 w/o requiring that all unescaped URI's
be encoded using UTF-8.  URIs for some protocols should simply never be
unescaped.  HTTP is an example of one such protocol.

HTTP for example will most likely not use the charset attribute since there is
no way to know in general what charset sequences %80-%FF correspond to.  HTTP
URLs shouldn't be unescaped.

there are of course exceptions, and we want to make sure that, in the cases
where charset information does exist, we try to show the user the unescaped URI.
this really means that we should show the user the URI with escape sequences %80
and above unescaped.  other escape sequences should probably stay intact since
they could correspond to control characters and other reserved characters that
would either make the URL not display properly or make the URI mean something
entirely different.

Comment 10

17 years ago
As far as I know we store the URL escaped in nsStandardURL. This is a must! We
need to change the escaping to no longer escape chars > 127 by default. On
protocols that need to be in ASCII (could be stored on the protocol information)
we need a second special escaping run for all chars > 127. That can happen just
before sending the request to the server.

URLs as a whole should be unescaped for displaying purpose only.
(Reporter)

Comment 11

17 years ago
>URLs as a whole should be unescaped for displaying purpose only.
I agree. The hint charset may be used to display if available.

I talked about the unescaped case. Unescaped non ASCII URI in a document (e.g.
HREF) is most likely in a charset of the document. I think we don't currently
convert thoese URI to UTF-8 but I am not sure if those are escaped in nsIURI or
left unescaped. In either cases, the hint charset would help to display those URI.
(Assignee)

Updated

16 years ago
Target Milestone: mozilla1.0.1 → Future
(Reporter)

Comment 12

16 years ago
This is already available as originCharset in nsIURI.
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.