IDN: URL in status bar is displayed as garbage if the path part has non-ASCII characters in non-UTF-8 encoding

RESOLVED WORKSFORME

Status

()

RESOLVED WORKSFORME
15 years ago
4 years ago

People

(Reporter: kazhik, Assigned: smontagu)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {intl})

Trunk
mozilla1.9alpha1
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 obsolete attachments)

(Reporter)

Description

15 years ago
URL in status bar is displayed as garbage if Non-ASCII characters are
used in domain name and name attribute.

http://<non-ASCII domain name>/index.html#<non-ASCII name attribute>

The second non-ASCII characters are displayed correctly, but the first
aren't.

Original report in Bugzilla-jp
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=3513

Comment 1

15 years ago
What has to be 
http://賢明.jp/ent_exam/ent_exam.html#メニュー

is displayed as

http://莖∽??.jp/ent_exam/ent_exam.html#メニュー

With a debug build, I got a few assertions in xpconnvert.cpp and nsUTF8Utils.h
so that there is a conversion problem somewhere. My guess is that the first four
bytes of 賢明 (in EUC-JP) correspond to 莖∽ in UTF-8 and the rest two bytes
don't form a valid character in EUC-JP so that they're turned into question
marks. I'll check this out when I'm on Linux. (I can do it now, but it's a
little cumbersome).

This happens probably because somewhere we have a URI with the host address in
UTF-8 and the path part in EUC-JP. Given this URI, we try to convert it to
Unicode (UTF8 or UTF-16) as if the whole URI is in EUC-JP (originCharset).




Keywords: intl
OS: Linux → All
Hardware: PC → All

Comment 2

15 years ago
> UTF-8 and the path part in EUC-JP. Given this URI, we try to convert it to
> Unicode (UTF8 or UTF-16) as if the whole URI is in EUC-JP (originCharset)
 
 We only try this conversion when a given URI spec is NOT a valid UTF-8. 
With UTF-8 in the host address part and EUC-JP in the path part, it's not a
valid UTF-8 as a whole so that we assume the whole URI spec is in EUC-JP.
Therefore, this problem doesn't occur if we just have the host part in UTF-8
followed by the path part in ASCII-only. To see that, try
http://bugzilla.mozilla.gr.jp/attachment.cgi?id=1954 (quoted in bug 229546)

Darin, can I assume that the host part of _any_ URI is _always_ in UTF-8? Then,
I can fix this in nsISubTextURI (?). However, that wouldn't be pretty. 

Assignee: smontagu → jshin

Updated

15 years ago
Blocks: 237820

Updated

15 years ago
Summary: IDN: URL in status bar is displayed as garbage → IDN: URL in status bar is displayed as garbage if the path part has non-ASCII characters in non-UTF-8 encoding

Comment 3

15 years ago
I cannot reproduce 2004050304-trunk/WinXP.
WORKSFORME?

Comment 4

15 years ago
Sorry...

Reproduced with 2004050304-trunk/Win98, 20040503-trunk(Firefox)/Win98,
20040503-trunk(Firefox)/WinXP.

Comment 5

15 years ago
> Darin, can I assume that the host part of _any_ URI is _always_ in UTF-8? Then,
> I can fix this in nsISubTextURI (?). However, that wouldn't be pretty. 

nsIURI::host is always encoded using UTF-8.

Comment 6

14 years ago
Created attachment 171172 [details] [diff] [review]
patch

This fixes bug 229546 as well and can also be used for fixing bug 200150.

Comment 7

14 years ago
I've got a little more robust patch. This should be fixed before 1.8beta.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta

Comment 8

14 years ago
Created attachment 171508 [details] [diff] [review]
patch

asking for review
Attachment #171172 - Attachment is obsolete: true
Attachment #171508 - Flags: superreview?(darin)
Attachment #171508 - Flags: review?(smontagu)
(Assignee)

Comment 9

14 years ago
Comment on attachment 171508 [details] [diff] [review]
patch

This would have been easier to review with more context, by the way.
Attachment #171508 - Flags: review?(smontagu) → review+

Comment 10

14 years ago
Created attachment 171515 [details] [diff] [review]
patch with more context

thanks for r and sorry for too little context. I was just too lazy to get rid
of another patch nearby (for bug 244754) and took a short-cut by omitting '-u'
option.
Attachment #171508 - Attachment is obsolete: true
Attachment #171515 - Flags: superreview?(darin)
Attachment #171515 - Flags: review+

Updated

14 years ago
Attachment #171508 - Flags: superreview?(darin)

Comment 11

14 years ago
Comment on attachment 171515 [details] [diff] [review]
patch with more context

>Index: intl/uconv/src/nsTextToSubURI.cpp

>+  nsCOMPtr<nsIURLParser> urlParser;
>+  // should we just use net_GetStdURLParser()? 
>+  urlParser = do_GetService(NS_STDURLPARSER_CONTRACTID, &rv);
>+  NS_ENSURE_SUCCESS(rv, rv);

net_GetStdURLParser is an internal necko method.  since this code
is not part of the necko DLL, it cannot use it.

How do you know that this is the correct nsIURLParser instance for
the given URI?	I don't think you can know that it is.	What if the
given URI scheme does not support an authority section, but would
erroneously be parsed as having one by the STDURLPARSER?

I think you should use nsIIOService::newURI instead, to construct
a nsIURI.  Then, call GetHost, and check that instead.
Attachment #171515 - Flags: superreview?(darin) → superreview-

Comment 12

14 years ago
Created attachment 173054 [details] [diff] [review]
patch that generates nsIURI

With standard-url.encode-utf8 set to true, this patch is not necessary for most
uris (except for file url). However, setting the pref to true (which is by
default now) doesn't fix bug 229546 and this patch fixes it.
Attachment #171515 - Attachment is obsolete: true
Attachment #173054 - Flags: superreview?(darin)
Attachment #173054 - Flags: review?(smontagu)
(Assignee)

Updated

14 years ago
Attachment #173054 - Flags: review?(smontagu) → review+

Comment 13

14 years ago
Comment on attachment 173054 [details] [diff] [review]
patch that generates nsIURI

Phew... This patch creates an infinite loop for 'javascript:.....' url (I
hadn't tested any page with  such a url) becacuse nsJSProtocolHanler relies on
nsITextToSubURI to ensure the UTF8ness of a spec (EnsureUTF8Spec method of
nsJSProtocolHandler). THere may be other protocol handlers with the same issue.
Attachment #173054 - Attachment is obsolete: true
Attachment #173054 - Flags: superreview?(darin)
Attachment #173054 - Flags: review+

Comment 14

14 years ago
(In reply to comment #13)
> Phew... This patch creates an infinite loop for 'javascript:.....' url because 
> nsJSProtocolHanler relies on nsITextToSubURI 

One way to break the infinite loop is to check if the scheme is 'javascript',
but that may not scale. Alternatively, we may add a parameter to the APIs of
nsITextToSubURI to indicate whether 'host' is present /needs to be checked for IDN.

Comment 15

14 years ago
(In reply to comment #14)

> One way to break the infinite loop is to check if the scheme is 'javascript',
> but that may not scale. Alternatively, we may add a parameter to the APIs of
> nsITextToSubURI to indicate whether 'host' is present /needs to be checked for
IDN.

Well, the second approach is just shifting the 'responsibility' to callers so
that it has the same problem. If so, just checking if the scheme is 'javascript'
(for now) in nsITexToSubURI is better.

Comment 16

14 years ago
It seems to me that these functions should take a nsIURI as their parameter
instead of a raw character string.

Updated

13 years ago
Blocks: 316730
Jshin:

I want to fix this issue myself.
Are you working on this?
Can I take this?
Target Milestone: mozilla1.8beta1 → ---
Assignee: jshin1987 → masayuki
Status: ASSIGNED → NEW
Target Milestone: --- → mozilla1.9alpha
Status: NEW → ASSIGNED
Masayuki Nakano: 2½ years after you "accepted" this bug, no one has objected. Are you still willing to fix it? And are you still experiencing it? (I'm not sure what to check against what).
(In reply to comment #18)
> Masayuki Nakano: 2½ years after you "accepted" this bug, no one has objected.
> Are you still willing to fix it? And are you still experiencing it? (I'm not
> sure what to check against what).

No, I'm not sure. I'll clean up my bug list after all Gecko1.9 works finished.
It works for me now with the latest trunk. Can you confirm?
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073100 SeaMonkey/2.0a1pre

After studying the bug again somewhat, I'd say it works for me but not decisively:

- I see the same garbage (plus www) in the Location Bar of the xul error page as in the input box of the bug's URL: http://www.è3¢æ¤ô¤3¤ò¤μ.jp/ent_exam/ent_exam.html

- In the top URL in comment #1, the blue underlined part stops just before the # sign. Clicking that link gives a xul error page for http://www.賢明.jp/ent_exam/ent_exam.html

(Apparently the DNS query gives a null result in both cases. Don't know if relevant.)

However, these Bugzilla pages are in UTF-8. A link to a non-Bugzilla non-Unicode page, with a link on it with non-ASCII in it, might be necessary for a "really" valid testcase nowadays.
I'm resetting bugs which are assigned to me but I'm not working on them and I don't have plan for fixing them in near future.
Assignee: masayuki → smontagu
QA Contact: amyy → i18n

Comment 23

4 years ago
Status bar is not supported any longer.
So this bug should be closed.
(In reply to Hideo Oshima from comment #23)
> Status bar is not supported any longer.
> So this bug should be closed.

URL preview is still available, even without the status bar.

That being said, I can't reproduce this, so I'm inclined to WFM. Anne, what do you think?
Flags: needinfo?(annevk)

Comment 25

4 years ago
Agreed.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(annevk)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.