Closed Bug 229548 Opened 21 years ago Closed 10 years ago

IDN: URL in status bar is displayed as garbage if the path part has non-ASCII characters in non-UTF-8 encoding

Tracking

()

Status:

RESOLVED WORKSFORME

Milestone:

mozilla1.9alpha1

People

(Reporter: kazhik, Assigned: smontagu)

References

(Depends on 1 open bug, Blocks 1 open bug,
URL
)

Details

(Keywords: intl)

Attachments

(4 obsolete files)

patch 20 years ago Jungshik Shin 3.35 KB, patch		Details \| Diff \| Splinter Review
patch 20 years ago Jungshik Shin 3.40 KB, patch	smontagu : review+	Details \| Diff \| Splinter Review
patch with more context 20 years ago Jungshik Shin 5.35 KB, patch	jshin1987 : review+ darin.moz : superreview-	Details \| Diff \| Splinter Review
patch that generates nsIURI 20 years ago Jungshik Shin 9.56 KB, patch		Details \| Diff \| Splinter Review

Koike Kazuhiko

Reporter

Description

•

21 years ago

URL in status bar is displayed as garbage if Non-ASCII characters are
used in domain name and name attribute.

http://<non-ASCII domain name>/index.html#<non-ASCII name attribute>

The second non-ASCII characters are displayed correctly, but the first
aren't.

Original report in Bugzilla-jp
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=3513

Jungshik Shin

Comment 1

•

21 years ago

What has to be 
http://賢明.jp/ent_exam/ent_exam.html#メニュー

is displayed as

http://莖∽??.jp/ent_exam/ent_exam.html#メニュー

With a debug build, I got a few assertions in xpconnvert.cpp and nsUTF8Utils.h
so that there is a conversion problem somewhere. My guess is that the first four
bytes of 賢明 (in EUC-JP) correspond to 莖∽ in UTF-8 and the rest two bytes
don't form a valid character in EUC-JP so that they're turned into question
marks. I'll check this out when I'm on Linux. (I can do it now, but it's a
little cumbersome).

This happens probably because somewhere we have a URI with the host address in
UTF-8 and the path part in EUC-JP. Given this URI, we try to convert it to
Unicode (UTF8 or UTF-16) as if the whole URI is in EUC-JP (originCharset).

URL: http://賢明.jp/ent_exam/ent_exam.html

Keywords: intl

OS: Linux → All

Hardware: PC → All

Jungshik Shin

Comment 2

•

21 years ago

> UTF-8 and the path part in EUC-JP. Given this URI, we try to convert it to
> Unicode (UTF8 or UTF-16) as if the whole URI is in EUC-JP (originCharset)
 
 We only try this conversion when a given URI spec is NOT a valid UTF-8. 
With UTF-8 in the host address part and EUC-JP in the path part, it's not a
valid UTF-8 as a whole so that we assume the whole URI spec is in EUC-JP.
Therefore, this problem doesn't occur if we just have the host part in UTF-8
followed by the path part in ASCII-only. To see that, try
http://bugzilla.mozilla.gr.jp/attachment.cgi?id=1954 (quoted in bug 229546)

Darin, can I assume that the host part of _any_ URI is _always_ in UTF-8? Then,
I can fix this in nsISubTextURI (?). However, that wouldn't be pretty.

Assignee: smontagu → jshin

Darin Fisher

Updated

•

20 years ago

Blocks: IDN

Jungshik Shin

Updated

•

20 years ago

Summary: IDN: URL in status bar is displayed as garbage → IDN: URL in status bar is displayed as garbage if the path part has non-ASCII characters in non-UTF-8 encoding

baffclan

Comment 3

•

20 years ago

I cannot reproduce 2004050304-trunk/WinXP.
WORKSFORME?

baffclan

Comment 4

•

20 years ago

Sorry...

Reproduced with 2004050304-trunk/Win98, 20040503-trunk(Firefox)/Win98,
20040503-trunk(Firefox)/WinXP.

Darin Fisher

Comment 5

•

20 years ago

> Darin, can I assume that the host part of _any_ URI is _always_ in UTF-8? Then,
> I can fix this in nsISubTextURI (?). However, that wouldn't be pretty. 

nsIURI::host is always encoded using UTF-8.

Jungshik Shin

Comment 6

•

20 years ago

Attached patch patch (obsolete) — Details — Splinter Review

This fixes bug 229546 as well and can also be used for fixing bug 200150.

Jungshik Shin

Comment 7

•

20 years ago

I've got a little more robust patch. This should be fixed before 1.8beta.

URL: http://賢明.jp/ent_exam/ent_exam.html → http://è³¢æ¤Ô¤³¤Ò¤µ.jp/ent_exam/ent_e...

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.8beta

Jungshik Shin

Comment 8

•

20 years ago

Attached patch patch (obsolete) — Details — Splinter Review

asking for review

Attachment #171172 - Attachment is obsolete: true

Attachment #171508 - Flags: superreview?(darin)

Attachment #171508 - Flags: review?(smontagu)

Simon Montagu :smontagu

Assignee

Comment 9

•

20 years ago

Comment on attachment 171508 [details] [diff] [review]
patch

This would have been easier to review with more context, by the way.

Attachment #171508 - Flags: review?(smontagu) → review+

Jungshik Shin

Comment 10

•

20 years ago

Attached patch patch with more context (obsolete) — Details — Splinter Review

thanks for r and sorry for too little context. I was just too lazy to get rid
of another patch nearby (for bug 244754) and took a short-cut by omitting '-u'
option.

Attachment #171508 - Attachment is obsolete: true

Attachment #171515 - Flags: superreview?(darin)

Attachment #171515 - Flags: review+

Jungshik Shin

Updated

•

20 years ago

Attachment #171508 - Flags: superreview?(darin)

Darin Fisher

Comment 11

•

20 years ago

Comment on attachment 171515 [details] [diff] [review]
patch with more context

>Index: intl/uconv/src/nsTextToSubURI.cpp

>+  nsCOMPtr<nsIURLParser> urlParser;
>+  // should we just use net_GetStdURLParser()? 
>+  urlParser = do_GetService(NS_STDURLPARSER_CONTRACTID, &rv);
>+  NS_ENSURE_SUCCESS(rv, rv);

net_GetStdURLParser is an internal necko method.  since this code
is not part of the necko DLL, it cannot use it.

How do you know that this is the correct nsIURLParser instance for
the given URI?	I don't think you can know that it is.	What if the
given URI scheme does not support an authority section, but would
erroneously be parsed as having one by the STDURLPARSER?

I think you should use nsIIOService::newURI instead, to construct
a nsIURI.  Then, call GetHost, and check that instead.

Attachment #171515 - Flags: superreview?(darin) → superreview-

Jungshik Shin

Comment 12

•

20 years ago

Attached patch patch that generates nsIURI (obsolete) — Details — Splinter Review

With standard-url.encode-utf8 set to true, this patch is not necessary for most
uris (except for file url). However, setting the pref to true (which is by
default now) doesn't fix bug 229546 and this patch fixes it.

Attachment #171515 - Attachment is obsolete: true

Attachment #173054 - Flags: superreview?(darin)

Attachment #173054 - Flags: review?(smontagu)

Simon Montagu :smontagu

Assignee

Updated

•

20 years ago

Attachment #173054 - Flags: review?(smontagu) → review+

Jungshik Shin

Comment 13

•

20 years ago

Comment on attachment 173054 [details] [diff] [review]
patch that generates nsIURI

Phew... This patch creates an infinite loop for 'javascript:.....' url (I
hadn't tested any page with  such a url) becacuse nsJSProtocolHanler relies on
nsITextToSubURI to ensure the UTF8ness of a spec (EnsureUTF8Spec method of
nsJSProtocolHandler). THere may be other protocol handlers with the same issue.

Attachment #173054 - Attachment is obsolete: true

Attachment #173054 - Flags: superreview?(darin)

Attachment #173054 - Flags: review+

Jungshik Shin

Comment 14

•

19 years ago

(In reply to comment #13)
> Phew... This patch creates an infinite loop for 'javascript:.....' url because 
> nsJSProtocolHanler relies on nsITextToSubURI 

One way to break the infinite loop is to check if the scheme is 'javascript',
but that may not scale. Alternatively, we may add a parameter to the APIs of
nsITextToSubURI to indicate whether 'host' is present /needs to be checked for IDN.

Jungshik Shin

Comment 15

•

19 years ago

(In reply to comment #14)

> One way to break the infinite loop is to check if the scheme is 'javascript',
> but that may not scale. Alternatively, we may add a parameter to the APIs of
> nsITextToSubURI to indicate whether 'host' is present /needs to be checked for
IDN.

Well, the second approach is just shifting the 'responsibility' to callers so
that it has the same problem. If so, just checking if the scheme is 'javascript'
(for now) in nsITexToSubURI is better.

Darin Fisher

Comment 16

•

19 years ago

It seems to me that these functions should take a nsIURI as their parameter
instead of a raw character string.

Neil Harris

Updated

•

19 years ago

Blocks: 316730

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 17

•

19 years ago

Jshin:

I want to fix this issue myself.
Are you working on this?
Can I take this?

Target Milestone: mozilla1.8beta1 → ---

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

19 years ago

Depends on: 320807

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

19 years ago

Assignee: jshin1987 → masayuki

Status: ASSIGNED → NEW

Target Milestone: --- → mozilla1.9alpha

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

19 years ago

Status: NEW → ASSIGNED

Tony Mechelynck [:tonymec]

Comment 18

•

16 years ago

Masayuki Nakano: 2½ years after you "accepted" this bug, no one has objected. Are you still willing to fix it? And are you still experiencing it? (I'm not sure what to check against what).

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 19

•

16 years ago

(In reply to comment #18)
> Masayuki Nakano: 2½ years after you "accepted" this bug, no one has objected.
> Are you still willing to fix it? And are you still experiencing it? (I'm not
> sure what to check against what).

No, I'm not sure. I'll clean up my bug list after all Gecko1.9 works finished.

Lucas Malor (mail: c6kfnkn2uc AT snkmail DOT c0m)

Comment 20

•

16 years ago

It works for me now with the latest trunk. Can you confirm?

Tony Mechelynck [:tonymec]

Comment 21

•

16 years ago

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073100 SeaMonkey/2.0a1pre

After studying the bug again somewhat, I'd say it works for me but not decisively:

- I see the same garbage (plus www) in the Location Bar of the xul error page as in the input box of the bug's URL: http://www.è3¢æ¤ô¤3¤ò¤μ.jp/ent_exam/ent_exam.html

- In the top URL in comment #1, the blue underlined part stops just before the # sign. Clicking that link gives a xul error page for http://www.賢明.jp/ent_exam/ent_exam.html

(Apparently the DNS query gives a null result in both cases. Don't know if relevant.)

However, these Bugzilla pages are in UTF-8. A link to a non-Bugzilla non-Unicode page, with a link on it with non-ASCII in it, might be necessary for a "really" valid testcase nowadays.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 22

•

15 years ago

I'm resetting bugs which are assigned to me but I'm not working on them and I don't have plan for fixing them in near future.

Assignee: masayuki → smontagu

QA Contact: amyy → i18n

Hideo Oshima

Comment 23

•

10 years ago

Status bar is not supported any longer.
So this bug should be closed.

Gordon P. Hemsley [:GPHemsley]

Comment 24

•

10 years ago

(In reply to Hideo Oshima from comment #23)
> Status bar is not supported any longer.
> So this bug should be closed.

URL preview is still available, even without the status bar.

That being said, I can't reproduce this, so I'm inclined to WFM. Anne, what do you think?

Flags: needinfo?(annevk)

Anne (:annevk)

Comment 25

•

10 years ago

Agreed.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Flags: needinfo?(annevk)

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.