Closed Bug 81024 Opened 24 years ago Closed 22 years ago

www.%E4%F6.com in statusbar

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.2beta

People

(Reporter: Junk_HbJ, Assigned: nhottanscp)

References

(
URL
)

Details

(Keywords: intl, regression)

Attachments

(4 files)

screenshot while checking testcase 24 years ago Nikolai Prokoschenko 118.50 KB, image/jpeg		Details
The bug is also back in trunk 23 years ago Rui Xu 91.81 KB, image/jpeg		Details
Fixes IDN hostname display in statusbar when mouseover. 23 years ago Wil Tan 2.00 KB, patch		Details \| Diff \| Splinter Review
Changed to call a new function to unescape URI for UI. 23 years ago nhottanscp 2.54 KB, patch	ftang : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review

Junk_HbJ

Reporter

Description

•

24 years ago

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9) Gecko/20010505 BuildID: 2001051403 If mouse over http://www.צה.com, statusbar shows http://www.%E4%F6.com. Reproducible: Always Steps to Reproduce: 1. Mouse over http://www.צה.com. Actual Results: Message "http://www.%E4%F6.com" in statusbar Expected Results: Message "http://www.צה.com" in statusbar This is a spinoff of bug 80942, which contains further comments about this matter.

nhottanscp

Assignee

Comment 1

•

24 years ago

I can reproduce this with NS6.01.

Status: UNCONFIRMED → NEW

Ever confirmed: true

nhottanscp

Assignee

Comment 2

•

24 years ago

Reassign to gagan.

Assignee: nhotta → gagan

Keywords: intl

Andreas Becker

Updated

•

24 years ago

QA Contact: andreasb → jonrubin

Alexey Chernyak

Comment 3

•

24 years ago

I think bug 81019 is a direct result of this.

Alexey Chernyak

Updated

•

24 years ago

Blocks: 81019

Gagan

Comment 4

•

24 years ago

*** This bug has been marked as a duplicate of 81022 ***

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → DUPLICATE

Alexey Chernyak

Comment 5

•

24 years ago

This bug is not a duplicate. bug 81022 is about trying to go to that URL, while tthis bug is about moving a mouse pointer over a link. The info status bar shows is different. For this bug it's: http://www.%E4%F6.com For bug 81022 it's: www.öä.com And finally bug 81019 is a direct result of this bug. While with bug 81022 a hostname is shown in the dialog. These bugs have spanned from bug 80942 which has some more discussion on this. reopening.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Gagan

Comment 6

•

24 years ago

got it. apologies for the wrong dup marking. I should have read it carefully. ->dougt

Assignee: gagan → dougt

Status: REOPENED → NEW

Target Milestone: --- → mozilla1.0

Doug Turner (:dougt)

Comment 7

•

24 years ago

what is milestone "mozilla1.0" anyway? Moving to future.

Target Milestone: mozilla1.0 → Future

Nikolai Prokoschenko

Comment 8

•

24 years ago

Hello? Has this bug been fixed? I couldn't reproduce it - screenshot is attached. In it, the mouse cursor is above the hyperlink.

Nikolai Prokoschenko

Comment 9

•

24 years ago

Attached image screenshot while checking testcase — Details

Doug Turner (:dougt)

Comment 10

•

23 years ago

QA, can you please verify?

Status: NEW → RESOLVED

Closed: 24 years ago → 23 years ago

Resolution: --- → FIXED

Andreas Becker

Comment 11

•

23 years ago

mass change, switching qa contact from jonrubin to ruixu.

QA Contact: jonrubin → ruixu

Rui Xu

Comment 12

•

23 years ago

Verified on EN Win98SE, EN WinME, JP Win98SE and KO Win2K with build 2001083003, this bug has been fixed.

Status: RESOLVED → VERIFIED

Junk_HbJ

Reporter

Comment 13

•

23 years ago

The little bugger is back in Mozilla 0.9.7. http://www.צה.com brings now some garbled characters in the status bar. Ironically, http://www.%E4%F6.com is shown as the correct URL. Weird...

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Rui Xu

Comment 14

•

23 years ago

Attached image The bug is also back in trunk — Details

There is the same problem in trunk.

Andreas Becker

Updated

•

23 years ago

Keywords: regression

Doug Turner (:dougt)

Comment 15

•

23 years ago

This has been fixed for a while. Please verify

Status: REOPENED → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

Alexey Chernyak

Comment 16

•

23 years ago

reopening mouseover http://www.%E4%F6.com shows http://www.צה.com but mouseover http://www.צה.com shows some gibberish. And if you right click and Copy Link Address on it, this is what will be copied: http://www.%D0%96%D0%94.com/

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Doug Turner (:dougt)

Comment 17

•

23 years ago

wierd. what build are you using alexey?

Alexey Chernyak

Comment 18

•

23 years ago

Win32 2002012903 bug 42898 describes similar behaviour

nhottanscp

Assignee

Comment 19

•

23 years ago

This is for host name only, the remaining problem of bug 102656. Reassign to nhott. http://www.צה.com/צה

Assignee: dougt → nhotta

Status: REOPENED → NEW

nhottanscp

Assignee

Comment 20

•

23 years ago

http://www.צה.com/צה/

Status: NEW → ASSIGNED

nhottanscp

Assignee

Updated

•

23 years ago

Target Milestone: Future → ---

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

23 years ago

It seems this new problem is caused by the fact that ConvertHostnameToUTF8 within NS_MakeAbsoluteURIWithCharset causes different parts of the returned URI to be encoded differently. Why do we want to encode different parts of the URI differently in the result of that function rather than fixing the callers to understand a consistent encoding?

nhottanscp

Assignee

Comment 22

•

23 years ago

Host name part, it is agreed that Unicode is used (with ACE encoding). But others like file names are not usually supported by the servers if we use Unicode. We may internally convert to Unicode then convert back to the charset, we need to remember that charset (bug 84032).

Darin Fisher

Comment 23

•

23 years ago

are there any specs that talk about non ascii characters in portions of the URL other than the hostname? that is, can we always use UTF8 as the encoding for the entire URL string?

nhottanscp

Assignee

Comment 24

•

23 years ago

There used to be a internet draft for non ASCII URL which was Unicode base. It has expired for a while, I don't have the last one. William, do you know anything about that, has the new draft been posted? I think it is possible to keep URL in UTF-8 internally but as I mentioned, we need to also keep a charset (e.g. a document charset) so we can convert URL back if necessary.

Darin Fisher

Comment 25

•

23 years ago

nhotta: in what cases would it be necessary to convert the URL back to a document specific charset?

nhottanscp

Assignee

Comment 26

•

23 years ago

E.g., path names, UTF-8 is not usally understood by the server.

Darin Fisher

Comment 27

•

23 years ago

right, but is there any guarantee that servers will understand any extended ascii encoding?

nhottanscp

Assignee

Comment 28

•

23 years ago

No, the web author has to know the server's charset then apply URL escape in order to guarantee the link to work. But usually people just put non ASCII path names in the docuements instead and that works most of the time. So there are many existing pages like that.

Darin Fisher

Comment 29

•

23 years ago

ic, that does complicate things. so, we need to ensure that we send out URLs using the document charset. technically we should be escaping the URLs when we send them, cuz i think we have to limit ourselves to 7-bit ascii when we hit the net. hostname's need to be encoded using UTF-8 for IDN purposes. the result is what we have today which is an URL string composed of different encodings... yuck! i need to think about this some more... i'm not sure what the right solution is. if we move to a world in which nsIURI/nsIURL expect UTF-8 parameters, then we'll need to do charset conversions in necko to generate the right URL string for sending to servers. and what about proxy servers?? double yuck!

Wil Tan

Comment 30

•

23 years ago

The IRI draft is here: http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt The problem seems to be that the CString or |char *| version of the URL spec is being passed along to various other modules like content or even docshell. I agree that having 2 encodings within the same string is yucky.

Darin Fisher

Comment 31

•

23 years ago

ok, so if nsIURI has a charset associated with it, then it seems like it would be best to encode the nsIURI members (including the hostname) in that charset and then escape them when passing the values across the nsIURI interface. before using the hostname at the networking level it would have to be converted to UTF-8 and then ACE encoded. this unfortunately puts a lot of burden on nsIURI consumers because they must handle various charsets if they want to convert the URI elements into something readable that is not URL escaped. alternatively, we could also provide a nsIUnicodeURI and nsIUnicodeURL that provides UCS2 equivalents of the attributes. then nsIURI and nsIURL would remain US-ASCII. this seems like it might be the best solution moving forward, but i still need to think this out some more.

nhottanscp

Assignee

Comment 32

•

23 years ago

How about having the string in UTF-8 then convert to a charset since only the server needs that charset?

Darin Fisher

Comment 33

•

23 years ago

nhotta: yeah, i was thinking about that too.

Rui Xu

Updated

•

23 years ago

Keywords: nsbeta1

Frank Tang

Comment 34

•

23 years ago

nsbeta1- per triage meeting

Keywords: nsbeta1 → nsbeta1-

Wil Tan

Comment 35

•

23 years ago

*** Bug 120503 has been marked as a duplicate of this bug. ***

nhottanscp

Assignee

Updated

•

23 years ago

Target Milestone: --- → mozilla1.2

Wil Tan

Comment 36

•

23 years ago

The key to this is in nsWebShell::OnOverLink() where the call to nsITextToSubURI::UnEscapeAndConvert() unescapes the entire URL string. This is not desirable, since the hostname part is in UTF-8. If we could change the method signature to take an nsIURI instead then we could at least use the URL segments to build a string for display. The problem is that nsIURI is not passed down from the function call chain.

Wil Tan

Comment 37

•

23 years ago

Attached patch Fixes IDN hostname display in statusbar when mouseover. — Details — Splinter Review

This patch first converts the URL to document charset before calling textToSubURI->UnEscapeAndConvert() (instead of doing NS_ConvertUCS2toUTF8). Is there a function to use for this kind of conversion instead of going to the trouble of getting the ccm, then get the encoder, etc?

Wil Tan

Updated

•

23 years ago

No longer blocks: 81019

Wil Tan

Comment 38

•

23 years ago

Naoki: Could you take a look at this patch please? Thanks!

Darin Fisher

Comment 39

•

23 years ago

i don't think you want to unescape characters in the range U+00..U+7F if any such chars are escaped, they are probably just control characters or other characters that should be escaped. now, if the URL is a file: URL, i suppose you could argue that unescaping all chars is likely valid. but, doing so for HTTP URLs could lead to all sorts of problems (e.g., embedded nulls). when my patch for bug 124042 lands, there'll be an option to NS_UnescapeURL that allows you to only unescape bytes with the 8-th bit set.

nhottanscp

Assignee

Comment 40

•

23 years ago

So the host part is converted to a document charset then later converted back to UTF-8. Is it possible to process the host name and other part separately? If the host name is non ASCII then you can call UnEscapeAndConvert with charset as "UTF-8" then you don't have to put the conversion code there.

Wil Tan

Comment 41

•

23 years ago

Since nsWebShell::OnOverLink() only has the spec, we would have to instantiate an nsIURI to parse it don't we? Doing UnEscapeAndConvert using "UTF-8" would only work on the hostname part alone. Darin: UnEscapeAndConvert uses nsUnescape(), does it unescape 00-7F?

Darin Fisher

Comment 42

•

23 years ago

nsUnescape unescapes everything... nsUnescape should never be used. there are much better alternatives. nsUnescapeCount returns the length of the unescaped string, so you can be sure not to be fooled by embedded nulls. once my patch for bug 124042 lands, there'll be a better option. NS_UnescapeURL which has an argument to specify that only non-ASCII characters should be unescaped. there'll also be a version of NS_UnescapeURL that returns the result in a nsACString, which internally handles embedded nulls.

nhottanscp

Assignee

Comment 43

•

23 years ago

The current plan is to try to unescape URI for the status bar, by trying UTF-8 and originCharset of nsIURI.

Blocks: 157673

Keywords: nsbeta1- → nsbeta1

nhottanscp

Assignee

Updated

•

23 years ago

Depends on: 110943

nhottanscp

Assignee

Comment 44

•

23 years ago

Attached patch Changed to call a new function to unescape URI for UI. — Details — Splinter Review

The new function tries UTF-8 before the document charset, so no need to special case mailto.

Frank Tang

Comment 45

•

23 years ago

Comment on attachment 95007 [details] [diff] [review] Changed to call a new function to unescape URI for UI. r=ftang

Attachment #95007 - Flags: review+

nhottanscp

Assignee

Updated

•

22 years ago

Target Milestone: mozilla1.2alpha → mozilla1.2beta

Darin Fisher

Comment 46

•

22 years ago

Comment on attachment 95007 [details] [diff] [review] Changed to call a new function to unescape URI for UI. sr=darin (sorry for taking so long to review this patch... it looks great!)

Attachment #95007 - Flags: superreview+

nhottanscp

Assignee

Comment 47

•

22 years ago

checked in to the trunk

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 22 years ago

Resolution: --- → FIXED

Rui Xu

Comment 48

•

22 years ago

Verified fixed with 2002-09-17 trunk.

Status: RESOLVED → VERIFIED

nhottanscp

Assignee

Comment 49

•

22 years ago

*** Bug 81022 has been marked as a duplicate of this bug. ***

Frank Tang

Updated

•

22 years ago

Depends on: 180372

Frank Tang

Updated

•

22 years ago

No longer blocks: 157673

You need to log in before you can comment on or make changes to this bug.