Closed Bug 81024 Opened 23 years ago Closed 22 years ago

Mouseover for http://www.צה.com shows http://www.%E4%F6.com in statusbar

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2beta

People

(Reporter: Junk_HbJ, Assigned: nhottanscp)

References

()

Details

(Keywords: intl, regression)

Attachments

(4 files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9) Gecko/20010505
BuildID:    2001051403

If mouse over http://www.צה.com, statusbar shows http://www.%E4%F6.com.



Reproducible: Always
Steps to Reproduce:
1. Mouse over http://www.צה.com.

Actual Results:  Message "http://www.%E4%F6.com" in statusbar

Expected Results:  Message "http://www.צה.com" in statusbar

This is a spinoff of bug 80942, which contains further comments about this matter.
I can reproduce this with NS6.01.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassign to gagan.
Assignee: nhotta → gagan
Keywords: intl
QA Contact: andreasb → jonrubin
I think bug 81019 is a direct result of this.
Blocks: 81019

*** This bug has been marked as a duplicate of 81022 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
This bug is not a duplicate.
bug 81022 is about trying to go to that URL,
while tthis bug is about moving a mouse pointer over a link.
The info status bar shows is different.
For this bug it's: http://www.%E4%F6.com
For bug 81022 it's: www.öä.com

And finally bug 81019 is a direct result of this bug.
While with bug 81022 a hostname is shown in the dialog.

These bugs have spanned from bug 80942 which has some more discussion on this.

reopening.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
got it. apologies for the wrong dup marking. I should have read it carefully.
->dougt
Assignee: gagan → dougt
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
what is milestone "mozilla1.0" anyway?  Moving to future.
Target Milestone: mozilla1.0 → Future
Hello?

Has this bug been fixed? I couldn't reproduce it - screenshot is attached. In
it, the mouse cursor is above the hyperlink.
QA, can you please verify?
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu
Verified on EN Win98SE, EN WinME, JP Win98SE and KO Win2K with build 2001083003,
this bug has been fixed.
Status: RESOLVED → VERIFIED
The little bugger is back in Mozilla 0.9.7. 
http://www.צה.com brings now some garbled characters in the status bar.

Ironically, http://www.%E4%F6.com is shown as the correct URL. Weird...
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
There is the same problem in trunk.
Keywords: regression
This has been fixed for a while.  Please verify
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
reopening
mouseover http://www.%E4%F6.com shows http://www.צה.com

but mouseover http://www.צה.com shows some gibberish.
And if you right click and Copy Link Address on it, this is what will be copied:
http://www.%D0%96%D0%94.com/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
wierd.  what build are you using alexey?
Win32 2002012903

bug 42898 describes similar behaviour
This is for host name only, the remaining problem of bug 102656.
Reassign to nhott.
http://www.צה.com/צה
Assignee: dougt → nhotta
Status: REOPENED → NEW
http://www.צה.com/צה/

Status: NEW → ASSIGNED
Target Milestone: Future → ---
It seems this new problem is caused by the fact that ConvertHostnameToUTF8
within NS_MakeAbsoluteURIWithCharset causes different parts of the returned URI
to be encoded differently.  Why do we want to encode different parts of the URI
differently in the result of that function rather than fixing the callers to
understand a consistent encoding?
Host name part, it is agreed that Unicode is used (with ACE encoding). But
others like file names are not usually supported by the servers if we use
Unicode. We may internally convert to Unicode then convert back to the charset,
we need to remember that charset (bug 84032).
are there any specs that talk about non ascii characters in portions of the URL
other than the hostname?  that is, can we always use UTF8 as the encoding for
the entire URL string?
There used to be a internet draft for non ASCII URL which was Unicode base. It
has expired for a while, I don't have the last one. 
William, do you know anything about that, has the new draft been posted?
I think it is possible to keep URL in UTF-8 internally but as I mentioned, we
need to also keep a charset (e.g. a document charset) so we can convert URL back
if necessary.
nhotta: in what cases would it be necessary to convert the URL back to a
document specific charset?
E.g., path names, UTF-8 is not usally understood by the server.
right, but is there any guarantee that servers will understand any extended
ascii encoding?
No, the web author has to know the server's charset then apply URL escape in
order to guarantee the link to work.
But usually people just put non ASCII path names in the docuements instead and
that works most of the time. So there are many existing pages like that.
ic, that does complicate things.  so, we need to ensure that we send out URLs
using the document charset.  technically we should be escaping the URLs when we
send them, cuz i think we have to limit ourselves to 7-bit ascii when we hit the
net.  hostname's need to be encoded using UTF-8 for IDN purposes.  the result is
what we have today which is an URL string composed of different encodings... yuck!

i need to think about this some more... i'm not sure what the right solution is.
 if we move to a world in which nsIURI/nsIURL expect UTF-8 parameters, then
we'll need to do charset conversions in necko to generate the right URL string
for sending to servers.  and what about proxy servers??  double yuck!
The IRI draft is here:
http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt

The problem seems to be that the CString or |char *| version of the URL spec
is being passed along to various other modules like content or even docshell.
I agree that having 2 encodings within the same string is yucky.
ok, so if nsIURI has a charset associated with it, then it seems like it would
be best to encode the nsIURI members (including the hostname) in that charset
and then escape them when passing the values across the nsIURI interface. 
before using the hostname at the networking level it would have to be converted
to UTF-8 and then ACE encoded.  this unfortunately puts a lot of burden on
nsIURI consumers because they must handle various charsets if they want to
convert the URI elements into something readable that is not URL escaped. 
alternatively, we could also provide a nsIUnicodeURI and nsIUnicodeURL that
provides UCS2 equivalents of the attributes.  then nsIURI and nsIURL would
remain US-ASCII.  this seems like it might be the best solution moving forward,
but i still need to think this out some more.
How about having the string in UTF-8 then convert to a charset since only the
server needs that charset?
nhotta: yeah, i was thinking about that too.
Keywords: nsbeta1
nsbeta1- per triage meeting 
Keywords: nsbeta1nsbeta1-
*** Bug 120503 has been marked as a duplicate of this bug. ***
Target Milestone: --- → mozilla1.2
The key to this is in nsWebShell::OnOverLink() where the call to
nsITextToSubURI::UnEscapeAndConvert() unescapes the entire URL string. This is 
not desirable, since the hostname part is in UTF-8. If we could change the
method signature to take an nsIURI instead then we could at least use the URL
segments to build a string for display.
The problem is that nsIURI is not passed down from the function call chain.
This patch first converts the URL to document charset before calling
textToSubURI->UnEscapeAndConvert() (instead of doing NS_ConvertUCS2toUTF8).
Is there a function to use for this kind of conversion instead of going 
to the trouble of getting the ccm, then get the encoder, etc?
No longer blocks: 81019
Naoki: Could you take a look at this patch please?
Thanks!
i don't think you want to unescape characters in the range U+00..U+7F

if any such chars are escaped, they are probably just control characters or
other characters that should be escaped.  now, if the URL is a file: URL, i
suppose you could argue that unescaping all chars is likely valid.  but, doing
so for HTTP URLs could lead to all sorts of problems (e.g., embedded nulls).

when my patch for bug 124042 lands, there'll be an option to NS_UnescapeURL that
allows you to only unescape bytes with the 8-th bit set.
So the host part is converted to a document charset then later converted back to
UTF-8. Is it possible to process the host name and other part separately? 
If the host name is non ASCII then you can call UnEscapeAndConvert with charset
as "UTF-8" then you don't have to put the conversion code there.

Since nsWebShell::OnOverLink() only has the spec, we would have to instantiate
an nsIURI to parse it don't we?
Doing UnEscapeAndConvert using "UTF-8" would only work on the hostname part
alone.

Darin: UnEscapeAndConvert uses nsUnescape(), does it unescape 00-7F?
nsUnescape unescapes everything... nsUnescape should never be used.  there are
much better alternatives.  nsUnescapeCount returns the length of the unescaped
string, so you can be sure not to be fooled by embedded nulls.

once my patch for bug 124042 lands, there'll be a better option.  NS_UnescapeURL
which has an argument to specify that only non-ASCII characters should be
unescaped.  there'll also be a version of NS_UnescapeURL that returns the result
in a nsACString, which internally handles embedded nulls.
The current plan is to try to unescape URI for the status bar, by trying UTF-8
and originCharset of nsIURI.
Blocks: 157673
Keywords: nsbeta1-nsbeta1
Depends on: 110943
The new function tries UTF-8 before the document charset, so no need to special
case mailto.
Comment on attachment 95007 [details] [diff] [review]
Changed to call a new function to unescape URI for UI.

r=ftang
Attachment #95007 - Flags: review+
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Comment on attachment 95007 [details] [diff] [review]
Changed to call a new function to unescape URI for UI.

sr=darin (sorry for taking so long to review this patch... it looks great!)
Attachment #95007 - Flags: superreview+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
Verified fixed with 2002-09-17 trunk.
Status: RESOLVED → VERIFIED
*** Bug 81022 has been marked as a duplicate of this bug. ***
Depends on: 180372
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: