Closed Bug 163998 Opened 22 years ago Closed 22 years ago

Image titles should not be URL encoded

Categories

(Core :: Graphics: ImageLib, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: neil, Assigned: pavlov)

References

()

Details

(Keywords: intl)

Attachments

(2 files, 3 obsolete files)

Using Build ID: 2002082104
Steps to reproduce problem:
1. Rename an image to use non-ASCII characters
2. Open the image in Mozilla

Expected results: non-ASCII characters appear in window title

Actual results: window title is URL encooded

For instance, the URL should display as Håkan.ico (ICON image, 32x32 pixels)
well, the problem is: which charset to use...

IIRC, some standard says to assume UTF-8... though that would not work for the
testcase in this bug.

We can not just assume latin1.
bug 9949 wants to display Unicode titles in the titlebar (assuming they're UTF-8
Christian said - that won't help you in this case).
Actually, if I go to <http://www.nrr.co.uk/Håkan.ico> with Mozilla, I get
<http://www.nrr.co.uk/H%C3%A5kan.ico>, which looks like UTF-8... :)
OS: Windows 95 → All
A few weeks ago I filed another bug about the fact that I cannot go to
www.håkan.nu with Mozilla (it works in IE6). Maybe this is related?
Intersting... when I hover over the URL link that I entered into Bugzilla the
status bar displays http://www.nrr.co.uk/Håkan.ico and also when I hover over
Tuukka's bugmail URL link the status bar displays http://www.nrr.co.uk/Håkan.ico
but when I hover over the URL link in the bugzilla page the status bar displays
http://www.nrr.co.uk/HÃ¥kan.ico although all of the links work...
Jo, that bug# seems to be wrong...
As for comment #3, Mozilla converted U+00E5 in the URL
in comment #3 encoded in ISO 8859-1/Windows-1252(0xE5) 
to UTF-8 because the encoding of the page with the URL you cited 
is *known* (to Mozilla) to be ISO 8859-1/Windows-1252.
Anyway, I guess it's a bug NOT to 'unescape' '%-escape' sequence
back to UTF-8 before handing it over to 'window titlebar display
method' when the encoding of the URL is known and
is already converted to UTF-8. I'll try to take a look.  

As for comment #4, iDNS(using characters outside a subset
of US-ASCII allowed in the present DNS system) is a totally
different issue.  You cannot just use a legacy encoding
of your choice(which happened to be ISO-8859-1 in your example)
and expect it to be interpreted as you intended by DNS. 
(there are many hacked/non-standard way of supporting
non-ASCII characters in domain name especially
in CJK countries, but I believe iDNS in general is still premature) 
I haven't followed the development of iDNS in general
and the status of iDNS support by Mozilla in particular, though.

As for comment #6, it's bug 9449 (one of the most often duped
and misunderstood bug). The bug was fixed on all platforms
(under *nix/X11, a long long time ago. under Windows, recently).
So, Mozilla does have the ability to put any Unicode character
in the title bar in principle. 

I'm also adding intl keyword. BTW, this bug may as well be transferred
over to I18N. 

Keywords: intl
This bug does not belong in Intl.
It is entirely within content/html/document/src/nsImageDocument.cpp.

The question from comment 1 is not answered. Which charset should Mozilla use
for the encoded character?

Also, iDNS is entirely unrelated to this bug. iDNS is only for DNS (duh),
however this bug is about the filename part of the image, not the hostname.
> This bug does not belong in Intl.
> It is entirely within content/html/document/src/nsImageDocument.cpp.

  Well, I wouldn't say it should belong to intl no matter what, 
but neither would I say it's nothing to do with intl. 

> The question from comment 1 is not answered. Which charset should 
> Mozilla use for the encoded character?

   The answer is UTF-8, but to cope with 
non-standard-compliant web pages/user behaviors/web server
behaviros/etc, sometimes it may need to try various heuristics to
'read' the mind of users/web page authors and so forth.
However, in this particular case, that doesn't seem to be of much
relevance judging from comment #3. Mozilla already got the
encoding of the filename right and converted it to UTF-8
before url-encoding it. (as corerctly pointed out in comment
#3, U+00E5 - Latin small letter a with ring above - is 0xC3 0xe5)
Thereofre, the issue is where to url-decode it before handing 
it over to the method responsible for the window title bar 
rendering.  

>    Also, iDNS is entirely unrelated to this bug. iDNS is only for 
> DNS (duh),
> however this bug is about the filename part of the image, not the 
> hostname.

  That's exactly what I meant to convey in my comment. 
What I wrote in response to comment #4 was meant to
indicate that this bug is NOT related with iDNS because comment #4
was about using non-ASCII chars. in the domain name part of the URL
while this bug is about the rest of the URL.  
 
> It is entirely within content/html/document/src/nsImageDocument.cpp

  A cursory look at the file led me to doubt that this is the case
although I wouldn't rule out that possibility. It seems to me that
it has to be solved 'upstream?' (e.g. possibly implementation of nsIURL).
|UpdateTitle| and |CreateSyntheticDocument| appear to do what they're
supposed to do. 

BTW, it's not clear how to reproduce this bug because the following
cases are all different cases and I'm not sure which of them this bug
is about:

  1. Type in the URL with non-ASCII characters in the url bar
     or the url-open dialog box. Non-ASCII characters are supposed
     to be interpreted as the encoding/charset of the current locale
     under which Mozilla is running. 

  2. Make an html document in MIME charset A, put a link
     to an image with a URL
     with non-ASCII characters in the filename part 
     as 'raw octet/byte string' in MIME charset/encoding A.
     Make it clear that the document is in charset A
     in either the http header or the document itself(with
     meta tag). 
     a. view it under Mozilla with MIME charset set to A
     b. view it under Mozilla with MIME charset set other than A

      
  3. Same as 2, but url-encode the filename in charset/encoding A.
      
  4. Same as 2, but url-encode the filename in UTF-8

Case 2 and 3 are not compliant to the standard, but in practice
there are many cases like them. 

 A simple way to check whether this bug is confined to |nsImageDocument| or is
more generic is to test all four
cases above with  two links/urls, one to an image and
the other to a non-image (e.g. text/html, text/plain). 
>  A cursory look at the file led me to doubt that this is the case

That file is responsible for setting the title when an image document is loaded
(besides lots of other things, of course)

>   The answer is UTF-8

ok, I'll attach a patch
Attached patch better patchSplinter Review
this one is better, if the string is no valid utf-8, the unescaped filename is
displayed.
Attachment #113293 - Attachment is obsolete: true
Distracted by a 'grand' issue (which turned out to have been
taken care of mostly by nsIUR(I|L) implementions),
I contradicted what I wrote in comment #9
> the issue is where to url-decode it before handing 
> it over to the method responsible for the window title bar 
> rendering.  
by saying in comment #10
> A cursory look at the file led me to doubt that this is the case

This realization bothered me and I got out of bed to make a patch, but
you did it already.  However, I have a slightly different patch
copied from somewhere in necko 
(which may or may not work) so that I'm gonna upload it.
Attached patch an alternative (obsolete) — Splinter Review
I haven't tested this yet. If it works as intended, 
it falls back to 'charset' of the refering document (originCharset)
if filename is not  UTF-8.
Comment on attachment 113307 [details] [diff] [review]
an alternative

hmmm
interesting

I did not know that URIs had a charset stored on them. This patch does look
like a better solution.

However, imho you should check the return value of GetOriginCharset too.

I'd try to get r=jst, sr=darin on this patch.
added a check for GetOriginCharset(). I have yet to check 
if this (GetOriginCharset) really works for this case, though.
Attachment #113307 - Attachment is obsolete: true
Comment on attachment 113323 [details] [diff] [review]
update to my alternative per Christopher's comment

well, the documentation in nsIURI.idl of originCharset definitely sounds like
it should work.

but you could ask darin, who is the networking owner, which is why I suggested
him for super-review.
Comment on attachment 113323 [details] [diff] [review]
update to my alternative per Christopher's comment

Judging from the behavior of Mozilla, it works as intended with my patch.
I tried to get some
diagnostic output, but failed. 

Nonetheless, I'm asking 
r=jst and sr=darin as
suggested by Christopher.
Attachment #113323 - Flags: superreview?(darin)
Attachment #113323 - Flags: review?(jst)
I'm now sure that it works as intended with attachment 113323 [details] [diff] [review].
I ran 'xprop' over mozilla and found that_NET_WM_NAME(UTF8_STRING)
had values they're supposed to have when view|char. coding was set to 
Western and Korean(EUC).  
Comment on attachment 113323 [details] [diff] [review]
update to my alternative per Christopher's comment

nsIURI::originCharset is very often empty indicating an origin charset of
UTF-8.
in which case, it looks like this code would enter UnEscapeURIForUI and error
out
as a result of there being no converter registered under the empty string.  so, 
this case seems to be covered, but please be sure to verify it.  thx!
Attachment #113323 - Flags: superreview?(darin) → superreview+
Comment on attachment 113323 [details] [diff] [review]
update to my alternative per Christopher's comment

What darin said, r=jst
Attachment #113323 - Flags: review?(jst) → review+
Thank you jst and darin for r/sr and pointing out the problem.
I thought that empty originCharset would lead to 'error out' further down the
road
than  nsTextToSubURI.cpp  and didn't look further.  It turned out that it
didn't
get checked. So, I added a check to nsTextToSubURI.cpp
Attachment #113323 - Attachment is obsolete: true
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

Assuming jst's review
on nsImageDocument.cpp
still standing, now I'm asking nhotta for r on
nsTextToSubURI.cpp part.
darin, after that, can you transfer your sr?
Thank you.
Attachment #113463 - Flags: superreview?(darin)
Attachment #113463 - Flags: review?(nhotta)
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

nsCOMPtr<nsITextToSubURI> textToSubURI

I think the line can move after getting the origin charset.
Attachment #113463 - Flags: review?(nhotta) → review+
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

sr=darin
Attachment #113463 - Flags: superreview?(darin) → superreview+
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

change : url-decode image 
urls before putting in the titlebar

risk : low on all platforms

test : on Linux, it's confirmed that it worked as intended. 
There's no platform dependence.

On nhotta's comment:
I considered checking
|originCharset| before
|nsTextToSubURI| in my first patch, but didn't
because all the implemenations of |GetOriginCharset| always return NS_OK
as they're now.
Attachment #113463 - Flags: approval1.3b?
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

We're trying to get 1.3beta out the door. This is going to have to wait.
Attachment #113463 - Flags: approval1.3b? → approval1.3b-
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

I'm not sure if I can ask for a for 1.3.
Anyway, it'd be nice to put this tiny (very low risk) patch into 1.3.

change : url-decode image 
urls before putting in the titlebar

risk : very low on all platforms

test : on Linux, it's confirmed that it worked as intended. 
There's no platform dependence.
Attachment #113463 - Flags: approval1.3?
Comment on attachment 113463 [details] [diff] [review]
a new patch with darin's concern addressed

a=asa (on behalf of drivers) for checkin to 1.3.
Attachment #113463 - Flags: approval1.3? → approval1.3+
Looks like this got checked in.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Thank you everyone.
 I forgot to change the status after check-in. I've just
tested under Win2k(with my build) and found that it worked fine. 
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: