Closed Bug 163998 Opened 23 years ago Closed 22 years ago

Image titles should not be URL encoded

Categories

(Core :: Graphics: ImageLib, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: neil, Assigned: pavlov)

References

()

Details

(Keywords: intl)

Attachments

(2 files, 3 obsolete files)

Using Build ID: 2002082104 Steps to reproduce problem: 1. Rename an image to use non-ASCII characters 2. Open the image in Mozilla Expected results: non-ASCII characters appear in window title Actual results: window title is URL encooded For instance, the URL should display as Håkan.ico (ICON image, 32x32 pixels)
well, the problem is: which charset to use... IIRC, some standard says to assume UTF-8... though that would not work for the testcase in this bug. We can not just assume latin1.
bug 9949 wants to display Unicode titles in the titlebar (assuming they're UTF-8 Christian said - that won't help you in this case).
Actually, if I go to <http://www.nrr.co.uk/Håkan.ico> with Mozilla, I get <http://www.nrr.co.uk/H%C3%A5kan.ico>, which looks like UTF-8... :)
OS: Windows 95 → All
A few weeks ago I filed another bug about the fact that I cannot go to www.håkan.nu with Mozilla (it works in IE6). Maybe this is related?
Intersting... when I hover over the URL link that I entered into Bugzilla the status bar displays http://www.nrr.co.uk/Håkan.ico and also when I hover over Tuukka's bugmail URL link the status bar displays http://www.nrr.co.uk/Håkan.ico but when I hover over the URL link in the bugzilla page the status bar displays http://www.nrr.co.uk/HÃ¥kan.ico although all of the links work...
Jo, that bug# seems to be wrong...
As for comment #3, Mozilla converted U+00E5 in the URL in comment #3 encoded in ISO 8859-1/Windows-1252(0xE5) to UTF-8 because the encoding of the page with the URL you cited is *known* (to Mozilla) to be ISO 8859-1/Windows-1252. Anyway, I guess it's a bug NOT to 'unescape' '%-escape' sequence back to UTF-8 before handing it over to 'window titlebar display method' when the encoding of the URL is known and is already converted to UTF-8. I'll try to take a look. As for comment #4, iDNS(using characters outside a subset of US-ASCII allowed in the present DNS system) is a totally different issue. You cannot just use a legacy encoding of your choice(which happened to be ISO-8859-1 in your example) and expect it to be interpreted as you intended by DNS. (there are many hacked/non-standard way of supporting non-ASCII characters in domain name especially in CJK countries, but I believe iDNS in general is still premature) I haven't followed the development of iDNS in general and the status of iDNS support by Mozilla in particular, though. As for comment #6, it's bug 9449 (one of the most often duped and misunderstood bug). The bug was fixed on all platforms (under *nix/X11, a long long time ago. under Windows, recently). So, Mozilla does have the ability to put any Unicode character in the title bar in principle. I'm also adding intl keyword. BTW, this bug may as well be transferred over to I18N.
Keywords: intl
This bug does not belong in Intl. It is entirely within content/html/document/src/nsImageDocument.cpp. The question from comment 1 is not answered. Which charset should Mozilla use for the encoded character? Also, iDNS is entirely unrelated to this bug. iDNS is only for DNS (duh), however this bug is about the filename part of the image, not the hostname.
> This bug does not belong in Intl. > It is entirely within content/html/document/src/nsImageDocument.cpp. Well, I wouldn't say it should belong to intl no matter what, but neither would I say it's nothing to do with intl. > The question from comment 1 is not answered. Which charset should > Mozilla use for the encoded character? The answer is UTF-8, but to cope with non-standard-compliant web pages/user behaviors/web server behaviros/etc, sometimes it may need to try various heuristics to 'read' the mind of users/web page authors and so forth. However, in this particular case, that doesn't seem to be of much relevance judging from comment #3. Mozilla already got the encoding of the filename right and converted it to UTF-8 before url-encoding it. (as corerctly pointed out in comment #3, U+00E5 - Latin small letter a with ring above - is 0xC3 0xe5) Thereofre, the issue is where to url-decode it before handing it over to the method responsible for the window title bar rendering. > Also, iDNS is entirely unrelated to this bug. iDNS is only for > DNS (duh), > however this bug is about the filename part of the image, not the > hostname. That's exactly what I meant to convey in my comment. What I wrote in response to comment #4 was meant to indicate that this bug is NOT related with iDNS because comment #4 was about using non-ASCII chars. in the domain name part of the URL while this bug is about the rest of the URL.
> It is entirely within content/html/document/src/nsImageDocument.cpp A cursory look at the file led me to doubt that this is the case although I wouldn't rule out that possibility. It seems to me that it has to be solved 'upstream?' (e.g. possibly implementation of nsIURL). |UpdateTitle| and |CreateSyntheticDocument| appear to do what they're supposed to do. BTW, it's not clear how to reproduce this bug because the following cases are all different cases and I'm not sure which of them this bug is about: 1. Type in the URL with non-ASCII characters in the url bar or the url-open dialog box. Non-ASCII characters are supposed to be interpreted as the encoding/charset of the current locale under which Mozilla is running. 2. Make an html document in MIME charset A, put a link to an image with a URL with non-ASCII characters in the filename part as 'raw octet/byte string' in MIME charset/encoding A. Make it clear that the document is in charset A in either the http header or the document itself(with meta tag). a. view it under Mozilla with MIME charset set to A b. view it under Mozilla with MIME charset set other than A 3. Same as 2, but url-encode the filename in charset/encoding A. 4. Same as 2, but url-encode the filename in UTF-8 Case 2 and 3 are not compliant to the standard, but in practice there are many cases like them. A simple way to check whether this bug is confined to |nsImageDocument| or is more generic is to test all four cases above with two links/urls, one to an image and the other to a non-image (e.g. text/html, text/plain).
> A cursory look at the file led me to doubt that this is the case That file is responsible for setting the title when an image document is loaded (besides lots of other things, of course) > The answer is UTF-8 ok, I'll attach a patch
Attached patch better patchSplinter Review
this one is better, if the string is no valid utf-8, the unescaped filename is displayed.
Attachment #113293 - Attachment is obsolete: true
Distracted by a 'grand' issue (which turned out to have been taken care of mostly by nsIUR(I|L) implementions), I contradicted what I wrote in comment #9 > the issue is where to url-decode it before handing > it over to the method responsible for the window title bar > rendering. by saying in comment #10 > A cursory look at the file led me to doubt that this is the case This realization bothered me and I got out of bed to make a patch, but you did it already. However, I have a slightly different patch copied from somewhere in necko (which may or may not work) so that I'm gonna upload it.
Attached patch an alternative (obsolete) — Splinter Review
I haven't tested this yet. If it works as intended, it falls back to 'charset' of the refering document (originCharset) if filename is not UTF-8.
Comment on attachment 113307 [details] [diff] [review] an alternative hmmm interesting I did not know that URIs had a charset stored on them. This patch does look like a better solution. However, imho you should check the return value of GetOriginCharset too. I'd try to get r=jst, sr=darin on this patch.
added a check for GetOriginCharset(). I have yet to check if this (GetOriginCharset) really works for this case, though.
Attachment #113307 - Attachment is obsolete: true
Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment well, the documentation in nsIURI.idl of originCharset definitely sounds like it should work. but you could ask darin, who is the networking owner, which is why I suggested him for super-review.
Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment Judging from the behavior of Mozilla, it works as intended with my patch. I tried to get some diagnostic output, but failed. Nonetheless, I'm asking r=jst and sr=darin as suggested by Christopher.
Attachment #113323 - Flags: superreview?(darin)
Attachment #113323 - Flags: review?(jst)
I'm now sure that it works as intended with attachment 113323 [details] [diff] [review]. I ran 'xprop' over mozilla and found that_NET_WM_NAME(UTF8_STRING) had values they're supposed to have when view|char. coding was set to Western and Korean(EUC).
Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment nsIURI::originCharset is very often empty indicating an origin charset of UTF-8. in which case, it looks like this code would enter UnEscapeURIForUI and error out as a result of there being no converter registered under the empty string. so, this case seems to be covered, but please be sure to verify it. thx!
Attachment #113323 - Flags: superreview?(darin) → superreview+
Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment What darin said, r=jst
Attachment #113323 - Flags: review?(jst) → review+
Thank you jst and darin for r/sr and pointing out the problem. I thought that empty originCharset would lead to 'error out' further down the road than nsTextToSubURI.cpp and didn't look further. It turned out that it didn't get checked. So, I added a check to nsTextToSubURI.cpp
Attachment #113323 - Attachment is obsolete: true
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed Assuming jst's review on nsImageDocument.cpp still standing, now I'm asking nhotta for r on nsTextToSubURI.cpp part. darin, after that, can you transfer your sr? Thank you.
Attachment #113463 - Flags: superreview?(darin)
Attachment #113463 - Flags: review?(nhotta)
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed nsCOMPtr<nsITextToSubURI> textToSubURI I think the line can move after getting the origin charset.
Attachment #113463 - Flags: review?(nhotta) → review+
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed sr=darin
Attachment #113463 - Flags: superreview?(darin) → superreview+
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed change : url-decode image urls before putting in the titlebar risk : low on all platforms test : on Linux, it's confirmed that it worked as intended. There's no platform dependence. On nhotta's comment: I considered checking |originCharset| before |nsTextToSubURI| in my first patch, but didn't because all the implemenations of |GetOriginCharset| always return NS_OK as they're now.
Attachment #113463 - Flags: approval1.3b?
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed We're trying to get 1.3beta out the door. This is going to have to wait.
Attachment #113463 - Flags: approval1.3b? → approval1.3b-
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed I'm not sure if I can ask for a for 1.3. Anyway, it'd be nice to put this tiny (very low risk) patch into 1.3. change : url-decode image urls before putting in the titlebar risk : very low on all platforms test : on Linux, it's confirmed that it worked as intended. There's no platform dependence.
Attachment #113463 - Flags: approval1.3?
Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed a=asa (on behalf of drivers) for checkin to 1.3.
Attachment #113463 - Flags: approval1.3? → approval1.3+
Looks like this got checked in.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Thank you everyone. I forgot to change the status after check-in. I've just tested under Win2k(with my build) and found that it worked fine.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: