Closed Bug 163998 Opened 23 years ago Closed 22 years ago

Image titles should not be URL encoded

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: neil, Assigned: pavlov)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(2 files, 3 obsolete files)

patch 22 years ago Christian :Biesinger (don't email me, ping me on IRC) 1.03 KB, patch		Details \| Diff \| Splinter Review
better patch 22 years ago Christian :Biesinger (don't email me, ping me on IRC) 1.60 KB, patch		Details \| Diff \| Splinter Review
an alternative 22 years ago Jungshik Shin 1.38 KB, patch		Details \| Diff \| Splinter Review
update to my alternative per Christopher's comment 22 years ago Jungshik Shin 1.42 KB, patch	jst : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
a new patch with darin's concern addressed 22 years ago Jungshik Shin 2.10 KB, patch	nhottanscp : review+ darin.moz : superreview+ asa : approval1.3b- asa : approval1.3+	Details \| Diff \| Splinter Review

neil@parkwaycc.co.uk

Reporter

Description

•

23 years ago

Using Build ID: 2002082104 Steps to reproduce problem: 1. Rename an image to use non-ASCII characters 2. Open the image in Mozilla Expected results: non-ASCII characters appear in window title Actual results: window title is URL encooded For instance, the URL should display as Håkan.ico (ICON image, 32x32 pixels)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 1

•

23 years ago

well, the problem is: which charset to use... IIRC, some standard says to assume UTF-8... though that would not work for the testcase in this bug. We can not just assume latin1.

Jo Hermans

Comment 2

•

23 years ago

bug 9949 wants to display Unicode titles in the titlebar (assuming they're UTF-8 Christian said - that won't help you in this case).

Tuukka Tolvanen (sp3000)

Comment 3

•

23 years ago

Actually, if I go to <http://www.nrr.co.uk/Håkan.ico> with Mozilla, I get <http://www.nrr.co.uk/H%C3%A5kan.ico>, which looks like UTF-8... :)

OS: Windows 95 → All

Håkan Waara

Comment 4

•

23 years ago

A few weeks ago I filed another bug about the fact that I cannot go to www.håkan.nu with Mozilla (it works in IE6). Maybe this is related?

neil@parkwaycc.co.uk

Reporter

Comment 5

•

23 years ago

Intersting... when I hover over the URL link that I entered into Bugzilla the status bar displays http://www.nrr.co.uk/Håkan.ico and also when I hover over Tuukka's bugmail URL link the status bar displays http://www.nrr.co.uk/Håkan.ico but when I hover over the URL link in the bugzilla page the status bar displays http://www.nrr.co.uk/HÃ¥kan.ico although all of the links work...

Christian :Biesinger (don't email me, ping me on IRC)

Comment 6

•

23 years ago

Jo, that bug# seems to be wrong...

Jungshik Shin

Comment 7

•

22 years ago

As for comment #3, Mozilla converted U+00E5 in the URL in comment #3 encoded in ISO 8859-1/Windows-1252(0xE5) to UTF-8 because the encoding of the page with the URL you cited is *known* (to Mozilla) to be ISO 8859-1/Windows-1252. Anyway, I guess it's a bug NOT to 'unescape' '%-escape' sequence back to UTF-8 before handing it over to 'window titlebar display method' when the encoding of the URL is known and is already converted to UTF-8. I'll try to take a look. As for comment #4, iDNS(using characters outside a subset of US-ASCII allowed in the present DNS system) is a totally different issue. You cannot just use a legacy encoding of your choice(which happened to be ISO-8859-1 in your example) and expect it to be interpreted as you intended by DNS. (there are many hacked/non-standard way of supporting non-ASCII characters in domain name especially in CJK countries, but I believe iDNS in general is still premature) I haven't followed the development of iDNS in general and the status of iDNS support by Mozilla in particular, though. As for comment #6, it's bug 9449 (one of the most often duped and misunderstood bug). The bug was fixed on all platforms (under *nix/X11, a long long time ago. under Windows, recently). So, Mozilla does have the ability to put any Unicode character in the title bar in principle. I'm also adding intl keyword. BTW, this bug may as well be transferred over to I18N.

Keywords: intl

Christian :Biesinger (don't email me, ping me on IRC)

Comment 8

•

22 years ago

This bug does not belong in Intl. It is entirely within content/html/document/src/nsImageDocument.cpp. The question from comment 1 is not answered. Which charset should Mozilla use for the encoded character? Also, iDNS is entirely unrelated to this bug. iDNS is only for DNS (duh), however this bug is about the filename part of the image, not the hostname.

Jungshik Shin

Comment 9

•

22 years ago

> This bug does not belong in Intl. > It is entirely within content/html/document/src/nsImageDocument.cpp. Well, I wouldn't say it should belong to intl no matter what, but neither would I say it's nothing to do with intl. > The question from comment 1 is not answered. Which charset should > Mozilla use for the encoded character? The answer is UTF-8, but to cope with non-standard-compliant web pages/user behaviors/web server behaviros/etc, sometimes it may need to try various heuristics to 'read' the mind of users/web page authors and so forth. However, in this particular case, that doesn't seem to be of much relevance judging from comment #3. Mozilla already got the encoding of the filename right and converted it to UTF-8 before url-encoding it. (as corerctly pointed out in comment #3, U+00E5 - Latin small letter a with ring above - is 0xC3 0xe5) Thereofre, the issue is where to url-decode it before handing it over to the method responsible for the window title bar rendering. > Also, iDNS is entirely unrelated to this bug. iDNS is only for > DNS (duh), > however this bug is about the filename part of the image, not the > hostname. That's exactly what I meant to convey in my comment. What I wrote in response to comment #4 was meant to indicate that this bug is NOT related with iDNS because comment #4 was about using non-ASCII chars. in the domain name part of the URL while this bug is about the rest of the URL.

Jungshik Shin

Comment 10

•

22 years ago

> It is entirely within content/html/document/src/nsImageDocument.cpp A cursory look at the file led me to doubt that this is the case although I wouldn't rule out that possibility. It seems to me that it has to be solved 'upstream?' (e.g. possibly implementation of nsIURL). |UpdateTitle| and |CreateSyntheticDocument| appear to do what they're supposed to do. BTW, it's not clear how to reproduce this bug because the following cases are all different cases and I'm not sure which of them this bug is about: 1. Type in the URL with non-ASCII characters in the url bar or the url-open dialog box. Non-ASCII characters are supposed to be interpreted as the encoding/charset of the current locale under which Mozilla is running. 2. Make an html document in MIME charset A, put a link to an image with a URL with non-ASCII characters in the filename part as 'raw octet/byte string' in MIME charset/encoding A. Make it clear that the document is in charset A in either the http header or the document itself(with meta tag). a. view it under Mozilla with MIME charset set to A b. view it under Mozilla with MIME charset set other than A 3. Same as 2, but url-encode the filename in charset/encoding A. 4. Same as 2, but url-encode the filename in UTF-8 Case 2 and 3 are not compliant to the standard, but in practice there are many cases like them. A simple way to check whether this bug is confined to |nsImageDocument| or is more generic is to test all four cases above with two links/urls, one to an image and the other to a non-image (e.g. text/html, text/plain).

Christian :Biesinger (don't email me, ping me on IRC)

Comment 11

•

22 years ago

> A cursory look at the file led me to doubt that this is the case That file is responsible for setting the title when an image document is loaded (besides lots of other things, of course) > The answer is UTF-8 ok, I'll attach a patch

Christian :Biesinger (don't email me, ping me on IRC)

Comment 12

•

22 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Christian :Biesinger (don't email me, ping me on IRC)

Comment 13

•

22 years ago

Attached patch better patch — Details — Splinter Review

this one is better, if the string is no valid utf-8, the unescaped filename is displayed.

Attachment #113293 - Attachment is obsolete: true

Jungshik Shin

Comment 14

•

22 years ago

Distracted by a 'grand' issue (which turned out to have been taken care of mostly by nsIUR(I|L) implementions), I contradicted what I wrote in comment #9 > the issue is where to url-decode it before handing > it over to the method responsible for the window title bar > rendering. by saying in comment #10 > A cursory look at the file led me to doubt that this is the case This realization bothered me and I got out of bed to make a patch, but you did it already. However, I have a slightly different patch copied from somewhere in necko (which may or may not work) so that I'm gonna upload it.

Jungshik Shin

Comment 15

•

22 years ago

Attached patch an alternative (obsolete) — Details — Splinter Review

I haven't tested this yet. If it works as intended, it falls back to 'charset' of the refering document (originCharset) if filename is not UTF-8.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 16

•

22 years ago

Comment on attachment 113307 [details] [diff] [review] an alternative hmmm interesting I did not know that URIs had a charset stored on them. This patch does look like a better solution. However, imho you should check the return value of GetOriginCharset too. I'd try to get r=jst, sr=darin on this patch.

Jungshik Shin

Comment 17

•

22 years ago

Attached patch update to my alternative per Christopher's comment (obsolete) — Details — Splinter Review

added a check for GetOriginCharset(). I have yet to check if this (GetOriginCharset) really works for this case, though.

Attachment #113307 - Attachment is obsolete: true

Christian :Biesinger (don't email me, ping me on IRC)

Comment 18

•

22 years ago

Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment well, the documentation in nsIURI.idl of originCharset definitely sounds like it should work. but you could ask darin, who is the networking owner, which is why I suggested him for super-review.

Jungshik Shin

Comment 19

•

22 years ago

Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment Judging from the behavior of Mozilla, it works as intended with my patch. I tried to get some diagnostic output, but failed. Nonetheless, I'm asking r=jst and sr=darin as suggested by Christopher.

Attachment #113323 - Flags: superreview?(darin)

Attachment #113323 - Flags: review?(jst)

Jungshik Shin

Comment 20

•

22 years ago

I'm now sure that it works as intended with attachment 113323 [details] [diff] [review]. I ran 'xprop' over mozilla and found that_NET_WM_NAME(UTF8_STRING) had values they're supposed to have when view|char. coding was set to Western and Korean(EUC).

Darin Fisher

Comment 21

•

22 years ago

Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment nsIURI::originCharset is very often empty indicating an origin charset of UTF-8. in which case, it looks like this code would enter UnEscapeURIForUI and error out as a result of there being no converter registered under the empty string. so, this case seems to be covered, but please be sure to verify it. thx!

Attachment #113323 - Flags: superreview?(darin) → superreview+

Johnny Stenback (:jst)

Comment 22

•

22 years ago

Comment on attachment 113323 [details] [diff] [review] update to my alternative per Christopher's comment What darin said, r=jst

Attachment #113323 - Flags: review?(jst) → review+

Jungshik Shin

Comment 23

•

22 years ago

Attached patch a new patch with darin's concern addressed — Details — Splinter Review

Thank you jst and darin for r/sr and pointing out the problem. I thought that empty originCharset would lead to 'error out' further down the road than nsTextToSubURI.cpp and didn't look further. It turned out that it didn't get checked. So, I added a check to nsTextToSubURI.cpp

Attachment #113323 - Attachment is obsolete: true

Jungshik Shin

Comment 24

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed Assuming jst's review on nsImageDocument.cpp still standing, now I'm asking nhotta for r on nsTextToSubURI.cpp part. darin, after that, can you transfer your sr? Thank you.

Attachment #113463 - Flags: superreview?(darin)

Attachment #113463 - Flags: review?(nhotta)

nhottanscp

Comment 25

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed nsCOMPtr<nsITextToSubURI> textToSubURI I think the line can move after getting the origin charset.

Attachment #113463 - Flags: review?(nhotta) → review+

Darin Fisher

Comment 26

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed sr=darin

Attachment #113463 - Flags: superreview?(darin) → superreview+

Jungshik Shin

Comment 27

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed change : url-decode image urls before putting in the titlebar risk : low on all platforms test : on Linux, it's confirmed that it worked as intended. There's no platform dependence. On nhotta's comment: I considered checking |originCharset| before |nsTextToSubURI| in my first patch, but didn't because all the implemenations of |GetOriginCharset| always return NS_OK as they're now.

Attachment #113463 - Flags: approval1.3b?

Asa Dotzler [:asa]

Comment 28

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed We're trying to get 1.3beta out the door. This is going to have to wait.

Attachment #113463 - Flags: approval1.3b? → approval1.3b-

Jungshik Shin

Comment 29

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed I'm not sure if I can ask for a for 1.3. Anyway, it'd be nice to put this tiny (very low risk) patch into 1.3. change : url-decode image urls before putting in the titlebar risk : very low on all platforms test : on Linux, it's confirmed that it worked as intended. There's no platform dependence.

Attachment #113463 - Flags: approval1.3?

Asa Dotzler [:asa]

Comment 30

•

22 years ago

Comment on attachment 113463 [details] [diff] [review] a new patch with darin's concern addressed a=asa (on behalf of drivers) for checkin to 1.3.

Attachment #113463 - Flags: approval1.3? → approval1.3+

neil@parkwaycc.co.uk

Reporter

Comment 31

•

22 years ago

Looks like this got checked in.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Jungshik Shin

Comment 32

•

22 years ago

Thank you everyone. I forgot to change the status after check-in. I've just tested under Win2k(with my build) and found that it worked fine.

You need to log in before you can comment on or make changes to this bug.