Closed Bug 158171 Opened 22 years ago Closed 14 years ago

DBCS tab title can't be displayed properly except for html file type.

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: kasumi, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(3 files, 3 obsolete files)

detail 22 years ago Kasumi 48.73 KB, image/jpeg		Details
patch from bug 167771, added unescape the title, with a charset if non ASCII. 22 years ago nhottanscp 1.26 KB, patch	nhottanscp : superreview+	Details \| Diff \| Splinter Review
Unescape non ASCII. 22 years ago nhottanscp 1.17 KB, patch	caillon : review+	Details \| Diff \| Splinter Review
updated patch using "browser.contentDocument.characterSet" 22 years ago nhottanscp 1.03 KB, patch		Details \| Diff \| Splinter Review
Changed to use Components.lookupMethod to get characterSet. 22 years ago nhottanscp 1.19 KB, patch		Details \| Diff \| Splinter Review
Same as Naoki's patch, but with the check moved and a comment addition 22 years ago Christopher Aillon (sabbatical, not receiving bugmail) 1.60 KB, patch	nhottanscp : review+ nhottanscp : superreview+	Details \| Diff \| Splinter Review

Kasumi

Reporter

Description

•

22 years ago

tested on 2007-07-17-08-1.0
Win XP pro. JA SP1 beta1
1. create a .txt file having DBCS as body and file name
2. create a .gif/.jpeg file having DBCS as file name
3. Select File/Open Web location 
4. Specify those files as New Navigator tab
   For .txt file, displayed as "(Untitled)"
   For .gif/.jpeg file, displayed in return code
   More detail, please refer attachment

Kasumi

Reporter

Updated

•

22 years ago

Keywords: intl

Summary: DBCS tab title can't be displayed properly except for html file type. → DBCS tab title can't be displayed properly except for html file type.

Kasumi

Reporter

Comment 1

•

22 years ago

Attached image detail — Details

Yuying Long

Comment 2

•

22 years ago

We display tab title as html source <title> for html file.  And for non-ascii
file name, we currently escape them in many case, e.g. file | open and save
file...etc.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 3

•

22 years ago

->me

Assignee: jaggernaut → caillon

Christopher Aillon (sabbatical, not receiving bugmail)

Updated

•

22 years ago

Blocks: 152127

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 4

•

22 years ago

This bug has been fixed by my check-in to bug 152127.  Resolving fixed.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

sairuh (rarely reading bugmail)

Comment 5

•

22 years ago

hi Kasumi,

could you pls verify whether this is fixed for you using recent TRUNK builds?
thanks!

Kasumi

Reporter

Comment 6

•

22 years ago

2002-07-29-08-trunk on Win XP Pro. JA SP1 beta1
I can still reproduce, didn't fix yet.

sairuh (rarely reading bugmail)

Comment 7

•

22 years ago

reopening per comment 6. kasumi, i'm handing over qa contact to you since you
have the appropriate configuration for testing this (i don't)...

Status: RESOLVED → REOPENED

QA Contact: sairuh → kasumi

Resolution: FIXED → ---

neil@parkwaycc.co.uk

Comment 8

•

22 years ago

Comment on attachment 91846 [details]
detail

This attachment looks correct to me, as far as the tabbed browser is concerned;
if you look at the browser title then this should match the tab title which it
does.

neil@parkwaycc.co.uk

Comment 9

•

22 years ago

If the browser title for an image is incorrect then that's an imagelib problem,
not a tabbed browser problem.

Kasumi

Reporter

Comment 10

•

22 years ago

*** Bug 167771 has been marked as a duplicate of this bug. ***

nhottanscp

Assignee

Comment 11

•

22 years ago

Attached patch patch from bug 167771, added unescape the title, with a charset if non ASCII. (obsolete) — Details — Splinter Review

nhottanscp

Assignee

Comment 12

•

22 years ago

Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

copy sr=bzbarsky
from bug 167771

Attachment #98741 - Flags: superreview+

nhottanscp

Assignee

Comment 13

•

22 years ago

caillon, could you review the patch? The URI needs unescaping (with a charset
for non ASCII).

Assignee: caillon → nhotta

Status: REOPENED → NEW

nhottanscp

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.2beta

neil@parkwaycc.co.uk

Comment 14

•

22 years ago

Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

caillon, doesn't this patch conflict with your patch to bug 164006?

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 15

•

22 years ago

Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

>Index: bindings/tabbrowser.xml
>===================================================================

Yes, as Neil points out you will have merge conflicts with the patch I checked
in for bug 164006.

>               title = titleViaGetter
>             else if (browser.currentURI.spec && browser.currentURI.spec != "about:blank") {
>               title = browser.currentURI.spec;
>+              try {
>+                title = unescape(title);
>+              } catch (e) {

Erm, how will this code within this catch ever be executed?  Unless I am
reading ECMA-262 Level 3 incorrectly, unescape should never throw any
exceptions or errors... Do we implement something different?

Deferring review of this patch pending an answer for the above question.

nhottanscp

Assignee

Comment 16

•

22 years ago

>Unless I am
>reading ECMA-262 Level 3 incorrectly, unescape should never throw any
>exceptions or errors...

If that is the case then I can change it to always unescape with the document
charset instead.

Boris Zbarsky [:bzbarsky]

Comment 17

•

22 years ago

unescape will certainly throw an exception in our implementation if the result,
after unescaping is not valid "something" (dunno whether "something" is ASCII or
UTF8 or ISO-8859-1 or what; I did not experiment).

I suspect that the ECMA definition is just plain not compatible with existing
real-world practice, based on a brief skim.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 18

•

22 years ago

Phil, any insight on this? (see comment 15, comment 17).  If we do go against
the spec, will we keep it that way and try and change the spec?  Or is that a
bug in our impl (that won't get wontfixed due to compat)?  I suppose the former,
and that's fine, I just need to know so I can feel confident about reviewing
this patch :-)

Phil Schwartau

Comment 19

•

22 years ago

From what I know, the escape() function of JS Engine is 
superseded by the escape() function of the DOM, and I don't
know if the DOM function ever generates exceptions -

This issue came up, for example, in bug 58813,
"escaped string throws JavaScript error"

That makes it sound as if Boris is right -

Phil Schwartau

Comment 20

•

22 years ago

Oops - my comment treated escape(), when you were asking about 
unescape(). But I think the same principle applies: the function
that exists in the JavaScript Engine (subject to ECMA-262 Ed. 3)
is superseded in the browser by the DOM version of the function.

I'm not as familiar with the DOM escape() and unescape() functions
as a developer would be, but I suspect both can generate exceptions -

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 21

•

22 years ago

Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

>Index: bindings/tabbrowser.xml
>===================================================================
>RCS file: /cvsroot/mozilla/xpfe/global/resources/content/bindings/tabbrowser.xml,v
>retrieving revision 1.60
>diff -u -r1.60 tabbrowser.xml
>--- bindings/tabbrowser.xml	10 Sep 2002 08:11:53 -0000	1.60
>+++ bindings/tabbrowser.xml	10 Sep 2002 22:35:49 -0000
>@@ -553,6 +553,19 @@
>               title = titleViaGetter
>             else if (browser.currentURI.spec && browser.currentURI.spec != "about:blank") {
>               title = browser.currentURI.spec;
>+              try {
>+                title = unescape(title);
>+              } catch (e) {

The code right above this uses ex as the exception variable (which you can't
see because you are diffing against an older revision.	Please update and use
the same name.

>+                try {
>+                  // the URI might be non ASCII

"non-ASCII"

>+                  // try unescape again with a characterSet

"Try to unescape again, with a character set this time"

>+                  var textToSubURI = Components.classes["@mozilla.org/intl/texttosuburi;1"]

make textToSubURI a const?  It will never change...

>+                                               .getService(Components.interfaces.nsITextToSubURI);
>+                  title = textToSubURI.unEscapeURIForUI(browser.contentDocument.characterSet, 

I would rather you use the .originCharset of one of the nsIURIs we have. 
browser.currentURI or better, its fixed up equivalent will work nicely here. 
The reason for this is that browser.contentDocument.characterSet is _not_
guaranteed to be the actual characterSet of the document.  A script can easily
mess with that and make it be something that you aren't expecting.

>+                                                        title);


Additionally, I have an issue with the whole thing about unescaping.  The point
of having URLs in the tab title is so you can see the URL...  If a URL of a
site is 'http://foo/bar%20baz/' then I want to see 'http://foo/bar%20baz' and
not 'http://foo/bar baz/'  I don't mind if you want to make it so that DBCS
displays, but I'm not sure I like the whole idea of unconditionally
unescaping... I'd like to hear what jag says about this.  Nonetheless, this
patch needs some work as it stands, and I'd like to see the next patch diffed
against the current revision.  (Sorry for the conflicts)

Attachment #98741 - Flags: needs-work+

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 22

•

22 years ago

By the way, thanks Phil for clarifying that up.  I didn't realize there was a
DOM version of the function as well.  It all makes sense now and I bow to your
knowledge of the specs.  :-)

jag (Peter Annema)

Comment 23

•

22 years ago

Let's see... We want a URI to display in its URI form (escaped, with all its
warts), and we also want filenames to be "human readable", which is achieved by
unescaping the filename part. We do this unconditionally for "File -> Save Page
As..." because we're always dealing with a file there. In the case of tab
titles, sometimes it's a file, sometimes it's not. Is the happy golden road here
that we look at the protocol, and if it's file, we unescape, if not, we present
it as-is?

This has the drawback that a file can have two display forms, the raw URI text
when loaded from a server, the unescaped one when loading from disk. Is this
acceptable?

Perhaps we should display 'http://foo/bar%20baz/' as 'http://foo/bar baz/'?
After all, it's only a title, a reminder of what's to be found in the tab, not
the real URI. It would certainly make 'http://foopy.com/Hello World.html' more
readable.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 24

•

22 years ago

Jag, whichever you decide, is fine by me.  I wanted mainly for you to be aware
of this and to have a say in it.  My argument is that by displaying
http://foo/bar in the first place (escaped or not) it appears to the user as a
URL.  Would a user get confused and attempt to type the URL as it appears on the
tab title?  Unlikely, but it's possible.  Is it a big deal?  I don't know.

The current behavior has the potential of confusing users who have several tabs
open.  Consider 'http://foo/page%202.html' (page 2.html) and
'http://foo/page202.html' (page202.html).  With multiple tabs and cropping the
current behavior may have the user see 'http://foo...202.html' for both pages.

I _do_ think that if we decide to unescape, we should try and make the title not
look like a URL if possible.

Something like 'http://foo/ - page 2.html' vs. 'http://foo/ - page202.html'

The inner stuff will get cropped off anyway on the tab title, and it may make
the tooltip more useful.  Get the important information at the quick glance
you're afforded with tooltips (they dissapear after a certain timeout), and if
you want the real URI, click the tab and get it from the URL bar....

nhottanscp

Assignee

Comment 25

•

22 years ago

>I would rather you use the .originCharset of one of the nsIURIs we have. 

I am rewriting the patch, but using "browser.currentURI.originCharset" does not
work because it is UTF-8 instead of a charset of the document. Can I use
"browser.contentDocument.characterSet"?

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 26

•

22 years ago

> Can I use "browser.contentDocument.characterSet"?

No.  If you have to, for now use:

  Components.lookupMethod(browser.contentDocument, 'charset')
            .call(browser.contentDocument);

A site can easily change the characterSet with a single JS statement.  I'm
writing wrappers for the above statement, but in the meantime, it will do.

nhottanscp

Assignee

Comment 27

•

22 years ago

>A site can easily change the characterSet with a single JS statement.  
I am not sure if that happens frequently. When do you think that could happen?

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 28

•

22 years ago

>I am not sure if that happens frequently.

I can tell you for a fact that it does not happen frequently.

>When do you think that could happen?

Whenever an author wants to do it.  But that's the point - that it CAN happen. 
IF the user changes the characterSet to something that you are not expecting
(e.g. an array, a function, or other weird things), the behavior of your code is
then pretty much undefined.  Let us not even go there.

jag (Peter Annema)

Comment 29

•

22 years ago

Right, frequency doesn't matter, it's the impact of when it happens that counts,
and if there's a simple way to work around is, as shown by caillon, then that's
the way to go. See bug 159667 for details.

nhottanscp

Assignee

Comment 30

•

22 years ago

I was curious why/how that is used. Changing the document charset often makes
the page not readable.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 31

•

22 years ago

One way is to create an HTML page with <img name="characterSet">.

nhottanscp

Assignee

Comment 32

•

22 years ago

Attached patch Unescape non ASCII. (obsolete) — Details — Splinter Review

nhottanscp

Assignee

Updated

•

22 years ago

Attachment #98741 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 33

•

22 years ago

caillon@returnzero.com, please review the patch.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 34

•

22 years ago

Comment on attachment 102033 [details] [diff] [review]
Unescape non ASCII.

r=caillon

Attachment #102033 - Flags: review+

nhottanscp

Assignee

Comment 35

•

22 years ago

bzbarsky, could you review the last patch?

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 36

•

22 years ago

Comment on attachment 102033 [details] [diff] [review]
Unescape non ASCII.

Actually, thinking back on this, we have several places IIRC that want to get
at the characterSet of the document.  Jag, what do you think about adding a
<property name="characterSet"> or something like we did for contentTitle
(document.title)?

Boris Zbarsky [:bzbarsky]

Comment 37

•

22 years ago

Shouldn't that code go inside the "if (browser.currentURI.spec) {" block?

I can't remember the lookupMethod syntax offhand, but it seems odd to me that
one would use lookupMethod to get a property.... Is that correct?

nhottanscp

Assignee

Comment 38

•

22 years ago

There are two issues raised during the review process.
* Getting the charset - the obvious "browser.contentDocument.characterSet" may not
  be correct (comment #26).
  This can be filed as a separate bug with a real example if possible.
* Generic improvment idea of getting characterSet of the document (comment #36).
  This also does not have to be a part of this bug.

I will attach a patch which uses "browser.contentDocument.characterSet".

nhottanscp

Assignee

Comment 39

•

22 years ago

Attached patch updated patch using "browser.contentDocument.characterSet" (obsolete) — Details — Splinter Review

Attachment #102033 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 40

•

22 years ago

caillon/bzbarsky, please r/sr, thanks.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 41

•

22 years ago

Comment on attachment 102179 [details] [diff] [review]
updated patch using "browser.contentDocument.characterSet"

Absolutely positively NO.  I already explained why this should NOT happen.  The
syntax you used is correct, bzbarsky was a little confused (with good reason)
because of the way the function was named.  Please do not post patches just
based on questions.  Questions are made to be answered, not patched. Anyway,
Components.lookupMethod is correct because it looks up the getter, which is a
method.

Attachment #102179 - Flags: needs-work+

nhottanscp

Assignee

Comment 42

•

22 years ago

>I already explained why this should NOT happen.
I am not sure which comment. Could you give me the comment # of your explanation?

Boris Zbarsky [:bzbarsky]

Comment 43

•

22 years ago

Comment 26.  Basically, document.characterSet could be set by the site, and it
need not even be set to a string.  It could be an object that would do icky evil
things.

sr=me on attachment 102033 [details] [diff] [review] if the code is moved to the place where it is in
attachment 102179 [details] [diff] [review]

nhottanscp

Assignee

Comment 44

•

22 years ago

Attached patch Changed to use Components.lookupMethod to get characterSet. — Details — Splinter Review

Attachment #102179 - Attachment is obsolete: true

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 45

•

22 years ago

Attached patch Same as Naoki's patch, but with the check moved and a comment addition — Details — Splinter Review

This is the patch I think we should use.  If we have the check anywhere, we
should do this for URIs which we actually care about the URI (eg any except
about:blank) since those go a different route (string bundles).

r=caillon for this patch, since it's really nhottas.

nhottanscp

Assignee

Comment 46

•

22 years ago

Comment on attachment 102235 [details] [diff] [review]
Same as Naoki's patch, but with the check moved and a comment addition

r=caillon, sr=bzbarsky
according to comment #43 and #45

Attachment #102235 - Flags: superreview+

Attachment #102235 - Flags: review+

Boris Zbarsky [:bzbarsky]

Comment 47

•

22 years ago

yeah, sr=me....

nhottanscp

Assignee

Comment 48

•

22 years ago

checked in to the trunk

Status: ASSIGNED → RESOLVED

Closed: 22 years ago → 22 years ago

Resolution: --- → FIXED

Kasumi

Reporter

Comment 49

•

22 years ago

tested 0n 2002-10-09-11-trunk
Win XP Pro. JA
Still occurred.
Did I use wrong build ?

Kasumi

Reporter

Comment 50

•

22 years ago

tested on 2002-10-09-11-trunk
Win XP Pro. JA
Confirmed fix for .txt file but still occur for graphical files such as .gif,
.jpg and .bmp.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

nhottanscp

Assignee

Comment 51

•

22 years ago

There is no good way to get charsets for those types of files.

Status: REOPENED → ASSIGNED

Target Milestone: mozilla1.2beta → ---

Boris Zbarsky [:bzbarsky]

Comment 52

•

22 years ago

Yeah; such files will never be served with a useful charset. Do we have a way of
getting the document URI?  We could try falling back on its origin charset if
the documentCharacterSet is empty....

nhottanscp

Assignee

Comment 53

•

22 years ago

Yes, or we can put the fallback mechnism to the unescape funtion instead of the
caller to figure out how to fallback.
There are two cases, if local file we can fallback to the OS charset, if ftp
then fallback to the browser's default.
But for the local file case, this may be resovled once the file system switch to
use UTF-8 (bug 166735, bug 162361) since the current code already handles UTF-8 URL.

neil@parkwaycc.co.uk

Comment 54

•

22 years ago

> Confirmed fix for .txt file but still occur for graphical files such as .gif,
> .jpg and .bmp.

I thought that for graphical files the title was generated by imglib?

Kasumi

Reporter

Comment 55

•

22 years ago

Bug 173616 said Technically mozilla renders the image by constructing a small
html in #1.
If it's true, graphical file name in tab can be handled same as html.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 56

•

22 years ago

>I thought that for graphical files the title was generated by imglib?

well, if you consider content/html/document/src/nsImageDocument.cpp as a part of
imagelib then yes. anyway, that file creates the synthetic html document for
images and therefore also the title.

Samir Gehani

Updated

•

21 years ago

Blocks: 173151

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → SeaMonkey

Serge Gautherie (:sgautherie)

Updated

•

16 years ago

QA Contact: kasumi → tabbed-browser

Robert Kaiser

Comment 57

•

14 years ago

Something has been fixed here a long time ago. Please file followup bugs for any remaining issues.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago → 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.