Closed Bug 158171 Opened 22 years ago Closed 14 years ago

DBCS tab title can't be displayed properly except for html file type.

Categories

(SeaMonkey :: Tabbed Browser, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kasumi, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(3 files, 3 obsolete files)

tested on 2007-07-17-08-1.0
Win XP pro. JA SP1 beta1
1. create a .txt file having DBCS as body and file name
2. create a .gif/.jpeg file having DBCS as file name
3. Select File/Open Web location 
4. Specify those files as New Navigator tab
   For .txt file, displayed as "(Untitled)"
   For .gif/.jpeg file, displayed in return code
   More detail, please refer attachment
Keywords: intl
Summary: DBCS tab title can't be displayed properly except for html file type. → DBCS tab title can't be displayed properly except for html file type.
Attached image detail
We display tab title as html source <title> for html file.  And for non-ascii
file name, we currently escape them in many case, e.g. file | open and save
file...etc.
->me
Assignee: jaggernaut → caillon
This bug has been fixed by my check-in to bug 152127.  Resolving fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
hi Kasumi,

could you pls verify whether this is fixed for you using recent TRUNK builds?
thanks!
2002-07-29-08-trunk on Win XP Pro. JA SP1 beta1
I can still reproduce, didn't fix yet.
reopening per comment 6. kasumi, i'm handing over qa contact to you since you
have the appropriate configuration for testing this (i don't)...
Status: RESOLVED → REOPENED
QA Contact: sairuh → kasumi
Resolution: FIXED → ---
Comment on attachment 91846 [details]
detail

This attachment looks correct to me, as far as the tabbed browser is concerned;
if you look at the browser title then this should match the tab title which it
does.
If the browser title for an image is incorrect then that's an imagelib problem,
not a tabbed browser problem.
*** Bug 167771 has been marked as a duplicate of this bug. ***
Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

copy sr=bzbarsky
from bug 167771
Attachment #98741 - Flags: superreview+
caillon, could you review the patch? The URI needs unescaping (with a charset
for non ASCII).
Assignee: caillon → nhotta
Status: REOPENED → NEW
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2beta
Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

caillon, doesn't this patch conflict with your patch to bug 164006?
Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

>Index: bindings/tabbrowser.xml
>===================================================================

Yes, as Neil points out you will have merge conflicts with the patch I checked
in for bug 164006.

>               title = titleViaGetter
>             else if (browser.currentURI.spec && browser.currentURI.spec != "about:blank") {
>               title = browser.currentURI.spec;
>+              try {
>+                title = unescape(title);
>+              } catch (e) {

Erm, how will this code within this catch ever be executed?  Unless I am
reading ECMA-262 Level 3 incorrectly, unescape should never throw any
exceptions or errors... Do we implement something different?

Deferring review of this patch pending an answer for the above question.
>Unless I am
>reading ECMA-262 Level 3 incorrectly, unescape should never throw any
>exceptions or errors...

If that is the case then I can change it to always unescape with the document
charset instead.

unescape will certainly throw an exception in our implementation if the result,
after unescaping is not valid "something" (dunno whether "something" is ASCII or
UTF8 or ISO-8859-1 or what; I did not experiment).

I suspect that the ECMA definition is just plain not compatible with existing
real-world practice, based on a brief skim.
Phil, any insight on this? (see comment 15, comment 17).  If we do go against
the spec, will we keep it that way and try and change the spec?  Or is that a
bug in our impl (that won't get wontfixed due to compat)?  I suppose the former,
and that's fine, I just need to know so I can feel confident about reviewing
this patch :-)
From what I know, the escape() function of JS Engine is 
superseded by the escape() function of the DOM, and I don't
know if the DOM function ever generates exceptions -

This issue came up, for example, in bug 58813,
"escaped string throws JavaScript error"

That makes it sound as if Boris is right -
Oops - my comment treated escape(), when you were asking about 
unescape(). But I think the same principle applies: the function
that exists in the JavaScript Engine (subject to ECMA-262 Ed. 3)
is superseded in the browser by the DOM version of the function.

I'm not as familiar with the DOM escape() and unescape() functions
as a developer would be, but I suspect both can generate exceptions -
Comment on attachment 98741 [details] [diff] [review]
patch from bug 167771, added unescape the title, with a charset if non ASCII.

>Index: bindings/tabbrowser.xml
>===================================================================
>RCS file: /cvsroot/mozilla/xpfe/global/resources/content/bindings/tabbrowser.xml,v
>retrieving revision 1.60
>diff -u -r1.60 tabbrowser.xml
>--- bindings/tabbrowser.xml	10 Sep 2002 08:11:53 -0000	1.60
>+++ bindings/tabbrowser.xml	10 Sep 2002 22:35:49 -0000
>@@ -553,6 +553,19 @@
>               title = titleViaGetter
>             else if (browser.currentURI.spec && browser.currentURI.spec != "about:blank") {
>               title = browser.currentURI.spec;
>+              try {
>+                title = unescape(title);
>+              } catch (e) {

The code right above this uses ex as the exception variable (which you can't
see because you are diffing against an older revision.	Please update and use
the same name.

>+                try {
>+                  // the URI might be non ASCII

"non-ASCII"

>+                  // try unescape again with a characterSet

"Try to unescape again, with a character set this time"

>+                  var textToSubURI = Components.classes["@mozilla.org/intl/texttosuburi;1"]

make textToSubURI a const?  It will never change...

>+                                               .getService(Components.interfaces.nsITextToSubURI);
>+                  title = textToSubURI.unEscapeURIForUI(browser.contentDocument.characterSet, 

I would rather you use the .originCharset of one of the nsIURIs we have. 
browser.currentURI or better, its fixed up equivalent will work nicely here. 
The reason for this is that browser.contentDocument.characterSet is _not_
guaranteed to be the actual characterSet of the document.  A script can easily
mess with that and make it be something that you aren't expecting.

>+                                                        title);


Additionally, I have an issue with the whole thing about unescaping.  The point
of having URLs in the tab title is so you can see the URL...  If a URL of a
site is 'http://foo/bar%20baz/' then I want to see 'http://foo/bar%20baz' and
not 'http://foo/bar baz/'  I don't mind if you want to make it so that DBCS
displays, but I'm not sure I like the whole idea of unconditionally
unescaping... I'd like to hear what jag says about this.  Nonetheless, this
patch needs some work as it stands, and I'd like to see the next patch diffed
against the current revision.  (Sorry for the conflicts)
Attachment #98741 - Flags: needs-work+
By the way, thanks Phil for clarifying that up.  I didn't realize there was a
DOM version of the function as well.  It all makes sense now and I bow to your
knowledge of the specs.  :-)
Let's see... We want a URI to display in its URI form (escaped, with all its
warts), and we also want filenames to be "human readable", which is achieved by
unescaping the filename part. We do this unconditionally for "File -> Save Page
As..." because we're always dealing with a file there. In the case of tab
titles, sometimes it's a file, sometimes it's not. Is the happy golden road here
that we look at the protocol, and if it's file, we unescape, if not, we present
it as-is?

This has the drawback that a file can have two display forms, the raw URI text
when loaded from a server, the unescaped one when loading from disk. Is this
acceptable?

Perhaps we should display 'http://foo/bar%20baz/' as 'http://foo/bar baz/'?
After all, it's only a title, a reminder of what's to be found in the tab, not
the real URI. It would certainly make 'http://foopy.com/Hello World.html' more
readable.
Jag, whichever you decide, is fine by me.  I wanted mainly for you to be aware
of this and to have a say in it.  My argument is that by displaying
http://foo/bar in the first place (escaped or not) it appears to the user as a
URL.  Would a user get confused and attempt to type the URL as it appears on the
tab title?  Unlikely, but it's possible.  Is it a big deal?  I don't know.

The current behavior has the potential of confusing users who have several tabs
open.  Consider 'http://foo/page%202.html' (page 2.html) and
'http://foo/page202.html' (page202.html).  With multiple tabs and cropping the
current behavior may have the user see 'http://foo...202.html' for both pages.

I _do_ think that if we decide to unescape, we should try and make the title not
look like a URL if possible.

Something like 'http://foo/ - page 2.html' vs. 'http://foo/ - page202.html'

The inner stuff will get cropped off anyway on the tab title, and it may make
the tooltip more useful.  Get the important information at the quick glance
you're afforded with tooltips (they dissapear after a certain timeout), and if
you want the real URI, click the tab and get it from the URL bar....
>I would rather you use the .originCharset of one of the nsIURIs we have. 

I am rewriting the patch, but using "browser.currentURI.originCharset" does not
work because it is UTF-8 instead of a charset of the document. Can I use
"browser.contentDocument.characterSet"?
> Can I use "browser.contentDocument.characterSet"?

No.  If you have to, for now use:

  Components.lookupMethod(browser.contentDocument, 'charset')
            .call(browser.contentDocument);

A site can easily change the characterSet with a single JS statement.  I'm
writing wrappers for the above statement, but in the meantime, it will do.
>A site can easily change the characterSet with a single JS statement.  
I am not sure if that happens frequently. When do you think that could happen?
>I am not sure if that happens frequently.

I can tell you for a fact that it does not happen frequently.

>When do you think that could happen?

Whenever an author wants to do it.  But that's the point - that it CAN happen. 
IF the user changes the characterSet to something that you are not expecting
(e.g. an array, a function, or other weird things), the behavior of your code is
then pretty much undefined.  Let us not even go there.
Right, frequency doesn't matter, it's the impact of when it happens that counts,
and if there's a simple way to work around is, as shown by caillon, then that's
the way to go. See bug 159667 for details.
I was curious why/how that is used. Changing the document charset often makes
the page not readable.
One way is to create an HTML page with <img name="characterSet">.
Attached patch Unescape non ASCII. (obsolete) — Splinter Review
Attachment #98741 - Attachment is obsolete: true
caillon@returnzero.com, please review the patch.
bzbarsky, could you review the last patch?
Comment on attachment 102033 [details] [diff] [review]
Unescape non ASCII.

Actually, thinking back on this, we have several places IIRC that want to get
at the characterSet of the document.  Jag, what do you think about adding a
<property name="characterSet"> or something like we did for contentTitle
(document.title)?
Shouldn't that code go inside the "if (browser.currentURI.spec) {" block?

I can't remember the lookupMethod syntax offhand, but it seems odd to me that
one would use lookupMethod to get a property.... Is that correct?
There are two issues raised during the review process.
* Getting the charset - the obvious "browser.contentDocument.characterSet" may not
  be correct (comment #26).
  This can be filed as a separate bug with a real example if possible.
* Generic improvment idea of getting characterSet of the document (comment #36).
  This also does not have to be a part of this bug.

I will attach a patch which uses "browser.contentDocument.characterSet".
Attachment #102033 - Attachment is obsolete: true
caillon/bzbarsky, please r/sr, thanks.
Comment on attachment 102179 [details] [diff] [review]
updated patch using "browser.contentDocument.characterSet"

Absolutely positively NO.  I already explained why this should NOT happen.  The
syntax you used is correct, bzbarsky was a little confused (with good reason)
because of the way the function was named.  Please do not post patches just
based on questions.  Questions are made to be answered, not patched. Anyway,
Components.lookupMethod is correct because it looks up the getter, which is a
method.
Attachment #102179 - Flags: needs-work+
>I already explained why this should NOT happen.
I am not sure which comment. Could you give me the comment # of your explanation?
Comment 26.  Basically, document.characterSet could be set by the site, and it
need not even be set to a string.  It could be an object that would do icky evil
things.

sr=me on attachment 102033 [details] [diff] [review] if the code is moved to the place where it is in
attachment 102179 [details] [diff] [review]
Attachment #102179 - Attachment is obsolete: true
This is the patch I think we should use.  If we have the check anywhere, we
should do this for URIs which we actually care about the URI (eg any except
about:blank) since those go a different route (string bundles).

r=caillon for this patch, since it's really nhottas.
Comment on attachment 102235 [details] [diff] [review]
Same as Naoki's patch, but with the check moved and a comment addition

r=caillon, sr=bzbarsky
according to comment #43 and #45
Attachment #102235 - Flags: superreview+
Attachment #102235 - Flags: review+
yeah, sr=me....
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
tested 0n 2002-10-09-11-trunk
Win XP Pro. JA
Still occurred.
Did I use wrong build ?
tested on 2002-10-09-11-trunk
Win XP Pro. JA
Confirmed fix for .txt file but still occur for graphical files such as .gif,
.jpg and .bmp.  
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
There is no good way to get charsets for those types of files.
Status: REOPENED → ASSIGNED
Target Milestone: mozilla1.2beta → ---
Yeah; such files will never be served with a useful charset. Do we have a way of
getting the document URI?  We could try falling back on its origin charset if
the documentCharacterSet is empty....
Yes, or we can put the fallback mechnism to the unescape funtion instead of the
caller to figure out how to fallback.
There are two cases, if local file we can fallback to the OS charset, if ftp
then fallback to the browser's default.
But for the local file case, this may be resovled once the file system switch to
use UTF-8 (bug 166735, bug 162361) since the current code already handles UTF-8 URL.
> Confirmed fix for .txt file but still occur for graphical files such as .gif,
> .jpg and .bmp.

I thought that for graphical files the title was generated by imglib?
Bug 173616 said Technically mozilla renders the image by constructing a small
html in #1.
If it's true, graphical file name in tab can be handled same as html.
>I thought that for graphical files the title was generated by imglib?

well, if you consider content/html/document/src/nsImageDocument.cpp as a part of
imagelib then yes. anyway, that file creates the synthetic html document for
images and therefore also the title.
Blocks: 173151
Product: Core → SeaMonkey
QA Contact: kasumi → tabbed-browser
Something has been fixed here a long time ago. Please file followup bugs for any remaining issues.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: