Closed Bug 199237 Opened 17 years ago Closed 17 years ago

non-textual docs(image/media) opened in a new win/tab have url-escaped names in title/tab

Categories

(Core Graveyard :: File Handling, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(3 files, 4 obsolete files)

This is the latest in a series of related bugs (bug 162765,
bug 198598, bug 158006, and bug 163998). In bug 163998, we took care
of  the window title bar and tab label when an image is
opened in the current window or tab. bug 198598 does the
same for media documents (plugin documents, non-image,
non-textual documents) while bug 162765 is about suggested
filename  when C-D header is present in http header. Bug
158006 is also abuot suggested filename. Bug 158006 takes
a different codepath from bug 162765 so that it's filed
as a separate bug but depends on bug 162765. 

Now this bug is about the window title and the tab label
of a non-textual (image/media) document when it's opened in
a new window or a new tab. 

A simple test case:

Go to http://jshin.net/moztest/download.html

Right-lick on the first image/jpeg link (followed by 
the comment 'EUC-KR filename') and select  either 'open in a new window' 
or 'open in a new tab'. The window title bar and the tab label
comes up with the url-escaped file name. When the link is just clicked,
it comes up with the non-escaped filename thanks to the patch
for bug 163998 (althought without bug 198598 being fixed, 
the suggested filename for saving is not what's expected
when it's in its own browser pane). 

At first, I thought this is an easy fix and looked into
xpfe/global/resources/content/bindings/tabbrowser.xml
thinking that I would add a couple of lines invoking
|textToSubURI|. It turned out that it already did the right
thing. Then, why isn't it working? That's because stand-alone
non-textual documents (imaeg/plugin) are always given
ISO-8859-1 as their 'charset'(nsDocument ctro
sets it to ISO-8859-1 by default). This means,
the title bar and the tab label would come up
correctly when image/plugin title/linknames are in
ISO-8859-1. In all other cases, they're interpreted
as ISO-8859-1 and come up mangled.

In my patch to bug 198598, I reset it to the charset of the referring
document(originCharset) and it works well for the case
when they're opened in the current window because originCharset is that 
of the referring document. However, this does NOT work when they're opened
in a new window/tab because in that case originCharset is set to UTF-8. 

Therefore, the fix is to *set* originCharset to that of the referring 
document for non-textual documents (when available) instead of leaving 
them unset or UTF-8. This would have  a performance implication (small?)
because Mozilla has to do more work than before. It'd be
nice to know if there's a better way to handle this.
 
BTW, fixing this would also solve the third case I mentioned
in bug 198598 comment #4 (right-clicking on an image in an html doc
selecting 'view image' and trying to save it in a stand-alone
image doc pane). 

Another btw, I'm filing this under 'file handling', but I'll
move it to i18n if people think that's better.
I tried to solve this bug by passing 'originCharset' from JS (
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resouces/content/contentAreaUtils.js)
to C++ code, but it turned out that |openNewWindowWith| already does by
charsetArg.(|openNewTabWith| does not) However, charset passed in charsetArg
gets 'lost' (it's acted upon in |OpenWindowJS| but with no effect) in the
process of creating a new window (chrome: uri) and putting a media(image,plugin)
doc in the window. nsDocument ctor init'd mChararacterSet to iso-8859-1 and
further down the road, there seems to be no way to retrieve the 'charset' of a
document from which the request to open a new window/tab and put up
an image/media doc in it originated.  

When a new window/tab is opened up to put up an image/media doc,
two nsIDocument's are created, one for 'chrome://....xul' (for
a new window/tab) and the other for a synthetic html doc
with the image/media doc in it. In the process, baseURI for 
the URL to the image/media doc becomes 'chrome://....xul'
instead of the URL of the originating document and originCharset
is set to ISO-8859-1. Why not UTF-8 but ISO-8859-1? Because
a synthetic html document (container for image/media doc) has
the default charset of ISO-8859-1 and it takes the precedence
over originCharset of baseURI (chrome://....) in NS_NewURI(imageURL) 

It'd be possible if nsIURI had an  attribute
|baseURI| in addition to |originCharset| so that we could
move up the URI lineage tree (past 'chrome://...xul')
to get hold of the URI of the originating document and its associated charset.
I found that |nsIURI| had been frozen. Does it mean that a new attribute
cannot be added even though it doesn't affect the way it's used in anyway?
 
Attached patch a patch (obsolete) — Splinter Review
I toyed with two alternatives. One of them works for opening in a new tab and
opening in the same tab/window) and the other works for a new window and the
same tab/window. Because neither of them works for both cases, I came up with
another solution. Just like the current document charset is inherited by a
document opened in a new window (via JS object property. bug 27646 and bug
45187), I made a document to be opened in a new tab (somehow, textual documents
appear to do that without my patch) inherit the current doc. charset. Then by
examining the prev. doc charset (in case of opening in the same tab/window) and
the default doc. charset(when opening in a new tab/window), the media/image
doc. charset is set to that of the referrer.  

Just in case, I haven't yet removed a method using 'originCharset' in
nsMediaDocument::UpdateTitleAndCharset, which I think is redundant.
I added this to nsIWebNavigation instead of replacing loadURI() just to see how
it works without touching every file loadURI is invoked (there are not that
many). Eventually, loadURIwithHintCharset has to replace loadURI because they're
almost identical except for a new param. hintCharset.

+  void loadURIwithHintCharset(in wstring uri,
+                              in unsigned long loadFlags,
+                              in nsIURI referrer,
+                              in string hintCharset,
+                              in nsIInputStream postData,
+                              in nsIInputStream headers);

BTW, I also have a patch (not uploaded because it's not necessary to fix this
bug although it was used in one of two alternatives that only worked for opening
in a new tab) that modifies nsIURIFixup::createFixupURI so that it accepts
originCharset as a param to set originCharset of nsIURI. 
ccing adamlock, since this touches nsIWebNavigation.
Comment on attachment 126696 [details] [diff] [review]
a patch

+    if (aHintCharset && *aHintCharset) {
+	 if (NS_SUCCEEDED(SetCharset(aHintCharset))) 
+#ifdef DEBUG_jungshik
+	     fprintf(stderr, "charset set to %s\n", aHintCharset);
+#endif
+    }

why not put the #ifdef around the outer if? That will also make this code
compile if DEBUG_jungshik is not defined.

+//	     document.characterSet = aCharset;
please don't add commented out lines
Attached patch a cleaned-up patch (obsolete) — Splinter Review
I tried to do without introducing a new method loadURIwithCharset to
nsIWebNavigator, but setting charset in browser.xml didn't work (most
properties of 'browser' in browser.xml
(http://lxr.mozilla.org/seamonkey/source/xpfe/global/resources/content/bindings/browser.xml)

turn out to be read-only. 

100+ references to loadURI are found and I thought it'd be better just to go
with a new interface instead of fixing all those referenecs to pass 'null' for
charset parameter. We may do tree-wide sweep later if an extra function-call
overhead is too expensive in so many places.
Attachment #126696 - Attachment is obsolete: true
Comment on attachment 126982 [details] [diff] [review]
a cleaned-up patch

As this is a sequel to bug 198598, I'm asking Christian and Boris for r/sr. 

As for nsWebNav. change, I also like Adam to review.

Thanks
Attachment #126982 - Flags: superreview?(bzbarsky)
Attachment #126982 - Flags: review?(cbiesinger)
Comment on attachment 126982 [details] [diff] [review]
a cleaned-up patch

in docshell/base/nsIWebNavigation.idl:
maybe you should document the |headers| parameter as well

(for both loadURI and loadURIwithCharset)

+  docShell->GetContentViewer(getter_AddRefs(cv));
are you sure that docshell is non-null here?

+  if ((charset.IsEmpty() || charset.Equals("UTF-8")) && muCV) {

charset will always be empty here. you never assigned anything to it.

marking review- therefore. (and clearing the sr request)
Attachment #126982 - Flags: superreview?(bzbarsky)
Attachment #126982 - Flags: review?(cbiesinger)
Attachment #126982 - Flags: review-
Frankly, until Adam OKs whatever changes you are making to nsIWebNavigation,
don't bother asking me for sr....

Also, you can probably do what you want without any interface changes by going
through nsIMarkupDocumentViewer (which I think you should be able to
getInterface() after QIing the nsIWebNavigation to nsIInterfaceRequestor...) 
Not sure that's a better way of doing things, though.
Comment on attachment 126982 [details] [diff] [review]
a cleaned-up patch

Is it necessary to add a new LoadURIWithCharset method or could it be replaced
by asking the nsIDocShell for its nsIDocumentCharsetInfo and setting the forced
charset property before calling the normal loadURI?

I would much rather that if possible than to add this new method.
I'll try what bz suggested in browser.xml. It's also my desire to avoid adding a
new method if there's only a single consumer [1] (that's why I tried a few
different ways in browser.xml - not uploaded - before resorting back to adding a
method).  As for Adam's suggestion, I guess it's along the same line as bz's
except that it has to be done in bz's way (see comment #6) in browser.xml


re comment #8
> +  docShell->GetContentViewer(getter_AddRefs(cv));
> are you sure that docshell is non-null here?

  Yeah, I'm unless we change the caller later. That part of the code is copied
from nsHTMLDocument.cpp. Anyway, I'll make it bullet-proof by checking docShell.

> +  if ((charset.IsEmpty() || charset.Equals("UTF-8")) && muCV) {
> charset will always be empty here. you never assigned anything to it.

  Thanks for catching it. I used to assign something to charset in my various
experiments. I forgot to clean it up after removing those lines. 

[1] There may be some more potential consumers, but I haven't searched for them.
Assignee: law → jshin
Adam's suggestion saved me a day. No matter what, I couldn't touch muDV in
browser.xml (getting rid of 'readonly'	attrib. might work, but I have little
idea of its potential side-effect)  but changing documentCharsetInfo worked.
With that, I don't have to change nsIWebNav. at all. 

I also addressed Christian's concerns.
Attachment #126982 - Attachment is obsolete: true
Comment on attachment 127064 [details] [diff] [review]
a new patch with no change in nsIWebNav.

asking r/sr with thanks for Adam. 

I've just removed four commented-out lines in contentAreaUtils.js as well one
empty line in nsMediaDocument.cpp.
Attachment #127064 - Flags: superreview?(bzbarsky)
Attachment #127064 - Flags: review?(cbiesinger)
I'm not familiar with xul enough to be sure which is better. Perhaps being a
property with 'onget', atomService would be init'd just once on demand and stay
there (like global variable?) so this one is better. Please, r/sr whichever you
like.
Comment on attachment 127064 [details] [diff] [review]
a new patch with no change in nsIWebNav.

Here and in attachment 127065 [details] [diff] [review], we have to use parentCharset instead of
forcedCharset.	Otherwise, setting the latter in dcInfo would _force_  a text/*
documents opened in a new tab to be in that charset. 


>Index: content/html/document/src/nsMediaDocument.cpp

>+    docShell->GetDocumentCharsetInfo(getter_AddRefs(dcInfo));
>+    if (dcInfo) {
>+      nsCOMPtr<nsIAtom> csAtom;
>+      dcInfo->GetForcedCharset(getter_AddRefs(csAtom));

	dcInfo->GetParentCharset(getter_AddRefs(csAtom));

>Index: xpfe/global/resources/content/bindings/browser.xml

>+              getService(Components.interfaces.nsIAtomService);
>+              if (atomService) 
>+                this.documentCharsetInfo.forcedCharset = 
>+                  atomService.getAtom(aCharset);

		  this.documentCharsetInfo.parentCharset = 
		    atomService.getAtom(aCharset);

When an EUC-JP document referred to in an EUC-KR document
(http://i18nl10n.com/dl.html)
 is opened in a new tab (contextual 'opne in a new tab' menu), the result is
that charset obtained from meta tag, charset detection, http channel and other
sources (EUC-JP) is ignored and EUC-KR is used, instead.
Comment on attachment 127065 [details] [diff] [review]
alternative with atomService made a property

Please use a bigger context setting than the default 3 lines, and use the -p
switch to cvs diff to make it annotate the changes with the name of the
function they are in -- this diff is very difficult to read.

>Index: content/html/document/src/nsMediaDocument.cpp
>+#include "nsIParser.h" // kCharsetFrom* macro definition

I can't say that I like this.  If those are used independently of the parser,
they should not be in nsIParser.

>+  NS_PRECONDITION(nsnull != aContainer, "No content viewer container");

Take out the "nsnull !=" part.

Are you sure this is assertion-worthy?	Your code seems to deal with a null
aContainer just fine...

>+  // We try to get charset from the 'genuine' (as opposed to an intervening
>+  // 'chrome') parent document  that

Weird spacing.

> because there's a fallback code in 

Why is the "a" there?

>+  // When this doc. is opened in the same window/tab  as the referring 
>+  // document is rendered, prevDocCharacterSet contains the charset of 
>+  // the referring  document while it's defaultCharacterSet of muCV that has 
>+  // the charset when it's opened in  a new window.

I can't figure out what this sentence is actually trying to say, especially the
last part.  In any case, lose the period after "doc", please.

>+  // In case of openining
>+  // in a new tab, we get charset from documentCharsetInfo. 

"the charset"

>+  // Note that we exclude UTF-8 as 'invalid' because it's likely to be 
>+  // the charset of a chrome that has nothing to do with the actual

"chrome document"?

>+  // content of which charset we're interested in.

That's not English, sorry...  "content whose charset we want to know", perhaps?


>+  // Even if it's indeed UTF-8, we

What is the antecedent of "it" here?  That's not clear....

>+  nsCOMPtr<nsIDocShell> docShell(do_QueryInterface(aContainer));
>+  if (docShell) {

Why not reverse the test and do an early return to avoid the deep nesting?

>+      if (cv) {
>+        nsCOMPtr<nsIMarkupDocumentViewer> muCV = do_QueryInterface(cv, &rv);
>+        if (NS_SUCCEEDED(rv)) {

doQueryInterface is null-safe and always returns null on error.  Unless you're
planning to return that rv, don't even write to it, and just null-test the
things that really need to be non-null (in this case, muCV).

>+          rv = muCV->GetPrevDocCharacterSet(charset);

Why bother storing rv if you don't plan to check it (except in debug code, and
even that check is semi-bogus, imo...)?

>+    // Now that the charset is set in StartDocumentDownload

There is no such function.


>+    // to charset of 

"the charset"?

>+    // the document  viewer

Weird spacing.

>+    // instead of bogus value (the value of

"a bogus value"

>+    // intl.charset.default read off in 
>+    // nsHTMLDocument::UseWeakDocTypeDefault)

That is only called in nsHTMLDocument::StartDocumentLoad; nsMediaDocument does
not call nsHTMLDocument::StartDocumentLoad.  Ergo, UseWeakDocTypeDefault is
never called by nsMediaDocument.

>+    // to the current charset. This  is essential in dealing with

Weird spacing.

>+    // a media document being opened in a new window or a new tab
>+    // in which case originCharset of URI is not reliable.

"the originCharser of the URI"

>+    if (mCharacterSetSource != kCharsetFromWeakDocTypeDefault) {  

As I said, in many cases mCharacterSetSource will be simply uninitialized
(since it's only initialized in nsHTMLDocument::StartDocumentLoad; I do not
understand why nsDocument does not initialize the member, but it does not).

>-            if (!aURI)
>+            if (!aURI) 

Don't add whitespace to ends of lines.

>               aURI = "about:blank";
>-

Don't remove that blank line either.

>+            if (aCharset && this.atomService)
>+	      this.documentCharsetInfo.forcedCharset = this.atomService.getAtom(aCharset);

This is mis-indented.  Tabs?

>+      <property name="atomService"
>+                onget="return Components.classes['@mozilla.org/atom-service;1'].getService(Components.interfaces.nsIAtomService);"
>                 readonly="true"/>

This will be evaluated on every access to the property, if I read the XBL code
correctly.  In particular, it will be evaluated _twice_ in your expression, and
you could get a non-null return inside the if condition, but a null one inside
the if body.

Did you mean to use a <field>, perhaps?  Though someone who is actually
familiar with XBL should comment on this part...
Attachment #127065 - Flags: superreview-
Attachment #127064 - Flags: superreview?(bzbarsky)
Attachment #127064 - Flags: review?(cbiesinger)
Thanks for your review. I got rid of some redundant lines, replaced 'onget -
properties' with 'field' in browser.xml (it seems to be the 'right' thing to do
as you suggested, but I'll ask) and clarified comments.

>>+  // content of which charset we're interested in.



> I do not understand why nsDocument does not initialize the member, > but it
does no
t).

 It's claimed that it does as long as operator new() is used during nsDocument
creation and it seems that nsDocument is always made via NS_New....() (that I
presume invokes operator new(). I haven't checked). Assuming that's the case, I
now check whether |mDocumentCharsetSource| is initialized or not
(kCharsetSourceUninitialized / '0') 

> That's not English, sorry...	"content whose charset we want to > know",
perhaps?

 Well, if your comment is about 'content of which [the] charset',  grammarians
wouldn't agree with you. Some (few) of them might even frown upon  'whose' used
as a possesive relative pronoun for a thing. Nonetheless, I agree that I should
have used 'whose charset' or 'the charset of which' instead of that obscure
construct.

As for 'kCharset..Source..' definitions in nsIParser.idl, it's been discussed
somewhere else and some alternatives may have been proposed (nsIParser was not
such a good choice in the first place, IMHO).
Attachment #127064 - Attachment is obsolete: true
Attachment #127065 - Attachment is obsolete: true
Comment on attachment 127349 [details] [diff] [review]
a new patch addressing bz's concerns

>Index: xpfe/global/resources/content/bindings/browser.xml
>+            if (aCharset && this.mAtomService)
>+              this.documentCharsetInfo.parentCharset = this.mAtomService.getAtom(aCharset);

I'll replace the above lines with the following: 

	      if (aCharset) {
		try {
		  this.documentCharsetInfo.parentCharset =
this.mAtomService.getAtom(aCharset);
		}
		catch(ex) {
		}
	      }

tabbrowser.xml (
http://lxr.mozilla.org/seamonkey/source/xpfe/global/resources/content/bindings/
tabbrowser.xml#115)
has an example of using <field> the way I used it for |mAtomService|. It was
added by caillon's patch (sr'd by bz :-)) for bug 164006.
Attachment #127349 - Flags: superreview?(bzbarsky)
Attachment #127349 - Flags: review?(cbiesinger)
Comment on attachment 127349 [details] [diff] [review]
a new patch addressing bz's concerns

sorry... I don't really think I can review this, I don't know this code...
Attachment #127349 - Flags: review?(cbiesinger) → review?
Comment on attachment 127349 [details] [diff] [review]
a new patch addressing bz's concerns

This looks good except for one thing (which I thought I had mentioned last
time, but it looks like I did not).  You need to leave the
RetrieveRelevantHeaders call in, since it gets information other than the
charset info...  Probably just doing it after your charset-setting code is
safe.
Attachment #127349 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 127349 [details] [diff] [review]
a new patch addressing bz's concerns

asking jag for r with two changes mentioned in comment #18 and comment #20
applied (in my tree) as he frequenty worked  on and r/sr'd patches for
browser.xml.
Attachment #127349 - Flags: review? → review?(jaggernaut)
Attachment #127349 - Flags: review?(jaggernaut) → review?(bryner)
Comment on attachment 127349 [details] [diff] [review]
a new patch addressing bz's concerns

>Index: content/html/document/src/nsMediaDocument.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/content/html/document/src/nsMediaDocument.cpp,v
>retrieving revision 1.6
>diff -U8 -p -r1.6 nsMediaDocument.cpp
>--- content/html/document/src/nsMediaDocument.cpp	17 Jun 2003 16:39:54 -0000	1.6
>+++ content/html/document/src/nsMediaDocument.cpp	9 Jul 2003 12:13:03 -0000
>+  // When this document is opened in the window/tab of the referring 
>+  // document (by a simple link-clicking), |prevDocCharacterSet| contains 
>+  // the charset of the referring document. On the other hand, if the
>+  // document is opened in a new window, it's |defaultCharacterSet| of |muCV| 

Grammatical nit:  "its"  not "it's"
>+    
>+  nsCOMPtr<nsIDocShell> docShell(do_QueryInterface(aContainer));
>+  if (!docShell) 
>+    return NS_OK;

Should |docShell| ever be null?  If not, please at least make this an
NS_ENSURE_TRUE so that there is some warning output in debug builds.

>+  if (charset.IsEmpty() || charset.Equals("UTF-8")) {
>+    nsCOMPtr<nsIContentViewer> cv;
>+    docShell->GetContentViewer(getter_AddRefs(cv));
>+    if (cv) {

Also here, it seems like we should complain if |cv| is null.

Please also include the corresponding firebird (mozilla/toolkit) changes for
your xpfe/ changes.  It should apply with minimal changes.

r=bryner with those addressed.
Attachment #127349 - Flags: review?(bryner) → review+
attachment 127349 [details] [diff] [review] was checked in to the trunk with bryner's concern addressed.
(btw, I used "it's" for "it is" (not the possessive form of it) :-)) I moved up
'Retrie..Header' because we still return NS_OK when docShell or cv is null. 

I tried to check in this attachment but was not allowed. bryner, can you check
it on my behalf? In the meantime, I'll ask for the permission because there are
a few other fixes I landed recently in xpfe/ that should have been committed to
toolkit/ and browser/ for firebird as well.
Comment on attachment 128562 [details] [diff] [review]
patch for firebird

I checked this in.
Bryan, thanks for landing my patch for firebird. Marking as fixed
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
I just found I had missed this in attachment 128562 [details] [diff] [review]. The corresponding change
to xpfe/ was included in attachment 127349 [details] [diff] [review] (for xpfe/), but somehow I missed it
in the patch for firebird.

bryner, if you happen to read this,  can you give me a quick review (including
a for fb 0.8 branch) so that I can land it.

Thanks.

BTW, I don't know why loadURIWithFlags is called with two 'null's in the
current code. There's no loadURIWithFlags that accepts 5 parameters. I added
one parameter to make it 4 in this bug, but that call site (invoking it with 5
parameters) has been there since day 1 of tab browser. Anyway,
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.