Closed Bug 162377 Opened 22 years ago Closed 22 years ago

Non-ASCII filename is not shown when saving and downloading from a ftp listing

Categories

(Core :: Internationalization, defect, P1)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: ji, Assigned: nhottanscp)

References

Details

(Keywords: intl, topembed, Whiteboard: [adt2 RTM] [ETA 08/21] [Land on 1.0 branch post MachV])

Attachments

(1 file, 5 obsolete files)

This bug is seperated from bug 155569. Fix for bug 155569 is reading the page
encoding. This works for most cases, but on a ftp listing page, to display
non-ascci filename correctly on the page, charset preference setting has to
match the sever encoding. But ftp listing page is encoded in UTF-8, this caused
the fix for bug 155569 not working for ftp listing page.
On the latest branch build, the filename is not shown on File Save dialog except
the file extension part, the user can still correct the filename on the file
save dialog.
But on some embedding applications, because there is no file save dialog as in
Netscape client, the file is downloaded directly from the browser to local
drive, so the user has no chance to correct the filename, non-ascii filename is
plainly shown as .exe or .doc in local drive after it's downloaded.

Steps to reproduce on Netscape client:
1. go to ftp://10.169.114.243
2. Go to Edit | Preference | Language, select Simplified Chinese (GB2312) as
Character Encoding, so the Chinese filenames can be viewed correctly on the
listing page.
2. Click on a doc file, Helper dialog comes up. Select "Save the File", then
File Save dialog comes up. You can see the filename is not displayed on the dialog.
Nominating.
Keywords: intl, nsbeta1, topembed
QA Contact: ruixu → kasumi
QA contact to myself for this bug.
QA Contact: kasumi → ji
The problem is that we have no way of knowing what the server's charset actually is.

See bug 145101 and bug 26767, but mainly bug 26767. Since almost no server
implements the RFC in that bug, that wouldn't help you, though. The basic
problem is that the ftp spec is ASCII only...
I think this bug is for a specific case when the server's charset and the
browser's default charset matches (so ftp listing works for non ASCII). 
Status: NEW → ASSIGNED
I talked to ftang about this. Here is the current plan.

In webshell, we get a document charset and set it as originCharset. For ftp
listing, howerver, the charset is UTF-8 and the filename is encoded in browser's
default (this assumes the list is already shown correctly). So, what we do is to
detect ftp listing case in webshell and set browser's default to originCharset
as a special case.
Attachment #95331 - Attachment is obsolete: true
Comment on attachment 95336 [details] [diff] [review]
Moved mPref check down to ftp case.

r=ftang
Attachment #95336 - Flags: review+
Comment on attachment 95336 [details] [diff] [review]
Moved mPref check down to ftp case.

This looks basically identical to the code at
http://lxr.mozilla.org/mozilla/source/netwerk/streamconv/converters/nsDirIndexP
arser.cpp#66

Why isn't that code triggering?

Oh. Oops

In
http://lxr.mozilla.org/mozilla/source/netwerk/streamconv/converters/nsIndexedTo
HTML.cpp#206, that code should be getting the encoding from
mParser::GetEncoding, instead of using utf8.

Does that change fix this bug? If so, then that would be the preferred way to
do this. I think I even had that fixed locally as part of my patches to
i18n-ify the html file directory listings which I never got to finish.

That fix is too early to pick up the hints from the protocol for the encoding,
but ftp can't set those anyway ATM.

I'll try to generate a patch this evening which you could test
Attached patch Try this (obsolete) — Splinter Review
This compiles, but I'm not set up to test it - can someone please do so?

If it does work, then this patch should be applied instead of the other one.

If it doesn't, then (to help me debug when I have to fix this later), can you
please tell me if the filename looks correct on the page, with and without this
patch? Thanks!

(As a side note, the reason that this code uses UTF-8 always is because it uses
nsITextToSubURI internally to do conversions. I think that I'm mishandling that
somewhere, but I'm not sure where...)
> please tell me if the filename looks correct on the page, with and without this
> patch? Thanks!
With the patch, the ftp list does not show the file names correctly
(ftp://10.169.114.243). The document charset is now changed to GB2312 (from the
browser's default).
Without the patch, the filename were shown correctly when the browser's default
matches with the server (e.g.  "GB2312" for ftp://10.169.114.243).

> (As a side note, the reason that this code uses UTF-8 always is because it uses
> nsITextToSubURI internally to do conversions. I think that I'm mishandling that
> somewhere, but I'm not sure where...)
> 
If you want to go with your current approach (i.e. the last patch), then how
about just apply unescape without converting to Unicode? Then the charset and
the actual content could match and the filename would be shown correctly.


>This compiles, but I'm not set up to test it - can someone please do so?
What does that mean? I think I can help you about this if that means you don't
have fonts (are you using MS Widnows?).

318 nsIndexedToHTML::FormatInputStream(nsIRequest* aRequest, nsISupports
*aContext, const nsAString &aBuffer) 
319 {
320     nsresult rv = NS_OK;
321     NS_ConvertUCS2toUTF8 buffer(aBuffer);
322     nsCOMPtr<nsIInputStream> inputData;
323     rv = NS_NewCStringInputStream(getter_AddRefs(inputData), buffer);

This part (line 321) generates HTML in UTF-8. This has to change also if we
label the page as parser's charset instead of UTF-8. We can hook up charset
converter here.
Attachment #95336 - Attachment is obsolete: true
Attachment #95391 - Attachment is obsolete: true
Bradley Baetz  , not sure your approach will work
the reason now the generated xml is in utf8 is because we are mixing data from
both the string bundle (which contains the localized message from the cleint)
and the data from the ftp protocol. The only way we can make both work is to
generate encoding in utf-8, we get the string bundle (which is utf8) and
generated it AS IS to the output, and get the data from the ftp, convert it from
the default charset to the utf-8 and generate it. IF you change the xml encoding
to something else, then the data from string bundle may not convert into it
correctly. For example, if the build is localized to japanese build and the
default charset is set to korean, then the string bundle is japanese in utf-8,
we then convert the ftp data from korean euc-kr to utf-8 and generate it. IF you
change the the xml encoding to euc-kr, then the utf-8 string bundle wiill show
up as garbage if you generate without convert, if you generate with convert, the
japanese text will be convert to ? mark sicne some of the text cannot be encoded
in euc-kr. Only the utf-8 or other unicode can convert everything. 
ok, I take my comment back. after talk to nhott, he said we can generate ncr
instead for the string bundle data. I agree with it. so he will work out a patch
which include that part. 
No, I'm running linux, and I don't know what ite meant to look like anyway...

In that case, use your patch, and I'll tidy it up when I have more time to make
the file stuff work (in a few months...)

My approach doesn't work, but I thought I was converting it to utf8. I've just
noticed that the texttosuburi thing uses ucs2, so maybe that was the problem.
Attachment #95452 - Attachment is obsolete: true
Comment on attachment 95622 [details] [diff] [review]
Added code to convert string bundle to NCR.

r=ftang
Attachment #95622 - Flags: review+
Blocks: 154896
cc to rick
Priority: -- → P1
Whiteboard: [adt2 RTM] [ETA 08/21] [Land on 1.0 branch post MachV]
Target Milestone: --- → mozilla1.0.1
Comment on attachment 95622 [details] [diff] [review]
Added code to convert string bundle to NCR.

- In nsIndexedToHTML.cpp:

+static void ConvertNonAsciiToNCR(const nsAString& in, nsAFlatString& out)
+{
+  nsAString::const_iterator start, end;

Any reason for |in| to not be a nsAFlatString as well? If you can make that
change, then change start and end to be nsAFlatString::const_iterator's too.

...
+  while (start != end) {
+    if (*start < 128) {
+      out.Append(*start++);
+      continue;
+    }
+    out.Append(NS_LITERAL_STRING("&#x"));
+    nsAutoString hex;
+    hex.AppendInt(*start++, 16);
+    out.Append(hex);
+    out.Append((PRUnichar)';');
+  }

Please loose the |continue| and move the code below the if statement into an
else statement.

- In nsIndexedToHTML::FormatInputStream():

+	   if (NS_SUCCEEDED(rv))
+	   rv =
mUnicodeEncoder->SetOutputErrorBehavior(nsIUnicodeEncoder::kOnError_Replace, 
+							nsnull,
(PRUnichar)'?');

Indent the rv = ... and please put braces around the if, even if it's just a
one-line statement.

sr=jst
Attachment #95622 - Flags: superreview+
Attached patch new patchSplinter Review
Attachment #95622 - Attachment is obsolete: true
> Any reason for |in| to not be a nsAFlatString as well? If you can make that
> change, then change start and end to be nsAFlatString::const_iterator's too.
I thought nsAString is more universal than nsAFlatString, and it should not make
any difference here, right?
> Please loose the |continue| and move the code below the if statement into an
> else statement.
Done.

> Indent the rv = ... and please put braces around the if, even if it's just a
> one-line statement.
Done.
Comment on attachment 95954 [details] [diff] [review]
new patch

carry over r/sr
Attachment #95954 - Flags: superreview+
Attachment #95954 - Flags: review+
fix checked into trunk.
Resolving as fixed per Comment #24 From Shanjian Li.

ji: pls verify this as fixed on the branch. thanks!
mark this bug as fixed since shanjian land into trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
ji: can you pls verify this as fixed on the trunk? thanks!
I have some questions, with 08/20 trunk, I go to ftp://10.169.114.243
then set global default to GB2312 via Edit | Preference | Language.
On the ftp listing, clicking on .exe and .doc file brings up helper dialog,
Chinese filename can be shown correctly if i choose to save the file. But for
.txt and .html file, if I Shift click the file to save the file instead of
opening the file, the Chinese html/txt filename name is always shown as "File"
on Save as dialog. 
Shanjian/Frank, is it by design or a limitation?
If that's the way for html/txt file, I'll close this bug as fixed. Thanks.
Right click on the file and select Save Link Target As menu has the same
problem. It looks like we still have problems with html/text files which don't
need to go through helper.
Filed bug 163682 for the problem with html/text file download.
Verified this one as fixed.
Status: RESOLVED → VERIFIED
Did more testing, actually download via Save Link Target As menu has problems
with all different file types: .exe/.doc/.html/.txt/.jpg/.bmp/.gif
QA contact to Rui for the verification on the branch build. Thanks.
QA Contact: ji → ruixu
jaimejr, when should we land this into the m1.0 branch? how many approval we
need to got post machv?
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending
Drivers' approval. pls check this in asap, the replace "Mozilla1.0.1+" with
"fixed1.0.1". [Using adt1.0.1 keywords as a proxy, since no edt1.0.1 keywords
were created for the 1.0.1 branch. This is needed on the 1.0 branch for a major
embedor]. thanks! 
Keywords: adt1.0.1+, approval
Naoki, please make sure to include patch for bug 164197 when you check in this
patch to branch. 
shanjian, do you mean that the patch of this bug has to merge with the patch of
bug 164197?
Kind of. In fact, bug 164197 fixed 3 problems, 2 caused by this patch, the one
caused by another one of the 4 ftp patch. I think you will checkin all  the 4
ftp patches, so be sure to apply patch from 164197 before you do so.
You need to use Substring instead of nsDependentCString to wrap a chat* which
doesn't end in NULL. Just grab the netwerek/ftp/srv part of that patch to fix
that, although taking the other part won't hurt; it fixes what could be a
significant leak if you do lots of ftp browsing.
Comment on attachment 95954 [details] [diff] [review]
new patch

a=chofmann for 1.0.2
Attachment #95954 - Flags: approval+
patch checked into 1.02 branch.
mark it as fixed1.0.2
Keywords: fixed1.0.2
verified as fixed with 2002-08-28-08-1.0 build.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: