If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Non-ASCII filename is not shown when saving and downloading from a ftp listing

VERIFIED FIXED in mozilla1.0.1

Status

()

Core
Internationalization
P1
major
VERIFIED FIXED
15 years ago
15 years ago

People

(Reporter: ji, Assigned: nhottanscp)

Tracking

({intl, topembed})

Trunk
mozilla1.0.1
x86
Windows XP
intl, topembed
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt2 RTM] [ETA 08/21] [Land on 1.0 branch post MachV])

Attachments

(1 attachment, 5 obsolete attachments)

(Reporter)

Description

15 years ago
This bug is seperated from bug 155569. Fix for bug 155569 is reading the page
encoding. This works for most cases, but on a ftp listing page, to display
non-ascci filename correctly on the page, charset preference setting has to
match the sever encoding. But ftp listing page is encoded in UTF-8, this caused
the fix for bug 155569 not working for ftp listing page.
On the latest branch build, the filename is not shown on File Save dialog except
the file extension part, the user can still correct the filename on the file
save dialog.
But on some embedding applications, because there is no file save dialog as in
Netscape client, the file is downloaded directly from the browser to local
drive, so the user has no chance to correct the filename, non-ascii filename is
plainly shown as .exe or .doc in local drive after it's downloaded.

Steps to reproduce on Netscape client:
1. go to ftp://10.169.114.243
2. Go to Edit | Preference | Language, select Simplified Chinese (GB2312) as
Character Encoding, so the Chinese filenames can be viewed correctly on the
listing page.
2. Click on a doc file, Helper dialog comes up. Select "Save the File", then
File Save dialog comes up. You can see the filename is not displayed on the dialog.
(Reporter)

Comment 1

15 years ago
Nominating.
Keywords: intl, nsbeta1, topembed

Updated

15 years ago
QA Contact: ruixu → kasumi
(Reporter)

Comment 2

15 years ago
QA contact to myself for this bug.
QA Contact: kasumi → ji
The problem is that we have no way of knowing what the server's charset actually is.

See bug 145101 and bug 26767, but mainly bug 26767. Since almost no server
implements the RFC in that bug, that wouldn't help you, though. The basic
problem is that the ftp spec is ASCII only...
(Assignee)

Comment 4

15 years ago
I think this bug is for a specific case when the server's charset and the
browser's default charset matches (so ftp listing works for non ASCII). 
Status: NEW → ASSIGNED
(Assignee)

Comment 5

15 years ago
I talked to ftang about this. Here is the current plan.

In webshell, we get a document charset and set it as originCharset. For ftp
listing, howerver, the charset is UTF-8 and the filename is encoded in browser's
default (this assumes the list is already shown correctly). So, what we do is to
detect ftp listing case in webshell and set browser's default to originCharset
as a special case.
(Assignee)

Comment 6

15 years ago
Created attachment 95331 [details] [diff] [review]
Adding a special case for ftp listing (use browser's default as originCharset)
(Assignee)

Comment 7

15 years ago
Created attachment 95336 [details] [diff] [review]
Moved mPref check down to ftp case.
Attachment #95331 - Attachment is obsolete: true

Comment 8

15 years ago
Comment on attachment 95336 [details] [diff] [review]
Moved mPref check down to ftp case.

r=ftang
Attachment #95336 - Flags: review+
Comment on attachment 95336 [details] [diff] [review]
Moved mPref check down to ftp case.

This looks basically identical to the code at
http://lxr.mozilla.org/mozilla/source/netwerk/streamconv/converters/nsDirIndexP
arser.cpp#66

Why isn't that code triggering?

Oh. Oops

In
http://lxr.mozilla.org/mozilla/source/netwerk/streamconv/converters/nsIndexedTo
HTML.cpp#206, that code should be getting the encoding from
mParser::GetEncoding, instead of using utf8.

Does that change fix this bug? If so, then that would be the preferred way to
do this. I think I even had that fixed locally as part of my patches to
i18n-ify the html file directory listings which I never got to finish.

That fix is too early to pick up the hints from the protocol for the encoding,
but ftp can't set those anyway ATM.

I'll try to generate a patch this evening which you could test
Created attachment 95391 [details] [diff] [review]
Try this

This compiles, but I'm not set up to test it - can someone please do so?

If it does work, then this patch should be applied instead of the other one.

If it doesn't, then (to help me debug when I have to fix this later), can you
please tell me if the filename looks correct on the page, with and without this
patch? Thanks!

(As a side note, the reason that this code uses UTF-8 always is because it uses
nsITextToSubURI internally to do conversions. I think that I'm mishandling that
somewhere, but I'm not sure where...)
(Assignee)

Comment 11

15 years ago
> please tell me if the filename looks correct on the page, with and without this
> patch? Thanks!
With the patch, the ftp list does not show the file names correctly
(ftp://10.169.114.243). The document charset is now changed to GB2312 (from the
browser's default).
Without the patch, the filename were shown correctly when the browser's default
matches with the server (e.g.  "GB2312" for ftp://10.169.114.243).

> (As a side note, the reason that this code uses UTF-8 always is because it uses
> nsITextToSubURI internally to do conversions. I think that I'm mishandling that
> somewhere, but I'm not sure where...)
> 
If you want to go with your current approach (i.e. the last patch), then how
about just apply unescape without converting to Unicode? Then the charset and
the actual content could match and the filename would be shown correctly.


>This compiles, but I'm not set up to test it - can someone please do so?
What does that mean? I think I can help you about this if that means you don't
have fonts (are you using MS Widnows?).

(Assignee)

Comment 12

15 years ago
318 nsIndexedToHTML::FormatInputStream(nsIRequest* aRequest, nsISupports
*aContext, const nsAString &aBuffer) 
319 {
320     nsresult rv = NS_OK;
321     NS_ConvertUCS2toUTF8 buffer(aBuffer);
322     nsCOMPtr<nsIInputStream> inputData;
323     rv = NS_NewCStringInputStream(getter_AddRefs(inputData), buffer);

This part (line 321) generates HTML in UTF-8. This has to change also if we
label the page as parser's charset instead of UTF-8. We can hook up charset
converter here.
(Assignee)

Comment 13

15 years ago
Created attachment 95452 [details] [diff] [review]
Hooked up uconv to convert generated HTML to a charset, combined the patch of bbaetz.
Attachment #95336 - Attachment is obsolete: true
Attachment #95391 - Attachment is obsolete: true

Comment 14

15 years ago
Bradley Baetz  , not sure your approach will work
the reason now the generated xml is in utf8 is because we are mixing data from
both the string bundle (which contains the localized message from the cleint)
and the data from the ftp protocol. The only way we can make both work is to
generate encoding in utf-8, we get the string bundle (which is utf8) and
generated it AS IS to the output, and get the data from the ftp, convert it from
the default charset to the utf-8 and generate it. IF you change the xml encoding
to something else, then the data from string bundle may not convert into it
correctly. For example, if the build is localized to japanese build and the
default charset is set to korean, then the string bundle is japanese in utf-8,
we then convert the ftp data from korean euc-kr to utf-8 and generate it. IF you
change the the xml encoding to euc-kr, then the utf-8 string bundle wiill show
up as garbage if you generate without convert, if you generate with convert, the
japanese text will be convert to ? mark sicne some of the text cannot be encoded
in euc-kr. Only the utf-8 or other unicode can convert everything. 

Comment 15

15 years ago
ok, I take my comment back. after talk to nhott, he said we can generate ncr
instead for the string bundle data. I agree with it. so he will work out a patch
which include that part. 
No, I'm running linux, and I don't know what ite meant to look like anyway...

In that case, use your patch, and I'll tidy it up when I have more time to make
the file stuff work (in a few months...)

My approach doesn't work, but I thought I was converting it to utf8. I've just
noticed that the texttosuburi thing uses ucs2, so maybe that was the problem.
(Assignee)

Comment 17

15 years ago
Created attachment 95622 [details] [diff] [review]
Added code to convert string bundle to NCR.
Attachment #95452 - Attachment is obsolete: true

Comment 18

15 years ago
Comment on attachment 95622 [details] [diff] [review]
Added code to convert string bundle to NCR.

r=ftang
Attachment #95622 - Flags: review+

Updated

15 years ago
Blocks: 154896

Comment 19

15 years ago
cc to rick

Updated

15 years ago
Keywords: nsbeta1 → mozilla1.0.1, nsbeta1+
Priority: -- → P1
Whiteboard: [adt2 RTM] [ETA 08/21] [Land on 1.0 branch post MachV]
Target Milestone: --- → mozilla1.0.1
Comment on attachment 95622 [details] [diff] [review]
Added code to convert string bundle to NCR.

- In nsIndexedToHTML.cpp:

+static void ConvertNonAsciiToNCR(const nsAString& in, nsAFlatString& out)
+{
+  nsAString::const_iterator start, end;

Any reason for |in| to not be a nsAFlatString as well? If you can make that
change, then change start and end to be nsAFlatString::const_iterator's too.

...
+  while (start != end) {
+    if (*start < 128) {
+      out.Append(*start++);
+      continue;
+    }
+    out.Append(NS_LITERAL_STRING("&#x"));
+    nsAutoString hex;
+    hex.AppendInt(*start++, 16);
+    out.Append(hex);
+    out.Append((PRUnichar)';');
+  }

Please loose the |continue| and move the code below the if statement into an
else statement.

- In nsIndexedToHTML::FormatInputStream():

+	   if (NS_SUCCEEDED(rv))
+	   rv =
mUnicodeEncoder->SetOutputErrorBehavior(nsIUnicodeEncoder::kOnError_Replace, 
+							nsnull,
(PRUnichar)'?');

Indent the rv = ... and please put braces around the if, even if it's just a
one-line statement.

sr=jst
Attachment #95622 - Flags: superreview+

Comment 21

15 years ago
Created attachment 95954 [details] [diff] [review]
new patch
Attachment #95622 - Attachment is obsolete: true

Comment 22

15 years ago
> Any reason for |in| to not be a nsAFlatString as well? If you can make that
> change, then change start and end to be nsAFlatString::const_iterator's too.
I thought nsAString is more universal than nsAFlatString, and it should not make
any difference here, right?
> Please loose the |continue| and move the code below the if statement into an
> else statement.
Done.

> Indent the rv = ... and please put braces around the if, even if it's just a
> one-line statement.
Done.

Comment 23

15 years ago
Comment on attachment 95954 [details] [diff] [review]
new patch

carry over r/sr
Attachment #95954 - Flags: superreview+
Attachment #95954 - Flags: review+

Comment 24

15 years ago
fix checked into trunk.

Comment 25

15 years ago
Resolving as fixed per Comment #24 From Shanjian Li.

ji: pls verify this as fixed on the branch. thanks!

Comment 26

15 years ago
mark this bug as fixed since shanjian land into trunk
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED

Comment 27

15 years ago
ji: can you pls verify this as fixed on the trunk? thanks!
(Reporter)

Comment 28

15 years ago
I have some questions, with 08/20 trunk, I go to ftp://10.169.114.243
then set global default to GB2312 via Edit | Preference | Language.
On the ftp listing, clicking on .exe and .doc file brings up helper dialog,
Chinese filename can be shown correctly if i choose to save the file. But for
.txt and .html file, if I Shift click the file to save the file instead of
opening the file, the Chinese html/txt filename name is always shown as "File"
on Save as dialog. 
Shanjian/Frank, is it by design or a limitation?
If that's the way for html/txt file, I'll close this bug as fixed. Thanks.
(Reporter)

Comment 29

15 years ago
Right click on the file and select Save Link Target As menu has the same
problem. It looks like we still have problems with html/text files which don't
need to go through helper.
(Reporter)

Comment 30

15 years ago
Filed bug 163682 for the problem with html/text file download.
Verified this one as fixed.
Status: RESOLVED → VERIFIED
(Reporter)

Comment 31

15 years ago
Did more testing, actually download via Save Link Target As menu has problems
with all different file types: .exe/.doc/.html/.txt/.jpg/.bmp/.gif
(Reporter)

Comment 32

15 years ago
QA contact to Rui for the verification on the branch build. Thanks.
QA Contact: ji → ruixu

Comment 33

15 years ago
jaimejr, when should we land this into the m1.0 branch? how many approval we
need to got post machv?

Comment 34

15 years ago
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending
Drivers' approval. pls check this in asap, the replace "Mozilla1.0.1+" with
"fixed1.0.1". [Using adt1.0.1 keywords as a proxy, since no edt1.0.1 keywords
were created for the 1.0.1 branch. This is needed on the 1.0 branch for a major
embedor]. thanks! 
Keywords: adt1.0.1+, approval

Comment 35

15 years ago
Naoki, please make sure to include patch for bug 164197 when you check in this
patch to branch. 
(Assignee)

Comment 36

15 years ago
shanjian, do you mean that the patch of this bug has to merge with the patch of
bug 164197?

Comment 37

15 years ago
Kind of. In fact, bug 164197 fixed 3 problems, 2 caused by this patch, the one
caused by another one of the 4 ftp patch. I think you will checkin all  the 4
ftp patches, so be sure to apply patch from 164197 before you do so.
You need to use Substring instead of nsDependentCString to wrap a chat* which
doesn't end in NULL. Just grab the netwerek/ftp/srv part of that patch to fix
that, although taking the other part won't hurt; it fixes what could be a
significant leak if you do lots of ftp browsing.

Comment 39

15 years ago
Comment on attachment 95954 [details] [diff] [review]
new patch

a=chofmann for 1.0.2
Attachment #95954 - Flags: approval+

Updated

15 years ago
Keywords: mozilla1.0.1 → mozilla1.0.1+

Comment 40

15 years ago
patch checked into 1.02 branch.
Keywords: mozilla1.0.1+ → mozilla1.0.1

Comment 41

15 years ago
mark it as fixed1.0.2
Keywords: fixed1.0.2
(Reporter)

Comment 42

15 years ago
verified as fixed with 2002-08-28-08-1.0 build.
Keywords: fixed1.0.2 → verified1.0.2
You need to log in before you can comment on or make changes to this bug.