Closed Bug 28787 Opened 25 years ago Closed 24 years ago

directory listing display Non ASCII filename as garbage

Categories

(Core Graveyard :: RDF, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: ftang, Assigned: nhottanscp)

References

Details

(Whiteboard: [nsbeta2+] ETA 7/24)

Attachments

(3 files)

This get caught when I use the directory viewer to view the directory name in my
Japanese NT. It will do the same thing for any non ASCII name.

I use the technique in bug 28424 to catch this problem. It assert in the line
    values[i] = value;
of nsHTTPIndexParser::ParseData
cvsblame show me this is waterson's code

I believe this is one of the cause that user see garbage of non ASCII
file/folder name in the directory viewer

Here is the stack trace-
nsAutoString::operator=(const nsStr & {...}) line 822 + 21 bytes
nsHTTPIndexParser::ParseData(const char * 0x033faa1d) line 665
nsHTTPIndexParser::ProcessData() line 508 + 15 bytes
nsHTTPIndexParser::OnDataAvailable(nsHTTPIndexParser * const 0x033e1550,
nsIChannel * 0x033dad00, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58,
unsigned int 0x00000000, unsigned int 0x00000238) line 454
nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x033dfbd0,
nsIChannel * 0x033dad00, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58,
unsigned int 0x00000000, unsigned int 0x00000238) line 262 + 46 bytes
nsFileChannel::OnDataAvailable(nsFileChannel * const 0x033dad04, nsIChannel *
0x033e1950, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58, unsigned int
0x00000000, unsigned int 0x00000238) line 468 + 49 bytes
nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x033e2c80)
line 370
nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x033e2c30) line 93 + 12 bytes
PL_HandleEvent(PLEvent * 0x033e2c30) line 526 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x01074b50) line 487 + 9 bytes

while values is an array of nsString and value is a nsCString
notice value may contains non ASCII text. In the US Window, it contains cp1252
(which is 88% same as ISO-8859-1 with 12% differences [0x80-0x9F]), in Japanese
system, the charset is totally different outside the ASCII part.
using the = operator for this kind of nsCString to nsString will cause bad data
converion.

I am not sure how should you fix it. If nsHTTPIndexParser is only used by file
system directory (but not for other, for example FTP), then you can assume the
nsCString is in the charset of nsIPlatformCharset( filenameselector) and use a
nsIUnicodeDecoder to convert it into PRUnichar* . if nsHTTPIndexParser is used
by other places than file: protocol, you probably need to pass the charset in by
some way from the file protocol.
change qa contact to teruko and add 'beta1' keyword
Blocks: 28424
Keywords: beta1
QA Contact: tever → teruko
Need info on what exactly the bad problem is here.
Whiteboard: [NEED INFO]
sorry that I forget to mention the user problem. User in non English system, 
inculding Japanese, Chinese, Korean for all platform and European languages for 
Macintosh will see garbage of their folder name, file name in garbage in the 
file:/// directory view. This is also true for Macintosh user which commonly use 
non ASCII character, such as bullet or the "folder f" , "Mu" characters . 
Whiteboard: [NEED INFO]
Target Milestone: M15
Putting on PDT- radar for beta1.  Assuming this is only for ftp directory 
listings.  bobj was in PDT and approved.
Whiteboard: [PDT-]
Yeah, this'll probably involve making sure that the back-end code generates 
UTF-8 as well. Jud, you'll need to do this FTP stream converter (if it's even 
possible for FTP directories to be sent in non-ASCII). I'll do the filesystem 
stuff and kick it over to you once that's working.
Status: NEW → ASSIGNED
I try to debug 20292 and find out it assert in the same place as this one. So 
this bug is the cause of 20292 also. Please be careful here. FTP and FILE system 
are NOT operate in UTF8 these days. 
Blocks: 20292
waterson- what is the ETA for fixing this ?
Target Milestone: M15 → M16
Change the summary from "illegal use nsCString to nsString - 
nsHTTPIndexParser::ParseData" to "directory listing display Non ASCII filename 
as garbage" . This bug also show garbage on US MacOS 9 as well as non Western 
Window/Linux locale.
put in nsbeta2 
Keywords: beta1nsbeta2
Whiteboard: [PDT-]
Summary: illegal use nsCString to nsString - nsHTTPIndexParser::ParseData → directory listing display Non ASCII filename as garbage
Putting on [nsbeta2+] radar. 
Whiteboard: [nsbeta2+]
Priority: P3 → P1
This is going to involve a fairly significant re-write of the nsHTTPIndexParser 
code, which is probably best done after NEW_STRING_APIS are turned on (so I 
don't have to do it twice).
Depends on: 39376
Depends on: 40461
Target Milestone: M16 → M17
*** Bug 40661 has been marked as a duplicate of this bug. ***
saved by rjc.
Assignee: waterson → rjc
Status: ASSIGNED → NEW
Frank (Tang), please provide a FTP URL (preferrably outside of our firewall) 
which demonstrates this bug.  That would be very helpful.  :^)
Status: NEW → ASSIGNED
Whiteboard: [nsbeta2+] → [nsbeta2+] ETA: 7/14
nhotta, I haven't heard from Frank (Tang) in a while, perhaps you can help with 
this bug a bit.  First, can you reproduce it?  (Frank didn't give a URL in this 
bug to test, so I've been going to "ftp://kaze/pub/" which was a URL from one of 
the other dependancy bugs.)  From what I'm seeing, what we display in Mozilla is 
the same as what we display in 4.x.

I was looking at this bug over a week or so ago, and made some small changes for 
testing which might actually be more appropriate [basically, instead of using the 
IO service, use textToSubURI->UnEscapeAndConvert()]  If you are still seeing the 
original problem that Frank reported, please try applying the diff (I'll attach 
it as well as the entire file to this bug) and see if that helps.
nhotta, can you test this for me?  (Please see comment above and try out the diff 
if need be.)
Assignee: rjc → nhotta
Status: ASSIGNED → NEW
After I used the attached file and rebuild, I still see the problem on my 
Japanese WinNT 4.
There is a slite difference after the change so I will attach a screen shot.
And it is a separate problem but I see something wrong in a date string (year is 
bogus).
Assignee: nhotta → rjc
does rjc's fix work for Japanese directory viewing in file:/// ?
I think we should seperate the file:/// from ftp:// issue. We have a way to know 
what charset is file:/// (from nsPlatformCharset) but we don't know what is the 
charset of ftp:// 
file:/// handling is separate from ftp:// handling... and this bug will deal 
only with ftp   :)
Well, this bug was a huge pain.  Here's the current state of the world:

o FTP (the spec/protocol) doesn't appear to be strong enough to provide locale 
info to the client

o because of that, FTP (in Necko) just sends up bytes (escaped) for filenames

o its rather hard to reliably convert random bytes (which could be encoded in ANY 
fashion) into Unicode

so...

o I'll whacked the directory/ftp datasource to use:
        nsITextToSubURI::UnEscapeAndConvert(oldCharset, byteRun, &ucs2Result)
instead of the bare-bones nsEscape() for FTP filename convertion

o I've added an "encoding" attribute to the directory/ftp datasource's IDL, so 
that JavaScript can pass in the "current" encoding as chosen by the user

o In mozilla/xpfe/components/directory/directory.js around line #69, its 
currently hard-coded to this:
        httpDS.encoding = "ISO-8859-1";
If I then ftp to "ftp://kaze/pub/" I see crap.  If I change line # 69 to:
        httpDS.encoding = "EUC-JP";
I then see Japanese.  <whee!>

I've just checked all this in.

All that's left:  someone who is familiar with document charset encoding needs to 
remove that hard-coded line and instead get the current document's charset and 
pass that in instead.

Should be really trivial JavaScript... as fate would have it, I haven't had any 
luck with it.

nhotta, can you (or cata, or someone who's been involved with the charset menus) 
hook this up?
Assignee: rjc → nhotta
rjc-
Thanks for fixing to this stage. I think your change for ftp is great. However,
the origional bug are for file directory not ftp if you read carefully. Maybe we 
should file a seperate bug for that. 

Frank, as has been indicated at various places in this bug by various people 
(including myself), this bug will only deal with ftp problems.  If you have a 
problem with file encoding as well, that will have to be a different bug.
If it helps to explain why this is the case, realize that nsHTTPIndexParser 
(which is where the problem was reported as being) deals with ftp, not file.
Clearing ETA.
Somehow, a character set in JS is always UTF-8. No override or autodetection 
affects the charset (so the menu check mark is always at UTF-8).
The value I am looking at for this is,
window._content.document.characterSet
Whiteboard: [nsbeta2+] ETA: 7/14 → [nsbeta2+]
Since a charset is always UTF-8, I cannot use it to set "httpDS.encoding".
I will talk to ftang tomorrow about this.
Whiteboard: [nsbeta2+] → [nsbeta2+] No ETA
So the document charset is always "UTF-8".
If I can get appcore then I may use a default charset as below.
window._content.appCore.GetDocumentCharset()
But that returns null, I cannot get appcore in directory.js.
Reassign to rjc, do you know how to get appcore in directory.js?
Assignee: nhotta → rjc
nhotta, it might be the case that the appCore can't be obtained in directory.js 
because the entire page is sand-boxed inside of the browser's content area.

Can you guys come up with another way of getting the charset?
Assignee: rjc → nhotta
No, we only have document charset or default charset.
Menu check mark is using document charset. That's why any page for ftp get a 
charset check mark at "UTF-8". So getting a charset from menu does not work.
For every ftp load, directory.xul is loaded from C++ code. It may be related why 
I cannot get appCore in directory.js. I tried to used window.parent.appCore but 
that's also null.
BTW, local file listing is working fine for Japanese (I think this bug was 
orginally filed for local file listing).

Asked Cata for help for the appCore problem.
nhotta- ETA ? Next Tuesday ???
I got a help from Bill Law for appCore issue, ETA is 7/24.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta2+] No ETA → [nsbeta2+] No 7/24
Whiteboard: [nsbeta2+] No 7/24 → [nsbeta2+] ETA 7/24
Fix checked in, Japanese name shows by setting a correct charset by menu.
A menu check mark sticks to UTF-8 but that should be a separate bug.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I verified this in 2000-07-24-08 Win32, Mac, and Linux build.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: