Closed Bug 28787 Opened 25 years ago Closed 24 years ago

directory listing display Non ASCII filename as garbage

Categories

(Core Graveyard :: RDF, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: ftang, Assigned: nhottanscp)

References

Details

(Whiteboard: [nsbeta2+] ETA 7/24)

Attachments

(3 files)

This get caught when I use the directory viewer to view the directory name in my Japanese NT. It will do the same thing for any non ASCII name. I use the technique in bug 28424 to catch this problem. It assert in the line values[i] = value; of nsHTTPIndexParser::ParseData cvsblame show me this is waterson's code I believe this is one of the cause that user see garbage of non ASCII file/folder name in the directory viewer Here is the stack trace- nsAutoString::operator=(const nsStr & {...}) line 822 + 21 bytes nsHTTPIndexParser::ParseData(const char * 0x033faa1d) line 665 nsHTTPIndexParser::ProcessData() line 508 + 15 bytes nsHTTPIndexParser::OnDataAvailable(nsHTTPIndexParser * const 0x033e1550, nsIChannel * 0x033dad00, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58, unsigned int 0x00000000, unsigned int 0x00000238) line 454 nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x033dfbd0, nsIChannel * 0x033dad00, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58, unsigned int 0x00000000, unsigned int 0x00000238) line 262 + 46 bytes nsFileChannel::OnDataAvailable(nsFileChannel * const 0x033dad04, nsIChannel * 0x033e1950, nsISupports * 0x00000000, nsIInputStream * 0x033e5f58, unsigned int 0x00000000, unsigned int 0x00000238) line 468 + 49 bytes nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x033e2c80) line 370 nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x033e2c30) line 93 + 12 bytes PL_HandleEvent(PLEvent * 0x033e2c30) line 526 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x01074b50) line 487 + 9 bytes while values is an array of nsString and value is a nsCString notice value may contains non ASCII text. In the US Window, it contains cp1252 (which is 88% same as ISO-8859-1 with 12% differences [0x80-0x9F]), in Japanese system, the charset is totally different outside the ASCII part. using the = operator for this kind of nsCString to nsString will cause bad data converion. I am not sure how should you fix it. If nsHTTPIndexParser is only used by file system directory (but not for other, for example FTP), then you can assume the nsCString is in the charset of nsIPlatformCharset( filenameselector) and use a nsIUnicodeDecoder to convert it into PRUnichar* . if nsHTTPIndexParser is used by other places than file: protocol, you probably need to pass the charset in by some way from the file protocol.
change qa contact to teruko and add 'beta1' keyword
Blocks: 28424
Keywords: beta1
QA Contact: tever → teruko
Need info on what exactly the bad problem is here.
Whiteboard: [NEED INFO]
sorry that I forget to mention the user problem. User in non English system, inculding Japanese, Chinese, Korean for all platform and European languages for Macintosh will see garbage of their folder name, file name in garbage in the file:/// directory view. This is also true for Macintosh user which commonly use non ASCII character, such as bullet or the "folder f" , "Mu" characters .
Whiteboard: [NEED INFO]
Target Milestone: M15
Putting on PDT- radar for beta1. Assuming this is only for ftp directory listings. bobj was in PDT and approved.
Whiteboard: [PDT-]
Yeah, this'll probably involve making sure that the back-end code generates UTF-8 as well. Jud, you'll need to do this FTP stream converter (if it's even possible for FTP directories to be sent in non-ASCII). I'll do the filesystem stuff and kick it over to you once that's working.
Status: NEW → ASSIGNED
I try to debug 20292 and find out it assert in the same place as this one. So this bug is the cause of 20292 also. Please be careful here. FTP and FILE system are NOT operate in UTF8 these days.
Blocks: 20292
waterson- what is the ETA for fixing this ?
Target Milestone: M15 → M16
Change the summary from "illegal use nsCString to nsString - nsHTTPIndexParser::ParseData" to "directory listing display Non ASCII filename as garbage" . This bug also show garbage on US MacOS 9 as well as non Western Window/Linux locale. put in nsbeta2
Keywords: beta1nsbeta2
Whiteboard: [PDT-]
Summary: illegal use nsCString to nsString - nsHTTPIndexParser::ParseData → directory listing display Non ASCII filename as garbage
Putting on [nsbeta2+] radar.
Whiteboard: [nsbeta2+]
Priority: P3 → P1
This is going to involve a fairly significant re-write of the nsHTTPIndexParser code, which is probably best done after NEW_STRING_APIS are turned on (so I don't have to do it twice).
Depends on: 39376
Depends on: 40461
Target Milestone: M16 → M17
*** Bug 40661 has been marked as a duplicate of this bug. ***
saved by rjc.
Assignee: waterson → rjc
Status: ASSIGNED → NEW
Frank (Tang), please provide a FTP URL (preferrably outside of our firewall) which demonstrates this bug. That would be very helpful. :^)
Status: NEW → ASSIGNED
Whiteboard: [nsbeta2+] → [nsbeta2+] ETA: 7/14
nhotta, I haven't heard from Frank (Tang) in a while, perhaps you can help with this bug a bit. First, can you reproduce it? (Frank didn't give a URL in this bug to test, so I've been going to "ftp://kaze/pub/" which was a URL from one of the other dependancy bugs.) From what I'm seeing, what we display in Mozilla is the same as what we display in 4.x. I was looking at this bug over a week or so ago, and made some small changes for testing which might actually be more appropriate [basically, instead of using the IO service, use textToSubURI->UnEscapeAndConvert()] If you are still seeing the original problem that Frank reported, please try applying the diff (I'll attach it as well as the entire file to this bug) and see if that helps.
nhotta, can you test this for me? (Please see comment above and try out the diff if need be.)
Assignee: rjc → nhotta
Status: ASSIGNED → NEW
After I used the attached file and rebuild, I still see the problem on my Japanese WinNT 4. There is a slite difference after the change so I will attach a screen shot. And it is a separate problem but I see something wrong in a date string (year is bogus).
Assignee: nhotta → rjc
does rjc's fix work for Japanese directory viewing in file:/// ? I think we should seperate the file:/// from ftp:// issue. We have a way to know what charset is file:/// (from nsPlatformCharset) but we don't know what is the charset of ftp://
file:/// handling is separate from ftp:// handling... and this bug will deal only with ftp :)
Well, this bug was a huge pain. Here's the current state of the world: o FTP (the spec/protocol) doesn't appear to be strong enough to provide locale info to the client o because of that, FTP (in Necko) just sends up bytes (escaped) for filenames o its rather hard to reliably convert random bytes (which could be encoded in ANY fashion) into Unicode so... o I'll whacked the directory/ftp datasource to use: nsITextToSubURI::UnEscapeAndConvert(oldCharset, byteRun, &ucs2Result) instead of the bare-bones nsEscape() for FTP filename convertion o I've added an "encoding" attribute to the directory/ftp datasource's IDL, so that JavaScript can pass in the "current" encoding as chosen by the user o In mozilla/xpfe/components/directory/directory.js around line #69, its currently hard-coded to this: httpDS.encoding = "ISO-8859-1"; If I then ftp to "ftp://kaze/pub/" I see crap. If I change line # 69 to: httpDS.encoding = "EUC-JP"; I then see Japanese. <whee!> I've just checked all this in. All that's left: someone who is familiar with document charset encoding needs to remove that hard-coded line and instead get the current document's charset and pass that in instead. Should be really trivial JavaScript... as fate would have it, I haven't had any luck with it. nhotta, can you (or cata, or someone who's been involved with the charset menus) hook this up?
Assignee: rjc → nhotta
rjc- Thanks for fixing to this stage. I think your change for ftp is great. However, the origional bug are for file directory not ftp if you read carefully. Maybe we should file a seperate bug for that.
Frank, as has been indicated at various places in this bug by various people (including myself), this bug will only deal with ftp problems. If you have a problem with file encoding as well, that will have to be a different bug.
If it helps to explain why this is the case, realize that nsHTTPIndexParser (which is where the problem was reported as being) deals with ftp, not file.
Clearing ETA. Somehow, a character set in JS is always UTF-8. No override or autodetection affects the charset (so the menu check mark is always at UTF-8). The value I am looking at for this is, window._content.document.characterSet
Whiteboard: [nsbeta2+] ETA: 7/14 → [nsbeta2+]
Since a charset is always UTF-8, I cannot use it to set "httpDS.encoding". I will talk to ftang tomorrow about this.
Whiteboard: [nsbeta2+] → [nsbeta2+] No ETA
So the document charset is always "UTF-8". If I can get appcore then I may use a default charset as below. window._content.appCore.GetDocumentCharset() But that returns null, I cannot get appcore in directory.js. Reassign to rjc, do you know how to get appcore in directory.js?
Assignee: nhotta → rjc
nhotta, it might be the case that the appCore can't be obtained in directory.js because the entire page is sand-boxed inside of the browser's content area. Can you guys come up with another way of getting the charset?
Assignee: rjc → nhotta
No, we only have document charset or default charset. Menu check mark is using document charset. That's why any page for ftp get a charset check mark at "UTF-8". So getting a charset from menu does not work.
For every ftp load, directory.xul is loaded from C++ code. It may be related why I cannot get appCore in directory.js. I tried to used window.parent.appCore but that's also null. BTW, local file listing is working fine for Japanese (I think this bug was orginally filed for local file listing).
Asked Cata for help for the appCore problem.
nhotta- ETA ? Next Tuesday ???
I got a help from Bill Law for appCore issue, ETA is 7/24.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta2+] No ETA → [nsbeta2+] No 7/24
Whiteboard: [nsbeta2+] No 7/24 → [nsbeta2+] ETA 7/24
Fix checked in, Japanese name shows by setting a correct charset by menu. A menu check mark sticks to UTF-8 but that should be a separate bug.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I verified this in 2000-07-24-08 Win32, Mac, and Linux build.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: