Closed Bug 26767 Opened 25 years ago Closed 10 years ago

need support RFC2640 (Internationalization of the File Transfer Protocol)

Categories

(Core Graveyard :: Networking: FTP, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
mozilla31

People

(Reporter: m_kato, Assigned: emk)

References

()

Details

(Keywords: helpwanted, intl, testcase)

Attachments

(1 file, 16 obsolete files)

13.45 KB, patch
michal
: review+
Details | Diff | Splinter Review
Current implementation of filename encoding via FTP etc depends on OS.
But by RFC2640, filename should use UTF8 via FTP.  So we should support RFC2640.
-> Jud
Assignee: gagan → valeski
Target Milestone: M16
Summary: [REF] need support RFC2640 (Internationalization of the File Transfer Protocol) → [RFE] need support RFC2640 (Internationalization of the File Transfer Protocol)
Target Milestone: M16 → M18
-> dougt
Assignee: valeski → dougt
Blocks: 62355
*** Bug 64700 has been marked as a duplicate of this bug. ***
unsetting milestone
Target Milestone: M18 → ---
Keywords: helpwanted
Target Milestone: --- → Future
Blocks: 20292
*** Bug 20292 has been marked as a duplicate of this bug. ***
mass move, v2.
qa to me.
QA Contact: tever → benc
->ftp
Component: Networking → Networking: FTP
QA Contact: benc → tever
reassigning to bbaetz@cs.mcgill.ca.
Assignee: dougt → bbaetz
Are you aware of any servers which implement this?
QA Contact: tever → benc
will test against:
ftp://kaze/pub/
Keywords: testcase
What are we exactly testing? Maybe I can help better. kaze is my server
and I set up the original test case for some other bug. I am not
sure if it is a proper test case for this bug.
This definately won't work. (jbetak tested it as part of my directory viewer
changes, and it was broken to start with). This really can't be done until file
urls display with the proper character set, though. I cna't fix this unless I
get access to a server which implements this extention. Can someone please
telnet into kaze on the ftp port (21), and give me the results of typing, in order:

FEAT
USER anonymous
PASS foo@bar.com
FEAT
SYST

This will at least let me confirm what server it is, and whether it is, in fact,
following this RFC.
kaze is running AIX 4.2.1 and its FTP cannot be compliant 
with RFC 2640.
But here is the output you requested:

500 'FEAT': command not understood.
SYST
215 UNIX Type: L8 Version: BSD-44
Does kaze use UTF8? Can you create a new directory with a file called fooé
(thats f-o-o-(e-actute)), and attach a packet trace of us trying to download
this file?

I tried just assuming all ftp data was UTF8 by default, but wuftpd uses
iso-8859-1, and so that broke. Can you try using the html ftp view, and changing
the charset using the view menu? does that help? Can you load files with non
ascii names - ie is this just a display problem? Do you know of clients which
support this server? If so, can you get a packet trace of one of them connecting
and downloading a file with non-iso-8859-1 chars, and another one of us doing
the same? On today's trunk build theres a bug which I would like you to exploit
- can you load the html directory listing, and then use file->send-page to mail
me the raw output of what we think we get? I won't fix that bug til the weekend..

benc: Can I get you to help momoi to do this? I assume that momoi doesn't have a
packet trace tool installed.
momoi: lets find some time to look at this problem....
Yes, benc, let's do that next week. It should not be that 
hard to come up with what we need.
Adding intl keyword.
Keywords: intl
*** Bug 144717 has been marked as a duplicate of this bug. ***
I'm the author of bug 144717 which is a dup of this one. I have put up a test
FTP site where you can all test this out.

Machine: annexia.org
Username: mozilla
Password: mozilla

Or use the following URL: ftp://mozilla:mozilla@annexia.org/

(Sorry my machine is quite slow).

There are three files in this directory. All file names are in UTF-8 format. The
first is Greek, the second is Chinese and the third is Japanese. Go to the
UTF-8 sampler page (http://www.columbia.edu/kermit/utf8.html) to see what
the original strings *should* look like.

NCFTP gets this right. Mozilla and IE get it wrong. NCFTP correctly sends the
FEAT command which this FTP server understands. The response from FEAT is:

211-Extensions supported:
 HOST
 LANG EN*
 MDTM
 MLST TYPE*;SIZE*;MODIFY*;PERM*;UNIX.MODE*;
 REST STREAM
 SIZE
 TVFS
 UTF8
211 END

(Note the "UTF8").

See also:

http://www.zvon.org/tmRFC/RFC2640/Output/chapter3.html
What server is this running? SYST just says: "UNIX Type: L8", which doesn't
help. I'm curious, mainly because none of the 'big' ftp servers, (ie wuftpd,
iis, proftpd, and a few others I tested) support this.

Its probably not too difficult to do, I guess, once the patch bug 118179 lands.
We don't send FEAT at all, so we'd have to send that, and then parse the
response, though.
This particular server is Net::FTPServer/1.108 
(http://www.cpan.org/modules/by-authors/id/R/RW/RWMJ/Net-FTPServer-1.108.readme).
NcFTPD also supports FEAT and UTF-8.

Our website and FTP service uses UTF-8 extensively because we need to support
a world-wide market with a mix of languages on the same pages.

Yes, you would need to issue the FEAT command and parse the output to see
if it contains the string "UTF8". If so, display the page in UTF-8 encoding.
If not, you're on your own (it's pretty much undefined: probably best to just
assume ISO-8859-1 as now).
> If not, you're on your own (it's pretty much undefined: 
> probably best to just assume ISO-8859-1 as now).

I don't think we can make such an assumption for this
type of situation. It would have to be the same encoding as
the system default or the browser window endcoding to be
safe. 

If we pay attention to both LANG and encoding (=UTF-8),
I think we need to use the proper font for the LANG
value to display the result. Hope this is automatically
done the the layout and rendering components of Mozilla. 
If FEAT is not supported by the server, or does not return the UTF8 feature,
then you'd need to guess the encoding on the server. Ouch :-) But assuming the
browser's default encoding might be better than just guessing ISO-8859-1.

I'm only really interested in the case where FEAT *is* supported and it
returns the UTF8 feature.

Note that RFC 2640 *requires* that clients now support and send the FEAT command
and recognise UTF8 encoding, so Mozilla is in non-compliance (from my reading
of the RFC anyway).
My point is that UTF-8 encoding itself does not give
sufficient info on what font to use for Unifiied Ideograph
range for CJK. LANG info is important for proper display of
characters in such a case.
>If FEAT is not supported by the server, or does not return the UTF8 feature,
>then you'd need to guess the encoding on the server. Ouch :-) But assuming the
>browser's default encoding might be better than just guessing ISO-8859-1.
Doesn't this bug get fixed by 118179?  
I have a patch to use the Default char coding in the Pref.
I have no time to work on mozilla at the moment, so dougt is taking over FTP

open ftp bugs -> him
Assignee: bbaetz → dougt
Summary: [RFE] need support RFC2640 (Internationalization of the File Transfer Protocol) → need support RFC2640 (Internationalization of the File Transfer Protocol)
Blocks: 297395
*** Bug 335099 has been marked as a duplicate of this bug. ***
mass reassigning to nobody.
Assignee: dougt → nobody
(In reply to comment #29)
> Created an attachment (id=311568) [details]
> a patch for RFC2640 support.  test w/ pure-ftped
> 

Why don't you ask for review? 

People on Mozilla Russia's forum keep asking about proper Cyrillic folder names support. At the moment they are stuck with the need to change 'network.standard-url.encode-utf8' to 'false' (see https://bugzilla.mozilla.org/show_bug.cgi?id=297395#c13).

UTF-8 in FTP would be a good addition to Fx 3.1
(In reply to comment #31)
> (In reply to comment #29)
> > Created an attachment (id=311568) [details] [details]
> > a patch for RFC2640 support.  test w/ pure-ftped
> > 
> 
> Why don't you ask for review? 

Because mozilla-central tree (for 3.x and 4.0) isn't opened yet.  After opened, I will send r/sr.

Assignee: nobody → m_kato
Status: NEW → ASSIGNED
Attached patch a patch for RFC2640 support (obsolete) — Splinter Review
RFC2640 (I18N FTP support).  I tested using pure-ftpd (it is supported FEAT/OPTS)
Attachment #311568 - Attachment is obsolete: true
Attachment #323826 - Flags: review?(bzbarsky)
I don't think I'm qualified to review this patch.  Dougt, would you be willing to take a look?
Attachment #323826 - Flags: review?(bzbarsky) → review?(doug.turner)
Comment on attachment 323826 [details] [diff] [review]
a patch for RFC2640 support 

cool!  m_kato, thanks for looking into this (a 5 digit bug, no less!)

got any test cases -- or a public server that I can test this against?


Does the ftp feature UTF8 imply that the paths that we send to the ftp server?  (/me needs to refresh himself with rfc 2640!)
Although pure-ftpd (http://www.pureftpd.org/project/pure-ftpd) supports RFC2640, I don't know where is public test server.  I will be looking for it.

- Spec
FTP serer must support FEAT command, client sends FEAT command, server must send UTF8 as result if supporting RFC2640.  Next, client sends OPTS UTF8 ON commands to enable UTF8.  

It supports Windows shell (aka FTP folder).  And Filezilla supports it too.
Not quite. RFC 2640 does not specify OPTS UTF8 ON. As long as UTF8 is in the FEAT response, everything already is using UTF-8.

OPTS UTF-8 ON is from a long-expired IETF draft (http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00).

An RFC2640 compliant server uses UTF-8 regardless of whether OPTS UTF-8 ON gets sent or not, and a compliant client assumes the server uses UTF-8 even if a OPTS UTF-8 ON command fails.

Note that using any encoding other than ASCII (RFC 959) or UTF-8 (RFC 2640) without prior explicit negotiation is not covered by the specs.

I've summarized this in my wiki: http://wiki.filezilla-project.org/Character_Set
A quick glance at the proposed patch reveals that it is not RFC2640 compliant as it depends on the unspecified OPTS UTF-8 ON command.
Makoto, can you fix this before I review?
I know that "OPTS UTF8 ON" is Microsoft's FTP folder spec.  Since Microsoft implements foolish spec (not RFC2640), many servers do it to support FTP folder of Explorer.

Doug,
OK. After I change a code, then I will send review to you.
Attached patch a patch v2 for RFC2640 support (obsolete) — Splinter Review
Attachment #323826 - Attachment is obsolete: true
Attachment #326913 - Flags: review?
Attachment #323826 - Flags: review?(doug.turner)
Attachment #326913 - Flags: review? → review?(doug.turner)
Comment on attachment 326913 [details] [diff] [review]
a patch v2 for RFC2640 support

instead of ftpcharset, lets change it to useUTF8.  Lets not imply that there are multiple charsets available to ftp.

I am concerned that just setting the control channel's content charset isn't enough.  There is code in ftp that deals with filepaths, conversion from filepaths to VMS paths, and other suspect code.  Did you review this to see if anything needs to change?

I think I would be happier if there were public servers we could test against.  Do any of these servers work:

http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/ftp/test/frametest/menu.html?force=1
Here's a public server to test against:

by HTTP: http://www.annexia.org/tmp/mozilla-test/
by FTP:  ftp://mozilla:mozilla@annexia.org/

Or:
  Hostname: annexia.org
  Username: mozilla
  Password: mozilla

You should see three files in the directory.  The two interesting
ones are the letter "ka" in two different languages:

http://en.wikipedia.org/wiki/Ka_%28kana%29
http://en.wikipedia.org/wiki/%E0%A4%95

Mozilla currently renders this wrong.  NCFTP gets it right:

$ ncftp -u mozilla -p mozilla annexia.org
NcFTP 3.2.1 (Jul 29, 2007) by Mike Gleason (http://www.NcFTP.com/contact/).
Connecting to 80.68.91.176...                                                   
furbychan.cocan.org FTP server (Net::FTPServer/1.120-1) ready.
Logging in...                                                                   
Welcome mozilla.
Logged in to annexia.org.                                                       
ncftp / > ls
か          .htaccess    क
Attached patch a patch v3 (obsolete) — Splinter Review
previous patch is a lack of streamconv code
Attachment #326913 - Attachment is obsolete: true
Attachment #326924 - Flags: review?(doug.turner)
Attachment #326913 - Flags: review?(doug.turner)
(In reply to comment #42)
> I am concerned that just setting the control channel's content charset isn't
> enough.  There is code in ftp that deals with filepaths, conversion from
> filepaths to VMS paths, and other suspect code.  Did you review this to see if
> anything needs to change?

Current code uses ISO-8859-1 for all FTP channel.  So even if current is VMS style, it is no problem to convert from ISO-8859-1 to UTF-8.  (7bit character map is same.  So "[", "]", "/" and "." is same mapping.  And, if problem occurs with FEAT - UTF8 enable, it is broken characters as UTF8.)

And, I don't think that VMS style server supports new feature such as RFC2640.

Also, I will create new patch for changing metadata.
Attached patch a patch v4 for RFC2640 support (obsolete) — Splinter Review
change metadata name of utf8 support into cache
Attachment #326924 - Attachment is obsolete: true
Attachment #327207 - Flags: review?(doug.turner)
Attachment #326924 - Flags: review?(doug.turner)
Blocks: 296528
I applied the patch 327207 and was testing my patch for bug 296528 if it helps handling files with diacritics in name when dropped to firefox from windows explorer. I encountered assertion failure in:

 	msvcr80d.dll!strtoxl(localeinfo_struct * plocinfo=0x00000000, const char * nptr=0x00000000, const char * * endptr=0x00000000, int ibase=10, int flags=0)  Line 94 + 0x2f bytes	C++
 	msvcr80d.dll!strtol(const char * nptr=0x00000000, char * * endptr=0x00000000, int ibase=10)  Line 236 + 0x15 bytes	C++
 	msvcr80d.dll!atol(const char * nptr=0x00000000)  Line 56 + 0xd bytes	C
 	msvcr80d.dll!atoi(const char * nptr=0x00000000)  Line 100 + 0x9 bytes	C
>	necko.dll!nsFtpState::ReadCacheEntry()  Line 2145 + 0xf bytes	C++
 	necko.dll!nsFtpState::OnCallbackPending()  Line 2119 + 0x26 bytes	C++
 	necko.dll!nsBaseContentStream::AsyncWait(nsIInputStreamCallback * callback=0x05bb28f4, unsigned int flags=0, unsigned int requestedCount=0, nsIEventTarget * target=0x00bc1ec8)  Line 171	C++
 	necko.dll!nsInputStreamPump::EnsureWaiting()  Line 152 + 0x48 bytes	C++
 	necko.dll!nsInputStreamPump::AsyncRead(nsIStreamListener * listener=0x036b0064, nsISupports * ctxt=0x00000000)  Line 362 + 0x8 bytes	C++
 	necko.dll!nsBaseChannel::BeginPumpingData()  Line 232 + 0x3a bytes	C++
 	necko.dll!nsBaseChannel::AsyncOpen(nsIStreamListener * listener=0x05bb26b8, nsISupports * ctxt=0x00000000)  Line 488 + 0xb bytes	C++
 	docshell.dll!nsURILoader::OpenURI(nsIChannel * channel=0x036b0058, int aIsContentPreferred=0, nsIInterfaceRequestor * aWindowContext=0x045e3920)  Line 840 + 0x19 bytes	C++
 	docshell.dll!nsDocShell::DoChannelLoad(nsIChannel * aChannel=0x036b0058, nsIURILoader * aURILoader=0x03608380, int aBypassClassifier=0)  Line 7638 + 0x41 bytes	C++
 	docshell.dll!nsDocShell::DoURILoad(nsIURI * aURI=0x05bb15a0, nsIURI * aReferrerURI=0x00000000, int aSendReferrer=1, nsISupports * aOwner=0x00000000, const char * aTypeHint=0x00000000, nsIInputStream * aPostData=0x00000000, nsIInputStream * aHeadersData=0x00000000, int aFirstParty=1, nsIDocShell * * aDocShell=0x00000000, nsIRequest * * aRequest=0x0012d5e8, int aIsNewWindowTarget=0, int aBypassClassifier=0)  Line 7484 + 0x29 bytes	C++
 	docshell.dll!nsDocShell::InternalLoad(nsIURI * aURI=0x05bb15a0, nsIURI * aReferrer=0x00000000, nsISupports * aOwner=0x00000000, unsigned int aFlags=1, const wchar_t * aWindowTarget=0x04a63f88, const char * aTypeHint=0x00000000, nsIInputStream * aPostData=0x00000000, nsIInputStream * aHeadersData=0x00000000, unsigned int aLoadType=1, nsISHEntry * aSHEntry=0x00000000, int aFirstParty=1, nsIDocShell * * aDocShell=0x00000000, nsIRequest * * aRequest=0x00000000)  Line 7209 + 0x7f bytes	C++
 	docshell.dll!nsDocShell::LoadURI(nsIURI * aURI=0x05bb15a0, nsIDocShellLoadInfo * aLoadInfo=0x05bb1780, unsigned int aLoadFlags=0, int aFirstParty=1)  Line 904 + 0x56 bytes	C++
 	docshell.dll!nsDocShell::LoadURI(const wchar_t * aURI=0x05bab670, unsigned int aLoadFlags=0, nsIURI * aReferringURI=0x00000000, nsIInputStream * aPostStream=0x00000000, nsIInputStream * aHeaderStream=0x00000000)  Line 2934 + 0x2a bytes	C++
 	xpcom_core.dll!NS_InvokeByIndex_P(nsISupports * that=0x0012db98, unsigned int methodIndex=21151274, unsigned int paramCount=1235340, nsXPTCVariant * params=0x0012db98)  Line 102	C++
 	xpcom_core.dll!xptiInterfaceInfo::GetIIDForParamNoAlloc(unsigned short methodIndex=8, const nsXPTParamInfo * param=0x00000005, nsID * iid=0x0012da08)  Line 720 + 0x2e bytes	C++
 	xpc3250.dll!XPCWrappedNative::CallMethod(XPCCallContext & ccx={...}, XPCWrappedNative::CallMode mode=CALL_METHOD)  Line 2393 + 0x21 bytes	C++
 	xpc3250.dll!XPC_WN_CallMethod(JSContext * cx=0x03783810, JSObject * obj=0x02db10e0, unsigned int argc=5, long * argv=0x03828338, long * vp=0x0012dce0)  Line 1473 + 0xe bytes	C++
 	js3250.dll!js_Invoke(JSContext * cx=0x03783810, unsigned int argc=5, long * vp=0x03828330, unsigned int flags=2)  Line 1307 + 0x20 bytes	C++
 	js3250.dll!js_Interpret(JSContext * cx=0x03783810)  Line 4939 + 0x16 bytes	C++
 	js3250.dll!js_Invoke(JSContext * cx=0x03783810, unsigned int argc=1, long * vp=0x0382804c, unsigned int flags=0)  Line 1323 + 0x9 bytes	C++
 	js3250.dll!js_InternalInvoke(JSContext * cx=0x03783810, JSObject * obj=0x05201920, long fval=94112704, unsigned int flags=0, unsigned int argc=1, long * argv=0x03828048, long * rval=0x0012e4f4)  Line 1379 + 0x15 bytes	C++
 	js3250.dll!JS_CallFunctionValue(JSContext * cx=0x03783810, JSObject * obj=0x05201920, long fval=94112704, unsigned int argc=1, long * argv=0x03828048, long * rval=0x0012e4f4)  Line 5054 + 0x1f bytes	C++
 	gklayout.dll!nsJSContext::CallEventHandler(nsISupports * aTarget=0x0431d250, void * aScope=0x03ee36a0, void * aHandler=0x059c0bc0, nsIArray * aargv=0x04a14d30, nsIVariant * * arv=0x0012e6c0)  Line 1962 + 0x24 bytes	C++
 	gklayout.dll!nsJSEventListener::HandleEvent(nsIDOMEvent * aEvent=0x05ba8d00)  Line 248 + 0x67 bytes	C++
 	gklayout.dll!nsEventListenerManager::HandleEventSubType(nsListenerStruct * aListenerStruct=0x0494e148, nsIDOMEventListener * aListener=0x0431d358, nsIDOMEvent * aDOMEvent=0x05ba8d00, nsISupports * aCurrentTarget=0x0431d250, unsigned int aPhaseFlags=2)  Line 1080 + 0x12 bytes	C++
 	gklayout.dll!nsEventListenerManager::HandleEvent(nsPresContext * aPresContext=0x04959040, nsEvent * aEvent=0x0012ed6c, nsIDOMEvent * * aDOMEvent=0x0012e99c, nsISupports * aCurrentTarget=0x0431d250, unsigned int aFlags=2, nsEventStatus * aEventStatus=0x0012e9a0)  Line 1190	C++
 	gklayout.dll!nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=2)  Line 211	C++
 	gklayout.dll!nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=6, nsDispatchingCallback * aCallback=0x0012ea50)  Line 293	C++
 	gklayout.dll!nsEventDispatcher::Dispatch(nsISupports * aTarget=0x0495e910, nsPresContext * aPresContext=0x04959040, nsEvent * aEvent=0x0012ed6c, nsIDOMEvent * aDOMEvent=0x00000000, nsEventStatus * aEventStatus=0x0012eb80, nsDispatchingCallback * aCallback=0x0012ea50)  Line 478 + 0x12 bytes	C++
 	gklayout.dll!PresShell::HandleEventInternal(nsEvent * aEvent=0x0012ed6c, nsIView * aView=0x04960168, nsEventStatus * aStatus=0x0012eb80)  Line 5928 + 0x29 bytes	C++
 	gklayout.dll!PresShell::HandlePositionedEvent(nsIView * aView=0x04960168, nsIFrame * aTargetFrame=0x04968ca4, nsGUIEvent * aEvent=0x0012ed6c, nsEventStatus * aEventStatus=0x0012eb80)  Line 5816 + 0x14 bytes	C++
 	gklayout.dll!PresShell::HandleEvent(nsIView * aView=0x04960168, nsGUIEvent * aEvent=0x0012ed6c, nsEventStatus * aEventStatus=0x0012eb80)  Line 5676 + 0x1e bytes	C++
 	gklayout.dll!nsViewManager::HandleEvent(nsView * aView=0x04960168, nsPoint aPoint={...}, nsGUIEvent * aEvent=0x0012ed6c, int aCaptured=0)  Line 1385	C++
 	gklayout.dll!nsViewManager::DispatchEvent(nsGUIEvent * aEvent=0x0012ed6c, nsEventStatus * aStatus=0x0012ecc8)  Line 1337 + 0x22 bytes	C++
 	gklayout.dll!HandleEvent(nsGUIEvent * aEvent=0x0012ed6c)  Line 171	C++
 	gkwidget.dll!nsWindow::DispatchEvent(nsGUIEvent * event=0x0012ed6c, nsEventStatus & aStatus=nsEventStatus_eIgnore)  Line 985 + 0xc bytes	C++
 	gkwidget.dll!nsNativeDragTarget::DispatchDragDropEvent(unsigned int aEventType=1403, _POINTL aPT={...})  Line 223	C++
 	gkwidget.dll!nsNativeDragTarget::ProcessDrag(IDataObject * pData=0x0018df98, unsigned int aEventType=1403, unsigned long grfKeyState=0, _POINTL ptl={...}, unsigned long * pdwEffect=0x0018d964)  Line 253	C++
 	gkwidget.dll!nsNativeDragTarget::Drop(IDataObject * pData=0x0018df98, unsigned long grfKeyState=0, _POINTL aPT={...}, unsigned long * pdwEffect=0x0018d964)  Line 425	C++


    nsXPIDLCString charset;
    mCacheEntry->GetMetaDataElement("useUTF8", getter_Copies(charset));
    if (atoi(charset.get()) == PR_TRUE)

charset is empty and .get() returns NULL pointer. Metadata of the cache entry is NULL.

It happened probably just because I installed MS SDK debug sources and could be completely ignored, so just for record.
This happens when I want to access a file with diacritics with name a second time. (The first time I get "file could not be found" and the name of that file seems to be pure UTF-8 or native encoding of its name, like there would be instead of AUTF8String used ACString in some interface method declaration or decoding would be applied two times)
(In reply to comment #48)
> This happens when I want to access a file with diacritics with name a second
> time. (The first time I get "file could not be found" and the name of that file
> seems to be pure UTF-8 or native encoding of its name, like there would be
> instead of AUTF8String used ACString in some interface method declaration or
> decoding would be applied two times)
> 

I will add simply null check.  Then, I will submit new patch this week.
Attached patch patch v5 (obsolete) — Splinter Review
Attachment #327207 - Attachment is obsolete: true
Attachment #331315 - Flags: review?(doug.turner)
Attachment #327207 - Flags: review?(doug.turner)
IMHO better and cleaner would be to test result of mCacheEntry->GetMetaDataElement. And also could be interesting to figure out why this happens and why exactly on the second try...

Further, maybe I misunderstood what this bug had to fix but I was testing international file names while fixing bug 296528 with patch (v4) of this bug applied. It didn't help to open files with diacritics in names when written directly to the address bar nor dropped from the windows explorer (attachment 330841 [details] [diff] [review] applied) and nor when click on the file name in FF FTP directory browser. I was testing on winxpsp2-en and with files with czech diacritics in name.

And one more nit, when building on windows I got: d:/mozilla/HG/src/netwerk/streamconv/converters/nsIndexedToHTML.cpp(169) : error C2373: 'nsIndexedToHTML::OnHeaderAvailable' : redefinition; different type modifiers

nsIndexedToHTML::OnHeaderAvailable must be defined with 'nsresult' return type and not with NS_IMETHODIMP. Sorry, I forgot to mention this sooner...
Attached patch patch v6 (obsolete) — Splinter Review
Attachment #331315 - Attachment is obsolete: true
Attachment #334663 - Flags: superreview?(doug.turner)
Attachment #334663 - Flags: review?(doug.turner)
Attachment #331315 - Flags: review?(doug.turner)
michal, could you take a first look at this patch?
Comment on attachment 334663 [details] [diff] [review]
patch v6

This patch reintroduces partially bug 427089. For example try to follow link "Ñûñîâ" on page ftp://ftp.asu.ru/incoming/. It works without this patch.

I think that we should implement LANG command too. At the end of 4.1 section in RFC 2640 is:
 "If user-PI issues a HOST command, and the server's default language is acceptable, it need not issue a LANG command. However, if the implementation does not support the HOST command, a LANG command MUST be issued. Until server-PI is presented with either a HOST or LANG command it SHOULD assume that the user-PI does not comply with this specification."
Michal, 

"ftp.asu.ru" doesn't seem to support LANG command.  So your idea isn't good.

This Russian server doesn't support FEAT and UTF-8, so URI by necko uses original charset (it will become old behavior.  If URI is non-UTF8 and platform is non-UTF8, it uses platform charset with standard-uri setting).

There is same issue with Japanese IIS FTP Server.  Until IIS 5.0 FTP server, although this can use Shift-JIS charset as directory and filename, it doesn't support FEAT and UTF-8.

Also, this patch cannot apply current tree because of bug 427089.  I will make 
new patch with considering bug 427089.
(In reply to comment #55)
> "ftp.asu.ru" doesn't seem to support LANG command.  So your idea isn't good.

There are 2 independent comments in my reply #54. I didn't say that LANG command would fix the problem with ftp.asu.ru. I'm just saying that this patch breaks what was fixed in bug 427089.

And it is good idea to implement LANG command just because RFC2640 demands it.
Following check doesn't detect UTF8 in reply from proftpd:

+        if (mResponseMsg.Find(NS_LITERAL_CSTRING("UTF8" CRLF), PR_TRUE) > -1) {

The most correct check would be (CRLF " UTF8" CRLF), but proftpd is buggy and sends LF only instead of CRLF. So we should use some more relaxed check like:

(mResponseMsg.Find(NS_LITERAL_CSTRING(CRLF " UTF8" CRLF), PR_TRUE) > -1 ||
 mResponseMsg.Find(NS_LITERAL_CSTRING(LFSTR " UTF8" LFSTR), PR_TRUE) > -1)


The reason why there are problems with server ftp.asu.ru after applying your patch is that mPath is converted to originCharset when response to FEAT is received. But this happens only for the first request on the connection. Other requests that reuses the connection don't send FEAT command...


And I've found another problem. When server says that it supports UTF-8 but some filenames aren't valid UTF-8 strings, then conversion fails at http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#297 and such files aren't in the listing at all.
Michal,

> And I've found another problem. When server says that it supports UTF-8 but
> some filenames aren't valid UTF-8 strings, then conversion fails at
> http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#297
> and such files aren't in the listing at all.

Why?  If server return UTF8 as FEAT result, server should return "UTF8 filename".
If server doesn't return it, I should not consider error.  Why does client consider it?

If server doesn't return UTF8 as FEAT, I can make sense it.
Server that supports FEAT UTF8 can return non-UTF8 filename and according to "RFC2640 Annex A.1 paragraph 4" I think that it happens quite often:
   - A server that copies pathnames transparently from a local
     filesystem may continue to do so. It is then up to the local file
     creators to use UTF-8 pathnames.

And according to "RFC2640 3.1 last paragraph" client SHOULD handle these cases:
   - There may be cases when the code set / encoding presented to the
     server or client cannot be determined. In such cases the raw bytes
     SHOULD be used.
Comment on attachment 334663 [details] [diff] [review]
patch v6

makoto, where we on this?  Can you resolve michal's issues with your patch?  Specifically, what are we going to do about invalid utf-8 strings being return by the server?
Attachment #334663 - Flags: superreview?(doug.turner)
Attachment #334663 - Flags: superreview-
Attachment #334663 - Flags: review?(doug.turner)
Attachment #334663 - Flags: review-
Attached patch patch v7 (obsolete) — Splinter Review
work in progress...
Attachment #334663 - Attachment is obsolete: true
Attachment #391260 - Attachment is obsolete: true
Attachment #568615 - Flags: review?(honzab.moz)
Attachment #568616 - Flags: review?(jduell.mcbugs)
Attachment #568617 - Flags: review?(jduell.mcbugs)
Comment on attachment 568615 [details] [diff] [review]
Part 1. client detects UTF-8 if FTP server returns UTF8 on FEAT

Redirecting to Michal, he knows this area much better then me.
Attachment #568615 - Flags: review?(honzab.moz) → review?(michal.novotny)
Attachment #568617 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Attachment #568616 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Makoto, I've tested your patch and before going any further with the review I'd like to get back to comment #54. With this patch applied I can't browse directories in ftp://ftp.asu.ru/incoming/

The situation is now different in that ftp.asu.ru already supports FEAT command and returns UTF8. Those problematic files are in cp1251 encoding which is IMO allowed (see comment #59) and probably quite common.
Attachment #568617 - Attachment is obsolete: true
Attachment #570960 - Flags: review?(michal.novotny)
Attachment #568617 - Flags: review?(michal.novotny)
Also, I will add part 4 patch to work CWD or MDTM by non-UTF8 path on UTF8 server.
work on non UTF-8 path on UTF-8 FTP server.
Attachment #568616 - Attachment is obsolete: true
Attachment #568616 - Flags: review?(michal.novotny)
Attachment #571265 - Flags: review?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #68)
> Makoto, I've tested your patch and before going any further with the review
> I'd like to get back to comment #54. With this patch applied I can't browse
> directories in ftp://ftp.asu.ru/incoming/
> 
> The situation is now different in that ftp.asu.ru already supports FEAT
> command and returns UTF8. Those problematic files are in cp1251 encoding
> which is IMO allowed (see comment #59) and probably quite common.

New patch series will work on this ftp server.
Attachment #568615 - Flags: review?(michal.novotny) → review+
Comment on attachment 571265 [details] [diff] [review]
Part 2. "301: <encoding>" works on nsIndexedToHTML correctly v2

> +nsresult
> +nsIndexedToHTML::OnHeaderAvailable(nsIRequest* request, nsISupports *aContext,
> +                                   nsString& aBuffer)

I think the name of the method is confusing. A name like WriteHeader would be IMO more appropriate. You could also move the check of mWroteHeader inside the method.


> +    bool mWroteHeader;

This member isn't initialized in the constructor.


> +    bool failbackCharset = encoding.Equals("ISO-8859-1");

s/failback/fallback/ ?


>      if (mExpectAbsLoc &&
> -        NS_SUCCEEDED(net_ExtractURLScheme(utf8UnEscapeSpec, nsnull, nsnull, nsnull))) {
> +        NS_SUCCEEDED(net_ExtractURLScheme(convertedUnEscapeSpec, nsnull, nsnull, nsnull))) {
>          // escape as absolute.
> -        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_Minimal;
> +        escFlags |= esc_Forced | esc_AlwaysCopy | esc_Minimal;
>      }

As far as I can see, this can be removed completely. mExpectAbsLoc was set to true only in case of gopher and support for this protocol was removed.
Attachment #571265 - Flags: review?(michal.novotny)
Comment on attachment 570960 [details] [diff] [review]
Part 3. FTP directory list sets the encoding when current FTP server supports UTF-8 v2

> +   channel->GetContentCharset(mCharset);
> +   if (mCharset.EqualsLiteral("UTF-8")) {
> +       // For RFC 2640 support, charset is into HTML, not channel.
> +       // So we need clear charset on channel.
> +       channel->SetContentCharset(NS_LITERAL_CSTRING(""));
> +   }

Could you please make the comment more verbose? Why it is needed to clear the charset and why only in case of UTF8?


> +   if (mCharset.EqualsLiteral("UTF-8") && !IsUTF8(mPendingBuffer)) {
> +       // RFC 2640 section 3.3 says the following.
> +       //
> +       // If a client detects that a server is non UTF-8,
> +       // it SHOULD change its display appropriately.
> +       //
> +       // So we need failback encoding.
> +       mCharset.AssignLiteral("ISO-8859-1");
> +   }

There is still a problem when the directory is empty and its name isn't a valid UTF8 string. In this case call to mTextToSubURI->UnEscapeAndConvert() in nsIndexedToHTML::OnHeaderAvailable() fails.
Attachment #570960 - Flags: review?(michal.novotny)
Blocks: 653342
No longer blocks: 653342
Blocks: 943284
- Rebased to tip.
- Although RFC 2640 does not define "OPTS UTF8 ON", Some ftp servers violates the RFC. ftp.asu.ru is one of such servers. I restored the OPTS support from the older patch. Sending "OPTS UTF8 ON" to compliant servers should be harmless.
Assignee: m_kato → VYV03354
Attachment #568615 - Attachment is obsolete: true
Attachment #8398921 - Flags: review?(michal.novotny)
- Bug 948901 made the ftp directry view byte-transparent, so this former part 2 & 3 could be greatly simplified.
Attachment #570960 - Attachment is obsolete: true
Attachment #571265 - Attachment is obsolete: true
Attachment #8398922 - Flags: review?(michal.novotny)
Escape (percent-encode) URLs in case some totally broken servers return non-UTF-8 URLs.
Attachment #8398922 - Attachment is obsolete: true
Attachment #8398922 - Flags: review?(michal.novotny)
Attachment #8398925 - Flags: review?(michal.novotny)
Fixed a build failure on gcc.

https://tbpl.mozilla.org/?tree=Try&rev=7a83b378b920
Attachment #8398921 - Attachment is obsolete: true
Attachment #8398921 - Flags: review?(michal.novotny)
Attachment #8398993 - Flags: review?(michal.novotny)
Comment on attachment 8398925 [details] [diff] [review]
Part 2: Read charset from the channel

This part is no longer needed with the follow-up patch of bug 989576.
Attachment #8398925 - Attachment is obsolete: true
Attachment #8398925 - Flags: review?(michal.novotny)
Depends on: 989576
michal, ping?
Comment on attachment 8398993 [details] [diff] [review]
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

No response from Michal.
Honza, could you take a look at?
Attachment #8398993 - Flags: review?(honzab.moz)
Comment on attachment 8398993 [details] [diff] [review]
Part1. FTP client detects UTF-8 if server returns UTF8 on FEAT

Review of attachment 8398993 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, r+ with fixed indentation

::: netwerk/protocol/ftp/nsFtpConnectionThread.cpp
@@ +657,5 @@
> +          case FTP_S_FEAT:
> +            rv = S_feat();
> +
> +            if (NS_FAILED(rv))
> +              mInternalError = rv;

wrong indentation

@@ +675,5 @@
> +          case FTP_S_OPTS:
> +            rv = S_opts();
> +
> +            if (NS_FAILED(rv))
> +              mInternalError = rv;

wrong indentation
Attachment #8398993 - Flags: review?(michal.novotny) → review+
Thank you. Landed with the indent fixed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/70fa92170fad
Attachment #8398993 - Flags: review?(honzab.moz)
(In reply to Masatoshi Kimura from comment #82)
> Comment on attachment 8398925 [details] [diff] [review]
> Part 2: Read charset from the channel
> 
> This part is no longer needed with the follow-up patch of bug 989576.

Although, would it be nice to emit a <meta charset="UTF-8"> tag if we know that the charset is UTF-8 so that we don't have to detect it later on (and report that annoying warning to the console)?
The charset information will be taken from the channel which is more preferable than the meta element.
Neither auto-detect nor console warnings are involved.
I tested with the tinderbox build and verified that no console warnings are displayed.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1398183266/
https://hg.mozilla.org/mozilla-central/rev/70fa92170fad
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: Future → mozilla31
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.