Closed Bug 95590 Opened 23 years ago Closed 22 years ago

FTP: SYST limitations (problems w/ ftp.microsoft.com)

Categories

(Core Graveyard :: Networking: FTP, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.2alpha

People

(Reporter: tenthumbs, Assigned: dougt)

References

Details

Attachments

(2 files, 1 obsolete file)

Mozilla should rely very little, if at all, on the SYST command. SYST 
might be useful but mozilla appears, right now, to use it only to decide 
how to parse listings and SYST is no good for that.

Consider ftp.microsoft.com. It identifies itself as a Windows NT server
so mozilla would assume a dos-style listing. However, the server
actually returns a unix-style listing. Now you can change this server's
list style by issuing a SITE DIRSTYLE command but you have to know that
exists and you have to deal with that command's possible failure. SYST 
isn't very useful here.

One approach is consider a whole bunch of special cases after a SYST. 
This is what lynx does but it's quite complicated. Another approach is 
to ignore SYST entirely. The squid proxy does this.

I rather like the squid way. Analyzing the list results directly seems 
far more robust.

Just an idea. This obviously won't happen any time soon.
Severity: enhancement → normal
Target Milestone: --- → mozilla1.0
reassigning to bbaetz@cs.mcgill.ca.
Assignee: dougt → bbaetz
help/code wanted.
Keywords: helpwanted
Target Milestone: mozilla1.0 → Future
it's a public source proxy, so there is some chance 
we could copy code or reverse-engineer the 
behavior...

http://www.squid-cache.org/
Summary: Reduce reliance on SYST command → FTP: SYST limitations (problems w/ ftp.microsoft.com)
Here are some random thoughts on the problem. They're not complete by any means.
Attached file ideas??
I agree that SYST should not be used for determining the directory listing 
style. If the _default_ style is anything but unix, that server can be 
considered broken. Jon Postel certainly didn't have directory-parsers or 
GUIs in mind when he wrote the RFC in 1980, and FWIW, IE can't parse 
anything but a unix style listing either.

rfc959 says:
> SYSTEM (SYST)
>
> This command is used to find out the type of operating
>           system at the server.  The reply shall have as its first
>            word one of the system names listed in the current version
>            of the Assigned Numbers document [4].

Thats *IT*. SYST was certainly not conceived to be a guide to parsing the 
listing format. 

Incidentally, the 'Official System Names' in the "Assigned Numbers document" 
(RFC 943) lists ASP, AUGUST, BKY, CCP, DOS/360, ELF, EPOS, EXEC-8, GCOS, GPOS,
ITS, INTERCOM, INTERLISP, KRONOS, MCP, MOS, MPX-RT, MULTICS, MVT, NOS, NOS/BE,
OS/MVS, OS/MVT, RIG, RSX-11M, RT11, SCOPE, SIGNAL, SINTRAN, TAC, TENEX, TOPS-10,
TOPS-20, TSS, UNIX, VM/370, VM/CMS, VMS, WAITS, XDE.
All the OSs other than UNIX and VMS are not running on anything more than a
handful of hobby machines (if at all). Even VMS is very rare anymore, and in 
any case, the FTP server lists in unix style.



tenthumbs: With respect to your attachment:

> MSDOS style listings are simpler. Again, ftp.microsoft.com
>
> 02-13-01  08:02PM       <DIR> bussys
>
> Again, splitting into tokens, the key is the first token. The pattern is
> obvious. If the first token is valid then look at the next token. If it's 
> a valid time format then you've got a MSDOS listing.

Determining if the first token is a valid _date_format_ is virtually impossible
since the format is dependant on the server's locale setting. For Japanese NT,
the date will appear as "01-02-13", for a UK version it will appear as
"13/02/01", for German as "13.02.01" and so on. As for _time_format_: Only the
US and the UK (and ex-colonies) use AM/PM styles. The rest of the world uses 24
hour  time.

Whether the first token is a _date_ (as opposed to having a valid _date_format_) 
can be determined by testing if the length of the token is 8 or 10, and all
characters but the third and fifth are digits. One does not test if the third 
and fifth characters at all - they may still be digits if the date format is ISO.
As of Mozilla 1.0, how dependent are we on SYST? 
Very, although its more the dir parsing than SYST
possible solution for FTP_NT_TYPE servers returning LISTs in the "wrong" style:

in nsFtpState::R_syst():

 :
 :
 if ( ( mResponseMsg.Find("WIN32", PR_TRUE) > -1) ||
      ( mResponseMsg.Find("windows", PR_TRUE) > -1) )
 {
   mServerType = FTP_NT_TYPE;
   return FTP_S_DIRSTYLE; /* make sure we don't get DOS-style listings */
   //see http://bugzilla.mozilla.org/show_bug.cgi?id=95590
 }   
 else if ( mResponseMsg.Find("NetWare system type", PR_TRUE) > -1))
 {
   //see http://bugzilla.mozilla.org/show_bug.cgi?id=147961
   mServerType = FTP_UNIX_TYPE; /* not really, but faithfully emulated */
 }   
 :
 :

in nsFtpState::Process():

 :
 case FTP_S_DIRSTYLE:
 case FTP_R_DIRSTYLE_1:
 case FTP_R_DIRSTYLE_2:
     mState = SR_dirstyle(); /* dirstyle mini-state-machine */
     if (FTP_ERROR == mState)
       mInternalError = NS_ERROR_FTP_LOGIN;
     break;
 :

 FTP_STATE nsFtpState::SR_dirstyle() /* note: returns FTP_STATE not nsresult */
 {                                   /* even for S_dirstyle because S_dirstyle*/
                                     /* is an extension of R_syst */
   if (mServerType == FTP_NT_TYPE)
   {
     if (mState == FTP_S_DIRSTYLE)
     {
       nsCString cmd("SITE DIRSTYLE" CRLF);
       if (NS_FAILED(SendFTPCommand(cmd))
         return FTP_ERROR;
       return FTP_R_DIRSTYLE_1;
     } 
     else if (mResponseCode/100 != 2)
     {
       //DOS style listings not supports. Ergo, must be /bin/ls style.
       mServerType = FTP_UNIX_TYPE; /* no need for FTP_NT_TYPE anymore */
     }
     else if ((mResponseMsg.Find("MSDOS", PR_TRUE) > -1) &&
              (mResponseMsg.Find(" off", PR_TRUE) > -1))
     {
       //DOS style listings are OFF. Ergo, now in /bin/ls style.
       mServerType = FTP_UNIX_TYPE; /* no need for FTP_NT_TYPE anymore */
     }
     else if ((mResponseMsg.Find("MSDOS", PR_TRUE) > -1) &&
              (mResponseMsg.Find(" on", PR_TRUE) > -1))
     {
       //DOS style listings are ON. Got toggled the wrong way
       if (mState == FTP_R_DIRSTYLE_1)
       {
         nsCString cmd("SITE DIRSTYLE" CRLF);
         if (NS_FAILED(SendFTPCommand(cmd))
           return FTP_ERROR;
         return FTP_R_DIRSTYLE_2;
       }
       else
       {
         //failed to toggle. Moz can't parse DOS-style listings so bail.
         // (Of course, the parsing code should be more robust...)
         NS_ASSERTION(0, "FTP_NT_TYPE has stuck DOS-style.");

         //this next bit is from R_syst().
         nsresult rv;
         nsCOMPtr<nsIStringBundleService> bundleService =
              do_GetService(NS_STRINGBUNDLE_CONTRACTID, &rv);
         if (NS_FAILED(rv)) return FTP_ERROR;

         nsCOMPtr<nsIStringBundle> bundle;
         rv = bundleService->CreateBundle(NECKO_MSGS_URL,
                                         getter_AddRefs(bundle));
         if (NS_FAILED(rv)) return FTP_ERROR;
            
         nsXPIDLString formatedString;
         PRUnichar* ucs2Response = ToNewUnicode(mResponseMsg);
         const PRUnichar *formatStrings[1] = { ucs2Response };
         rv = bundle->FormatStringFromName(  
                     NS_LITERAL_STRING("UnsupportedFTPServer").get(),
                                           formatStrings, 1,
                                     getter_Copies(formatedString));
         nsMemory::Free(ucs2Response);
         if (NS_FAILED(rv)) return FTP_ERROR;

         NS_ASSERTION(mPrompter, "no prompter!");
         if (mPrompter)
           (void) mPrompter->Alert(nsnull, formatedString.get());
          
         // since we just alerted the user, clear mResponseMsg,
         // which is displayed to the user.
         mResponseMsg = "";
         return FTP_ERROR;
       }
     }
   }
   return FTP_S_PWD; /* default retval from R_syst() */
 }
     
I'm still of the opinion that Mozilla should not use SYST (for LIST format
determination) at all. Netscape didn't. IE doesn't. Squid doesn't, see
ftpListParseParts() in squid-2.4.STABLE-1/src/ftp.c . D. J. Bernstein (author of
EPLF) also has something to go on at http://cr.yp.to/ftpparse/ftpparse.c 
I've written a routine that parses every known FTP LIST style 
without having to know the SYST. 

The "known" LIST styles is based on what is known to 
squid, lynx, wget, and ftpmirror, and includes:

- VMS (MultiNet, UCX, and CMU) LIST format (including multi-line format)
- IBM VM/CMS, VM/ESA LIST format (two known variants)
- Windows NT's default "DOS-dirstyle"
- OS/2 basic server format LIST format
- SuperTCP FTP Server
- NetManage Chameleon (NEWT)
- EPLF (Easily Parsable List Format)
- '/bin/dls' (two known variants, plus multi-line) LIST format
- '/bin/ls -l' and all variants (even if they are not SYST UNIX)
  including 
   - Hellsoft FTP for NetWare (non-unix perm-bits)
   - Hethmon Brothers FTP for OS/2 (all '-' perm bits)
   - NetPresenz (SYST is "MACOS")
   - "NETWARE" (Hellsoft-style perms, no linkcount, no UID/GID)
   - OpenBSD FTPD (numeric UID/GID)
   - Open Group's FTP servers (no GID)
   - Novonyx [Netscape/Novell] (fields not in columns)
   - wuFTPd and other BSD-based ftpd that exec "ls -l"
   - Windows NT server (internal "ls -l" compatible)
   - Netmanage ProFTPD for Win32 (internal "ls -l" compatible)
   - SurgeFTPd for Win32 (internal "ls -l" compatible)
   - WarFTPd for Win32 (internal "ls -l" compatible)
   - WebStarFTP for MacOS (internal "ls -l" compatible)
   - MurkWorks FTP for NetWare (internal "ls -l" compatible)
   - NcFTPd for Unix (internal "ls -l" compatible).
If there are others, then I'd like to hear about them (send me a sample).

The parser is a complete rewrite (doesn't use anyone else's code),
is thread-safe, does not allocate/free memory, uses only const 
pointers, uses only trivial string.h/ctype.h lib functions, does 
not assume that time_t is 32bit or that a filesize is <= 4GB (or 
even that it is <= 2^64), can deal with servers with Y2K bugs, 
and never mis-parses because it makes NO assumptions about input 
and is (excrutiatingly :) pedantic in its pattern matching. 

It can be downloaded at:
http://fb14.uni-mainz.de/~cyp/moz/ftpbugs/parseLIST.zip

ParseLIST() has been tested against several dozen test-cases, 
twenty-four of which are included in the .zip. 

The .zip file also includes a Win32 'test.exe' for those who 
don't wish to compile but would like beat the parser with their 
own listings.
Bradley - can you take a look at this and see how hard it would be to intergrate
into mozilla.  If not, please reassign it to me.
You might also look at <http://cr.yp.to/ftpparse.html>.
I have actually used it in fetchzilla (now unmaintained).  
Cyrus - thanks for this! I'm going on holiday next week, so I ron't have time to
look at this until I get back. A quick glace doesn't show any problems, but it
will have to be mozilla-ized to use the PR* types for time and so on. dougt -
can you handle that?

I looked at DJB's stuff, but the licnese isn't compatable with mozilla.

Cyrus - I presume that you're happy for this to be used under mozilla's trilicense?
Keywords: helpwanted
Target Milestone: Future → mozilla1.2alpha
I have no time to work on mozilla at the moment, so dougt is taking over FTP

open ftp bugs -> him
Assignee: bbaetz → dougt
I have a patch which intergrates Patel's parser.  I still need to clean up the
parser a bit.  patch coming up soon.
I failed to say that I have *not* completely review the parsing code and need to
sit down and go through it.
> Cyrus - I presume that you're happy for this to be used 
> under mozilla's trilicense?

Sure. I wrote it with Mozilla in mind.
Blocks: 61235
Blocks: 146166
Blocks: 147961
Intergrates Cyrus Patel parser.  
Cleans up the stream converters a bunch.

I have run the parser against all provided test cases under purify without any
problems.
Comment on attachment 96913 [details] [diff] [review]
Intergrates Cyrus Patel parser v.1

>Index: protocol/ftp/src/nsFtpConnectionThread.cpp

>+#ifdef DEBUG_dougt            
>+            mServerType = FTP_UNIX_TYPE;
>+#else
>             return FTP_ERROR;
>+#endif

what's this?

>+            nsAutoString fromStr(NS_LITERAL_STRING("text/ftp-dir"));
>             contentType = NS_LITERAL_CSTRING("text/ftp-dir-");
>         }

looks like fromStr is unused.  should be eliminated.

>+    nsAutoString fromStr(NS_LITERAL_STRING("text/ftp-dir"));

can this be replaced with

      NS_NAMED_LITERAL_STRING(fromStr, "text/ftp-dir");

>Index: streamconv/converters/nsFTPDirListingConv.cpp

if this is the only file accessing the new FTP dir listing parser,
then shouldn't the new code be added to this directory?

i also think that ParseLIST should be given a less generic name
since it is an exported symbol.  how about ParseFTPList or FTP_ParseLIST??

>         aString.Append("201: ");
>+        aString.Append(nsDependentCString(result.fe_fname, result.fe_fnlen));
>         aString.Append(' ');

the above 3 appends should be combined into one call to Append, e.g.:

  aString.Append(NS_LITERAL_CSTRING("201: ")
	       + nsDependentCString(result.fe_fname, result.fe_fnlen)
	       + NS_LITERAL_CSTRING(" "));

>+            for (int i = 0; i < sizeof(result.fe_size); i++)
>+            {
>+                if (result.fe_size[i] != '\0')
>+                    aString.Append((const char*)&result.fe_size[i], 1);

it looks to me as if fe_size is supposed to always be null terminated.
it is initialized to all zeros, and whenever set it is immediately null
terminated. so, you should be able to replace this loop with:

   aString.Append(result.fe_size);

you should also combine this Append with the next one:

>+            aString.Append(' ');
>         }

>         aString.Append(escapedDate);
>         nsMemory::Free(escapedDate);
>         aString.Append(' ');

again, i'd combine the Append(' ') with the preceding Append.

>         aString.Append(' ');
>         aString.Append(char(nsCRT::LF)); // complete this line

and again here.
Attachment #96913 - Flags: needs-work+
I have a bug to clean up the string usages in ftp (131361 and others).

>it looks to me as if fe_size is supposed to always be null terminated. it is
initialized to all zeros, and whenever set it is immediately null terminated.
so, you should be able to replace this loop with:

that didn't work for me.  I think that it didn't like the type of fe_size... I
will try again...

Renamed the parser api to ParseFTPLIST() to match the filename.  I am rebuilding
the world right now.  New patch comining up shortly.



Attachment #96913 - Attachment is obsolete: true
Comment on attachment 97090 [details] [diff] [review]
includes darin's suggesting v.2

sr=darin
Attachment #97090 - Flags: superreview+
Comment on attachment 97090 [details] [diff] [review]
includes darin's suggesting v.2

r=dougt
Attachment #97090 - Flags: review+
I marked the above patch review by me since the bulk of the code was written by
patel.
Comment on attachment 97090 [details] [diff] [review]
includes darin's suggesting v.2

I took a _very_ quick glance through this, and didn't see anything obviously
wrong with it. I didn't get to apply it/test it, though.
Fixed on trunk.  Thank you for the great contribution Cyrus.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This caused a regression (bug 165585). Non ASCII file names are not shown in the
list, and cannot download files.
have the patch tested with different locales? different locales setting in the
ftp or different locale setting in the client. 
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: