84242 - FTP URL parsing broken

Reporter

Description

•

24 years ago

Here are two URL's that Mozilla 0.9 fails to properly parse: ftp://server/%2fF%2finet%2fapache%2fwebpages%2frepository/beach_bit_foggy.JPG ftp://thanny:[password]@server/%2fh%2fdownload/hubble5_hst.jpg Here's what Mozilla should have ended up with, according to RFC1738 and RFC2396: RETR /F/inet/apache/webpages/repository/beach_bit_foggy.JPG RETR /h/download/hubble5_hst.jpg Here's what Mozilla actually did: RETR //F/inet/apache/webpages/repository/beach_bit_foggy.JPG RETR //h/download/hubble5_hst.jpg The key here is that the slash separating the host information from the path information is *not* supposed to be part of the path in FTP URLs. Mozilla ignores this rule, as does all previous versions of Netscape that I have at my disposal. I suspect this bug has gone unnoticed because most Unix servers don't blink at an extra directory slash at the front. It does, however, break other FTP servers, which quite properly report an error at the invalid path information provided. Nixing the initial encoded slash makes Mozilla work, but at the expense of providing an incorrect URL that an RFC-compliant client will not be able to parse properly. Imagine, if you will, a client which follows the FTP scheme defined in RFC1738 (or is there a superceding document? - I can't find one). Such a client would do a series of CWD's for each slash-delimited path component. That is, *before* %-encoded slashes are decoded. Here's what a strictly RFC-compliant FTP client would do given the following URLs: ----- ftp://server/%2fF%2finet%2fapache%2fwebpages%2frepository/beach_bit_foggy.JPG ... CWD /F/inet/apache/webpages/repository ... RETR beach_bit_foggy.JPG ----- ----- ftp://server/%2fF%2finet/apache/webpages/repository/beach_bit_foggy.JPG ... CWD /F/inet CWD apache CWD webpages CWD repository ... RETR beach_bit_foggy.JPG ----- If one "fixes" the URL to be compatible with Mozilla, by removing the (encoded) leading directory slash, the RFC-compliant client would (for the second URL immediately above) do the following: ... CWD F/inet CWD apache CWD webpages CWD repository ... RETR beach_bit_foggy.JPG Given a login directory other than the immediate parent of the tree "F/inet", the result will be failure. This isn't an encoding-triggered problem, either. Consider this: ftp://server/repository/beach_bit_foggy.JPG Assume that the login-directory is the immediate parent of "repository". The RFC-compliant client will do this: ... CWD repository ... RETR beach_bit_foggy.JPG Mozilla does this: ... RETR /repository/beach_bit_foggy.JPG ... This happens to work with most FTP servers (ones on Unix, and those emulating Unix FTP servers). But any server would be behaving perfectly properly by refusing the above request on the grounds that there is not such directory as "/repository". Strictly speaking, since path delimiters are not part of the FTP spec, Mozilla should not be attempting retrievals in this fashion at all. It should instead be performing CWD's on each slash-delimited path component (which may or may not path delimiters, encoded if necessary). An example of where Mozilla fails is with a non-Unix-emulating OS/2 FTP server. This is a valid URL, which allows an RFC-compliant FTP client to retrieve the file: ftp://thanny:[password]@localhost:2222/h%3a%5cdownload/hubble5_hst.jpg Mozilla ends up sending this: RETR /h:\download/hubble5_hst.jpg The OS/2 FTP server quite properly chokes on this completely invalid pathname. The RFC-compliant client, however, got to the file in this manner: ... CWD h:\download ... RETR hubble5_hst.jpg This what a client is supposed to do, according to the only remotely standardized information available on FTP URL schemes. It would be silly for Mozilla to remain broken. Obviously, the program should attempt a transfer based on the incorrect URL parsing, since countless HTML authors put bodged FTP URLs into their documents, if correct parsing fails to obtain a favorable result (there's definitely room for the problem of getting the incorrect file here, but it's unlikely).

Mike Kaply [:mkaply]

Comment 1

•

24 years ago

Can you please give real URLs for these rather than made up URLs?

Mike Ruskai

Reporter

Comment 2

•

24 years ago

Those *are* real URLs. They point to files on my internal LAN. I used an IP trace to figure out exactly what Mozilla was doing (which revealed the odd fact that it's chopping network packets up into absurdly small pieces, incidentally). If you're explicitly looking for a URL that points to a server on the Internet which won't be able to handle Mozilla's improper request, then I can't help you. I don't have a list of FTP servers which would let me find troublesome ones. But if you've got an OS/2 machine, just run the built-in FTP server (properly configured, of course), and create a URL for any file on that machine. The complete inability of Mozilla to retrieve a file from the built-in OS/2 FTP server that does not reside on the login drive will become apparent. If you're simply looking for a URL which will show Mozilla improperly parsing, virtually any will do - just put "%2F" after the host-path delimiter for a Unix server. That won't prevent a successful transfer, but it will show how Mozilla sends an improper request. But I've already explained in copious detail what Mozilla is doing, and in equally copious detail what it should be doing.

benc

Comment 3

•

24 years ago

-> New. I'm buying this b/c we need to get on the fasttrack if it's going to be fixed for rtm.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Doug Turner (:dougt)

Updated

•

24 years ago

Target Milestone: --- → mozilla0.9.3

Doug Turner (:dougt)

Updated

•

24 years ago

Target Milestone: mozilla0.9.3 → mozilla1.0

Doug Turner (:dougt)

Comment 4

•

24 years ago

what is milestone "mozilla1.0" anyway? Moving to future.

Target Milestone: mozilla1.0 → Future

Mike Kaply [:mkaply]

Comment 5

•

24 years ago

I'm taking this out of the OS/2 queue since nothing about it is OS/2 related

OS: OS/2 → All

Hardware: PC → All

Bradley Baetz (:bbaetz)

Comment 6

•

24 years ago

This is invalid, I think RFC1738 is updated by RFC2396, which states in its changes section (G.2): " RFC 1738 specified that the path was separated from the authority portion of a URI by a slash. RFC 1808 followed suit, but with a fudge of carrying around the separator as a "prefix" in order to describe the parsing algorithm. RFC 1630 never had this problem, since it considered the slash to be part of the path. In writing this specification, it was found to be impossible to accurately describe and retain the difference between the two URI <foo:/bar> and <foo:bar> without either considering the slash to be part of the path (as corresponds to actual practice) or creating a separate component just to hold that slash. We chose the former. " Thus the / is part of the path. (Also, the ; syntax has been obsoleted) Note the difference between ftp://server/, which goes to the root dir of the server, and ftp://server, which goes to the login directory. (There is an unconfirmed bug that this is currently broken in mozilla, though) Regarding sending a CWD separately, the problem with that is that, as RFC1738 section 3.2.5 says, is that the control connection would have to be recreated each time the user changes directories. That would suck. Does any client do that? dougt?

Mike Ruskai

Reporter

Comment 7

•

24 years ago

>This is invalid, I think RFC1738 is updated by RFC2396, which states in its >changes section (G.2): > >" RFC 1738 specified that the path was separated from the authority > portion of a URI by a slash. RFC 1808 followed suit, but with a > fudge of carrying around the separator as a "prefix" in order to > describe the parsing algorithm. RFC 1630 never had this problem, > since it considered the slash to be part of the path. In writing > this specification, it was found to be impossible to accurately > describe and retain the difference between the two URI > <foo:/bar> and <foo:bar> > without either considering the slash to be part of the path (as > corresponds to actual practice) or creating a separate component just > to hold that slash. We chose the former. >" > >Thus the / is part of the path. > >(Also, the ; syntax has been obsoleted) I just read through RFC2396 briefly, and it's quite a mess, at least with regards to the path component. There's no way anyone could possibly write a parser based on that document. It doesn't once say how the host component is to be separated from the path component, so one has to return to RFC1738. Outside of the above, it does not address how the host-path delimiter is to be treated. Finally, the above is, quite frankly, gibberish. The tokens <foo:/bar> and <foo:bar> are meaningless. Unless... >Note the difference between ftp://server/, which goes to the root dir of the >server, and ftp://server, which goes to the login directory. (There is an >unconfirmed bug that this is currently broken in mozilla, though) If those tokens are supposed to represent your example here, then their comments become the part that's gibberish. It is entirely possible to determine the difference between the two URLs above, if one simply follows the description laid out by RFC1738. The simple fact is, that by any sensible standard, "ftp://server" and "ftp://server/" do in fact point to the exact same resource. The alternative (with the latter indicating the root directory) makes URLs completely functionless for retrieving files relative to the login directory. For any server which regularly changes the underlying file system structure of the login directories, a sensibly parsed (i.e. according to RFC1738) URL would make the changes transparent (provided, of course, they were relative to the login directory). The approach that includes the delimiter in the path would make URLs invalid whenever a change was made. It would makes it impossible to retrieve resources located on machines which do not use the forward slash character as a path delimiter. In other words, what RFC2396 suggest is simply daft. The fact is that doing it the RFC1738 way will work on just about every server already in operation. At the very least, after attempting the full-path-including-delimiter approach, and failing, Mozilla should parse the URL sensibly. To do otherwise is to go the road of "it works on most systems", when a relatively simple change would make it work on all. >Regarding sending a CWD separately, the problem with that is that, as RFC1738 >section 3.2.5 says, is that the control connection would have to be recreated >each time the user changes directories. That would suck. Does any client do >that? No. That section addresses a change of URL where the server and login information remain the same, not a change of directory. Because all path information (when treated sensibly) is relative to the login directory, a completely new URL will not necessarily work from the current directory. The somewhat simple (and perfectly reliable) solution to that problem is to perform a PWD upon login, and after each directory change. Store and tie the results to the displayed page, and change back to the login directory before processing any directory changes from a new URL. Note that this wouldn't apply at all to downloading files in the current working directory. Mozilla knows that these files are in the current directory, and treating their retrieval as the processing of a new URL is nonsensical.

Doug Turner (:dougt)

Comment 8

•

24 years ago

reassigning to bbaetz@cs.mcgill.ca.

Assignee: dougt → bbaetz

Bradley Baetz (:bbaetz)

Comment 9

•

24 years ago

Yes, RFC2396 has lots of problems. You cannot use RFC1738 to work out what to do, since RFC2396 explicitly says that its changing the behaviour. "Store and tie the results to the displayed page". The ftp viewer doesn't have state associated with it, and really can't. Does any web browser do what you expect? cc andreas for comment

Mike Ruskai

Reporter

Comment 10

•

24 years ago

>Yes, RFC2396 has lots of problems. >You cannot use RFC1738 to work out what to do, since RFC2396 explicitly >says that its changing the behaviour. It says that it's updating RFC1738, but it does not in fact define any changed behavior for FTP URL's. A few indecipherable sentences hardly qualify as a standing change. >"Store and tie the results to the displayed page". The ftp viewer doesn't >have state associated with it, and really can't. It's really not necessary, either. Any clickable links are going to be created by Mozilla, so the possibility of clicking on a new URL for that session which has the same host and user information, but not an absolute path, won't really be an issue. Care just needs to be taken that any non-absolute paths are handled by a new control connection, and that all Mozilla-generated URL's in the FTP viewer are absolute. I'm sure there will be snags here and there, but overall, it's workable. >Does any web browser do what you expect? The only ones I've tested are Mozilla (in NS 2.x, 4.x, and the betas) and IBM WebExplorer. I try to avoid IE whenever possible. WebEx doesn't do CWD's with each path component, but it does properly exclude the host/path separator slash. This makes it possible to use relative Unix paths, but since it doesn't do a CWD for each path component, it's still not ideal.

Bradley Baetz (:bbaetz)

Comment 11

•

24 years ago

> It says that it's updating RFC1738, but it does not in fact define any > changed behavior for FTP URL's. It defines changed behaviour for all urls, and ftp is included in that set. > A few indecipherable sentences hardly > qualify as a standing change. Unfortunately it does in this case. Yes, the URL RFCs are ambiguous in lots of places - see the bugs + discussions on what <a href="?foo"> should do. (summary - RFC2396 has an error in some bits) dougt? andreas? comments? Ignoring the leading / will allow us to support the home directory stuff a little easier. However, it will break almost every ftp link in existance. I'm really against this unless you can point me to an ftp server + link to it where we fail but other web browsers succeed.

patch changes using path in ftp urls 24 years ago Andreas Otte 1.82 KB, patch		Details \| Diff \| Splinter Review
current version of patch with problems CWD ing on files 24 years ago Andreas Otte 1.91 KB, patch		Details \| Diff \| Splinter Review
patch doing the thing described above 24 years ago Andreas Otte 1.74 KB, patch		Details \| Diff \| Splinter Review
another patch this time using pwd 24 years ago Andreas Otte 7.30 KB, patch		Details \| Diff \| Splinter Review
path that only removes the first slash 24 years ago Andreas Otte 7.30 KB, patch	bbaetz : review+	Details \| Diff \| Splinter Review
patch incorporating Bradleys comments 24 years ago Andreas Otte 7.30 KB, patch	bbaetz : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review