Closed Bug 84242 Opened 19 years ago Closed 18 years ago

FTP URL parsing broken

Categories

(Core :: Networking: FTP, defect, major)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: thanny, Assigned: andreas.otte)

References

Details

(Keywords: testcase)

Attachments

(1 file, 5 obsolete files)

Here are two URL's that Mozilla 0.9 fails to properly parse:

ftp://server/%2fF%2finet%2fapache%2fwebpages%2frepository/beach_bit_foggy.JPG
ftp://thanny:[password]@server/%2fh%2fdownload/hubble5_hst.jpg

Here's what Mozilla should have ended up with, according to RFC1738 and RFC2396:

RETR /F/inet/apache/webpages/repository/beach_bit_foggy.JPG
RETR /h/download/hubble5_hst.jpg

Here's what Mozilla actually did:

RETR //F/inet/apache/webpages/repository/beach_bit_foggy.JPG
RETR //h/download/hubble5_hst.jpg

The key here is that the slash separating the host information from the path
information is *not* supposed to be part of the path in FTP URLs.  Mozilla
ignores this rule, as does all previous versions of Netscape that I have at my
disposal.

I suspect this bug has gone unnoticed because most Unix servers don't blink at
an extra directory slash at the front.

It does, however, break other FTP servers, which quite properly report an error
at the invalid path information provided.  Nixing the initial encoded slash
makes Mozilla work, but at the expense of providing an incorrect URL that an
RFC-compliant client will not be able to parse properly.

Imagine, if you will, a client which follows the FTP scheme defined in RFC1738
(or is there a superceding document? - I can't find one).  Such a client would
do a series of CWD's for each slash-delimited path component.  That is, *before*
%-encoded slashes are decoded.  Here's what a strictly RFC-compliant FTP client
would do given the following URLs:

-----
ftp://server/%2fF%2finet%2fapache%2fwebpages%2frepository/beach_bit_foggy.JPG


...
CWD /F/inet/apache/webpages/repository
...
RETR beach_bit_foggy.JPG
-----

-----
ftp://server/%2fF%2finet/apache/webpages/repository/beach_bit_foggy.JPG


...
CWD /F/inet
CWD apache
CWD webpages
CWD repository
...
RETR beach_bit_foggy.JPG
-----

If one "fixes" the URL to be compatible with Mozilla, by removing the (encoded)
leading directory slash, the RFC-compliant client would (for the second URL
immediately above) do the following:

...

CWD F/inet

CWD apache

CWD webpages

CWD repository

...

RETR beach_bit_foggy.JPG


Given a login directory other than the immediate parent of the tree "F/inet",
the result will be failure.

This isn't an encoding-triggered problem, either.  Consider this:

ftp://server/repository/beach_bit_foggy.JPG 

Assume that the login-directory is the immediate parent of "repository".

The RFC-compliant client will do this:

...
CWD repository
...
RETR beach_bit_foggy.JPG


Mozilla does this:

...
RETR /repository/beach_bit_foggy.JPG
...

This happens to work with most FTP servers (ones on Unix, and those emulating
Unix FTP servers).  But any server would be behaving perfectly properly by
refusing the above request on the grounds that there is not such directory as
"/repository".

Strictly speaking, since path delimiters are not part of the FTP spec, Mozilla
should not be attempting retrievals in this fashion at all.  It should instead
be performing CWD's on each slash-delimited path component (which may or may not
path delimiters, encoded if necessary).

An example of where Mozilla fails is with a non-Unix-emulating OS/2 FTP server. 

This is a valid URL, which allows an RFC-compliant FTP client to retrieve the file:

ftp://thanny:[password]@localhost:2222/h%3a%5cdownload/hubble5_hst.jpg

Mozilla ends up sending this:

RETR /h:\download/hubble5_hst.jpg

The OS/2 FTP server quite properly chokes on this completely invalid pathname.  

The RFC-compliant client, however, got to the file in this manner:

...
CWD h:\download
...
RETR hubble5_hst.jpg


This what a client is supposed to do, according to the only remotely
standardized information available on FTP URL schemes.

It would be silly for Mozilla to remain broken.  Obviously, the program should
attempt a transfer based on the incorrect URL parsing, since countless HTML
authors put bodged FTP URLs into their documents, if correct parsing fails to
obtain a favorable result (there's definitely room for the problem of getting
the incorrect file here, but it's unlikely).
Can you please give real URLs for these rather than made up URLs?
Those *are* real URLs.  They point to files on my internal LAN.  I used an IP 
trace to figure out exactly what Mozilla was doing (which revealed the odd fact 
that it's chopping network packets up into absurdly small pieces, 
incidentally).

If you're explicitly looking for a URL that points to a server on the Internet 
which won't be able to handle Mozilla's improper request, then I can't help 
you.  I don't have a list of FTP servers which would let me find troublesome 
ones.  But if you've got an OS/2 machine, just run the built-in FTP server 
(properly configured, of course), and create a URL for any file on that 
machine.  The complete inability of Mozilla to retrieve a file from the 
built-in OS/2 FTP server that does not reside on the login drive will become 
apparent.

If you're simply looking for a URL which will show Mozilla improperly parsing, 
virtually any will do - just put "%2F" after the host-path delimiter for a Unix 
server.  That won't prevent a successful transfer, but it will show how Mozilla 
sends an improper request.

But I've already explained in copious detail what Mozilla is doing, and in 
equally copious detail what it should be doing.

-> New.
I'm buying this b/c we need to get on the fasttrack if it's going to be fixed
for rtm.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → mozilla0.9.3
Target Milestone: mozilla0.9.3 → mozilla1.0
what is milestone "mozilla1.0" anyway?  Moving to future.
Target Milestone: mozilla1.0 → Future
I'm taking this out of the OS/2 queue since nothing about it is OS/2 
related
OS: OS/2 → All
Hardware: PC → All
This is invalid, I think RFC1738 is updated by RFC2396, which states in its 
changes section (G.2):

" RFC 1738 specified that the path was separated from the authority
   portion of a URI by a slash.  RFC 1808 followed suit, but with a
   fudge of carrying around the separator as a "prefix" in order to
   describe the parsing algorithm.  RFC 1630 never had this problem,
   since it considered the slash to be part of the path.  In writing
   this specification, it was found to be impossible to accurately
   describe and retain the difference between the two URI
      <foo:/bar>   and   <foo:bar>
   without either considering the slash to be part of the path (as
   corresponds to actual practice) or creating a separate component just
   to hold that slash.  We chose the former.
"

Thus the / is part of the path.

(Also, the ; syntax has been obsoleted)

Note the difference between ftp://server/, which goes to the root dir of the 
server, and ftp://server, which goes to the login directory. (There is an 
unconfirmed bug that this is currently broken in mozilla, though)

Regarding sending a CWD separately, the problem with that is that, as RFC1738 
section 3.2.5 says, is that the control connection would have to be recreated 
each time the user changes directories. That would suck. Does any client do 
that?

dougt?
>This is invalid, I think RFC1738 is updated by RFC2396, which states in its 
>changes section (G.2):
>
>" RFC 1738 specified that the path was separated from the authority
>   portion of a URI by a slash.  RFC 1808 followed suit, but with a
>   fudge of carrying around the separator as a "prefix" in order to
>   describe the parsing algorithm.  RFC 1630 never had this problem,
>   since it considered the slash to be part of the path.  In writing
>   this specification, it was found to be impossible to accurately
>   describe and retain the difference between the two URI
>      <foo:/bar>   and   <foo:bar>
>   without either considering the slash to be part of the path (as
>   corresponds to actual practice) or creating a separate component just
>   to hold that slash.  We chose the former.
>"
>
>Thus the / is part of the path.
>
>(Also, the ; syntax has been obsoleted)

I just read through RFC2396 briefly, and it's quite a mess, at least with 
regards to the path component.  There's no way anyone could possibly write 
a parser based on that document.  It doesn't once say how the host 
component is to be separated from the path component, so one has to return 
to RFC1738.  Outside of the above, it does not address how the host-path 
delimiter is to be treated.

Finally, the above is, quite frankly, gibberish.  The tokens <foo:/bar> and 
<foo:bar> are meaningless.  Unless...

>Note the difference between ftp://server/, which goes to the root dir of the 
>server, and ftp://server, which goes to the login directory. (There is an 
>unconfirmed bug that this is currently broken in mozilla, though)

If those tokens are supposed to represent your example here, then their 
comments become the part that's gibberish.  It is entirely possible to 
determine the difference between the two URLs above, if one simply follows 
the description laid out by RFC1738.  The simple fact is, that by any 
sensible standard, "ftp://server" and "ftp://server/" do in fact point to 
the exact same resource.  The alternative (with the latter indicating the 
root directory) makes URLs completely functionless for retrieving files 
relative to the login directory.  For any server which regularly changes 
the underlying file system structure of the login directories, a sensibly 
parsed (i.e. according to RFC1738) URL would make the changes transparent 
(provided, of course, they were relative to the login directory).  The 
approach that includes the delimiter in the path would make URLs invalid 
whenever a change was made.  It would makes it impossible to retrieve 
resources located on machines which do not use the forward slash character 
as a path delimiter.

In other words, what RFC2396 suggest is simply daft.  The fact is that 
doing it the RFC1738 way will work on just about every server already in 
operation.  

At the very least, after attempting the full-path-including-delimiter 
approach, and failing, Mozilla should parse the URL sensibly.  To do 
otherwise is to go the road of "it works on most systems", when a 
relatively simple change would make it work on all.

>Regarding sending a CWD separately, the problem with that is that, as RFC1738 
>section 3.2.5 says, is that the control connection would have to be recreated 
>each time the user changes directories. That would suck. Does any client do 
>that?

No.  That section addresses a change of URL where the server and login 
information remain the same, not a change of directory.  Because all path 
information (when treated sensibly) is relative to the login directory, a 
completely new URL will not necessarily work from the current directory.

The somewhat simple (and perfectly reliable) solution to that problem is to 
perform a PWD upon login, and after each directory change.  Store and tie 
the results to the displayed page, and change back to the login directory 
before processing any directory changes from a new URL.

Note that this wouldn't apply at all to downloading files in the current 
working directory.  Mozilla knows that these files are in the current 
directory, and treating their retrieval as the processing of a new URL is 
nonsensical.  
reassigning to bbaetz@cs.mcgill.ca.
Assignee: dougt → bbaetz
Yes, RFC2396 has lots of problems.

You cannot use RFC1738 to work out what to do, since RFC2396 explicitly says
that its changing the behaviour.

"Store and tie the results to the displayed page". The ftp viewer doesn't have
state associated with it, and really can't.

Does any web browser do what you expect?

cc andreas for comment
>Yes, RFC2396 has lots of problems.

>You cannot use RFC1738 to work out what to do, since RFC2396 explicitly 
>says that its changing the behaviour.

It says that it's updating RFC1738, but it does not in fact define any 
changed behavior for FTP URL's.  A few indecipherable sentences hardly 
qualify as a standing change.

>"Store and tie the results to the displayed page". The ftp viewer doesn't 
>have state associated with it, and really can't.

It's really not necessary, either.  Any clickable links are going to be 
created by Mozilla, so the possibility of clicking on a new URL for that 
session which has the same host and user information, but not an absolute 
path, won't really be an issue.  Care just needs to be taken that any 
non-absolute paths are handled by a new control connection, and that all 
Mozilla-generated URL's in the FTP viewer are absolute.  I'm sure there 
will be snags here and there, but overall, it's workable.

>Does any web browser do what you expect?

The only ones I've tested are Mozilla (in NS 2.x, 4.x, and the betas) and 
IBM WebExplorer.  I try to avoid IE whenever possible.  WebEx doesn't do 
CWD's with each path component, but it does properly exclude the host/path 
separator slash.  This makes it possible to use relative Unix paths, but 
since it doesn't do a CWD for each path component, it's still not ideal.
> It says that it's updating RFC1738, but it does not in fact define any 
> changed behavior for FTP URL's.

It defines changed behaviour for all urls, and ftp is included in that set.

> A few indecipherable sentences hardly 
> qualify as a standing change.

Unfortunately it does in this case. Yes, the URL RFCs are ambiguous in lots of
places - see the bugs + discussions on what <a href="?foo"> should do. (summary
- RFC2396 has an error in some bits)

dougt? andreas? comments?

Ignoring the leading / will allow us to support the home directory stuff a
little easier. However, it will break almost every ftp link in existance.
 I'm really against this unless you can point me to an ftp server + link to it
where we fail but other web browsers succeed.
This has nothing to do with the core url parsing, this is ftp specific.

If I remember correctly there is a whole set of *hardcoded* identifiers
somewhere inside necko that deals with different ftp servers and their different
behaviour when it comes to handle directorys. Judson might know something about it.
andreas: this is to do with a link in an <a> tag, not the direcotry parsing stuff.
Hmmm ... I can't find it at the moment but I have the distinct memory about some
code that dealt with creating the right commands from ftp urls based on the ftp
server that was used. Maybe this code needs to be adapted.

One other thing I noticed: The url
ftp://thanny:[password]@localhost:2222/h%3a%5cdownload/hubble5_hst.jpg maybe
valid  (because of it's encoding) but don't expect it to work because
interpreting %5c as a path identifier (seems to be intended) is clearly wrong.
The url should look like: 
ftp://thanny:[password]@localhost:2222/h|/download/hubble5_hst.jpg or
ftp://thanny:[password]@localhost:2222/h%3a/download/hubble5_hst.jpg.
QA Contact: tever → benc
I'll confess that I haven't read RFC 2396 carefully, but I don't see any ftp: 
URL specific BNF that postdates RFC 1738 and RFC 1630.

In that light, section "3.2.2. FTP url-path" of RFC 1738 becomes very relevant, 
especially the comments here:

   Within a name or CWD component, the characters "/" and ";" are
   reserved and must be encoded. The components are decoded prior to
   their use in the FTP protocol.  In particular, if the appropriate FTP
   sequence to access a particular file requires supplying a string
   containing a "/" as an argument to a CWD or RETR command, it is

   necessary to encode each "/".

   For example, the URL <URL:ftp://myname@host.dom/%2Fetc/motd> is
   interpreted by FTP-ing to "host.dom", logging in as "myname"
   (prompting for a password if it is asked for), and then executing
   "CWD /etc" and then "RETR motd". This has a different meaning from
   <URL:ftp://myname@host.dom/etc/motd> which would "CWD etc" and then
   "RETR motd"; the initial "CWD" might be executed relative to the
   default directory for "myname". On the other hand,
   <URL:ftp://myname@host.dom//etc/motd>, would "CWD " with a null
   argument, then "CWD etc", and then "RETR motd".

Additionally:

"3.2.4 Hierarchy

   For some file systems, the "/" used to denote the hierarchical
   structure of the URL corresponds to the delimiter used to construct a
   file name hierarchy, and thus, the filename will look similar to the
   URL path. This does NOT mean that the URL is a Unix filename.
"

thanny: thanks for writing this bug. I was just about to try to create a similar 
test case, but I don't think my examples would have been as illustrative as 
yours.

Two general points about URL's I've learned recently:
1- For accesssing hierarcies, they messed up by not describing "/" path as 
"/<VOID>" (which is what it is in a lot of cases), and then explaining where it 
goes in each URL scheme.

For example, in HTTP, this "3rd slash is not part of the path" principle is a 
not true (you ask for "/" when you mean "/". So right out the door, general 
theory did not match specific practice.

In FTP, it is implicitly easy to figure out where "/", b/c the FTP server 
presents a filespace from a mount point on local disk (not always a real "/" 
either...)

In file, you can kind of deduce that it should be the top of the filesystem, so 
it's "/" in UNIX (otherwise file:/// != "/"), but that wasn't so bright because 
I don't think every filesystem allows every file-accessing user transparent 
visibility to the top of the file system. (In fact, whoever made "file:/// -> 
"C:\" in Windows has argued otherwise via the current Mozilla Win32 file 
implementation).

2- URL's have two fundamental structures:

Stateless systems get connection data and then an object reference (HTTP is 
basically a socket + the target of "GET").

ftpurl         = "ftp://" login [ "/" fpath [ ";type=" ftptype ]]
fpath          = fsegment *[ "/" fsegment ]

Connection based systems get a single-line script. (FTP is a socket, loging and 
password data, then a sequence of CWD commands, ending w/ RETR, with some 
optional commands along the way). You can't just map the fpath to a single CWD 
unless you KNOW "/" is the only filesystem delimiter.

I need to look at the FTP RFC more, but I'm reasonably sure that RFC 1738 
provides a VMS example that would probably break.

If you take RFC 1738, you can write URL's that have the Mozilla behavior, by 
encoding the filesystem delimiters "/", and having only one fpath. You also have 
a syntax that is robustly cross-file system compatible because it "walks" down 
the file system, rather than taking a presumptive short-cut.

I'll continue to look for newer RFC's that clarify this issue.
Keywords: testcase
benc: please read my comments.
The above patch removes the first / from the path and if the result is empty
replaces it with a . 

Seems to work fine for me so far. Anyone want to give it a testdrive?
Sometimes navigating through a directory tree with ftp results in "file not
found" the first time, clicking again on the same item works. Don't know if it
is related to the patch. Anyone else sees this?
Okay it's related. We just can't navigate through the tree with the whole path.
When the navigation fails we do a reconnect and the it works again. RFC1630
suggests a reconnect should be done with every CWD command. This would also make
sense in this case since it would reset to the home directory again.
Yes, and theres an XXX comment on us not disconnectiong the control connection
for a file not found. I know that the RFC says to do this, and I consider it
unacceptable from a perf point of view.

What I suggest (and what we used to do, although I still can't find code which
did so) was that when we log on with no directory, then we get teh current
directory, and update the url bar (and thus the relative stuff) to match. This
would almost match the 4.x behaviour. 4.x differentiated between ftp://host and
ftp://host/, and only did this for the first of these. We used to do the same,
but I suspect that one of the urlparser rewrites lost this difference, and then
one of the ftp rewrites lost the ftp code to handle the difference.

See bug 95277 - I was planning on digging through the ftp code from March this
weekend to see how we handled this then. I strongly suspect that this bug would
be fixed by fixing that bug.
There is no longer a difference between ftp://host and ftp://host/. The
urlparser will move ftp://host to ftp://host/ before the protocol has a chance,
and I see no reason to change this general behaviour that for this bug.

Is there a way to figure out if a list item is a file or a directory? If so
there is a way to deal with this. I have a version that operates always from the
login point but currently does sometimes CWDs on files. 
I know that there is no longer a difference. However, thats how ns4 does it -
again, see the other bug.

We figure this out by trying to CWD into the url, and if that fails, then we try
to RETR it (assuming that its a file). Thats the only way, so attempting to CWD
into files is correct, as long as we don't get confused afterwards.

I'm interested to see a patch which doesn't mean that we have to get a new
control connection each time, and which updates the urlbar to the 'real' path....
Is it wise to disclose the directory structure from the root? Usually I start
from a users directory and relative to it. No need to show where it is located
on the server.  
That patch breaks ns4 compatability. I know that that is what the rfc says, but
see my comments (based on the later rfc) for why we shouldn't do this. We cannot
break every ftp link in existance - consider an ftp mirror with a home directory
for the anonymous user of /pub. With your patch,
ftp://foo.com/pub/mirrors/file.txt would load ~/pub/mirrors/file.txt ->
/pub/pub/mirrors/file.txt, which is wrong.

You also need to update the urlbar, since from the client side there is nothing
stopping them from going up a directory level, and this would probably stuff up
the caching we do. Again, ns4 does that (although it doesn't do caching, IIRC)

Besides, the ~ notation is not standard, AFAIK. You cannot rely on it.
A few notes:

RFC2396 is useless as a guide to handling FTP URLs.  It provides no coherent 
information on them at all.  So, RFC1738 is where we have to look for behavior.  
It just so happens that RFC1738 is both coherent and very explicit.

Contrary to what has been posted, properly parsing URLs (issuing iterative 
CWDs) would not break every public FTP link in existence.  In fact, I've yet to 
see a single one that was broken.  The URL fetcher program I wrote does URL 
parsing according to RFC1738, and it has thus far never failed to retrieve a 
file in a public FTP path.  An example, which reflects all public FTP sites 
I've seen:

ftp://hobbes.nmsu.edu/pub/os2/apps/internet/ftp/client/00global.txt

After logging in as "anonymous", the client should issue the following commands 
(at a minimum):

CWD pub
CWD os2
CWD apps
CWD internet
CWD ftp
CWD client
PORT [add,port]
RETR 00global.txt

This correctly retrieves the file pointed to.  Every /pub/* URL I've seen works 
in this fashion.

If the server weren't Unix, it'd still work, because no path delimiter is 
assumed by properly parsing the URL.

To address some earlier comments:

>One other thing I noticed: The url
>ftp://thanny:[password]@localhost:2222/h%3a%5cdownload/hubble5_hst.jpg maybe
>valid  (because of it's encoding) but don't expect it to work because
>interpreting %5c as a path identifier (seems to be intended) is clearly wrong.

The URL above is not clearly wrong.  In fact, it is clearly correct.  There's a 
transport, host, and path segment - all properly delimited.  The path segment 
has two parts - a CWD argument, and a file to retrieve.  

It does not in fact matter what purpose the %5c character has, since the FTP 
client is entirely ignorant of it.  It's task is to send the path component 
string, decoded, as an argument to CWD, period.  It just so happens that in the 
above example, %5c ("\") is, in fact, a path delimiter.  Passing the entire 
string as the argument to a CWD works, because it's a completely proper path 
for the FTP server in question.
I agree mostly with Mike. RFC 2396 is useless regarding ftp urls. I would
propose the following:

1) get the path
2) remove leading slashes
3) unescape it

This would work on all given examples from RFC 1738. Then we can work as usual,
just do one CWD/RETR with the given path. The result should be the same as doing
incremental cwds.

After showing a listing or showing/downloading a file we work with the cached
connection for the next access. That is dangerous for the reasons given in RFC
1738. When there is no reliable way to get back to the login point then we have
to login again to force it. Instead just password and servertype should be
cached and the control connection not be reused, again logging in and be at the
right place (user home) for the next access. I dont't think that is such an
performance hit.

I also have to agree with Mike on his comments about my comments about 
ftp://thanny:[password]@localhost:2222/h%3a%5cdownload/hubble5_hst.jpg

He is right, there is no problem.
Running with the patch above I was able to access and navigate every site so far.
No. We cache control connections for a reason. It was really annoying on ns4 to
try to ftp to a busy site, and have every click give you another "too many
users" error.

Also see bug 94354 for another parsing issue.
I looked through the mozclassic code. What it does is: If the path is empty, (ie
ftp://ftp.netscape.com) _or_ the path starts with /./ (bug 94354), then we send
PWD, and tack that onto the front of the request. So to retrieve the file "foo"
from your home directory, you use ftp://server/./foo 

I will also note that (While its broken now due to bug 103514), mkaply had os2
ftp servers working.

The problem with your argument about disconnecting the control connection each
time is this: Imagine that the initial url is a directory listing. We get a list
of files/dirs within that directory. Now we need to print out this list in html.
Our code (all over the place) will translate that into an absolute url before it
gets to ftp. So now we have a brand new ftp url. There is no difference between
doing CWD /this_new_url, and recreating the control connection and then doing
that. So I agree that in some extreme edge cases, an ftp server may not work
with mozilla, ns4, ns6, and lynx (at least). But recreating the control
connection won't help that.

Getting back to the original position can be down by issuing a PWD first. Then
we can CWD back to that initial directory if we had to, but since the base url
would be changed by prepending that, we won't need it. CWD `PWD at login` must
take you back to the original login position.

Again, your way does not allow me to ftp to /etc on my machine, when my home
directory is /home/bbaetz.

In the patch itsself, you should test *fwdPtr=='\0', rather than PL_strlen. But
I'm still convinced that it is the wrong approach.

I think I'll have to put aside some time this weekend to work on a patch for my
approach.
You can go to /etc exactly as specified by RFC 1738, just do
ftp://user@host/%2Fetc. I will take a look at the PWD stuff.
This patch uses pwd to find and store the login path after login. Then it uses
it to create the correct path for CWD/RETR. The connection is again cached. Also
still compliant with RFC 1738 regarding the url syntax although it does no
subsequent CWDs to get to the right point.
OK, I like this patch a lot more.

What do you think about only removing the first / ? I know that this isn't what
the rfc says, but it would seem to better match current browser behaviour. I
still think that we will get complaints about this being the incorrect thing to
do, but I think that that is ok. What does ie do?

I'd still like some way to update the url bar, though.
darin: Theres an nsIHttpEventSink which takes an nsIHttpChannel as the first
paramater to OnRedirect. Could we make this more generic, somehow?

The other concern is taht we don't update the cache entry key for the correct
url. This worries me, although I can't think of a particular case where we'd get
teh wrong data, . Chaning which user you log in as could do it, though.
Personally I think, we can go with only removing the first slash. It might be a
good compromise between what 4.x does and what the rfc says, although it is very
clear that we violate rfc 1738 with it. Mike, what's your opinion on this?

Also I like it very much that we currently hide the login path from the user in
the ftp url. I see no gain in exposing it to the user. It would irritate we very
much when the ftp url changes that much after the login. 
>------- Additional Comments From Andreas Otte 2001-10-12 03:43 -------

>Personally I think, we can go with only removing the first slash. It might be 
>a good compromise between what 4.x does and what the rfc says, although it is 
>very clear that we violate rfc 1738 with it. Mike, what's your opinion on 
>this?

Well, it would be an improvement.  However, it still leaves open the 
possibility of problems for any FTP server running on an operating system which 
doesn't use the '/' character to deliminate path elements, and doesn't 
explicitly emulate Unix functionality.  The VMS example in RFC1738 serves well 
to illustrate the potential problem.  Unless the FTP server in use acted like 
it was a Unix server, translating all '/'-based paths to VMS paths, simply 
removing the first slash would solve nothing for such a system.  

Of course, I can only speculate that such FTP servers exist.  I don't have a 
VAX laying around to test with <g>.

Even the basic OS/2 FTP server will recognize a '/' as being analogous to '\', 
and work with such a system.

So, it will almost certainly work with the vast majority of systems.  

It's still easy, of course, to imagine a setup where that would fail, for any 
number of reasons, where doing iterative CWD's would work.  It would be my 
nature to do things completely properly, but I would not be opposed to adopting 
the simplest possible solution, and dealing with the problems if and when they 
arise in the future.

>Also I like it very much that we currently hide the login path from the user 
>in the ftp url. I see no gain in exposing it to the user. It would irritate we 
>very much when the ftp url changes that much after the login. 

I don't see why it'd be necessary to update the URL.  Just do what was already 
stated previously - store the result of a PWD upon login, and use it as the 
parameter to a CWD before issuing any CWD's or RETR's for links created by the 
FTP browsing code.  

On a related note, re-establishing the control connection should only be done 
when necessary.  I haven't given it a great deal of thought, but comparing the 
accessed URL with the established control connection should allow a correct 
decision about whether or not to reconnect (or make a new connection - how many 
open control connections to maintain is another issue entirely).  If the 
user/host and first path component are the same, clearly the existing control 
connection is sufficient - simply CWD to the login PWD before continuing.

If the first path component is different, however, a new control connection is 
the only way to guarantee that the correct resource is accessed.  For a Unix 
system, if the first character of the first path component (after decoding, of 
course) is a slash, then the path is absolute, and a new control connection is 
not required.  Any FTP server with a SYST response that has "UNIX" 
(case-insensitive, I'd think) as the first word would allow such an assumption 
to be safe.  Otherwise, trying to determine whether or not a path is absolute 
is inappropriate (unless customized for other reported operating system types).  
With a relative first path component that doesn't match the first path 
component of the new URL, a new control connection is necessary.  This 
situation should never occur with any links generated by the FTP browser, of 
course, so it will be fairly rare.

Finally, on a tangentially related note, I notice in passing that the program 
WGET already discards the first slash, and I find it unlikely that such 
behavior would remain if it caused problems in retrieving URLs.

the mozillaclassic code has every line dealing with paths wrapped in an:

if (foo->server_Type == VMS) {
...
} else {
...
}

Since we don't parse vms directory listings, I think that spending time working
out the correct behaviour isn't worth it.

We should update the url because:
a) 4.x did it
b) Its clearer exactly what path you are using

Note that the <title> of the page could be made correct reasonably easily - just
look at where the 300: line is printed out for directory listings, and change
that. Its probably not worth the confusion though.
Attachment #52429 - Attachment is obsolete: true
Attachment #52465 - Attachment is obsolete: true
Attachment #52997 - Attachment is obsolete: true
Maybe this can go in first and then see about updating the urlbar with the whole
path in a second step. The last patch only removes the first leading slash not
all of them, no need to mask a slash any more.
Bradley: Any decision yet on this? Shall we go with what we have for now (latest
patch, which fixes FTP url parsing) and future the update of the urlbar or wait
for that?
Comment on attachment 53486 [details] [diff] [review]
path that only removes the first slash

>+FTP_STATE
>+nsFtpState::R_pwd() {
>+    if (mResponseCode/100 != 2) 
>+        return FTP_ERROR;
>+    nsCAutoString respStr(mResponseMsg);
>+    PRInt32 pos = respStr.Find("\"");
>+    if (pos > -1) {
>+        respStr.Cut(0,pos+1);
>+        pos = respStr.Find("\"");

You should change all those Find calls to FindChar.


>@@ -1800,7 +1851,20 @@
>         rv = mURL->GetPath(&path);
>     
>     if (NS_FAILED(rv)) return rv;
>-    mPath.Adopt(nsUnescape(path));
>+    // Skip leading slash
>+    char* fwdPtr= (char*) path;
>+    if (fwdPtr && (*fwdPtr != '\0') && (*fwdPtr == '/'))
>+        fwdPtr++;

You don't need the != '\0' if you're checking for == '/', also, change the cast to char* to an
NS_CONST_CAST of path.get().

r=bbaetz if you do that. This is going to mean a change in behaviour, though, and I'm nervous :)
If someone comes up with a case which the old code could handle (or is in common use), but this
code breaks, and isn't easy to change to, then we'll have to reconsider.
Attachment #53486 - Flags: review+
Attachment #53165 - Attachment is obsolete: true
Attachment #53486 - Attachment is obsolete: true
Comment on attachment 54417 [details] [diff] [review]
patch incorporating Bradleys comments

sr=darin
Attachment #54417 - Flags: superreview+
Comment on attachment 54417 [details] [diff] [review]
patch incorporating Bradleys comments

remarking r=bbaetz on new patch
Attachment #54417 - Flags: review+
->patch author. Andreas, do you want me to check this in?
Assignee: bbaetz → andreas.otte
Blocks: 95277
I will check it in, but that will happen on the weekend since I'm away for three
days. If you want to have it in prior to that please step in, but I can't go on
the hook during those three days. 
fix checked in.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
At least once I have seen a crash when navigating through the ftp-tree that
might be related to this. I was unsuccesfull in recreating it, maybe an
uninitialized variable that is now exposed, I don't know. It would be nice if
someone else could check this out and have a backtrace if the crash happens too.
Currently I see this when trying gdb (after updating to SuSE 7.3) mit mozilla:

[New Thread 1024 (LWP 9414)]
Assertion failure: 0.5 <= maxAlpha && maxAlpha < 1 && 0 <= minAlpha, at
pldhash.c:227

Program received signal SIGABRT, Aborted.
[Switching to Thread 1024 (LWP 9414)]
0x40601861 in kill () from /lib/libc.so.6
Blocks: 55238
Still problematic in version 1.2.1 ( mozilla-1.2.1-26 on Red Hat Linux 9 )

There is a bootload of failing ( legal ) links listed at http://rcum.uni-
mb.si/~uel003r2a/ftp_url_test.html

( there are HTML comments for almost every failure , what and how went wrong )

As I can not reopen this bug, somebody else should or I will open a new one.
Hmm, the link I posted was broken in two. Here it is again :

http://rcum.uni-mb.si/~uel003r2a/ftp_url_test.html
David: please file a new bug.  thx!
You need to log in before you can comment on or make changes to this bug.