Closed Bug 78148 Opened 23 years ago Closed 23 years ago

clean up directory listing stream converters

Categories

(Core :: Networking, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.5

People

(Reporter: bbaetz, Assigned: bbaetz)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files, 7 obsolete files)

Currently the stream converters take one ascii representation, parse it, convert
it into another, and pass it off to something else which parses that, and then
spits out the result.

I got sick of hacking arround this while getting gopher to output html directory
listings. So I'm fixing it.

Doug, do we need to support nsFTPDirListingConverter::Convert? Nothing actually
uses it AFAICS, and the indexedToHTML stuff doesn't support it either (ditto for
gopher - I don't think that the gopher code has even been tested....).

Currently, I have an nsIDirIndex interface which contains attributes for the
various things we want to support, and an nsIDirIndexListener, which inherits
off nsIRequestObserver, and provides the addition method:

void onIndexAvailable(in nsIRequest aRequest,
                          in nsISupports aCtxt,
                          in nsIDirIndex aIndex);

I haven't finished hooking this all up yet (I still have to connect the new
FTPDirListingConverter to nsIndexedToHTML), but:

[bbaetz@banana netwerk]$ cvs diff streamconv/converters/nsFTPDirListingConv.cpp
| diffstat
 nsFTPDirListingConv.cpp                       |  307 +++++++-------------------
 1 file changed, 93 insertions, 214 deletions

[bbaetz@banana netwerk]$ cvs diff streamconv/converters/nsIndexedToHTML.cpp |
diffstat
 nsIndexedToHTML.cpp                       |  242 ++++++++---------------------- 
 1 files change, 72 insertions, 170 deletions

(I also NS_LITERAL_STRING'd nsIndexedToHTML while I was at it)

Of course, I've added the new interfaces and implementation files, (but those
just consists of getters and setters, + the license blurb). So I come out adding
23 more lines than I remove. I haven't pulled all the parsing stuff out of
nsDirectoryViewer yet, though, or touched gopher.

Or done anything more than verify that it all compiles - I know it won't work as
it currently stands.

Even if we do decide to scrap the XUL directory viewer, this still:

a) Cleans stuff up
b) Provides a common interface between ftp and gopher.
c) Avoids uneeded parsing/escaping/unescaping (although I'm still escaping - I
decided to try and check that it works, and change as little as possible, first)
d) May give us a way to easily produce html listings for file:///, although the
nsString vs nsCString stuff would have to be cleaned up - currently I've just
done what the directory viewer did.

The disadvantage of doing this is that it is theoretically now possible (since
my directoryViewer changes from a few weeks ago) for a web server to send us
application/http-index-format data, and have mozilla display the data as the XUL
tree. I haven't tested that though yet. These changes would remove the parser,
and so remove that feature - I don't know if anyone cares though.

I still have a couple of streamconverter/nsIInputStream questions though. I'll
code a bit more to try and get something actually displaying, and see what I
discover, or I'll find some necko people on IRC and nag them :)
supporting Convert() allows for SYNC stream converting.  Is anyone doing this: 
no.  Will anyone want to?  Don't know.  

Attach some diffs so we can see some of this great sounding work.  :-)
Is there some sort of wrapper class which will make an async converter
synchronous? I can imagine one, but I don't know if it already exists.

I'll attach diffs once I get something displaying - probably later today, after
I finish some other stuff.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
I believe one of the test stream converters I provide wraps a sync converter in
async callbacks. In real-world situations though this defeats the whole purpose
of asynchronous function calls because they just wind up blocking. It's best to
provide async parsing.

bradley writes:
"Currently the stream converters take one ascii representation, parse it,
convert it into another, and pass it off to something else which parses that,
and then spits out the result.

I got sick of hacking arround this while getting gopher to output html directory
listings. So I'm fixing it."

what's wrong w/ the stream converter model? and what/how were you hacking around
whatever you found to be wrong w/ it?

> I believe one of the test stream converters I provide wraps a sync converter
> in async callbacks. 

I meant the other way arround - given an async stream converter, wrap it so that
it becomes a sync converter. I know how to do it, I was just wondering if there
was something in the tree already.

Nothings wrong with the stream converter model. However, it only passes text
streams arround, and so we end up parsing and unparsing the same data multiple
times. To expand on my summary:

What currently happens (for ftp) is that nsFtpDirListingConv takes a
text/ftp-dir-<servertype> stream, and parses it, putting it into an indexEntry
(a local class defined in nsFtpDirListingConv.cpp). At the end of each line, it
takes that structure, and puts out a application/http-index-format entry.
Depending on the settings in the prefs, the ftp protocol may have arranged to
get a chain of converters, so this data will be passed to nsIndexedToHTML, and
html produced, or it will be left as application/http-index-format, where
nsDirectoryViewer.cpp will parse it. That parser is seveal hundred lines long,
and makes several assumptions about gopher and ftp.  It also puts everything
into a structure which looks remarkably like the indexEntry structure we started
out with.

The nsIndexedToHTML parser isn't really a parser for the entire format - it
makes assumptions about the layout of the data lines. I could make it understand
gopher as well, using a few ifs, and hardcode more stuff in.

Thats what I meant by "hacking arround" - not the stream converter model, but
the duplication of code, and the needless parsing/unparsing/etc.

The onIndexAvailable is there because, AFAIK, there isn't an nsIInputStream
which I can attach nsISupport objects to, rather than char*'s. I'm not sure that
something like that would really fit the model of the input stream stuff.

I'll attach some code later tonight. 
Attached patch work-in-progress patch (obsolete) — Splinter Review
I've attached a work in progress patch. With that patch gopher and ftp will use
the html directory index. I haven't updated the XUL viewer yet - I'll do that
tomorrow.

The API changes are straight forward, and mainly involve removing lots of code.

Any design/API comments?
hmmm, I'm a bit concerned here because http dir listing format is sort of
standard internally (HTTP can use it), and I see some benefit to cononicalizing
various listing formats into it as it gives us some common ground. have we
isolated this extra parsing step for ftp as a performance hit?
Well, mozilla can only use the application/http-index stuff in the XUL viewer,
and, until my last lot of cleanups for nsDirectoryViewer.cpp went in a few weeks
ago, only ftp and gopher entries could be identified as directories - all other
protocols would have just been files. I'm still not sure how generic it is,
though, and it probably still needs some changes to handle that. Do you know of
any sites which use this format? Despite the name, the current code doesn't
actually handle http directory listings, and ns6 only handles ftp.

I doubt that its a performance hit, although I haven't measured. The problem is
that the parsing code would have had to be reproduced twice - once for the html
listing, and once for the XUL one.

If you feel that its an issue, then I suppose it could be possible to just move
the current parser over, and get it to produce nsIDirIndex's. Another
alternative would be to have nsDirectoryViewer.cpp use a different xul file for
html listings (and then we only need one output formatter).

(BTW, do directory listings give the time in GMT for a reason?)
i don't know about current gmt reasoning. Netscape enterprise http servers can 
emit http index format if you tell them to.

I just want us to be aware that we're lopping off one of our stream converters. 
part of the stream converter model was to have the pool actually grow so you 
could convert from type A to Z w/ out knowing whether or not you had layered 
conversion going on. if you pull it out of mozilla by default, please leave the 
converter lying around in cvs (and in the build) so people can use it if the 
need arises.
OK, so I've just discovered nsDirectoryIndexStream, which is Yet Another
producer of directory listings.

Can I ask what this is used for? nsDirectoryViewer gets the data for files from
rdf:file, and mentions (line 1290) that while it may be a good idea to combine
them at one point, "rdf:file is less expensive in terms of memory and speed", so
it isn't used for that (although it does get the data for the initial url, which
is then probably discarded, although that may not be the case anymore. It also
assumes that all filenames are ascii, calling GetLeafName, and I thought we
supported non-ascii files)

Its only used when someone tries to get an input stream for a directory - does
anyone apart from the directory viewer do this?

If my changes allowed file:/// to be able to be displayed in any format, would
the objections to this be less?
Does that feature work with ns6?

If thats the case, then I'll leave it in - I'll have the parser convert to
nsIDirIndex, instead (and move it into netwerk). This will still have the
advantage that the 4 directory input methods (application/http-index-format,
file, ftp, and gopher) will be able to use either output format without
problems.
after thinking this over some more, and talking this over w/ rpotts, I'm a bit
concerned over this goal.

again, the whole idea of http_index is that it is a cononical dir listing format
that *any* protocol can generate and subsequently we'll be able to convert this
connonicial format into another. this adds a nice layer between protocol writers
and content delivery. it also allows for new converters, say text/xml, to come
in and take *all* dir listings and turn them into some new format. 

I'd really prefer we maintain this std format.
I agree, but the problem is that as soon as you have more than one output
format, you end up with either needing to duplicate the parser, or have a parser
which will take an http-index-format input stream, and output into a data
structure which the front ends can then use. Or you can skip the intermediate
data format, which is what my (unfinished) patches do, which then avoids the
need to escape, convert times to a standard representation, etc.

The other option may be to always convert to rdf (using the parser now in
nsDirectoryViewer.cpp), and then use a xul template to output xhtml (thus
getting rid of nsIndexedToHTML.cpp) Thinking about it a bit more, that solution
would probably be the cleanest. Would that be acceptable from the embedding POV?
"or have a parser
which will take an http-index-format input stream, and output into a data
structure which the front ends can then use. 
"

that's what index->html already does. skipping the intermediary format kills the
output format flexibility.

I'm not speaking from an embedding POV, just a mozilla contributor POV.

IMO, we already have the intermediary format, httpindex.
"that's what index->html already does."

But the problem is that it doesn't do it properly. It handles a subset of the
format which works for ftp, and only ftp - it assumes the ordering of the data
lines, so it won't handle gopher, file, data from the web, etc. I could fix this
by copying the parser from nsDirectoryViewer.cpp to nsIndexedToHTML, but that
strikes me as a waste of effort.

Would embedding (which I believe was part of the reason for creating the html
interface in the first place) be happy with a XUL template being used to output
html? Or is there another way to format rdf?
OK, so I've though about this a bit more, and now agree with valeski.

Does anyone have any objections to just killing the XUL/rdf directory viewer,
and moving everything to the HTML output? This would lose the ability to do the
rdf ftp bookmarks, but that doesn't really bother me. dougt? Any objections?

The bookmarks URI is really sucky, and the directory viewer is slow. According
to waterson:

<waterson> it was a research project gone bad

I can then hook the application/http-index-format datatype directly up to that
convertor, and remove the pref-testing code from the ftp directory listing
convertor.

If there's only one output format, then theres no need for all this abstraction.

valeski and/or dougt - would a patch to do that meet your approval?
lemmee ask some embedding customer's what they think about losing the XUL
listing. the last time I checked, there were some that actually liked it :-).
----------- PLEASE IGNORE ALL COMMENTS ABOVE THIS LINE ------------

OK, so I've redone this, and the only change is the abstracting of the parser.

If you have the network.dir.generate_html pref set to true (as it is by default
now), then ftp, gopher, and file will all appear as html.

The dates are also reported using the current locale, as well, and with the
current time (as opposed to GMT). file:// has the wrong dates - 
nsDirectoryIndexStream is generating dates in 1970. I'm also not sure how well
it behaves with non-ASCII file names.

Because the intermediate application/http-index-format is still generated, the
code isn't any smaller - if you ignore the copyright lines in the new files, its
about the same size.

All protocols generate application/http-index-format, and then the viewer
factory decides whether to generate html or XUL based on the pref.

I want to fix the file:/// stuff first, but are there any obbjections to the new
patch (which I'll attach now)?
Attached patch new patch (obsolete) — Splinter Review
patch seems like a good direction to head in to me.
Blocks: 72724
Blocks: 78474
Blocks: 68651
I'm about to attach a patch which works for me. I don't know what it does on
non-ascii file listings. As well, file listings aren't sorted. (well, none of
them are, but it only matters for file listings). The dates are now correct as
well (why does nsIFile return the time in milliseconds instead of microseconds)?

The HTML listing hardcodes english strings - dougt said that he has a bug on
that.

So I'd like to check this in, possibly disabled for file uris (by just changing
the check in the factory). Can I get someone who has access to a machine with
non ascii filenames please check this? And possibly fix it?

The functionality is the same, except that gopher will now use html instead of
the XUL interface. I did replace the "directory"/"file" bits with pictures (the
ones from ns4 that are still in the tree). Or at least, I did, if they actually
worked. I know pavlov had them working at one stage - I'll go ask him about
them.

I've also just noticed that I took out the .. listings as well. I'll put them
back in when my tree finishes rebuilding.

So can I get an r= on this?
Attached patch new patch (obsolete) — Splinter Review
the usage of the http-index only converter in FTP seems fine to me, as do the
makefile changes (who's going to do the mac build changes?), and factory
registration mods. however, I'm going to have to defer on the index/parser
changes; dougt maybe?
Attached patch patch against current tree (obsolete) — Splinter Review
pushing off - this is too late for 0.9.1
Target Milestone: mozilla0.9.1 → mozilla0.9.2
mass move, v2.
qa to me.
QA Contact: tever → benc
Target Milestone: mozilla0.9.2 → mozilla1.0
Blocks: 83881
Blocks: 97358
I'm going to try to get this in for 0.9.5, but it depends on when I get net
access at home - it will almost certainly end up as 0.9.6, I think.

I've discovered that all the i18n file listing worries I had don't really
matter, since the current viewer displays accented chars in ftp listings as ?,
while as of last night my tree displays the real char (after hacking
nsParser.cpp to look for a meta charset tag in application/http-dir-index as
well. Yes, this is ugly)
Assignee: bbaetz → bbaetz
Status: ASSIGNED → NEW
Priority: -- → P2
Target Milestone: mozilla1.0 → mozilla0.9.5
OK, here goes another patch. Its not perfect, but it works.

Try it out (in both xul and html mode) and let me know what you think. Ideally
there shouldn't be any noticable differences - this is a backend patch, although
the html view did change slightly. I also implemented view-source for the xul
viewer, by just displaying the html source :)

Theres some stuff in nsDirectoryIndexStream.cpp to do collation-based sorting
which is #ifdef'd out. See the comments for why. Also see bug 99478, and bug
85836 for why I'm reasonably sure that everything is OK.

I'm going to post to npm.i18n to check that I haven't regressed anything from
that side of things, (except for what I note above)

I'd like to get this in soonish, so can I get some review on the assumption that
there are not i18n regressions? I think I have everything; let me know if it
doesn't build :)

dougt & valeski: I haven't gotten a response to my mail yet, either.
Status: NEW → ASSIGNED
Attached patch patch v0.9 (obsolete) — Splinter Review
CC'ing some i18n people.
nsServiceManager::ReleaseService is deprecated.  Just use NS_RELEASE

+    if (gTextToSubURI) {
+      nsServiceManager::ReleaseService(NS_ITEXTTOSUBURI_CONTRACTID, gTextToSubURI);
+      gTextToSubURI = nsnull;
+    }



You can optimize this:

+  *aListener = mListener.get();
+  NS_IF_ADDREF(*aListener);

by just doing something like NS_IF_ADDREF(*aListener = mListener.get())

You should probably check against null here:
+  delete[] mFormat;


What does this comment mean:
+
+    // So? - bbaetz


For whatever reason, I can not apply this diff.  Patch/Diff just doesn't like me
today.  I would like to try it out before giving you a r/sr.

Darin, can you superreview?
Blocks: 77969
Blocks: 38014
Blocks: 61235
Blocks: 100691
Comment on attachment 50024 [details] [diff] [review]
patch v0.9

some nits:

nsDirectoryIndexStream.h:

1- how about using for mArray?

nsDirectoryIndexStream.cpp:

2- to bad we don't have a nsVoidArray::InsertSorted method... glib does.
3- bug 56354 has been fixed.. so nsIFile::GetURL should be ok, right?

4- fortunately RemoveElementAt doesn't do much when the element 
is removed from the end of the array, but if it did, your algorithm in 
~nsDirectoryIndexStream might want to change.

5- can the #if 0'd code be removed?

nsNetModule.cpp:

6- why not "Directory Index" instead of "nsDirIndex"?

nsDirIndex.h:

7- wouldn't nsXPIDL{C}String's be a better choice for storing those strings?
aren't ns{C}String's large in size by comparison?
nsIndexedToHTML.cpp:
8- please keep a consistent opening brace style for function definitions.

streamconv/public/makefile.win
9- check your indentation

nsIDirIndex.idl:

10- how about moving the CID definition into nsNetCID.h?

nsIDirIndexListener.idl:

11- why include nsIRequestObserver.h?

12- please use a consistent opening brace style.

13- how about including nsIStreamListener.idl at the top of the file.

14- how about moving the CID and ContractID into nsNetCID.h?
nsDirectoryViewer.h

15- looks like the member variables could be lined up better.

how about a new patch with these things + dougt's comments resolved?
by the way, thank you very much for cleaning up this code... it really has
needed some good lovin ;-)
A lot of this code was just moved arround. That said:

> 1- how about using for mArray?

you're missing a word there, I think. I can't use an ISupportsArray because I
can't sort one of those.

> 3- bug 56354 has been fixed.. so nsIFile::GetURL should be ok, right?

Looks that way. Fixed.

> 4- fortunately RemoveElementAt doesn't do much when the element 
> is removed from the end of the array, but if it did, your algorithm in 
> ~nsDirectoryIndexStream might want to change.

I'm not sure what you mean by this. I remove from the end on purpose, so that we
don't constantly have to move every element in the array.

> 5 #if 0 code

I forgot about that. I was thinking only doing so on unix, or something, to
match the default mac + windows behaviour. This was checked in as part of bug
22244. I have trouble understanding how this could have fixed that bug, though.
Changed to #ifndef XP_UNIX in my build, which is probably the correct thing to
do anyway.

6: Fixed - I was just copying the ones above
7: fixed
8: It was inconsistent to start with, but fixed
9: fixed10: None of the streamconverters do
11: Bogus left over code. Removed
12. fixed
13. fixed
14. See 10.
15. Fixed.

New patch coming, this time diffed against cvs-mirror
1- sorry... i meant: should mArray be of type nsAutoVoidArray?

4- i should have been more precise: imagine if RemoveElementAt were to shrink
the allocated array size every so often.

10,14- no problem... maybe someday we'll go and make the shift to nsNetCID.h
1 - no, there will usually be > 4 (or 8, can't remember) entries.

4 - Its making it smaller, so I hope that the overhead is minimal. I could
iterate through without deleting, and call NS_RELEASE, then just free the array.
In fact, I will.

New patch will arrive after my tree rebuilds all the license foo from yesterday.
Attached patch new patch (obsolete) — Splinter Review
jbetak, please help to test it. 
I'm going to attach another patch. Only differences are added REQUIRES lines for
uconv.

For testing instructions, see
news://news.mozilla.org/slrn9qim6d.p41.bbaetz@banana.home
Attachment #32728 - Attachment is obsolete: true
Attachment #32728 - Flags: needs-work+
Attachment #33251 - Attachment is obsolete: true
Attachment #33941 - Attachment is obsolete: true
Attachment #34585 - Attachment is obsolete: true
Attachment #50024 - Attachment is obsolete: true
Attachment #50202 - Attachment is obsolete: true
Attachment #51277 - Attachment description: updated for cvs conflicts/buils system changes → updated for cvs conflicts/build system changes
Comment on attachment 51277 [details] [diff] [review]
updated for cvs conflicts/build system changes

r=darin
Attachment #51277 - Flags: review+
Attachment #51277 - Attachment is obsolete: true
Comment on attachment 51334 [details] [diff] [review]
...and update for license changes, too!

this just resolves cvs conflicts with the license landing - still has r=darin
Attachment #51334 - Flags: review+
Attachment #51334 - Flags: superreview+
I really want to land this before the tree closes on Tuesday, so I'd appreciate
the i18n feedback before then.

If there are problems with file:/// urls and the html viewer, then I will check
this in with xul remaining the default for file until those can be resolved,
assuming that the xul viewer still ends up working correctly.
Bradley,

I'm using a 094 commercial Netscape build. While looking at an internal ftp site
(ftp://kaze/pub), the directory and file names appear either escaped or garbled
(treated as Latin-1 and rendered using incorrect fonts) in both HTML and XUL view. 

I'll doublecheck with i18n QA as this seems to be broken w/o the patch already.
I'll apply the patch next, repeat the test and look for any adverse effects.
jbetak: ftp will definately be broken, and theres a bug on that - I need file
tested, mainly.
thanks for the clarification - I'll test a local directory next.
I went through the test scenarion w/o the patch and local directory listings 
work great. However, I was not able to switch from XUL to hmtl via the pref 
setting - am I missing something?

Also, you wouldn't happen to have an non-debug version of necko.dll, would you? 
It might take me a while to compile an optimized build plus the patch seems to 
require some manual interaction (half a dozen of hunks fail on recent 
branch/trunk trees).
You can't change it on the trunk - thats (part of) what this patch does.

The latest patch is against a current trunk tree. Note that you have to apply it
_without_ the usual posix environment variable, since it creates files. Theres a
small possiblity of a makefile conflict with requires stuff, but cvs merged
those into my tree. They should just be one liners, and easy to resolve.

I only have a linux build environment, unfortunately, otherwise I'd make a test
build available.
alecf checked in a change this afternoon which broke this. To fix, just add
necko2 to the list of REQUIRES in xpfe/components/build/makefile.win after
applying. (and Makefile.in for unix builds)
Bradley,

my apologies for all the delays - I finally got an optimized build with your 
patch running on Japanese W2K. Seems like your patch breaks the HTML view for 
non-Latin-1 file and directory names. The functionality is still preserved, 
however file and folder names appear in the wrong encoding, which renders them 
unreadable. File and directory hierarchies can still be navigated, but the 
usability is severely impacted - the extent of the problem resembles what I 
have seen for ftp the other day.

Hope this helps a bit - perhaps you could still get the XUL view checked in 
before the 095 freeze? I and I'm sure the rest of the i18n team would be happy 
to help to address the remaining issues with the patch, although it might not 
happen in the M095 time frame.
OK, great. What I'll do is check this in, but disable the html stuff for file
urls, and file a new bug. If it ends up being a simple fix, then I'll still try
to get it in for 0.9.5.

jbetak: one more test, and if it doesn't work you don't need to attach a
screenshot. In netwerk/base/src/nsDirectoryIndexStream.cpp arround line 76 with
my patch applied is a #define for THREADSAFE_I18N. What happens if you uncomment
that line and then test in the html mode? If that works, then I know what the
problem is, and can probably work arround the other problem.  Also, what happens
if you change the charset encoding via the View menu? I am surprised that it
worked if it didn't display the names correctly, though, but I won't complain :)
Thank you very much for doing this testing for me.
Mass change: Bug 78148 has been checked in, and so it and these dependancies
should have been fixed. If you disagree, please reopen.

Note that the xul view is still enabled for file because of i18n problems. The
bug to fix this is bug 102812.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: