Closed Bug 309296 Opened 19 years ago Closed 18 years ago

RFE: directory listing for jar:

Categories

(Core :: Networking, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Future

People

(Reporter: sync2d, Assigned: Waldo)

References

Details

Attachments

(2 files, 7 obsolete files)

When I browse file:///c:/some-dir/, mozilla shows contents of the directory.
When I browse jar:some-archive.jar!/some-dir/, mozilla does nothing.
It's a bit convenient if I can browse contents of the
zip files in the same way as the local file system.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Agreed.  This would be really cool to have.  Maybe for Gecko 1.9.
Keywords: helpwanted
Target Milestone: --- → Future
Related to/duplicate of bug 132008 (and bug 64286)?
I've been poking around in the code that does directory listings, so I'll take this and poke at it for a while.  Without this, working with jar: URIs for zip files is basically impossible for anything but the simplest zip files, especially in combination with the lack of error pages when you attempt to open a file that doesn't exist in zip.  (I don't think the last has been filed as a bug yet.)

And regarding comment 2, no, it's not particularly related to either of those bugs.
Assignee: darin → jwalden+bmo
Jeff, have you had a chance to poke at this?  This has annoyed me many a time too; I'd love to see it fixed.
(In reply to comment #4)

I started poking at it about a week ago, and I've been doing so off and on in my spare time since then.  I've mostly been looking at the zip-reading code so far, because it doesn't yet support directories.  I'm currently trying to decide how to create implicitly-defined directories (e.g., foo/ in a zip containing a single entry for foo/bar.txt) in a way that isn't too slow so that dealing with a typical zip like browser.jar that contains a lot of files in a few directories is reasonably fast.  I'm guessing I might have a patch to support reading and non-recursively extracting directories in around a week and a half.  I haven't looked as much at the networking code, but I expect a patch for that could come a week or two afterward, depending on how my free time pans out over that period.
Status: NEW → ASSIGNED
Attached patch Complete patch (obsolete) — Splinter Review
Things went quicker than I expected, and I finished both directory support in zip files and directory listings in this patch.  I'm not as familiar with C++ XPCOM patterns as with JavaScript patterns, tho, so there's probably something in there that's not quite right.  Here are some things I think are worth pointing out before review, including further explanation of some of the changes when documenting in the patch didn't seem to make sense.

Index: mozilla/modules/libjar/nsJAR.cpp
-NS_IMPL_ISUPPORTS1(nsJARItem, nsIZipEntry)
+NS_IMPL_THREADSAFE_ISUPPORTS1(nsJARItem, nsIZipEntry)

Input streams are read on a separate thread from the one that created the nsIZipEntry, so without this I get assertion spew.  Given that the interface contains only readonly attributes and doesn't change any of the underlying data, I think this is safe; let me know what needs to be done to fix it if I'm wrong.

Index: mozilla/modules/libjar/nsJAR.cpp
 nsJARItem::GetCompression(PRUint16 *aCompression)
 {
     if (!aCompression)
         return NS_ERROR_NULL_POINTER;
-    if (!mZipItem->compression)
-        return NS_ERROR_FAILURE;
 
     *aCompression = mZipItem->compression;
     return NS_OK;
 }

The compression types for zip entries include the numbers 0 through 10 in modules/libjar/appnote.txt, with 0 defined as STORED (no compression), so throwing here made no sense.  This same pattern also happens in Get(Real)Size, and I removed it there to accommodate zero-sized files and directories.  The crc32 for zero-sized files and directories created by the zip utility is 0 (and for synthetic directories, too), so GetCrc32 was similarly changed.

Index: mozilla/modules/libjar/nsJARChannel.cpp
+        nsJAR* jar = (nsJAR*) mJarReader.get();
+        rv = jar->GetDirectoryInputStream(mJarEntry.get(),
+                                          getter_AddRefs(mJarStream));

Is this the correct way to get a pointer for the underlying class to call a non-XPCOM method?

Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp
+ * Sorting function which sorts nsIZipEntry items into alphabetical order.

I happen to think directories should be displayed before files, but the equivalent to this function in netwerk/base/src/nsDirectoryIndexStream.cpp doesn't, so this function doesn't either.

Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp

This file was adapted from the latest version of nsJARInputStream.cpp, so it contains all the fixes to that file that occurred as part of bug 318193.

Index: mozilla/netwerk/streamconv/converters/nsIndexedToHTML.cpp
     else {
         // escape as relative
-        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon;
+        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon | esc_Directory;
     }

esc_Directory is needed here because otherwise the links to directories in zip directory listings will include an escaped trailing slash.

I'm not in a rush to see this fixed by yesterday, so take your time reviewing the patch if you need it (but don't take /too/ much, please :-) ).
Attachment #207957 - Flags: review?(darin)
A couple followup comments:

http://lxr.mozilla.org/mozilla/source/xpinstall/src/nsSoftwareUpdateRun.cpp#123
The previously posted patch contains changes toChanged to skip entries for directories which aren't explicitly in the zip; my testing consisted solely of verifying that the Google toolbar extension showed up as signed when I tried to install it.  (To be honest, I'm not entirely certain that test is even meaningful, so it's possible the fix isn't adequate.)  This change is the single reason I added the isSynthetic attribute to nsIZipItem, because I don't think there's any other way to do this completely correctly.

http://lxr.mozilla.org/mozilla/source/xpinstall/src/nsInstall.cpp#2814
This should be changed in a future bug to only get directories with a simpler pattern (dir+"/*/"); the ugliness they had to go through there doesn't need to happen any more now if directories only present in item paths are recognized.

That's all the callers of findEntries in source that look like they need fixing.  (The toolkit extension manager already seems to think we recognize directories, so it didn't need fixing.)  I haven't yet looked at whether any extract callers need fixing.

As a final treat, those who feel like applying the patch can enjoy browsing through the following zip-in-a-zip of a freely-available book on C++:

jar:jar:http://www.pythoncriticalmass.com/downloads/TICPP-2nd-ed-Vol-one.zip!/TICPP-2nd-ed-Vol-one-html.zip!/
Keywords: helpwanted
Changes in response to some comments by timeless:
-doxygen comments are in /** \n * foo \n */ format (because doxygen can't
 read the single-line version, which I'm pretty sure is valid in the Java
 world)
-updated docs on the findEntries pattern to clarify what needs escaping and
 when; also explicitly noted that some patterns (mostly edge cases) have
 undefined behaviors, because the parser's very far from bulletproof --
 docs also updated in nsWildCard.h
-removed a bunch of ifdef'd NS_ERROR_* defines, which I only added to the
 original patch at the last minute because they looked like something that
 should be added;  lxr for "nsIFile_h" suggests that include'ing nsIFile.h
 may also do the trick when it's needed (is it?  I have no idea)
-checked for error on localFile->Create() -- an accidental omission on my
 part
-moved nsZipArchive's FindInit retval into an out param (and fixed callers)
 to get useful return values from it for the regexps that can be detected
 as invalid without trying to actually match them -- getting
 NS_ERROR_OUT_OF_MEMORY for jar.findEntries("*[*") is too wrong to bear
-change test of returned pointer from |new nsJAR(Directory)?InputStream()|
 returning NS_ERROR_FAILURE on null to NS_ERROR_OUT_OF_MEMORY (neither
 ctor does anything beyond initializing a couple simple member variables)
-lowercase a "Then" that shouldn't have been uppercased in comments
-change regexp parser to work correctly with "[\\]ab]" (handled by removing
 |++x| from the init of a |for| statement) and the case of an escaped ]
 inside a character class in general
Attachment #207957 - Attachment is obsolete: true
Attachment #207982 - Flags: review?(darin)
Attachment #207957 - Flags: review?(darin)
eh? doxygen does support +    /** The name (including path) of the entry. */
Comment on attachment 207982 [details] [diff] [review]
Updated for a few comments by timeless

>Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp

>+    // We can get aDir's contents as nsIZipEntry items via FindEntries
>+    // with the following pattern (see nsIZipReader.findEntries docs):
>+    //
>+    //   dir + "?*~" + dir + "?*/?*"
>+    nsDependentCString dirName(aDir);
>+    mDirNameLen = dirName.Length();
>+    nsCAutoString pattern = dirName + NS_LITERAL_CSTRING("?*~") +
>+                            dirName + NS_LITERAL_CSTRING("?*/?*");
>+
>+    nsCOMPtr<nsISimpleEnumerator> dirEnum;
>+    rv = aJAR->FindEntries(pattern.get(), getter_AddRefs(dirEnum));
>+    if (NS_FAILED(rv)) return rv;

This completely ignores the possibility that aDir contains any of the special characters mentioned in the docs for nsIZipReader.findEntries; I intend to update the patch to fix that by Wednesday.  The rest of the patch should still be fine, tho.
Changes:
-slight tweak to findEntries docs on pattern syntax
-iterate over dirname and copy chars into new string, escaping when needed, so
 directories containing special chars work
-this time built on Windows instead of Linux
Attachment #207982 - Attachment is obsolete: true
Attachment #208094 - Flags: review?(darin)
Attachment #207982 - Flags: review?(darin)
On my first pass through the patch I didn't see any major problems.  Looks very solid.  I have a few nits, but nothing major.  I'll try to post complete review comments as soon as possible.  Thanks for the awesome work on this patch!
Attached patch More comments by timeless (obsolete) — Splinter Review
Changes:
-precompute mBuffer.Length() before loops where mBuffer's length doesn't change
-don't leak old mJAR in nsJAR(Directory)?InputStream (or maybe require that it
 only be init'd once?  I'm unsure whether that's what was originally intended
 or not, but I think it's worth considering)

By the way, "deled" is a synonym for "deleted", and it's used in the patch purely by personal preference because it's a much cooler word.  ;-)
Attachment #208094 - Attachment is obsolete: true
Attachment #208166 - Flags: review?(darin)
Attachment #208094 - Flags: review?(darin)
Your patch removes the too-loose check on sane values for the realSize of zip entries (yay!). So I'm having that bug depend on this one for now (as it's not technically a dupe).

You may want to look at bug 321730 comment 2 though. Is there no way the realSize can be null or negative or something equally stupid? :-)
Blocks: 321730
(In reply to comment #14)
> Is there no way the realSize can be null or negative or something equally
> stupid? :-)

The realsize property of nsZipItem is a PRUint32 and thus can't be negative.  As for its value, it's set either directly from the item's entry within the zip or within the code to process entry-less directories within zips; the first case will use what the zip says is the size, and the second case will set it to 0.  The only case of stupidness I see would be if the zip contained a wrong value in either size column, but that won't matter except during extraction (in which case an appropriate error is thrown from nsZipArchive::InflateItem).  If users absolutely required totally accurate sizes we could add size verification during zip processing, but it'd be prohibitively slow, and no user is likely to care that much.
Changes:
-make nsIZipReader.test() work on synthetic entries -- before, I think it would
 only have worked on a zip file which starts with a zip entry, which isn't a
 strict requirement of the zip format -- e.g., self-extracting zips are valid,
 but I tried testing a synthetic entry in one without this change and it failed
-correct docs for crc32 to not say that accessing the crc32 of a synthetic is
 bad -- it would have been with a GetCrc32 nullcheck, but it's harmless now

I looked through the nsIZipReader.extract() callers, and I don't think any of them will need any fixup, as they already seem to work around the current mostly-non-support for directories and their extraction.

I also happened to notice that according to the http-index-format specs at <http://www.mozilla.org/projects/netlib/dirindexformat.html>, lines must be terminated by a carriage return followed by a linefeed.  (I sort of skimmed over that requirement when reading the spec the first time.)  I don't think there's a single dirindex creator in the tree that does this :-\ , but if it's desired that we start doing so it's easy to fix.
Attachment #208166 - Attachment is obsolete: true
Attachment #208415 - Flags: review?(darin)
Attachment #208166 - Flags: review?(darin)
Comment on attachment 208415 [details] [diff] [review]
Fix nsIZipReader.test() on synthetic entries

>+     *       special characters unless otherwise specified.
>+     *   o All other characters match case-sensitively.
>+     *
>+     *                  An aPattern not conforming to this syntax has undefined
>+     *                  behavior.
>+     * @throws NS_ERROR_ILLEGAL_VALUE on many but not all invalid aPattern

nit: The indentation of the "An aPattern..." bit is odd.



>Index: mozilla/modules/libjar/nsJAR.cpp

> #include "nsILocalFile.h"
...
>+#include "nsIFile.h"

If nsILocalFile.h is included, then you don't need to include nsIFile.h b/c
nsILocalFile extends nsIFile.


>+  if (item->isDirectory)
>+  {
>+    rv = localFile->Create(nsIFile::DIRECTORY_TYPE, item->mode);
>+    if (NS_FAILED(rv)) return rv;
>+    //XXX make this work recursively?
>+    //XXX do this in nsZipArchive?  but then we need to work with
>+    //XXX a PRDir, and there's no way to get a PRDir* from localFile

I don't understand these comments.  What needs to work recursively?
nsIFile::create does create all parent nodes before creating the
specified node (in case that's what you were referring to).


>+    rv = localFile->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE, item->mode, &fd);
>+    if (NS_FAILED(rv)) return NS_ERROR_FILE_ACCESS_DENIED;

How do you know to translate |rv| to NS_ERROR_FILE_ACCESS_DENIED?
Why not just return |rv|?


>+  nsZipFind *find;
>+  PRInt32 rv = mZip.FindInit(aPattern, &find);
>   if (!find)
>+    return ziperr2nsresult(rv);

Perhaps you should test |rv| instead of |find| since you do not
initialize |find| to null.  For example, this could cause problems
if FindInit returned early with an error and never set |find| to
null.


>+  if (NS_FAILED(rv)) {
>+    NS_RELEASE(*result);
>+    return NS_ERROR_FAILURE;
>+  }

Why don't you just return |rv| here?  Why mask the real error?


>Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp

>+/**
>+ * Sorting function which sorts nsIZipEntry items into alphabetical order.
>+ */
>+static int PR_CALLBACK compare(const void* aElement1,
>+                               const void* aElement2,
>+                               void* aData)
>+{
>+    nsIZipEntry* a = (nsIZipEntry*)aElement1;
>+    nsIZipEntry* b = (nsIZipEntry*)aElement2;
>+
>+    nsXPIDLCString name1, name2;
>+    a->GetName(getter_Copies(name1));
>+    b->GetName(getter_Copies(name2));
>+
>+    return Compare(name1, name2);
>+}

You should probably put a "XXX i18n" warning in here since this comparison
is only good for ASCII names.


>+            entryName = Substring(entryName, mDirNameLen);

Can you use .Cut instead here?


>+            nsDependentCString itemLastModTime(lmtBuf);

What is the advantage to creating this temporary object?  Why
not just use mBuffer.Append(lmtBuf) directly?


>+            if (itemIsDir)
>+                mBuffer.AppendLiteral(" DIRECTORY\n");
>+            else
>+                mBuffer.AppendLiteral(" FILE\n");

nit: I'm a fan of using brackets "{..}" when coding an "if..else" clause.
For single line "if.." clauses, it's fine to leave out brackets, but I 
think they help readability when an "else" is involved ;-)  Up to you though.


>+    NS_ADDREF(aJAR);
>+    NS_IF_RELEASE(mJAR);
>+    mJAR = aJAR;

Why don't you just make mJAR be a nsRefPtr<nsJAR>?  Then you would not need
to code such things.  You could just do "mJAR = aJAR" and it would amount to
the exact same code.


>+    // Don't assert if !fd, because it's possible that aJAR comes
>+    // from a cache and aJAR's file has been deled
>+    PRFileDesc* fd = aJAR->OpenFile();
>+    if (!fd)
>+        return NS_ERROR_UNEXPECTED;
...
>+        rv = aJAR->GetEntry(aDir, getter_AddRefs(ze));
>+        if (NS_FAILED(rv)) return rv;

Doesn't this leak |fd|?  You need to be very careful not to leave file
descriptors open forever.  Hmm... in fact, I don't see this |fd| variable
being used anywhere else in this function.  Am I missing something?  Or,
can you do without this call to OpenFile (or something like that)?


>+        rv = ze->GetIsDirectory(&isDir);
>+        if (NS_FAILED(rv)) return rv;
>+
>+        if (!isDir)
>+            return NS_ERROR_ILLEGAL_VALUE;

Same here.


>+    nsDependentCString::const_iterator curr, end;
>+    dirName.BeginReading(curr);
>+    dirName.EndReading(end);

The same could be written:

      const char *curr = dirName.BeginReading();
      const char *end = dirName.EndReading();

Personally, I find that a lot more readable.  The string iterator stuff
is left-over from the days when strings could be multi-fragment.  Those
days are long gone now, so I prefer the much simpler character pointers
for "iterators."


>+                NS_ADDREF(entryPtr);
>+                mArray.AppendElement(entryPtr);

You might want to use nsCOMArray<nsIZipEntry> instead for the type of
mArray.


>+    // clear out the array of nsIZipEntry items in the directory
>+    for (PRInt32 i = 0; i < mArray.Count(); i++) {
>+        nsIZipEntry* item = (nsIZipEntry*) mArray.ElementAt(i);
>+        NS_RELEASE(item);
>+    }

Stuff like this happens automagically when you use nsCOMArray.


>Index: mozilla/modules/libjar/nsJARInputStream.cpp

>+  NS_ADDREF(aJAR);
>+  NS_IF_RELEASE(mJAR);
>   mJAR = aJAR;

Same thing here.  Let's use nsRefPtr<nsJAR> :)


>Index: mozilla/modules/libjar/nsZipArchive.cpp

>+  nsZipFind* find;
>+  zip->FindInit(pattern, &find);
>+  return find;
> }

Shouldn't this test for errors returned from FindInit?


>+        else if (0 == PL_strcmp(item->name, zi->name))

Can this use a raw strcmp, or do you need to worry about either
of those inputs being null?  If you do not need to worry about
either being null, then just call strcmp directly.


>Index: mozilla/netwerk/streamconv/converters/nsIndexedToHTML.cpp

>+        if ('/' != path.Last() ||
>+            '!' != path.CharAt(path.Length() - 2)) {

You can also do |StringsEndsWith(path, NS_LITERAL_CSTRING("!/"))|,
which might a bit more readable.


>     else {
>         // escape as relative
>-        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon;
>+        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon | esc_Directory;

Why this change?
Attachment #208415 - Flags: review?(darin) → review-
> I don't understand these comments.  What needs to work recursively?

The idea was that if you tried extracting a directory from within a zip, it would extract the directory and then extract everything within the directory, doing so until every descendant of the directory was extracted.  I'm no longer convinced this would be a good idea, because it could break if you attempt to extract entries via the enumerator returned by findEntries("*"); in the unlikely case that foo/bar.txt occurred before foo/ in the enumeration, extracting foo/ would attempt to non-recursively dele a directory with contents and would fail.

> How do you know to translate |rv| to NS_ERROR_FILE_ACCESS_DENIED?
> Why not just return |rv|?

> Why don't you just return |rv| here?  Why mask the real error?

I hadn't looked closely enough during editing to notice these in the original file; both are now fixed.

> >+  nsZipFind *find;
> >+  PRInt32 rv = mZip.FindInit(aPattern, &find);
> >   if (!find)
> >+    return ziperr2nsresult(rv);

> >+  nsZipFind* find;
> >+  zip->FindInit(pattern, &find);
> >+  return find;
> > }

> Shouldn't [these] test for errors returned from FindInit?

The documentation in nsZipArchive.h (correctly) says in case of an error, the out pointer will be set to null.  Nevertheless, I changed these to both test |rv| and in the latter do |find = NULL;| if an error happened.

> You should probably put a "XXX i18n" warning in here since this comparison
> is only good for ASCII names.

Names of entries in zips don't have any sort of defined charset and thus can't be reliably alphabetized no matter how you split it, so I added, "XXX not i18n, but names in zips have no defined charset, so we can't win".

> Can you use .Cut instead here?

Sure; the one line of documentation of its existence in the XPCOM String Guide wasn't enough for me to notice it or determine that it was what I needed.

> What is the advantage to creating this temporary object?  Why
> not just use mBuffer.Append(lmtBuf) directly?

None; removed.

> >+            if (itemIsDir)
> >+                mBuffer.AppendLiteral(" DIRECTORY\n");
> >+            else
> >+                mBuffer.AppendLiteral(" FILE\n");
> 
> nit: I'm a fan of using brackets "{..}" when coding an "if..else" clause.
> For single line "if.." clauses, it's fine to leave out brackets, but I 
> think they help readability when an "else" is involved ;-)  Up to you though.

I moved this |if| further up into the Type section, so that there's total separation between metadata retrieval and 201: line output to the buffer.

> Why don't you just make mJAR be a nsRefPtr<nsJAR>?  Then you would not need
> to code such things.  You could just do "mJAR = aJAR" and it would amount to
> the exact same code.

I wasn't aware of or familiar with nsRefPtr; I took a look at its implementation and some of its uses, and it's pretty nifty.  Both of these cases are now nsRefPtr<nsJAR>s.

> >+    PRFileDesc* fd = aJAR->OpenFile();
> >+    if (!fd)
> >+        return NS_ERROR_UNEXPECTED;

This is indeed useless (and now removed).  I also checked through nsJARInputStream for hanging |fd|, and its |fd| is immediately passed to a mReadInfo of type nsZipReadState which takes ownership of the descriptor, so it'll correctly go away when the stream's deleted.  Close()ing the stream won't close the descriptor, but I don't think this is a problem.

> The same could be written:
> 
>       const char *curr = dirName.BeginReading();
>       const char *end = dirName.EndReading();
> 
> Personally, I find that a lot more readable.  The string iterator stuff
> is left-over from the days when strings could be multi-fragment.  Those
> days are long gone now, so I prefer the much simpler character pointers
> for "iterators."

This wasn't in the string guide, so I didn't know about it.  Sometime in the next week or so I'll update the string guide based on what you've mentioned in your comments; would you mind giving the changes a once-over after I make them?

> >+                NS_ADDREF(entryPtr);
> >+                mArray.AppendElement(entryPtr);
> 
> You might want to use nsCOMArray<nsIZipEntry> instead for the type of
> mArray.

This code is modeled off of nsDirectoryIndexStream, which used nsVoidArray to do this, and I wasn't familiar with nsCOMArray.  I also filed bug 324815 to change nsDirectoryIndexStream over to nsCOMArray.

> >+        else if (0 == PL_strcmp(item->name, zi->name))

They're null-checked on allocation, so neither can be null; it's a strcmp now.

> >+        if ('/' != path.Last() ||
> >+            '!' != path.CharAt(path.Length() - 2)) {
> 
> You can also do |StringsEndsWith(path, NS_LITERAL_CSTRING("!/"))|,
> which might a bit more readable.

This wasn't in the string guide either (and I didn't stumble upon it in the string header files I scanned through); switch made.  I also added a comment that this won't work correctly when the directory name ends with "!", but that's also a problem with the protocol itself and thus reasonably acceptable.

> >     else {
> >         // escape as relative
> >-        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon;
> >+        escFlags = esc_Forced | esc_OnlyASCII | esc_AlwaysCopy | esc_FileBaseName | esc_Colon | esc_Directory;
> 
> Why this change?

I explained this in comment 6 :-) :

"esc_Directory is needed here because otherwise the links to directories in zip
directory listings will include an escaped trailing slash."

I didn't think documenting the change there made sense initially, but this version of the patch includes a comment on why |esc_Directory| is needed.

Finally, there's one last additional bugfix made to nsZipArchive::FindNext to account for how a synthetic entry for a directory is sometimes created and then "overridden" by a real entry; here's one zip which demonstrates the bug:

http://whereswalden.com/files/mozilla/testcases/zip-doubled-directory.zip

The explanatory comments are fairly extensive, so I'm not going to bother explaining it here.  Incidentally, I think it's fairly rare to find a zip which will trigger this bug, because most utilities either use no explicit directories (e.g., WinZip and the Windows built-in zip utilities) or create directories before their contents when zipping recursively (e.g., Info-ZIP).  Only in the case where a file would be zipped before its parent directory would this problem appear, and I think it would almost have to be done intentionally for it to happen.
Attachment #208415 - Attachment is obsolete: true
Attachment #209731 - Flags: review?(darin)
> would you mind giving the changes a once-over after I make them?

Yes, gladly.  Thanks for offering to help revise the string guide!
Comment on attachment 209731 [details] [diff] [review]
Updated to comments, plus a bugfix

r=darin
Attachment #209731 - Flags: review?(darin) → review+
Attachment #209731 - Flags: superreview?(bzbarsky)
It'll likely take me at least a week and a half to get to this; I have other large reviews I need to do first.
Comment on attachment 209731 [details] [diff] [review]
Updated to comments, plus a bugfix

>Index: mozilla/modules/libjar/nsIZipReader.idl
>@@ -77,37 +118,75 @@ interface nsIZipReader : 
>+     *   o ~ followed by another shell expression will remove any pattern
>+     *       matching the shell expression from the match list.

I don't follow.  What's a "shell expression"?

>+     *   o (foo|bar) will match either the pattern foo or the pattern bar.  A
>+     *               pattern containing a pattern of this type has undefined
>+     *               behavior.

Not sure what this last sentence means....

>Index: mozilla/modules/libjar/nsJAR.cpp
>+  //XXX If we guarantee that the error is always FILE_DIR_NOT_EMPTY,
>+  //XXX we can remove |rv == NS_ERROR_FAILURE| - bug 322183 needs to be
>+  //XXX completely fixed before that can happen

File a followup bug (depending on bug 322183) to fix this, and put that bug# in this comment?

> nsJAR::FindEntries(const char *aPattern, nsISimpleEnumerator **result)

>+  PRInt32 rv = mZip.FindInit(aPattern, &find);

So this creates a new object...

>   nsISimpleEnumerator *zipEnum = new nsJAREnumerator(find);
>   if (!zipEnum)
>     return NS_ERROR_OUT_OF_MEMORY;

And we leak it there.  Want to fix that while you're here?  Followup bug if desired.

>+nsJAR::GetDirectoryInputStream(const char* aDir, nsIInputStream** result)
>+  // addref now so we can delete if the Init() fails
>+  *result = NS_STATIC_CAST(nsIInputStream*,jdis);

That should be "so that we can call Init()", no?

Also, you don't need the cast here.  Just assigning should work.

>+nsJARItem::GetIsDirectory(PRBool *aIsDirectory)
>+{
>+    if (!aIsDirectory)

That could be an assert instead, imo.  But either way.

>Index: mozilla/modules/libjar/nsJARChannel.cpp
>+    PRBool isDir = ( mJarEntry.IsEmpty() || '/' == mJarEntry.Last() );

It's too bad that mJarEntry is no a richer structure.  :(  Oh, well.

>+    if (isDir) {
>+        nsJAR* jar = (nsJAR*) mJarReader.get();

Why is this a safe cast?

Wouldn't it make more sense to have this logic in nsIZipReader::GetInputStream (that is, have it return a directory stream for a directory entry, and what it does now otherwise)?

>+        // check if we're displaying a directory
>+        // mJarEntry will be empty if we're trying to display
>+        // the topmost directory in a zip, e.g. jar:foo.zip!/
>+        if (len == 0 || '/' == fileName[len-1]) {

So would it make sense to use nsIZipEntry in this code?  Or would that not work?

If we _are_ going to have this check in several places in this code, then I suggest making it a macro or something... (checking IsEmpty() and Last()).

>Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp
>+ * The Original Code is Netscape Communicator source code. 

It is?  I doubt that.  ;)

>+ * The Initial Developer of the Original Code is
>+ * Netscape Communications Corporation.

Nope.

>+ * Portions created by the Initial Developer are Copyright (C) 1999

2006?  Or 2005?

>+ * Contributor(s):
>+ *   Mitch Stoltz <mstoltz@netscape.com>

I doubt Mitch actually contributed to this file...  ;)

>+nsJARDirectoryInputStream::Read(char* buf, PRUint32 count, PRUint32 *bytesRead)

>+    // If the buffer contains anything, write that out first
>+    PRUint32 buflen = mBuffer.Length();
>+    while (mBufPos < buflen && count != 0) {
>+        *(buf++) = char(mBuffer.CharAt(mBufPos++));
>+        numRead++;
>+        count--;
>+    }

So I was thinking.. how about:

  PRUint32 writeLength = PR_MIN(count, mBuffer.Length() - mBufPos);
  if (writeLength > 0) {
    memcpy(buf, mBuffer.get() + mBufPos, writeLength);
    count -= writeLength;
    mBufPos += writeLength;
    buf += writeLength;
  }

Could even factor this out into a subroutine or macro, since this is used in two places?

Seems like this would be more efficient than copying a char at a time, no?

>+        while (count > (PRUint32)mBuffer.Length()) {
>+            // have we consumed all the directory contents?
>+            if (arrayLen <= mArrPos)

  while (count > mBuffer.Length() && mArrPos < arrayLen) {

(note no need for the PRUint32 cast there, right?).

>+        // Now, write up to the desired amount of content to the buffer

Same here as the first place you do this.

>+    // We can get aDir's contents as nsIZipEntry items via FindEntries
>+    // with the following pattern (see nsIZipReader.findEntries docs)
>+    // assuming dirName is properly escaped:
>+    //
>+    //   dirName + "?*~" + dirName + "?*/?*"

I guess I need to understand what "~" does to grok this....

>+                nsIZipEntry* entryPtr = ze;

Why?  Why not just AppendObject(ze)?

>+    // Get the URL of the zip directory being displayed
>+    nsCOMPtr<nsIFile> file;
>+    rv = aJAR->GetFile(getter_AddRefs(file));
>+    if (NS_FAILED(rv)) return rv;

Won't this point to a file on the local machine?  And expose this information to web content?  That seems suboptimal.

>Index: mozilla/modules/libjar/nsJARDirectoryInputStream.h

Same comments on the license here as for the .cpp

>+#define NS_JARDIRECTORYINPUTSTREAM_CID \
>+   {0x06665122, 0x2b57, 0x475a, \
>+     {0xaf, 0xb7, 0x5f, 0x20, 0x9f, 0x2c, 0x85, 0xc3}}

Why do you need this?  I don't see this (or the NS_DECLARE_STATIC_CID_ACCESSOR thing) being used for anything...  Did I miss something?

>Index: mozilla/modules/libjar/nsJARInputStream.cpp
> nsJARInputStream::~nsJARInputStream()
> {
>-  Close();

Why?

>Index: mozilla/modules/libjar/nsZipArchive.cpp
>+nsZipArchive::FindInit(const char * aPattern, nsZipFind **aFind)
>+  *aFind = new nsZipFind(this, pattern, regExp);
>+  if (!aFind)

s/aFind/*aFind/ , right?

>@@ -1065,23 +1116,140 @@ PRInt32 nsZipArchive::BuildFileList(PRFi
>+      for (nsZipItem* zi = mFiles[hash]; ; zi = zi->next)
>+      {
>+        if (!zi)

Why not just put |zi| in the test in the for loop?

>+#ifndef STANDALONE
>+      // Arena allocate the nsZipItem

Isn't this a copy/paste of code above?  Worth a subroutine?

>Index: mozilla/modules/libjar/nsZipArchive.h
>+  PRBool      isDirectory;
>+  PRBool      isSynthetic;      /* whether item is 

PRPackedBool for both?

>Index: mozilla/netwerk/streamconv/converters/nsIndexedToHTML.cpp
>+    } else if (NS_SUCCEEDED(uri->SchemeIs("jar", &isScheme)) && isScheme) {

Ugh.  But I can't think of a better solution.  :(

sr=bzbarsky with the nits picked (though I'd like to see a patch with the ~ thing explained better.
Attachment #209731 - Flags: superreview?(bzbarsky) → superreview+
Attached patch Patch, v5 (probably final) (obsolete) — Splinter Review
I made enough non-trivial changes in this version that I'd like a sanity-check, please.  :-)  Most of the changes are minor, but in particular the use of a Create method on nsJARDirectoryInputStream and the use of references in CopyDataToBuffer are where I'd most like to be certain I'm doing things the preferred way.

(In reply to comment #22)
> I don't follow.  What's a "shell expression"?

It's an artifact of the original non-IDL pattern syntax docs.  I rewrote this to be clearer with a description of the behavior of "*~*z*" as a simple example.  Is the new comment better?

> >+     *   o (foo|bar) will match either the pattern foo or the pattern bar.  A
> >+     *               pattern containing a pattern of this type has undefined
> >+     *               behavior.
> 
> Not sure what this last sentence means....

They can't be nested.  Maybe it's not a point worth making; I removed it in this patch.

> >+  //XXX If we guarantee that the error is always FILE_DIR_NOT_EMPTY,
> >+  //XXX we can remove |rv == NS_ERROR_FAILURE| - bug 322183 needs to be
> >+  //XXX completely fixed before that can happen
> 
> File a followup bug (depending on bug 322183) to fix this, and put that bug# in
> this comment?

I'm not sure what the point of opening a separate bug is, because that one was to suffice for all instances of this problem.

> So this creates a new object...
> And we leak it there.  Want to fix that while you're here?  Followup bug if
> desired.

Bug 329071 filed.

> >+    if (isDir) {
> >+        nsJAR* jar = (nsJAR*) mJarReader.get();
> 
> Why is this a safe cast?
> 
> Wouldn't it make more sense to have this logic in nsIZipReader::GetInputStream
> (that is, have it return a directory stream for a directory entry, and what it
> does now otherwise)?

The problem is that this method means that using nsIZipReader.getInputStream and specifying a string that's a directory for the file name will return an application/http-index-format stream, which isn't how local file streams of directories work (an exception is thrown).  The cast was used to get a directory stream without also exposing such streams to clients who might try to get an input stream for a directory, when for consistency clients should not be able to get such input streams (and ideally would get the same error as file throws).

In any case, tho, there's a better way to do this -- add a static Create method to nsJARDirectoryInputStream and whenever an input stream for a directory is requested do all access to the zip file through that.  Also, because the zip isn't needed after it's enumerated, I removed the member reference to it from the directory stream and just let the enumeration happen during initialization.

> >+        // check if we're displaying a directory
> >+        // mJarEntry will be empty if we're trying to display
> >+        // the topmost directory in a zip, e.g. jar:foo.zip!/
> >+        if (len == 0 || '/' == fileName[len-1]) {
> 
> So would it make sense to use nsIZipEntry in this code?  Or would that not
> work?

It could be done, but it would require retrieving a new nsIZipEntry every time, and for the contenttype check we'd also have to reget an nsIZipReader.  Given the amount of work required, I think it makes more sense just to use the entry name to determine whether the URI is a directory.

> >Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp
> >+ * The Original Code is Netscape Communicator source code. 

> >+ * The Initial Developer of the Original Code is
> >+ * Netscape Communications Corporation.

> >+ * Portions created by the Initial Developer are Copyright (C) 1999

This was because I thought I hadn't changed enough of the file to make it truly a new file; I did a diff between the original and the new and decided to reconsider. :-)

> So I was thinking.. how about:
> 
>   PRUint32 writeLength = PR_MIN(count, mBuffer.Length() - mBufPos);
>   if (writeLength > 0) {
>     memcpy(buf, mBuffer.get() + mBufPos, writeLength);
>     count -= writeLength;
>     mBufPos += writeLength;
>     buf += writeLength;
>   }
> 
> Could even factor this out into a subroutine or macro, since this is used in
> two places?

Sure, makes sense.  Any comments on the way this patch does it?

> >+    // Get the URL of the zip directory being displayed
> >+    nsCOMPtr<nsIFile> file;
> >+    rv = aJAR->GetFile(getter_AddRefs(file));
> >+    if (NS_FAILED(rv)) return rv;
> 
> Won't this point to a file on the local machine?  And expose this information
> to web content?  That seems suboptimal.

No, because the stream converter currently ignores the actual 300: line in the directory index file and uses the URI provided by the channel.  However, the problem can be avoided easily enough by passing in the zip's base URI and using that, so this patch does so.

> >+#ifndef STANDALONE
> >+      // Arena allocate the nsZipItem
> 
> Isn't this a copy/paste of code above?  Worth a subroutine?

Sure, makes sense.


On a final note, a preliminary version of this patch for some reason occasionally demonstrated problems displaying the top-level directory of <http://whereswalden.com/files/mozilla/testcases/zipdir/1.zip>.  The generated stream would be empty, nothing in nsJARDirectoryInputStream would run, and a forced reload of the zip file would "fix" the problem.  (You'd see an empty directory because accessing the content type of the channel would still return application/http-index-format.)  I think I might have solved this problem, but as I'm not sure what actually triggered it I could be wrong.  I tested this by loading copies of the file mentioned above a few hundred times via a script without finding any had loaded incorrectly, but that's not really sufficient to say that the problem's fixed.  If you see anything in the changes that might cause this to happen, let me know and I'll make some more effort to fix it (in a followup bug -- my free time is pretty limited).
Attachment #209731 - Attachment is obsolete: true
Attachment #215188 - Flags: superreview?(bzbarsky)
Attachment #215188 - Flags: review+
Blocks: 329071
Comment on attachment 215188 [details] [diff] [review]
Patch, v5 (probably final)

>Index: mozilla/modules/libjar/nsIZipReader.idl
>+     *   o pat~pat2 returns matches to the pattern 'pat' which do not also match
>+     *              the pattern 'pat2'.  This may be used to perform 

So patterns of the form (foo|bar) are allowed as pat and pat2, right?  But something with a ~ in it is not allowed as one of the alternatives in  a (|) ?

It would be good to spell that out.  That is, explicitly say what's allowed in pat and pat2 here and what's allowed in foo/bar in the (foo|bar) construct.

But yes, the comments here are much clearer now.  Thanks!

>Index: mozilla/modules/libjar/nsJAR.cpp

>I'm not sure what the point of opening a separate bug is

The point is that a fix for bug 322183 isn't likely to actually fix the nsJAR code (note that your patch there doesn't fix it).  I want a separate bug to track fixing the nsJAR code once bug 322183 is fixed, so we don't get stuck with yet more stale XXX comments in our code.

>Index: mozilla/modules/libjar/nsJARDirectoryInputStream.cpp
>+nsJARDirectoryInputStream::Read(char* buf, PRUint32 count, PRUint32 

>+    PRUint32 numRead = 0;
>+
>+    // If the buffer contains data, copy what's there up to the desired amount
>+    numRead += CopyDataToBuffer(buf, count);

Why not just:

  PRUint32 numRead = CopyDataToBuffer(buf, count);

?

>+nsJARDirectoryInputStream::Init(nsIZipReader* aZip,
>+    jarDirSpec = NS_LITERAL_CSTRING("jar:") + jarURLSpec +
>+                 NS_LITERAL_CSTRING("!/") + dirName;

This duplicates code we have in nsJARChannel for building such specs, no?  I'd prefer if we just passed the right thing here to start with....

>+nsJARDirectoryInputStream::CopyDataToBuffer(char* &aBuffer, PRUint32 &aCount)

>+    PRUint32 numRead = 0;

So we always have numRead == writeLength when this method returns, right?  Why not just drop this numRead variable and returns writeLength?

sr=bzbarsky.  Looks great!
Attachment #215188 - Flags: superreview?(bzbarsky) → superreview+
(In reply to comment #24)
> The point is that a fix for bug 322183 isn't likely to actually fix the nsJAR
> code (note that your patch there doesn't fix it).  I want a separate bug to
> track fixing the nsJAR code once bug 322183 is fixed, so we don't get stuck
> with yet more stale XXX comments in our code.

Oh, I see -- I filed it as bug 332139 and mentioned in in the comment.

> >+nsJARDirectoryInputStream::Init(nsIZipReader* aZip,
> >+    jarDirSpec = NS_LITERAL_CSTRING("jar:") + jarURLSpec +
> >+                 NS_LITERAL_CSTRING("!/") + dirName;
> 
> This duplicates code we have in nsJARChannel for building such specs, no?  I'd
> prefer if we just passed the right thing here to start with....

You're right -- I hadn't noticed that mJarURI is exactly what I want here, so with that change.

Anyway, the patch is now checked in; I'll mark this FIXED once tinderboxen cycle.
Attachment #215188 - Attachment is obsolete: true
Tinderboxen looking good, marking FIXED.  Thanks to darin/bz for the reviews!
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Is there a particular reason why jar requires directories to end in /
(file and ftp don't; I don't know which other protocols generate listings)
(In reply to comment #27)
> Is there a particular reason why jar requires directories to end in /
> (file and ftp don't; I don't know which other protocols generate listings)

There's nothing in the zip specification preventing a file and a directory from having the same name if the final slash is ignored:

http://lxr.mozilla.org/mozilla/source/modules/libjar/appnote.txt
jar:http://whereswalden.com/files/mozilla/testcases/zipdir/same-name-file-dir.zip!/

Support for same-name files and directories varies (file-roller doesn't display both bar/ and bar, WinZip does, Info-ZIP at least reads such zips, Python will create them), but there's nothing explicitly preventing it in the zip specs.  It shouldn't be difficult to add code to direct jar:foo.zip!/bar to jar:foo.zip!/bar/ if bar/ exists and bar doesn't, but it hasn't been done yet.  Perhaps as part of bug 26496...
This solves two bugs caused by assuming directory names don't end in a /.
The first allows the xul tree to expand. The second one is just cosmetic.
Attachment #216775 - Flags: superreview?(jag)
Attachment #216775 - Flags: review?(jag)
Attachment #216775 - Flags: superreview?(jag)
Attachment #216775 - Flags: superreview+
Attachment #216775 - Flags: review?(jag)
Attachment #216775 - Flags: review+
Depends on: 448141
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: