Closed Bug 214672 Opened 20 years ago Closed 17 years ago

Further optimization and correctness improvements of libjar: streamlining nsJarInputStream

Categories

(Core :: Networking: File, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9alpha1

People

(Reporter: alfredkayser, Assigned: darin.moz)

References

Details

(Keywords: perf)

Attachments

(1 file, 23 obsolete files)

170.25 KB, patch
darin.moz
: superreview+
Details | Diff | Splinter Review
Patch coming up.
I've been busy with further optimizing libjar,
specifically nsJarInputStream. 
The main gist of it, is that all nsJarInputStream specific 
code (and members) are moved from nsZipArchive and nsJar to
nsJarInputStream itself.
This reduces the size of the standalone version (the installer),
but also of libjar generally, because of the more logical 
decomposition.

Estimated code reduction about 8%, about 3K codesize (optimized).

Note, this is a followup on bug 203627
Blocks: 210564
Blocks: 198487
No longer blocks: 198487, 210564
Summary: Further optimization of libjar: streamlining nsJarInputStream → Further optimization and correctness improvements of libjar: streamlining nsJarInputStream
Blocks: 210564
Some code size change indications:

C:\mozilla\mozilla\modules\libjar>grep '^+' patch.txt |wc
    905    4101   31579

C:\mozilla\mozilla\modules\libjar>grep '^-' patch.txt |wc
   2118    8528   66506

This is a 'WIP' patch..., but just posted to keep it safe here
This version does:
* Move ReadState to JARInputStream
    (less code in ZipArchive, less function calls, less structures)
* Made JARInputStream more MT-safe, and more independent from nsJAR.
    - after initialisation, it doesn't refer to JAR or zipitem anymore.
    - all data it needs is in its own class structure
    - 4K buffer and Zs structure only allocated for 'inflate'.
    - several small optimisations
* Removed 'lock' from nsJAR::GetInputStream, as it is not needed anymore.
    (and it was not valid anyway, as it only protected the GetInputStream call 
    itself, but not the stream).
* Cleared up use/passing round of filedescriptors
    - nsZipArchive maintains a FD for directory and for Test/Extract
    - nsJARInputStreams maintains its own FD for concurrent reading
    - nsJAR doesn't have a FD anymore
* Changed nsJar::Extract to first test if item exists in JAR before creating
outfile.
* Replace PRInt32 ZIP errorcodes with nsresult
* Merged GetItem and GetFileItem, to just GetItem
* Removed FindFree, as a normal 'delete find' is enough.
* Merged OpenArchive and ExtractFile into ZIP_..., both then calling the
    withFileDesc versions (code-reduction)
* Changed ExtractFileTo... to ExtractItem:
    - takes nsZipItem pointer, and filepointer as arguments.    
    - when filedescriptor is zero, don't write, only extract and calculate CRC.
* Changed TestItem to use ExtractItem with zero filedescriptor for testing
(saves a lot of code)
* Bug 173428: changed chmod into fchmod (other warnings are non critical)
* Includes: ZipEntry patch (203627)
* Includes: GetModTime patch (201226)
* changed nsZipItem into a struct, saving the class overhead
* moved compression info to a 'DEFLATED' bit in the flags of nsZipItem, saved 1
byte per item (avg 200 items per jar, 20 jars: 4000 bytes saved)
* Merged name allocation into nsZipItem (just like nsCacheMetaDataElement), saving
  an malloc/delete per item
* CopyItemToDisk: pass only size and crc instead of item
* move SeekToItem out from CopyItemToDisk and InflateToDisk to InflateItem
* nsJar: re-use OpenFile to open file for both jar itself, and for JIS.
* nsJARInputStream: moved OpenFile call to nsJar, making it 'protected' there.
* Made kMagic only a member when standalone, saving 4 bytes per jar and per find.
* change kArenaBlockSize member into a define, only used a parameter to allocate
arena once
* nsZipFind: PRBool -> PRPackedBool and ordering reduced sized to 16 bytes (from
24?)
* changed all error checking to NS_ENSURE macro's for better debugging support
    (allthough this increases debugsize, but optimized should be the same)

Timings:
  Old/New exactly the same: average of 10 start/stops
  old dll: 98KB, new dll: 82KB (saved about 16KB of 98KB, about 16%)
  After changing to NS_ENSURE macro's on lot of places: dll: 96KB 
    (but standalone is still smaller!, and optimized will still be smaller)

Todo:
  - nsCAutoString versus nsXPIDLAutoString??? 
  - nsZipFind: merged 'pattern' into structure (like nsCacheMetaDataElement),
saving malloc/delete
  - JArInputStream: Move CRC checking on 'output' instead of 'input'.
    
Not included: 
    * Bug 173615: ZIP_GetItemCount added.
      instead: Make a function to just get ItemCount from a 'closed' zipfile.

Size measurements (DEBUG build, standard config)
                 Old      New2
jar50.dll:      98364    81980  about 16% reduction
standalonelib:  57624    43384  about 25% reduction
jar50.xpt         941      770  about 18% reduction
Total KB        33804    33720/33764/33712
InMem KB        74416    74400
PrivateKB       28969    28840 (about 130K difference)

Estimation for optimized build would be about 5K of codesize savings.
Blocks: 198487
Attachment #131240 - Attachment is obsolete: true
Attachment #132047 - Flags: review?(darin)
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.6alpha
getting merge conflicts on the trunk:

6 out of 40 hunks FAILED -- saving rejects to file nsJAR.cpp.rej
1 out of 6 hunks FAILED -- saving rejects to file nsJAR.h.rej
2 out of 37 hunks FAILED -- saving rejects to file nsZipArchive.cpp.rej
1 out of 13 hunks FAILED -- saving rejects to file nsZipArchive.h.rej

maybe it would be better to just do one patch for this bug and bug 203627.  i
don't mind reviewing it all in one pass.  if you want to go this route, then
just put the patch in one bug and mark the other bug as being blocked by the bug
with the patch.
This bug superseedes now bug 203627.
An all-including patch will be attached soon.
Note, also bug 220060 is verified to work with this patch.
Blocks: 203627
No longer depends on: 203627
Generating using
cvs diff -D 2003/09/20 -uw  rdf/chrome modules/libjar xpinstall
netwerk/protocol/jar > jar_patch.txt
Attachment #132047 - Attachment is obsolete: true
Attachment #132129 - Flags: review?(darin)
Attachment #132047 - Flags: review?(darin)
It is difficult to create clean patch, as the tree is moving a lot...
Attachment #132129 - Attachment is obsolete: true
Comment on attachment 132131 [details] [diff] [review]
Cleaner version, hopefully with less rejects.

>Index: rdf/chrome/build/nsChromeFactory.cpp

>-NS_IMPL_NSGETMODULE(nsChromeModule, components)
>+NS_IMPL_NSGETMODULE(nsChromeModule, components);

actually, the semicolons are not supposed to be there.


>Index: rdf/chrome/src/nsChromeProtocolHandler.cpp

> NS_IMPL_ISUPPORTS2(nsCachedChromeChannel,
>                    nsIChannel,
>-                   nsIRequest)
>+                   nsIRequest);

same here.


>-NS_IMPL_THREADSAFE_ISUPPORTS2(nsChromeProtocolHandler, nsIProtocolHandler, nsISupportsWeakReference)
>+NS_IMPL_THREADSAFE_ISUPPORTS2(nsChromeProtocolHandler, nsIProtocolHandler, nsISupportsWeakReference);

and here.  etc...



>Index: rdf/chrome/src/nsChromeRegistry.cpp

> nsresult nsChromeRegistry::GetFormSheetURL(nsACString& aURL)
> {
>-  aURL = mUseXBLForms ? "chrome://forms/skin/forms.css" : "resource://gre/res/platform-forms.css";
>+  aURL = mUseXBLForms ? "chrome://forms/skin/forms.css" : "resource:/res/platform-forms.css";

why this change?  this change seems unrelated to the subject of this bug...
merge conflicts perhaps?


>Index: modules/libjar/nsJAR.h

>-    nsZipArchive *mArchive; // pointer extracted from mFind for efficiency
>     nsZipFind    *mFind;
>-    nsZipItem    *mCurr;    // raw pointer to an nsZipItem owned by mArchive -- DON'T delete
>-    PRBool        mIsCurrStale;
>+    const char*     mCurr;    // pointer to an name owned by mArchive -- DON'T delete

nit: line up mCurr with other variable names.


>Index: modules/libjar/nsJARInputStream.cpp

>+    PR_ASSERT(mInflate);
>+    PR_ASSERT(aBuffer);
>+    PR_ASSERT(aBytesRead);

use NS_ASSERTION instead of PR_ASSERT.	the behavior of NS_ASSERTION 
can be controlled via the environment variable XPCOM_DEBUG_BREAK, which
makes NS_ASSERTION much nicer than PR_ASSERT.

>+            // asserting because while this rarely happens, you definitely
>+            // want to catch it in debug builds!
>+            PR_ASSERT(0);

use NS_NOTREACHED here.


>Index: modules/libjar/nsZipArchive.cpp

>+    PRUint16 namelen = (PRUint16)PL_strlen(name);
>     if (bufsize > namelen)
>-    {
>-        PL_strcpy(outbuf, item->name);
>-    }
>+        PL_strcpy(outbuf, name);

try to use |strlen| and |strcpy| here since they are better
optimized (single cpu instruction on x86).  however, beware
when making this change that PL_ versions of these functions
null-check their input arguments whereas the stdc versions
do not.  this rule applies throughout the mozilla codebase.


>+    if (0 == PL_strcmp(aEntryName, item->name))
>+      break; //-- found it

here too... if you know that neither aEntryName nor item->name can be
NULL, then use |strcmp| instead.

>Index: modules/libjar/zipstub.h

>-#elif defined(XP_WIN)
>+#elif defined(XP_WIN) || defined(XP_OS2_VACPP)
> #define PR_EXTERN(__type)       extern _declspec(dllexport) __type
> #define PR_PUBLIC_API(__type)   _declspec(dllexport) __type
> #else /* XP_UNIX */
>@@ -68,13 +68,20 @@
> #define PR_PUBLIC_API(__type)   __type 
> #endif
> 
>+#ifdef XP_OS2_VACPP
>+#define PR_CALLBACK _Optlink
>+#define PR_STATIC_CALLBACK(__x) static __x PR_CALLBACK
>+#else
> #define PR_CALLBACK
> #define PR_STATIC_CALLBACK(__x) static __x
>+#endif

are you sure about these changes for OS/2?  do you have the
means to test OS/2?


>Index: xpinstall/packager/packages-os2

> [spellcheck]
>-bin/components/myspell.dll
>-bin/components/spellchk.dll
>-bin/components/spellchecker.xpt
>-bin/components/myspell/*
>+bin\components\myspell.dll
>+bin\components\spellchk.dll
>+bin\components\spellchecker.xpt
>+bin\components\myspell\*

are you sure about this?  the rest of the slashes in this file are
forward slashes.  maybe it has something to do with how this script
will be interpreted.


>Index: xpinstall/packager/packages-win

do you really mean the changes in this file to be part of this patch?


>Index: xpinstall/res/content/institems.js

this change seems unrelated to this bug... should it be in another bug?


>Index: xpinstall/res/content/institems.xul

same question about these changes.


>Index: xpinstall/src/nsInstall.cpp

>+    nsString pattern(aJarSource + NS_LITERAL_STRING("/*"));

use nsAutoString here since the right hand side arguments might
fit into a 64 character (128 byte) buffer.


>+    PRInt32 prefix_length = aJarSource.Length()+1; // account for slash
> 
>-    if (paths == nsnull)
>-    {
>-        *aReturn = SaveError(nsInstall::OUT_OF_MEMORY);
>-        return NS_OK;
>-    }
>+    nsresult rv = mJarFileData->FindEntries(NS_LossyConvertUCS2toASCII(pattern).get(), 
>+                                            &jarEnum );

nit: why not use nsCOMPtr with getter_AddRefs for jarEnum?


>+        nsString fileName = NS_ConvertASCIItoUCS2(Substring(name, prefix_length, name.Length()-prefix_length));

NS_ConvertASCIItoUCS2 is actually a class which inherits from nsString.
you can use this fact to avoid a string copy here.

	  NS_ConvertASCIItoUCS2 fileName(Substring(name, prefix_length,
name.Length()-prefix_length));

>         nsString newJarSource = aJarSource;

	  nsAutoString ... just in case everything fits onto the stack.


>         nsString newSubDir;

same here.

>Index: xpinstall/src/nsSoftwareUpdate.cpp

>+                                         nsSoftwareUpdate::GetInstance);
>+NS_GENERIC_FACTORY_CONSTRUCTOR(nsInstallTrigger);
>+NS_GENERIC_FACTORY_CONSTRUCTOR(nsInstallVersion);
>+NS_GENERIC_FACTORY_CONSTRUCTOR(nsSoftwareUpdateNameSet);

no trailing semicolons.


>Index: xpinstall/wizard/libxpnet/src/nsSocket.cpp

we should probably get mkaply to review the OS/2 changes...


>Index: xpinstall/wizard/windows/setuprsc/setuprsc.rc

>-    CONTROL         108,IDC_STATIC,"Static",SS_BITMAP,11,11,83,162,
>+    CONTROL
>+#ifndef __MINGW32__
>+                    108,
>+#endif
>+                    IDC_STATIC,"Static",SS_BITMAP,11,11,83,162,

ok, i'm curious... what does the 108 do, and why does MingW not like it?


>Index: netwerk/protocol/jar/src/nsJARProtocolHandler.cpp

>+#include "nsCExternalHandlerService.h"

>-        mMimeService = do_GetService("@mozilla.org/mime;1");
>+        mMimeService = do_GetService(NS_MIMESERVICE_CONTRACTID);

not sure if this is merge conflicts or what... but this change
should not be reverted.  we don't want necko to have a compile
time dependency on exthandler.	the Makefile.in change should
also not happen.


a lot of great stuff in this patch... thanks!!

i still need to spend more time reviewing the guts of the libjar
changes.  it might be good for you to go ahead and fixup these nits,
and then generate a new full patch (don't strip out whitespace
changes).  i'd like to apply the patch and do some testing.

also, have you tested the standalone libjar build?
Attachment #132131 - Flags: review-
Thanks for the review, I'll start working on it.
Question, how to best test the standalone version?
that is a good question.  i don't have any experience testing it, but i believe
it is used by the stub installer (the small installer that you can use to
download the rest of mozilla).  cc'ing dveditz and alecf... they probably know
more about how to test the standalone libjar.
The previous patch was a mess, it included changes from other
recent changes...

With regard to darin's comments:
>Index: rdf/chrome/build/nsChromeFactory.cpp

>-NS_IMPL_NSGETMODULE(nsChromeModule, components)
>+NS_IMPL_NSGETMODULE(nsChromeModule, components);

actually, the semicolons are not supposed to be there.
** Oops, something went wrong in the cvs diff...

nit: line up mCurr with other variable names.
** DONE

>Index: modules/libjar/nsJARInputStream.cpp

>+    PR_ASSERT(mInflate);
>+    PR_ASSERT(aBuffer);
>+    PR_ASSERT(aBytesRead);

use NS_ASSERTION instead of PR_ASSERT.	the behavior of NS_ASSERTION 
can be controlled via the environment variable XPCOM_DEBUG_BREAK, which
makes NS_ASSERTION much nicer than PR_ASSERT.
** DONE

use NS_NOTREACHED here.
** DONE

>Index: modules/libjar/nsZipArchive.cpp

try to use |strlen| and |strcpy| here since they are better
optimized (single cpu instruction on x86).  however, beware
when making this change that PL_ versions of these functions
null-check their input arguments whereas the stdc versions
do not.  this rule applies throughout the mozilla codebase.
** DONE

>+    if (0 == PL_strcmp(aEntryName, item->name))
>+	break; //-- found it
** DONE

>Index: modules/libjar/zipstub.h

>Index: xpinstall/src/nsInstall.cpp

>+    nsString pattern(aJarSource + NS_LITERAL_STRING("/*"));

use nsAutoString here since the right hand side arguments might
fit into a 64 character (128 byte) buffer.
** DONE

nit: why not use nsCOMPtr with getter_AddRefs for jarEnum?
** DONE

>+	  nsString fileName = NS_ConvertASCIItoUCS2(Substring(name,
prefix_length, name.Length()-prefix_length));

NS_ConvertASCIItoUCS2 is actually a class which inherits from nsString.
you can use this fact to avoid a string copy here.
** How? I couldn't find out how to do this.

>	  nsString newJarSource = aJarSource;
>	  nsString newSubDir;
	  nsAutoString ... just in case everything fits onto the stack.
** DONE

not sure if this is merge conflicts or what... but this change
should not be reverted.  we don't want necko to have a compile
time dependency on exthandler.	the Makefile.in change should
also not happen.
** Indeed, all comments not addressed are results from a merge/cvs diff
conflict

This patch is clean, and only reflects the intended changes.
Attachment #132131 - Attachment is obsolete: true
Attachment #132263 - Flags: review?(darin)
alfred: thanks for the updated patch!

so, if you look in xpcom/string/obsolete/nsString2.h, you'll see that
NS_ConvertASCIItoUCS2 is a typedef for NS_ConvertASCIItoUTF16, which is a class
inheriting from nsString.  you can write code like this:

  NS_ConvertASCIItoUTF16 str(some_cstr);
Attachment #132129 - Flags: review?(darin)
Complete new patch version with:
* Use of NS_ConvertASCIItoUCS2, and improved string usage in xpinstall
* A very small cleanup in nsJAR::GetNext
* diffed against uptodate tree (as of 2003/10/23)
Attachment #132263 - Attachment is obsolete: true
Attachment #132263 - Flags: review?(darin)
Attachment #133918 - Flags: review?(darin)
this patch was generated with whitespace changes removed, right?
i'm in the process of reviewing this patch.. i hope to have it finished by
sometime next week.
The following command was used:
cvs diff -D2003/10/23 -u rdf/chrome/src/nsChromeRegistry.cpp modules/libjar
xpinstall/src netwerk/protocol/jar/src/nsJARChannel.cpp > jar_patch2.txt

I've noticed meanwhile that some indents in nsJARInputStream.cpp are messed up,
reviewing my own code... So these need be cleaned up together anything
resulting from your review. 
alfred: there is a problem with this patch.  when i try to build under linux, i
run into problems resulting from the error code definitions in zipfile.h.  they
conflict with nsError.h while building nsXIEngine.cpp.  the problem is that
xpinstall/stub/xpistub.h is the one including nsError.h.  perhaps you should
keep ZIP_ error codes.  what do you think?  did you have to deal with this
problem under windows?
alfred: when trying to land big patches like this it is much easier to do so if
you minimize API changes.  that's just my 2 cents ;-)
alfred: also, i needed to make several fixes in modules/libjar and in caps to
get this to compile under linux (with an up-to-date trunk).  looks like your
patch is just missing the caps changes since they should have been required
under windows too.
Target Milestone: mozilla1.6alpha → mozilla1.6beta
Reverted to ZIP_ERR codes for STANDALONE build, so should be better
for Linux builds.
Also included the CAPS change.

This is still a fairly big patch, but it does a lot of cleanup and
optimisation.
See comment #2 for the list of changes.
Sumamry: about 16% (or 5K) codesize saving, and possibly some startup time
savings
Attachment #133918 - Attachment is obsolete: true
Attachment #133918 - Flags: review?(darin)
Comment on attachment 138026 [details] [diff] [review]
Reverted to ZIP_ERR codes for STANDALONE, and included the change in CAPS

Another try at completing this bug
Attachment #138026 - Flags: review?(darin)
Target Milestone: mozilla1.6beta → mozilla1.7alpha
Extensively tested on Linux, including some performance testing.
Startup time is about 0.5% reduced on Linux system 
(Fedora on PIII, 500Mhz, 192MBRam).
Attachment #138026 - Attachment is obsolete: true
Codesize reduction on Linux is about 6K (excluding the changes outside of libjar
itself). 
Next steps:
* test standalone version
* analyse memory footprint
Severity: normal → enhancement
Keywords: perf
Priority: -- → P3
Target Milestone: mozilla1.7alpha → mozilla1.7beta
Regretfully, I'm going to have to punt on finishing my review of this until the
1.8a cycle.
Target Milestone: mozilla1.7beta → mozilla1.8alpha
Target Milestone: mozilla1.8alpha1 → mozilla1.8beta
Attachment #140013 - Flags: superreview?(dveditz)
Attachment #140013 - Flags: review?(darin)
Increasing level to 'normal' (from enhancement), because it is not
about 'correctness' and 'optimisation', but also about fixing some
real bugs (leaking file descriptors, CRC validation, startup performance,
de-comification of ZipEntry), as this is one patch fixing all issues with libjar.
Severity: enhancement → normal
The latest patch has bit-rotted badly as a result of the patches for bug 231083
and 235781.  nsJAR.cpp and nsZipArchive.cpp are the files that I some
uncertainty about while trying to merge this patch onto the trunk.  Please
update the patch, and pester me relentlessly until I review it.  Thanks!
New patch coming up, as a result of the following chat:
Michael Kaply (14:29) - Got a sec?
Alfred Kayser (14:29) - Yep.
Michael Kaply (14:29) - Darin wanted me to profusely apologize to you because
your jar patch needs updating again
Michael Kaply (14:29) - Some security fixes went into the JAR code, and we need
to make sure they are mirrored in your code.
Alfred Kayser (14:30) - Ok. Can you point me to the specific fixes? I can try to
update the patch this weekend (hopefully).
Michael Kaply (14:31) - https://bugzilla.mozilla.org/show_bug.cgi?id=231083
Michael Kaply (14:31) - https://bugzilla.mozilla.org/show_bug.cgi?id=235781
Michael Kaply (14:32) - Also you might want to glance at
https://bugzilla.mozilla.org/show_bug.cgi?id=246687
Michael Kaply (14:32) - Secondly, I wanted to let you know of some code that was
going into Mozilla that you might be interested in
Alfred Kayser (14:32) - Ok, thanks, I will check them out.

Note, that the first two bugs I allready addressed in my patch, but I have
verified to copy all of the changes as documented in those bugs. 
Also the 246687 is incorporated, so everything should be up to date now.
Please review carefully.
The patch is tested on Linux, but I'm sure that
are more tricks with things like new nsString and
factories stuff that could make it better.
Attachment #140013 - Attachment is obsolete: true
Attachment #161589 - Flags: superreview?(dveditz)
Attachment #161589 - Flags: review?(darin)
P.s., patch file created using: 
cvs diff -u -w caps/src rdf/chrome modules/libjar xpinstall/src > jar_patch4.txt
Comment on attachment 161589 [details] [diff] [review]
Latest version, incorporating the bugs mentioned in the chat.

>Index: modules/libjar/nsIZipReader.idl

>+     * Returns a string enumerator containing the matching entry names.
>      */
>+    nsIUTF8StringEnumerator findEntries(in string aPattern);

perhaps we should be using AUTF8String whenever we reference an entry name
in this interface?  (not required for this patch of course)


>Index: modules/libjar/nsJAR.cpp

>+#define JAR_INVALID  1
>+#define JAR_INTERNAL 2
>+#define JAR_EXTERNAL 3

maybe add a comment indicating that these correspond to
nsJARManifestItem::mType?


> class nsJARManifestItem
...
>+  // Type of manifest entry: invalid, internal or external
>+  PRInt8        mType;

    // Type of manifest entry: JAR_INVALID, JAR_INTERNAL, JAR_EXTERNAL


>   virtual ~nsJARManifestItem();
> };

there's no need for this destructor to be virtual right?


> nsJAR::Test(const char *aEntryName)
> {
>+   return mZip.Test(aEntryName);

nit: indentation should be two whitespace characters in this module


>+  rv = localFile->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE | PR_TRUNCATE, item->mode, &fd);

nit: wrap long lines to 80'ish chars?


>+  *result = mZip.GetItem(zipEntry) ? PR_TRUE : PR_FALSE;

    *result = mZip.GetItem(zipEntry) != nsnull;


> nsJAR::GetInputStream(const char* aFilename, nsIInputStream** result)
...
>   nsJARInputStream* jis = new nsJARInputStream();
>+  NS_ENSURE_TRUE(jis, NS_ERROR_FAILURE);

use NS_ERROR_OUT_OF_MEMORY


>+  NS_ENSURE_SUCCESS(rv, NS_OK); // No signature verifier available[

nit: typo with trailing '['


> nsJAR::ReadLine(const char** src)
...
>+    if (c == '\r') {
>+      if (*eol == '\n') eol++;  // Skip the LF after the CR
>+      break;
>+    } else if (c == '\n') {
>+      break;

nit: 'else' after 'break' is unnecessary


>+  mManifestData = new nsObjectHashtable(nsnull, nsnull, DeleteManifestEntry, nsnull, 10);
>+  NS_ENSURE_TRUE(mManifestData, NS_ERROR_FAILURE);

NS_ERROR_OUT_OF_MEMORY


> nsJAR::~nsJAR()
...
>+  PR_FREEIF(mManifestData);

shouldn't this be 'delete mManifestData;' ?

please ensure that allocators are all matched up correctly.  in this case,
i think the result would be that mManifestData's destructor would not run,
and we'd thereby end up leaking a bunch of memory.


>Index: modules/libjar/nsJAR.h

>+class nsJAREnumerator : public nsIUTF8StringEnumerator
...
>     virtual ~nsJAREnumerator();
> 
> protected:

is this class ever subclassed?	(it doesn't look like it.)
so, how about making its destructor non-virtual and private?
and make its member vars all private.



>Index: modules/libjar/nsJARInputStream.h

>+    virtual ~nsJARInputStream() { Close(); }

same here.. probably this destructor can be made non-virtual and private.


ok, so overall the patch looks really good.  r=me with these nits fixed.
please post a revised patch.
Attachment #161589 - Flags: review?(darin) → review-
Attachment #140013 - Flags: superreview?(dveditz)
Attachment #161589 - Flags: superreview?(dveditz)
Attached patch Comments from Darin incorporated (obsolete) — Splinter Review
Incorporated your comments, and 
a small leak in the Manifest parsing, when 'invalid' entries
are found (and moved mType out of the class into the function).

P.s. There may be another leak: DeleteManifestItems does a 
PR_FREEIF(data) instead of delete, while the items are 'newed'.
Attachment #161589 - Attachment is obsolete: true
Attachment #163150 - Flags: superreview?(dveditz)
Attachment #163150 - Flags: review?(darin)
Thanks for the revised patch!


> P.s. There may be another leak: DeleteManifestItems does a 
> PR_FREEIF(data) instead of delete, while the items are 'newed'.

yeah, sounds like that should be fixed.  the trunk isn't open yet, so if you
want to fix that and attach another patch, it'd be appreciated :-)
Fixed the leak by deleting the ManifestItem instead of PR_FREEIF.
Also, only allocate mManifestData when there is really a Manifest found.
Attachment #163150 - Attachment is obsolete: true
Attachment #163150 - Flags: superreview?(dveditz)
Attachment #163150 - Flags: review?(darin)
Attachment #138026 - Flags: review?(darin)
Attachment #140013 - Flags: review?(darin)
Attachment #163653 - Flags: superreview?(dveditz)
Attachment #163653 - Flags: review?(darin)
I tried building this on windows, and it did not work for me.  I had to remove
the #ifdef XP_UNIX around nsZipItem's mode parameter, and then enable
ExtractMode, etc.  NOTE: OS/2 appears to honor the mode parameter passed to
PR_Open, so I'm not sure if it is correct to limit the mode to UNIX only.  The
old code seemed to play with the mode parameter on all platforms.

Once I fixed this, I then started up the browser, loaded mailnews, composer, and
then addressbook.  Everything looked good until I loaded chatzilla, which
triggered a crash:

nsHashtable::Get(nsHashKey * 0x0012f8f0) line 241 + 3 bytes
nsJAR::GetCertificatePrincipal(nsJAR * const 0x00be60bc, const char *
0x048f2878, nsIPrincipal * * 0x0012f9c8) line 342 + 19 bytes
nsJARChannel::GetOwner(nsJARChannel * const 0x04767e58, nsISupports * *
0x0012fa54) line 433 + 71 bytes
nsScriptLoader::OnStreamComplete(nsScriptLoader * const 0x04a687c4,
nsIStreamLoader * 0x0495fb98, nsISupports * 0x049162a0, unsigned int 0x00000000,
unsigned int 0xffffffff, const unsigned char * 0x048c1db4) line 890
nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x0495fb9c, nsIRequest *
0x04767e58, nsISupports * 0x049162a0, unsigned int 0x00000000) line 137
nsJARChannel::OnStopRequest(nsJARChannel * const 0x04767e60, nsIRequest *
0x044a3628, nsISupports * 0x00000000, unsigned int 0x00000000) line 682
nsInputStreamPump::OnStateStop() line 505
nsInputStreamPump::OnInputStreamReady(nsInputStreamPump * const 0x044a362c,
nsIAsyncInputStream * 0x045ab810) line 341 + 11 bytes
nsInputStreamReadyEvent::EventHandler(PLEvent * 0x04a202bc) line 119
PL_HandleEvent(PLEvent * 0x04a202bc) line 692 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00b51578) line 627 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00150654, unsigned int 0x0000c112, unsigned int
0x00000000, long 0x00b51578) line 1433 + 9 bytes
USER32! 77d43a50()
USER32! 77d43b1f()
USER32! 77d43d79()
USER32! 77d43ddf()
nsAppStartup::Run(nsAppStartup * const 0x00ba6a00) line 221
main1(int 0x00000001, char * * 0x002a4600, nsISupports * 0x00b4e320) line 1321 +
32 bytes
main(int 0x00000001, char * * 0x002a4600) line 1799 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e8141a()

Also, we need to change the UUID of nsIZipReader since methods have been removed.
I tried building this on windows, and it did not work for me.  I had to remove
the #ifdef XP_UNIX around nsZipItem's mode parameter, and then enable
ExtractMode, etc.  NOTE: OS/2 appears to honor the mode parameter passed to
PR_Open, so I'm not sure if it is correct to limit the mode to UNIX only.  The
old code seemed to play with the mode parameter on all platforms.

Once I fixed this, I then started up the browser, loaded mailnews, composer, and
then addressbook.  Everything looked good until I loaded chatzilla, which
triggered a crash:

nsHashtable::Get(nsHashKey * 0x0012f8f0) line 241 + 3 bytes
nsJAR::GetCertificatePrincipal(nsJAR * const 0x00be60bc, const char *
0x048f2878, nsIPrincipal * * 0x0012f9c8) line 342 + 19 bytes
nsJARChannel::GetOwner(nsJARChannel * const 0x04767e58, nsISupports * *
0x0012fa54) line 433 + 71 bytes
nsScriptLoader::OnStreamComplete(nsScriptLoader * const 0x04a687c4,
nsIStreamLoader * 0x0495fb98, nsISupports * 0x049162a0, unsigned int 0x00000000,
unsigned int 0xffffffff, const unsigned char * 0x048c1db4) line 890
nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x0495fb9c, nsIRequest *
0x04767e58, nsISupports * 0x049162a0, unsigned int 0x00000000) line 137
nsJARChannel::OnStopRequest(nsJARChannel * const 0x04767e60, nsIRequest *
0x044a3628, nsISupports * 0x00000000, unsigned int 0x00000000) line 682
nsInputStreamPump::OnStateStop() line 505
nsInputStreamPump::OnInputStreamReady(nsInputStreamPump * const 0x044a362c,
nsIAsyncInputStream * 0x045ab810) line 341 + 11 bytes
nsInputStreamReadyEvent::EventHandler(PLEvent * 0x04a202bc) line 119
PL_HandleEvent(PLEvent * 0x04a202bc) line 692 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00b51578) line 627 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00150654, unsigned int 0x0000c112, unsigned int
0x00000000, long 0x00b51578) line 1433 + 9 bytes
USER32! 77d43a50()
USER32! 77d43b1f()
USER32! 77d43d79()
USER32! 77d43ddf()
nsAppStartup::Run(nsAppStartup * const 0x00ba6a00) line 221
main1(int 0x00000001, char * * 0x002a4600, nsISupports * 0x00b4e320) line 1321 +
32 bytes
main(int 0x00000001, char * * 0x002a4600) line 1799 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e8141a()

Also, we need to change the UUID of nsIZipReader since methods have been removed.
Alfred: mManifestData was null.
Attachment #163653 - Flags: superreview?(dveditz)
Attachment #163653 - Flags: review?(darin)
(FF 1.0 keeps crashing on me)
Indeed, there is something going wrong in nsJAR::GetCertificatePrincipal
with missing manifests or so (in which case also no mManifestData will be
allocated, which is good), but it then should try to use it anyway.

The mode stuff was before dbaren patch really only used on XP_UNIX, but that
patch has changed it (rightfully so).

So, I will need to check this (and the XP_UNIX/mode stuff on Windows & OS/2),
and will come back with hopefully something better...
Attached patch Fixed the crash on mManifestData (obsolete) — Splinter Review
This fixes the mManifestData crash, as it checks on the pointer before
using it. Also clarified the logic in ParseOneFile and ParseManifest a bit.
This includes the removal of a not used enumeration from nsIJAR.idl

Also, fresh ID's for the changed IDL's.

Also, fixed the XP_UNIX and mode stuff, so that mode also works on other
platforms, and that only the SymLink stuff if XP_UNIX'ed.

Removed some debug code (which did'nt work anyway and was ifdef 0'd).
Attachment #163653 - Attachment is obsolete: true
Note, I've tested the above scenario (and a lot of other things), 
and it doesn't crash anymore.
Comment on attachment 166733 [details] [diff] [review]
Fixed the crash on mManifestData

Darin, can you check this out?
Attachment #166733 - Flags: review?(darin)
Comment on attachment 166733 [details] [diff] [review]
Fixed the crash on mManifestData

I tested the patch, and everything looks good.	Here are some
final review nits:


>Index: rdf/chrome/src/nsChromeRegistry.cpp

>+  rv = mOverrideJAR->ContainsEntry(PromiseFlatCString(aResult).get(), &ok);

So, here's an example of a case where it would be nice if
ContainsEntry took a |const nsACString&| as its first arg.


>Index: modules/libjar/nsIJAR.idl

>-[uuid(04501DB2-0409-11d3-BCF8-00805F0E1353)]
>+[uuid(62A3BC70-67A6-4da2-9247-8064033B5936)]
> interface nsIJAR : nsISupports
> {
>-
>-    const short NOT_SIGNED         = 0;
>-    const short VALID              = 1;
>-    const short INVALID_SIG        = 2;
>-    const short INVALID_UNKNOWN_CA = 3;
>-    const short INVALID_MANIFEST   = 4;
>-    const short INVALID_ENTRY      = 5;
>-    const short NO_MANIFEST        = 6;

Hmm... no real need to change the UUID on this one since the const
declarations do not affect the layout of the vtable.  I should have
clarified the UUID change policy when I previously commented about
rev'ing the UUID of nsIZipReader.


>Index: modules/libjar/nsIZipReader.idl

>-    /**
>-     * Closes a zip reader. Subsequent attempts to extract files or read from
>-     * its input stream will result in an error.
>-     */
>-    void close();

Hmm... you should keep in mind that GC'd languages such as JavaScript
and Java may require explicit "close" methods to shutdown an object.
You cannot rely on the interface being released at any predictable
time.  So, are you sure that we want to remove this close method?
As with nsIInputStream::close, you could make it an optional method
that exists purely in case someone wishes to forcibly "close" the
object from JS without waiting for the GC to run.  Though I doubt we
play with nsIZipReader from JS, it may make sense to allow it.

Remember that under Windows it is impossible to overwrite a file that
is in use.  So, explicit |close| methods may be the only solution for
synchronizing close and replace of a ZIP archive from JS.


>Index: modules/libjar/nsJARInputStream.cpp

>+nsJARInputStream::Read(char* aBuffer, PRUint32 aCount, PRUint32 *aBytesRead)
...
>+        if (mCurPos + aCount > mInSize) aCount = (mInSize - mCurPos);

please put a line break here so someone could set a breakpoint easily
when this condition is met.


>Index: xpinstall/src/nsInstall.cpp

>+    nsresult rv = mJarFileData->FindEntries(NS_LossyConvertUCS2toASCII(pattern).get(),

please use NS_LossyConvertUTF16toASCII instead of the UCS2 version.


please post a revised patch, and r=darin.  Nice work!!
Attachment #166733 - Flags: review?(darin) → review-
Fixed:
* Changed ContainsEntry to take const nsACString &
* Reved UUID of nsIJAR.idl
* Re-instated 'Close'
* linebreak in nsJARInputStream::Read
* nsInstall.cp: changed NS_LossyConvertUCS2toASCII into
NS_LossyConvertUTF16toASCII
Attachment #166733 - Attachment is obsolete: true
Comment on attachment 166924 [details] [diff] [review]
Patch with last nits of Darin incorporated

r=darin (see above)
Attachment #166924 - Flags: superreview?(dveditz)
Attachment #166924 - Flags: review+
Attachment #166924 - Attachment is obsolete: true
Attachment #166924 - Flags: superreview?(dveditz)
Comment on attachment 168039 [details] [diff] [review]
Version 7: fixed some indenting issues, and updated to current builds

Updated patch for the final review
Attachment #168039 - Flags: superreview?(dveditz)
Attached patch Slighty improved version (obsolete) — Splinter Review
Changed 'Init' into 'Open' and made it more safe, so that
functions don't crash after 'Close', and even after re-'Open'ing the
same or other zip-file using the same ZipReader object.
Tested in the javascript debugger, the nsInstaller, and the complete Mozilla
suite.
Calculations learn that this saves about 4K code for the app, 4K for the
installer, and about 400K of allocated memory...
Blocks: 92580
Attachment #168039 - Flags: superreview?(dveditz)
Attachment #168343 - Flags: superreview?(dveditz)
From bug 273406, comment 13:
//
... the code calls Init() on the Jar twice ...

The first time it passes, but after the second init() the file list contains two
entries for each actual content file. ...

nsJar should never be initialized twice. In addition to this problem it leaks
minor stuff like the PR_Lock and prossibly leaves the file open for reading.
There should be code in nsJAR to prevent this, or at least assert.
//

So, something additional to fix in this patch (or follow patch):
In nsJAR::Open add:
  if (mLock) {
    // Clean up old stuff
    Close();
  }
to clean up the 'old' stuff (normally one should do Close before Open...)

In nsJAR::Close add:
  // Release file handle, in case the nsJAR object remains allocated
  mZipFile = nsnull;
to at least release the file when somebody Close's the jar, but doesn't remove
the jar object itself.
I have been reviewing this bug. It's a long, slow, slog through a huge patch. It
might have gone easier if you hadn't combined bugs, separate smaller patches are
less likely to cause reviewers to recoil in horror.

You changed the manifest code, have you tested this with signed jars to make
sure they still work?
http://www.mozilla.org/projects/security/components/signed-script-example.html
jar:http://www.mozilla.org/projects/security/components/signed-script-demo.jar!/signed-script-demo.html
Comment on attachment 168343 [details] [diff] [review]
Slighty improved version

Please use the -p option on diffs, especially huge ones like this. The
default three lines of context are insufficient for reviewing. Try

  cvs diff -pNu9   // adjust context size if necessary for clarity

> -interface nsIZipEntry : nsISupports
> -{
> -    readonly attribute string           name;
> -    readonly attribute unsigned short   compression;

By getting rid of this iface you preclude the possibility of anyone
writing a ZIP archive viewer using this code. I guess if no one has
shown any interest by now it's OK.

Index: modules/libjar/nsZipArchive.h
===================================================================
>  #define ZIP_TABSIZE   256
> -// Keep this odd. The -1 is significant.
> -#define ZIP_BUFLEN    (4 * 1024 - 1)
> +#define ZIP_BUFLEN    4096

I'm concerned about this arbitrary change -- is there a reason other than
aesthetics? An exact 4k buffer was a frustrating source of mysterious bugs in
the past (see bug 112312 comment 12, not to mention the comment you deleted).

Index: xpinstall/src/nsInstall.cpp
===================================================================
@@ -445,47 +446,46 @@
> +    nsAutoString pattern(prefix + NS_LITERAL_STRING("*"));
> +
> +    nsCOMPtr<nsIUTF8StringEnumerator> jarEnum;
> +    nsresult rv = mJarFileData->FindEntries(NS_LossyConvertUTF16toASCII(pattern).get(),
> +                                            getter_AddRefs(jarEnum) );

Since this patch is about footprint and performance, I think you can make
pattern an NS_Lossy... string directly rather than have one in addition to
the nsAutoString:

     NS_LossyConvertUTF16toASCII pattern(prefix + NS_LITERAL_STRING("*"));

Not totally positive this works, whether you do this is up to you.


> +    PRBool bMore;
> +    rv = jarEnum->HasMore(&bMore);
> +    while (bMore && (result == nsInstall::SUCCESS))
>      {

You don't check rv and can can potentially use bMore uninitialized if HasMore
returns an error. Probably initializing bMore to PR_FALSE is good enough. If
you're going to ignore rv here then don't set it, although if you get
a non-success rv you really ought to set a non-success "result". i.e.

      if (NS_FAILED(jarEnum->HasMore(&bMore)))
	  result = EXTRACTION_FAILED;
      while (...)

You're still going to get uninit errors out of some of our tools
so go ahead and initialize bMore while you're at it.

The end of this routine now wants (result == SUCCESS && count == 0) to mean we
didn't find anything, so maybe you don't want an initial HasMore() failure to
set result to anything. Plus I notice you've got another
"rv = jarEnum->HasMore()" at the end of the loop. Another tack would be to
make NS_SUCCEEDED(rv) be part of the loop conditional. Something

> +        PRInt32 namelen = name.Length();
> +        NS_ASSERTION( prefix_length <= namelen, "Match must be longer than pattern!" );
  ...
> +        NS_ConvertASCIItoUTF16 fileName(Substring(name, prefix_length, name.Length()-prefix_length));

Don't bother with namelen and use name.Length() in the assert, or use namelen
in the string conversion. That conversion line is overly long anyway: fold it
and use the shorter form.

+	 if ( name.Last() == '/' )
+	 {
+	     // Skip it
+	     continue;
+	 }

The original could have used more comments, too; please add some here
explaining we're skipping it because it's a directory entry rather than a
file to extract.  "// We only extract files, not directory entries" or
"// Directory entry: skip it" or something.

still 4000 lines of patch to go...
actually, we want to make jar urls browsable ... :(
Browsable, like the file directory viewer? That was the original idea but no
one's gone beyond just wanting it. Now Alfred's offering a perf/footprint win at
the cost of skipping a feature we don't have anyway. Do we really need a zip
viewer when all the platforms have perfectly good utilities--better than we
could do--already?

Back to the patch...
I'm unhappy that nsJARInputStream duplicates the zip-reading code. That's just
asking for bugs in two places, or fixing one and not the other. Prior to this
patch all the smarts for zip archive internals lived in one file,
nsZipArchive.cpp. What was the rationale for killing nsZipReadState?
First thanks for doing the review, I will try to keep up with comments and
incorporate these into my tree, and make sure to clean up the patch as much as
possible (removing the minor whitespace changes, and other unneccessary cosmetic
changes).

|Please use the -p option on diffs, especially huge ones like this. The
|default three lines of context are insufficient for reviewing. Try
|  cvs diff -pNu9   // adjust context size if necessary for clarity
AK: OK

> -interface nsIZipEntry : nsISupports
> -{
> -    readonly attribute string           name;
> -    readonly attribute unsigned short   compression;

|By getting rid of this iface you preclude the possibility of anyone
|writing a ZIP archive viewer using this code. I guess if no one has
|shown any interest by now it's OK.
AK: I have thought about this a lot, but you can still 'browse' as the names of
the entries are still retrievable (using FindEntries), But the ZipEntries are
not used at all, and do cause a lot of overhead. Adding a separate function to
get the details/properties of an item could easily be added later (on demand),
and still allows to only pass a name for all normal operations (extract, test,
verify, stream)

>  #define ZIP_TABSIZE   256
> -// Keep this odd. The -1 is significant.
> -#define ZIP_BUFLEN    (4 * 1024 - 1)
> +#define ZIP_BUFLEN    4096

|I'm concerned about this arbitrary change -- is there a reason other than
|aesthetics? An exact 4k buffer was a frustrating source of mysterious bugs in
|the past (see bug 112312 comment 12, not to mention the comment you deleted).
AK: I changed it back, but it seems to be related to an old buffer size issue,
and the current zlib/ziparchive implemented doesn't need 'odd-sized' buffers
anymore. Especially when all the streaming code using 4K buffers, I just want to
make ziparchive use the same size.

> +    nsAutoString pattern(prefix + NS_LITERAL_STRING("*"));
> +
> +    nsCOMPtr<nsIUTF8StringEnumerator> jarEnum;
> +    nsresult rv =
mJarFileData->FindEntries(NS_LossyConvertUTF16toASCII(pattern).get(),
> +                                            getter_AddRefs(jarEnum) );

|Since this patch is about footprint and performance, I think you can make
|pattern an NS_Lossy... string directly rather than have one in addition to
|the nsAutoString:
|     NS_LossyConvertUTF16toASCII pattern(prefix + NS_LITERAL_STRING("*"));
AK: Done

> +    PRBool bMore;
> +    rv = jarEnum->HasMore(&bMore);
> +    while (bMore && (result == nsInstall::SUCCESS))
>      {

AK: To prevent uninit errors, and not catching an error code, changed to
    PRBool bMore = PR_FALSE;
    while (NS_SUCCEEDED(jarEnum->HasMore(&bMore)) && hasMore)

> +        PRInt32 namelen = name.Length();
> +        NS_ASSERTION( prefix_length <= namelen, "Match must be longer than
pattern!" );
  ...
> +        NS_ConvertASCIItoUTF16 fileName(Substring(name, prefix_length,
name.Length()-prefix_length));
AK: Done

+	 if ( name.Last() == '/' )
+	 {
+	     // Skip it
+	     continue;
+	 }
AK: changed to:
            // Skip the directory entries

------- Additional Comment #50 From timeless@myrealbox.com (working) 2005-03-22
20:42 PST [reply] -------

|actually, we want to make jar urls browsable ... :(
AK: See first item in this comment.

------- Additional Comment #51 From Daniel Veditz 2005-03-22 21:36 PST [reply]
-------

|Browsable, like the file directory viewer? That was the original idea but no
|one's gone beyond just wanting it. Now Alfred's offering a perf/footprint win at
|the cost of skipping a feature we don't have anyway. Do we really need a zip
|viewer when all the platforms have perfectly good utilities--better than we
|could do--already?
AK: See above

|Back to the patch...
|I'm unhappy that nsJARInputStream duplicates the zip-reading code. That's just
asking for bugs in two places, or fixing |one and not the other. Prior to this
patch all the smarts for zip archive internals lived in one file,
nsZipArchive.cpp. |What was the rationale for killing nsZipReadState?

AK: The rational is that this will solve the locking and memory leaks problems
(bug 198487 and bug 210564) that were caused by the interleaving of the plain
file extraction code and the JARInputStream streaming code.
Furthermore, now the jar XPCOM object can be destroyed now safely, even before
JARInputStream ends completely. Currently if the jar object is released before
the end of the stream, this will cause an crash (luckely jar objects are cached
(mostly), and not released quickly).
Furthermore, the file extraction (InflateItem) is now optimized to directly
write to a file, and the JARInputStream inflation code to inflate from a
dedicated 4K read buffer to the reader of the stream.
Furthermore, using the new 'inflateBack' feature of zlib, the InflateItem
function can even directly write from the zlib own buffer to disk (which saves a
buffer copy, and changes diskwrites from 4K to 32K with even less memory usage)
(see bug 247458). Note that the 'inflateBack' does not work for JARInputStream,
as the loop control is outside of JARInputStream, and inflateBack takes over the
loop control.
To prevent duplication of code, some of the zlib related code has been moved to
utility functions (such as setting up the zlib structures)
Blocks: 247458
I have checked the manifest parsing, and found and solved an ugly bug
in the ReadLine code (it missed detecting the EOF). So now the testcase at
http://www.mozilla.org/projects/security/components/signed-script-example.html
will work with the next patch.


Attachment #168039 - Attachment is obsolete: true
Attachment #168343 - Flags: superreview?(dveditz)
Comment on attachment 178448 [details] [diff] [review]
Version 8: incorporated comments of dveditz, and using diff -u9Np

Dveditz, I have incorporated your previous comments.
Can you please review this patch? It would be nice to get this in 1.8
Attachment #178448 - Flags: superreview?(dveditz)
pushing out to 1.9 alpha...

dveditz: can you please review this patch or punt to someone else?  thanks!
Target Milestone: mozilla1.8beta1 → mozilla1.9alpha
Attachment #168343 - Attachment is obsolete: true
Attachment #178448 - Attachment is obsolete: true
Attachment #197903 - Attachment is obsolete: true
Comment on attachment 197972 [details] [diff] [review]
v10: Cleaner version, removed unneeded changes

Dzeditz, could you please review this patch?
Attachment #197972 - Flags: superreview?(dveditz)
Comment on attachment 178448 [details] [diff] [review]
Version 8: incorporated comments of dveditz, and using diff -u9Np

v8 is now obsolete, please review v10.
Attachment #178448 - Flags: superreview?(dveditz)
Just to remind everybody: this patch results in: 4K codesize savings, about 5% startup time saving (measured by averaging 11 warmstarts of SeaMonkey on Windows), and 400K of allocated memory.
Attachment #197972 - Flags: superreview?(dveditz)
Attached patch V11: Updated to current trunk (obsolete) — Splinter Review
Please review this version, as it is now up to date (Dec 2, 2005)!
Attachment #197972 - Attachment is obsolete: true
Attachment #204764 - Flags: superreview?(dveditz)
Alfred -- is there any hope that this work could land on the 1.8 branch?  I see lots of API changes, so I doubt it.
Doug, while I hoped to get it in for 1.8, I doubt it will happen.
Reviews take forever, and indeed it is a firm impact on the nsIZipReader.

Restating the benefits:
About 0.5% startup improvement...
About 400K less allocated memory...
About 6K codesize reduction...
Comment on attachment 204764 [details] [diff] [review]
V11: Updated to current trunk

has this patch been reviewed by darin?
Version 6 is r'rd by Darin, after that the versions are mostly updates to the current trunk, and comments from dveditz.
I have a question with regard to this bug (and also https://bugzilla.mozilla.org/show_bug.cgi?id=203627):

Would it be possible to keep the nsIZipEntry interface and the method:

  nsIZipEntry getEntry(in string zipEntry);

AND add the method:

  boolean containsEntry(in AUTF8String zipEntry);

and use the latter one for optimization?

I would have liked to have a smarter nsIZipEntry actually.  I would be interested in "mounting" a Zip file and access the entries as nsILocalFile.  In other words, nsIZipReader could be QI'ed to nsIFile and would be equivalent to a directory, and nsIZipEntry could be QI'ed to a nsILocalFile.  I need to access an entry in the archive as a file but I don't want to extract it to disk.  Would that make sense?
it makes no sense to use nsILocalFile here. its methods can't be reasonably implemented. but exposing as an nsIFile would make sense.

imagine all the fun you can have with mozilla code that assumes all nsIFiles can be qi'd to nsILocalFile :-)
(In reply to comment #68)
> it makes no sense to use nsILocalFile here. its methods can't be reasonably
> implemented. but exposing as an nsIFile would make sense.

True for nsILocalFile.  Actually it probably does not make sense to QI nsIZipReader to nsIFile either, thinking about it.  Maybe add a method:

  nsIFile getRoot();

to nsIZipReader?  This way it would be possible to list the Zip content as a regular directory, walk through the archive tree like a regular filesystem and pass Zip entries to methods requiring files.  That would be very convenient.
Due to bug 309296, another update of this patch is needed.
Even worse, the removal of nsIZipEntry is not no longer possible, as it is required for the directory reader.

Besides the previously described cleanups, this bug does the following:
* Make nsJarItem/nsIZipItem independent from nsJar by copying the data instead of refering. This makes this object safe to close of the nsJar itself.
* Remove GetName from nsIZipItem (so we don't need to copy it), as it is not required.
* Moved GetModTime to nsJar.cpp to make it a 'inline static'
* Removal of some useless includes.
* As described in #67, containsEntry is now added, keeping getEntry (for the details)
* Use of NS_ENSURE_ARG_POINTER to check for argument pointers

With this patch, I get a codesize saving of 18KB
(jar50.dll from 69.632 to 51.712 bytes), not counting the savings in other locations (nsInstall.cpp)
Attachment #204764 - Attachment is obsolete: true
Attachment #217326 - Flags: review?(darin)
Attachment #204764 - Flags: superreview?(dveditz)
Comment on attachment 217326 [details] [diff] [review]
V12: Updated after the additions by bug 309296

I'd like to have jwalden review this since he has libjar freshly on his mind.  Then, I'll do the SR.
Attachment #217326 - Flags: superreview?(darin)
Attachment #217326 - Flags: review?(jwalden+bmo)
Attachment #217326 - Flags: review?(darin)
Ok!
P.s., I have removed most of the changes to the Manifest parsing, to keep that for a separate bug/patch/review, except for the move of the 'enum' define from the IDL to the nsJAR.h itself.
Note, nsZipItem contained both 'flags' and some PRPackedBool's, so I made them all PRPackedBools...
Blocks: 272451
Alfred: I tried applying this patch on Linux, but it doesn't build.  It complains about ResolveSymlink not being defined on nsZipArchive.  There were also problems compiling nsZipArchive.cpp because IsSymLink != IsSymlink.
Made ExtractFile to also do the deletion on error and the symlink resolving so that it that all happens in one place only. Also fixed issue in ZIP_ExtractFile to use the right filename for delete/symlink. 
P.s. Also two AssignLiterals's instead of Assign(NS_LIT...
Attachment #217326 - Attachment is obsolete: true
Attachment #217326 - Flags: superreview?(darin)
Attachment #217326 - Flags: review?(jwalden+bmo)
Attached file Interdiff between version 12 and 13 (obsolete) —
Comment on attachment 217824 [details] [diff] [review]
V13: Fixed the ResolveSymlink things

Excuses for the spam, I wanted to check the attachment first before placing the review requests
Attachment #217824 - Flags: superreview?(darin)
Attachment #217824 - Flags: review?(jwalden+bmo)
I plan to get to this over the next weekend, because it'll require a rather large chunk of free time I'm very unlikely to have until then.
nsSoftwareUpdateRun.cpp needs to enumerate the nsIZipEntry objects so that it can test the "isSynthetic" attribute.  This patch makes that difficult to do.  It becomes necessary to use nsIZipReader::GetEntry to workaround the fact that the enumeration API returns zipEntry names instead of objects.

I also had a bunch of merge conflicts upon trying to apply this patch.  I'll post a revised patch that includes merge conflicts shortly.
Attached patch V14: Merged to trunk (obsolete) — Splinter Review
Attachment #217824 - Attachment is obsolete: true
Attachment #217825 - Attachment is obsolete: true
Attachment #218741 - Flags: superreview?(darin)
Attachment #218741 - Flags: review?(jwalden+bmo)
Attachment #217824 - Flags: superreview?(darin)
Attachment #217824 - Flags: review?(jwalden+bmo)
The 'isSynthetic' was introduced by bug 309296, and actually in nsSoftwareUpdateRun.cpp the check for 'isSynthetic' needs probably be replaced by 'isDirectory' as all directory entries should not be checked for signatures?
So, then the isSynthetic can be replaced by a check for a ends with '/'?
Like in nsJARChannel.cpp: 
#define ENTRY_IS_DIRECTORY(_entry) \
  ((_entry).IsEmpty() || '/' == (_entry).Last())
Alfred: OK, that makes sense.
Priority: P3 → P2
(In reply to comment #80)
> The 'isSynthetic' was introduced by bug 309296, and actually in
> nsSoftwareUpdateRun.cpp the check for 'isSynthetic' needs probably be replaced
> by 'isDirectory' as all directory entries should not be checked for signatures?

Given that code prior to bug 309296 would include explicit entries for directories in parsing, this would be a change from current behavior.  Furthermore, if there are any signed zips out there which contain explicit directory entries, this change would break the signature information associated with them.  Who'd be the right person to ask to verify what the correct behavior is supposed to be?
Priority: P2 → P3
Comment on attachment 218741 [details] [diff] [review]
V14: Merged to trunk

>Index: modules/libjar/nsIJAR.idl

Just to be absolutely sure, changing the constants in an interface doesn't require a uuid bump, correct?


>Index: modules/libjar/nsIZipReader.idl

>     /**
>-     * Initializes a zip reader after construction.
>+     * Opens a jar/zip file for reading.
>      */

Can a zip reader be recycled by calling open() on it with a different nsIFile?  The comment here should explicitly say whether or not that's allowed.

Also, a nit: the rest of the docs don't refer to jars and only refer to zip files, so don't refer to jar files here.


>+    boolean containsEntry(in AUTF8String zipEntry);

Is there a particular reason "contains" is used instead of "has"?  I can't think of a reason why one is particularly more or less clear than the other, and all other things being equal I'd prefer the one that requires less typing.


>Index: modules/libjar/nsJAR.cpp

> // The following initialization makes a guess of 10 entries per jarfile.
> nsJAR::nsJAR(): mManifestData(nsnull, nsnull, DeleteManifestEntry, nsnull, 10),
>-                mParsedManifest(PR_FALSE), mGlobalStatus(nsIJAR::NOT_SIGNED),
>+                mParsedManifest(PR_FALSE), mGlobalStatus(JAR_MANIFEST_NOT_PARSED),
>                 mReleaseTime(PR_INTERVAL_NO_TIMEOUT), 

Nit: put the two member variable inits on separate lines.


>+nsJAR::GetJarPath(nsACString& aResult)

Nit: move this down further so it's outside the interface methods on nsJAR.


>+NS_IMETHODIMP
>+nsJAR::ContainsEntry(const nsACString &aEntryName, PRBool *result)
>+{
>+  *result = mZip.GetItem(PromiseFlatCString(aEntryName).get()) != nsnull;
>+  return NS_OK;
>+}

I don't think |PromiseFlatCString| is necessary here, but I could be wrong.


>+nsJAR::FindEntries(const char *aPattern, nsIUTF8StringEnumerator **result)

While I still remember, this leaks |find| if creation of |zipEnum| fails, just like the current code does; eventually we'll fix that in bug 329071.


> nsJAR::GetInputStream(const char* aFilename, nsIInputStream** result)
>+  // Open jarfile, to get an own filedescriptor for the stream

"an own"?


>+nsJAREnumerator::HasMore(PRBool* aResult)
>+        NS_ENSURE_TRUE(mFind, NS_ERROR_NOT_INITIALIZED);

Move this into the constructor so it's only executed once instead of probably 2*n times?


>-NS_IMPL_THREADSAFE_ISUPPORTS1(nsJARItem, nsIZipEntry)
>+NS_IMPL_ISUPPORTS1(nsJARItem, nsIZipEntry)

Why this change?


>Index: modules/libjar/nsJAR.h

>+    PRPackedBool mIsDirectory : 1; 
>+    PRPackedBool mIsSynthetic : 1;

Is this necessary?  I can't remember ever seeing C bit fields in the Mozilla codebase, and PRPackedBool is already supposed to reduce the storage space for boolean values.  There are only two of these anyway, so you're not going to get any space savings from using bit fields over using plain PRPackedBool variables; using bit fields here may even be slower, too.


>Index: modules/libjar/nsJARDirectoryInputStream.cpp

>+            const char * entryName = mArray[mArrPos]->get();
>+            PRUint32 entryNameLen = mArray[mArrPos]->Length();
>+            nsCOMPtr<nsIZipEntry> ze;
>+            rv = mZip->GetEntry(entryName, getter_AddRefs(ze));
>+            if (NS_FAILED(rv)) return rv;

Put a '// Name' before this, move the code that gets the relative portion of the name up into this code (to consolidate all name-related code into one section), and add an empty line before the lines that get the actual zip entry.


>Index: modules/libjar/nsJARInputStream.cpp

>+nsJARInputStream::Init(nsJAR* aJAR, nsZipItem *item, PRFileDesc *fd)

If something fails here (mInflate/gZlibInit/SeekToItem), |mFd| is left around until the stream is "closed", and various other state bits will be in some vaguely disorganized state.  Depending on how things happen, the stream's external behavior when the various stream methods on it are called will be somewhat erratic.  (For example, if SeekToItem fails, Available will return some value it thinks is reasonable, not throw some error.)  I'd like to see this handled more atomically, such that either all initialization succeeds and everything works correctly or some initialization fails and everything is guaranteed to not look or even maybe appear to work correctly.


>+nsJARInputStream::Read(char* aBuffer, PRUint32 aCount, PRUint32 *aBytesRead)
> {
>+    *aBytesRead = 0;
>+
>+    NS_ENSURE_ARG_POINTER(aBuffer);
>+    NS_ENSURE_ARG_POINTER(aBytesRead);

I think you want the NS_ENSUREs before |*aBytesRead = 0;|.  :-)


>+    // be agressive about closing!

s/agre/aggre/


> nsJARInputStream::ReadSegments(nsWriteSegmentFun writer, void * closure, PRUint32 count, PRUint32 *_retval)
> {
>+    if (mClosed)
>+        return NS_BASE_STREAM_CLOSED;
>+
>     // don't have a buffer to read from, so this better not be called!
>     NS_NOTREACHED("Consumers should be using Read()!");
>     return NS_ERROR_NOT_IMPLEMENTED;
> }

This change is incorrect; see <http://lxr.mozilla.org/mozilla/source/xpcom/io/nsIInputStream.idl#165>.


>Index: modules/libjar/nsJARInputStream.h

>+

Nit: excess whitespace.


>     NS_DEFINE_STATIC_CID_ACCESSOR( NS_JARINPUTSTREAM_CID );

It was pointed out to me during reviews in bug 309296 that this isn't used anywhere; can it be removed?  I'm not sure what its purpose is and thus don't know whether or not it's safe to do so; given that there are no references to it in LXR I suspect it can be removed.


>Index: modules/libjar/nsXPTZipLoader.cpp

>+    int index=0;

Spaces around the '=', please.


>Index: modules/libjar/nsZipArchive.cpp

>-static PRUint16 xtoint(unsigned char *ii);
>+static inline PRUint16 xtoint(unsigned char *ii);

Is the |inline| necessary?  Why not just let the compiler decide for all of these?


>+  PRFileDesc * fd = PR_Open(zipname, PR_RDONLY, 0000);

It probably doesn't matter, but the last parameter should be |0400|.


> //---------------------------------------------
> // nsZipArchive::ExtractFile
>+// This extracts the item to the filehandle provided.
>+// If 'outFD' is null, it only tests the extraction.
>+// On extraction error(s) it removes the file.
>+// When needed, it also resolves the symlink.
> //---------------------------------------------
>+nsresult nsZipArchive::ExtractFile(nsZipItem *item, const char *outname,
>+                                   PRFileDesc* aFD)

Style everywhere else would suggest using |aFd| instead of |aFD|; also, change 'outFD' to the actual name of the variable.

I'm not really a fan of overloading ExtractFile to do something completely not indicated by its name, but I'm not sure what there is to do about it if the code would be fairly similar for both.


>+  for (pos=0; pos < itemSize; pos += chunk)

Nit: while you're changing this, spaces around the '=' in |pos=0|?


>+    chunk = (itemSize-pos < ZIP_BUFLEN) ? itemSize-pos : ZIP_BUFLEN;

Spaces around the '-' here, too.


>+static inline PRUint32 HashName(const char* aName)
>+static inline PRUint16 xtoint (unsigned char *ii)
>+static inline PRUint32 xtolong (unsigned char *ll)
>+static inline PRUint16 ExtractMode(unsigned char *ll)

It seems fairly unlikely to me that these would be inlined by the compiler, and I don't see a reason to give hints to the compiler if it's likely to ignore them.


>Index: modules/libjar/nsZipArchive.h

>+  PRPackedBool hasDataOffset : 1;
>+  PRPackedBool isDirectory : 1; 
>+  PRPackedBool isSynthetic : 1;  /* whether item is an actual zip entry or was
>+                                    generated as part of a real entry's path,
>+                                    e.g. foo/ in a zip containing only foo/a.txt
>+                                    and no foo/ entry is synthetic */
>+#if defined(XP_UNIX) || defined(XP_BEOS)
>+  PRPackedBool isSymlink : 1;
> #endif

So it looks like the C bitfields here actually are useful, but I'm still not entirely convinced it's really worth saving a word of storage per nsZipItem here, particularly if accessing each member variable might require more instructions than simply using normal PRPackedBool's would require.


>+   * ExtractFile
>+  nsresult ExtractFile(nsZipItem * zipEntry, const char *outname, PRFileDesc * outFD);

Documentation for this method (particularly if the NULL-means-just-test behavior is retained)?


>+  nsresult  SeekToItem(nsZipItem* aItem, PRFileDesc* aFd);

Ideally some documentation here would be nice, too, but it's somewhat less critical as the function won't be used as often as an extracting function would be used.


>Index: toolkit/mozapps/extensions/src/nsExtensionManager.js.in

>+        while (entries.hasMore()) {
>+          entryName = entries.getNext();

You probably need to prefix this with |var |, or otherwise this code will display JavaScript strict warnings.


>+          if (entryName.substr(entryName.length - 1, 1) == "/")

Nit: |entryName.charAt(entryName.length - 1) == "/"| feels more readable.


Anyway, things generally look okay here, so r=jwalden with comments addressed.
Attachment #218741 - Flags: review?(jwalden+bmo) → review+
dveditz would be the right person to ask.  I think you are probably correct that we should worry about explicit directory entries, and therefore I think the current patch which uses GetEntry + IsSynthetic is probably the right way to go.  It'd be nice to avoid the GetEntry lookup during enumeration, but I think we can live with it.  Afterall, it's just a hashtable lookup ;-)
Alfred: I think it would be best if you would cut a revised patch addressing the issues that Jeff raised.  Then, I will take care of getting it committed to the trunk!  <-- seriously :-)
Thanks, I will do that this weekend (hopefully).
Response to the comments from Jeff:
Just to be absolutely sure, changing the constants in an interface doesn't
require a uuid bump, correct?
AK: No, or at least that is what I understood. Darin?

Can a zip reader be recycled by calling open() on it with a different nsIFile? 
The comment here should explicitly say whether or not that's allowed.
AK: Yes, after closing it first, added some comments

I don't think |PromiseFlatCString| is necessary here, but I could be wrong.
AK: Probably not (as strings are now all flat, but the API says it should use PromiseFlatCString). Darin?

While I still remember, this leaks |find| if creation of |zipEnum| fails, just
like the current code does; eventually we'll fix that in bug 329071.
AK: Fixed, deleting 'find' when zipEnum fails.

Move this into the constructor so it's only executed once instead of probably
2*n times?
AK: Converted into an NS_ASSERTION

>+nsJARInputStream::Init(nsJAR* aJAR, nsZipItem *item, PRFileDesc *fd)
If something fails here (mInflate/gZlibInit/SeekToItem), |mFd| is left around
until the stream is "closed", ....
AK: Fixed by setting mClosed flag to true, until all is well.

>     NS_DEFINE_STATIC_CID_ACCESSOR( NS_JARINPUTSTREAM_CID );
AK: Removed

>-static PRUint16 xtoint(unsigned char *ii);
>+static inline PRUint16 xtoint(unsigned char *ii);
Is the |inline| necessary?  Why not just let the compiler decide for all of
these?
AK: removed all 'inline': VC7 and higher indeed does the inline automatically (VC6 not!)

I'm not really a fan of overloading ExtractFile to do something completely not
indicated by its name, but I'm not sure what there is to do about it if the
code would be fairly similar for both.
AK: Added comments, and the code is similar on purpose (testing using the same code as real extraction).

>+  PRPackedBool hasDataOffset : 1;
>+  PRPackedBool isDirectory : 1; 
>+  PRPackedBool isSynthetic : 1;  /* whether item is an actual zip entry or was
>+                                    generated as part of a real entry's path,
>+                                    e.g. foo/ in a zip containing only foo/a.txt
>+                                    and no foo/ entry is synthetic */
>+#if defined(XP_UNIX) || defined(XP_BEOS)
>+  PRPackedBool isSymlink : 1;
> #endif
So it looks like the C bitfields here actually are useful, but I'm still not
entirely convinced it's really worth saving a word of storage per nsZipItem
here, particularly if accessing each member variable might require more
instructions than simply using normal PRPackedBool's would require.
AK: Note, originally some bits where kept in 'flags' and some in PRPackedBools.
AK: Keeping everything as PRPackedBool : 1 combines all the bits into one bytes
AK: For zipfile with 500 entries this saves about 1500 bytes.

AK: All the other comments are incorporated.


P.s., about the 'GetEntry'/'IsSynthetic': I still believe after some research into jar certification validation, that the .SF entry checking only applies to 'files', and therefor not for directories. Directories have zero or no data at all, and cannot be checked (LoadEntry will probably fail for directories anyway). 

Darin, over to you (and thanks in advance!)
Attachment #218741 - Attachment is obsolete: true
Attachment #219582 - Flags: superreview?(darin)
Attachment #218741 - Flags: superreview?(darin)
> Response to the comments from Jeff:
> Just to be absolutely sure, changing the constants in an interface doesn't
> require a uuid bump, correct?
> AK: No, or at least that is what I understood. Darin?

Correct.  No need to bump UUID when adding a constant.


> I don't think |PromiseFlatCString| is necessary here, but I could be wrong.
> AK: Probably not (as strings are now all flat, but the API says it should use
> PromiseFlatCString). Darin?

Seems to me that you have used PromiseFlatCString correctly.
Comment on attachment 219582 [details] [diff] [review]
V15: Updated with the comments from Jeff Walden

sr=darin
Attachment #219582 - Flags: superreview?(darin) → superreview+
fixed-on-trunk !! :)
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Thanks, tinderboxes seems to compile and run fine!
All went well, and two items (allmost) disappeared from the bloatreport
Before:
nsJARInputStream                             523000       0.00%
nsZipReadState                               521000       0.00%
After:
nsJARInputStream                               3968     -99.24%
--CLASSES-THAT-WENT-AWAY----------------------leaks------bloat------------------------
nsZipReadState                                    0     521000

Verified codesize impact is between 3.5K and 5K reduction
Status: RESOLVED → VERIFIED
where there any perf improvements?
I think this broke the Nightly Tester Tools extension (1.0). On startup I get this in the Javascript Console: 

Error: reader.init is not a function
Source File: chrome://nightly/content/nightly.js
Line: 169

Following the link gives me this code: 

loadGfxToolkit: function()
{
	var directoryService = Components.classes["@mozilla.org/file/directory_service;1"]
										               .getService(Components.interfaces.nsIProperties);

	var datafile = directoryService.get("AChrom",Components.interfaces.nsIFile);
	datafile.append("toolkit.jar");
	var reader = Components.classes["@mozilla.org/libjar/zip-reader;1"]
	                       .createInstance(Components.interfaces.nsIZipReader);
	reader.init(datafile);
	reader.open();
	var stream = reader.getInputStream("content/global/buildconfig.html");
	var sstream = Components.classes["@mozilla.org/scriptableinputstream;1"]
	                        .createInstance(Components.interfaces.nsIScriptableInputStream);
	sstream.init(stream);
	var content = "";
	var text = sstream.read(1024);
	while (text.length>0)
	{
		content+=text;
		text = sstream.read(1024);
	}
	var result = content.match(/--enable-default-toolkit=(\S+)/);
	if (result)
	{
		nightly.variables.toolkit = result[1];
	}
	sstream.close();
	stream.close();
	reader.close();
},

"	reader.init(datafile);" is the part that causes the error. Is this something the author should change?
Fix is simple: 
change:
        reader.init(datafile);
        reader.open();
into:
        reader.open(datafile);

About the performance question:
In the same time this patch went in, also a large patch for NestedURI's was checked in, and some other main patches as well, so it is difficult to see any startup or page performance gains. At least there is measureable codesize impact as bloat reduction of allmost 7% (1MB).
Possibly, but I not so sure. The crash seems to be caused by FreeArenaPool in CloseArchive in nsJar::Close from ~nsJar, and looking at the old code / differences there is nothing changed there. But somehow the nsJar destructor can get called on not an initialized nsJar (may be caused by someone creating a nsJar object, than failing to Open(filename), but still do the Close.
Because of the Init(f)/Open->Open(f) change, this is now probably happening. 
So the issue was always there, but now it becomes very visible...
Depends on: 336940
Blocks: 329071
This bug seems to have caused bug 342488, I posted a patch at that bug to fix the issue.
Depends on: 342488
bug 342488 is dup of bug 337558!
Depends on: 337558
You need to log in before you can comment on or make changes to this bug.