Closed Bug 116187 Opened 23 years ago Closed 23 years ago

nsZipArchive:BuildFileList allocates 180k in 4500+ allocations

Categories

(Core :: General, defect, P2)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: dp, Assigned: dp)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 3 obsolete files)

#allocs For Total bytes -------------------------------------- [ 2251 ] - nsZipItem - 72032 [ 2251 ] - namelen - 95008 We should be able to arena allocate these.
Blocks: 7251
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P2
Target Milestone: --- → mozilla0.9.8
nsZipItem and the name are arena allocated.
Brendan/dveditz sr=/r= ?
Comment on attachment 62534 [details] [diff] [review] Arena allocate the 4500+ allocations of filelist and name >+#if !defined(STANDALONE) >+ PL_ARENA_ALLOCATE(mem, &mArena, sizeof(char) * (namelen + 1)); Standard C and C++ say sizeof(char) == 1, so you don't need to scale here. > nsZipArchive::nsZipArchive() >- : kMagic(ZIP_MAGIC), mFd(0) >+ : kMagic(ZIP_MAGIC), kArenaBlockSize(8*1024), mFd(0) Call me old-fashioned, but would a #define ZIP_ARENA_SIZE (8*1024) not be better? Or do most C++ compilers "inline" the const and avoid a load on every use? >+#ifndef STANDALONE >+ // Initialize our arena >+ memset(&mArena, 0, sizeof mArena); >+ PL_INIT_ARENA_POOL(&mArena, "ZipArena", kArenaBlockSize); Don't memset mArena to 0, PL_INIT_ARENA_POOL => PL_InitArenaPool sets all members. >@@ -241,6 +247,9 @@ > > PRFileDesc *mFd; > nsZipItem* mFiles[ZIP_TABSIZE]; >+#ifndef STANDALONE >+ PLArenaPool mArena; >+#endif Nit: members are not consistently indented. /be
Comment on attachment 62534 [details] [diff] [review] Arena allocate the 4500+ allocations of filelist and name > nsZipArchive::nsZipArchive() >- : kMagic(ZIP_MAGIC), mFd(0) >+ : kMagic(ZIP_MAGIC), kArenaBlockSize(8*1024), mFd(0) I'm a little worried about using this large a block. Each zip file gets its own pool, netscape has 16 chrome archives, so we could be wasting 128K worst case. Even if you say on average we'll only waste a quarter of a block per archive (and I don't know how you could say that) you'll still be wasting 32K which just happens to be the savings Garret got after all his work shrinking the size of nsZipItem. For the very smallest archives you should use 512 byte blocks, and a few of them fit in just one block. For larger archives (transition should be at least 16K, but could reasonably go up to 100K) use 1k blocks. The largest archive, comm.jar, has 30,875 bytes of data we store by my calculation. Will 31 1K allocations kill us vs. 4 8K allocations? If so you can pick other points to transition to larger block sizes, but I'm sure we don't want the footprint hit on the smaller archives. [To guess the amount of data we're storing I used unzip -Z1 <archive> | wc -lc then multiplied the first number by 32 (sizeof nsZipItem) and added the second number for the name data. wc counts newlines and we store null terminators so it works out.] >+#ifndef STANDALONE >+ // Initialize our arena >+ memset(&mArena, 0, sizeof mArena); >+ PL_INIT_ARENA_POOL(&mArena, "ZipArena", kArenaBlockSize); >+#endif If you do what I suggest you'll have to wait on this until we know the filesize. Maybe in the Open, or in BuildFileList. BuildFileList might be best since we're already looking for the end of the file. You could also make guesses based on the size of the central directory which might be slightly more reliable if you worry about archives containing a few huge files or lots of miniscule ones.
See also Bug 118455: it is about using file mapping to avoid those allocations when all items the the jar file are stored and not compressed, comments are welcome!
You're thinking of bug 113393, the zlib allocations (which have been largely eliminated). Storing files rather than compressing them wouldn't help the nsZipItem allocations at all because we still have to track where the files are inside the archive.
Fixed all review comments. Dan, I switched the ARENA_BLOCK_SIZE to 4k and made the arena global to all zip archives. That would control the wastage if any. What do you think ?
Attachment #62534 - Attachment is obsolete: true
4k still sounds too large, potentially 64K wasted with 16 .jar files (probably closer to 32K in practice). We're trading footprint for performance here, but we're only getting performance measurements. What's the footprint cost? Please measure the allocations in the original code (I guess that's your 180K figure), with your 4K buffer, and with a 1K buffer.
Dan it isn't an arena per nsZipAchive; I switched it to one global arena shared by all nsZipArchives. Thus wastage is at worst 4k; probabilistically 2k
Attachment #64266 - Flags: needs-work+
Comment on attachment 64266 [details] [diff] [review] Comments from Brendan and Dveditz Whoa, that's a huge change to hide under the description "Comments from Brendan and Dveditz". In STANDALONE mode (used by the native installers) you now no longer delete nsZipItems so they all leak, and you removed its destructor so even if you do delete them the names are all leaked. By having a global pool we get screwed by transient archives, such as might be opened by XPInstall or signed scripts. Or if the ZipReader cache fills up and evicts an archive. And then again when it most likely reopens the chrome archive it just evicted. By having a shared memory pool you're also exposed to thread-safety issues. nsZipArchive is advertised as not threadsafe so people will take care to use each on a single thread (or wrap it in nsJAR which has locks for the critical entry points), but there's no guarantee two different threads won't open two different archives at the same time. Even though it's potentially more wasteful of memory you were on surer footing with a per-archive pool.
One way around the per-archive pool is to recycle items. The "screwed by transient archives" problem dveditz mentions is one example of using non-LIFO allocation lifetime request strings with an arena pool, which is always bad. If, however, you could make a freelist of items, the freelist would buffer the non-LIFO load and present a more LIFO load to the underlying pool. That doesn't deal with thread safety. You'd want a PRLock, I think. But then you're paying what might be too high an overhead per item allocation. So Dan's point about per-archive pools probably still wins, if you can make the arena size small enough that internal fragmentation doesn't waste too much space. It's not too hard to compute the average utilization for a given arena size, given real numbers on mean items per archive (max and variance would be useful too) -- do you have those? /be
yeah. I am convinced that per-archive pool is better now. Will revert back. Will publish numbers too.
> Whoa, that's a huge change to hide under the description > "Comments from Brendan and Dveditz". Hey to my defense I did add comments about the switch to global arena.
Attachment #64266 - Attachment is obsolete: true
Attachment #64266 - Flags: needs-work+
Got it. Your global arena mention in the comments didn't sink in when I read it, and later when I reviewed the patch I saw only the short comment in the attachment table.
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Target Milestone: mozilla0.9.9 → mozilla1.0
Keywords: footprint, nsbeta1
Attached patch Arena allocate zip archive list (obsolete) — Splinter Review
Using 1k blocks for Arena. Has all of brendan's earlier comments except the block size definition - C++ will substitute the const value instead of loading it each time.
chetu> grep Arena out | perl arena.pl Arena Alloc blks waste -------------------------------------- 1a755c8 11790 - 12 1k Blocks : 498 1a755e0 32131 - 32 1k Blocks : 637 1a755f8 31863 - 32 1k Blocks : 905 1a75610 1260 - 2 1k Blocks : 788 1a75628 3275 - 4 1k Blocks : 821 1a75640 32057 - 32 1k Blocks : 711 1a75658 2067 - 3 1k Blocks : 1005 1a75670 13237 - 13 1k Blocks : 75 1a75688 344 - 1 1k Blocks : 680 1a756a0 7416 - 8 1k Blocks : 776 1a756b8 6443 - 7 1k Blocks : 725 -------------------------------------- Total 141883 - 146 1k Blocks : 7621 Malloc wastage : 30880 in 3860 allocations So we save : 3,714 small malloc() calls totaling 22,091 bytes This is for a mozilla build for startup.
Seems like Dan is out until 3/11. Garrett r=, Brendan sr=?
Ccing Alec. Brendan's email says he is out and cant sr. Garrett r=, Alec sr= ?
Comment on attachment 72750 [details] [diff] [review] Arena allocate zip archive list r=blythe
Attachment #72750 - Flags: review+
Comment on attachment 72750 [details] [diff] [review] Arena allocate zip archive list what's PL_ARENA_CONST_ALIGN_MASK for? Don't we need to manually call the ~nsZipItem destructor in the arena case? If we don't, can you comment to explain why it isn't necessary?
plarena.h: /* * If the including .c file uses only one power-of-2 alignment, it may define * PL_ARENA_CONST_ALIGN_MASK to the alignment mask and save a few instructions * per ALLOCATE and GROW. */ Setting it to 7 causes all arena allocation to be aligned on 8 byte boundary. Malloc does that too. Yeah. We really dont need to call ~nsZipItem. I added this comment. // CAUTION: // We dont need to delete each of the nsZipItem as the memory for // the zip item and the filename it holds are both allocated from the Arena. // Hence, destroying the Arena is like destroying all the memory // for all the nsZipItem in one shot. But if the ~nsZipItem is doing // anything more than cleaning up memory, we should start calling it. I did some better error checking too when memory fails to protect from delete item in low memory situation. Nice point. Will attach new patch.
- Added comments per alecf - consistent use of #ifndef STANDALONE instead of #if !defined(STANDALONE) per brendan - Under low memory condition, prevent delete of nsZipItem
Attachment #72750 - Attachment is obsolete: true
Comment on attachment 72836 [details] [diff] [review] Added comments per alecf; more error checking r=blythe
Attachment #72836 - Flags: review+
Comment on attachment 72836 [details] [diff] [review] Added comments per alecf; more error checking ah, so I forgot about the whole MOZ_COUNT_CTOR/DTOR stuff I noticed that MOZ_DECL_CTOR_COUNTER() is just defined as being empty, in both debug and release builds. This means we've essentially lost bloat counting of nsZipItems. I guess I'm ok with this, but here's why: If we manually call the destructors, we take an unnecessary perf hit just to get debug info. If we only call the destructor on debug builds, I think that means there is too much difference between debug and release, and could lead to worse problems along the way....
Attachment #72836 - Flags: superreview+
Comment on attachment 72836 [details] [diff] [review] Added comments per alecf; more error checking ah, so I forgot about the whole MOZ_COUNT_CTOR/DTOR stuff I noticed that MOZ_DECL_CTOR_COUNTER() is just defined as being empty, in both debug and release builds. This means we've essentially lost bloat counting of nsZipItems. I guess I'm ok with this, but here's why: If we manually call the destructors, we take an unnecessary perf hit just to get debug info. If we only call the destructor on debug builds, I think that means there is too much difference between debug and release, and could lead to worse problems along the way.... sr=alecf
Keywords: nsbeta1nsbeta1+
Can't we just inline the MOZ_COUNT_DTOR stuff where we free up the structure?
Why ? Since we are allocating these objects on an Arena it seems wrong to count how many are created and destroyed. It will be all or nothing destroyed. Nothing is a leak of the Arena which we should know by other means. Inlining even if possible would make DEBUG shutdown faster. I wont mind putting it in non-inlined too. But then calling the destructor is the one that seems far off. And writing another loop to go over all these objects and count them seems off too 'cause of the Arena reason.
I think the object should still appear in the bloat logs - the bloat logs are measuring object creation - since there (might) be some cost with the actual creation of an object (i.e. through the constructor itself, independent of the allocation) we still get to keep some decent accounting stats through the bloat logs.
OK, I'll buy that. Thanks.
Alec: bloat logs are for tracking bloat, which is a space issue independent of the complexity of a constructor. Performance issues are better identified with profiles, which give call-context in addition to run-wide call counts, so I'm not worried about losing that data here.
fair enough. ok, I guess we just #ifdef out the MOZ_COUNT_CTOR in non-standalone builds.
Comment on attachment 72836 [details] [diff] [review] Added comments per alecf; more error checking a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #72836 - Flags: approval+
Looks good to me, too. r=/moa=dveditz if anyone ever asks.
Fix checked in to trunk.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Component: XP Miscellany → General
QA Contact: brendan → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: