Closed Bug 118455 Opened 23 years ago Closed 22 years ago

libjar performance / RFE: Replace {compressed JAR, read(), uncompress()} with {uncompressed JARs, mmap()}

Categories

(Core :: Networking: JAR, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla1.4beta

People

(Reporter: bernard.alleysson, Assigned: roland.mainz)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 2 obsolete files)

I was reading bug reports about libjar and memory allocations, buffer allocation and decompression etc... and I came to think that it could be more efficient to *store* items in the chrome jars and not *zip* them It could be important because it opens the way to another optimization: map each chrome jar in memory (with mmap() on Linux, CreateFileMapping() on Win32, don't know about Mac) Reading from the sources, I can see that mapping the file in memory can avoid *many* buffer allocations (nsZipItem should simply point to a file offset where the structure actually is, nsZipRead buffer should simply be a pointer into the in-memory view of the chrome file, ...) Even if you don't want to go with file mapping, storing items would save CPU time otherwise used to decompress (possibly on every item access ?) ... at the cost of more disk footprint (8MB vs 3MB, 5MB more on my HD is not important for me). I tried to measure launch and relaunch times. Launch time is slightly improved (but < 1s and this is very subjective). But relaunch time is definitely faster (because jar files content are already in the file cache ?). I think that the installer size will be the same because I beleive that jars are zipped into xpis like any other files so it would not impact download times. Finally I will attach a patch to build process to demonstrate what I mean I'd like any comments to be posted here even if you think that this doesn't make any sense.
Attached patch Add "-0" to store items instead of zipping (obsolete) — — Splinter Review
CC random gurus for comments on the patch and ideas. :-)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
Measurements I took on windows showed uncompressed chrome performed slightly better on a re-launch, and somewhat worse on a cold launch. See the thread started at news://news.mozilla.org:119/3C0C9C43.2090808@netscape.com and especially news://news.mozilla.org:119/3C0EF119.4010104@netscape.com Both machines were laptops (that is, relatively slow disks) so maybe on a slower processor with a really fast desktop disk storing the chrome instead of compressing would be a win. My main focus was exploring whether we could get a big footprint win by not archiving the chrome at all, but that was a losing configuration on both warm and cold starts. I'd welcome additional data in the above thread, especially from non-Windows platforms and from 266Mhz or slower CPUs. One of the articles in the thread (news://news.mozilla.org:119/3C0D2364.5010100@netscape.com) contains windows batch files I used to set up the tests which could easily be converted to unix shell scripts. This could be done easily by anyone who downloads a Mozilla nightly, it doesn't require a build environment. Your conjecture about the install is incorrect, "storing" chrome would save about 1/2Mb on download (at the cost of an additional 4Mb on disk after install). Compressed .zip archives don't compress much more inside .xpi archives, but if the chrome is merely stored tons of text common across the chrome (e.g. the license block) can be compressed out efficiently. An alternate proposal is to preprocess the chrome to strip comments and whitespace, saving nearly as much on download (400K) but better because it saves the same 400K after install and also speeds chrome access since less has to be read from disk. Because of this, download space saving is not a good rationale for uncompressed chrome, but measurable performance gains would be. Current efforts to develop fastload for more chrome types in addition to the current fastload for JS may make the performance point moot if ultimately chrome is loaded from fastload files instead of the .jar archives.
As dveditz said: Disk footprint isnt a big issue. Download footprint isn't an issue either because the xpi file is compressed. This is purely reading more data from disk not compressed Vs reading less from disk and unzip in memory. For chrome most of the files are small. So more or less they fit in the same block. So disk read wont be impacted and we should save the unzip time. Overall I too think this will be a win and we should this in. Dan ? (Will reassing the bug to myself. Dan take over if you prefer.)
Assignee: waterson → dp
With regards to mmap: I doubt if it will be a win. First there is this requirement that data is uncompressed. Second (as dan said in another bug) we still will need to read the zip manifest to know what files are in there.
"I'd welcome additional data in the above thread, especially from non-Windows platforms and from 266Mhz or slower CPUs" I'll try to get some reliable numbers. I've got 2 machines (PIII 450, 512MB + PII266 + 64MB). Soon I'll have an Athlon 1600+, 512MB. I'd like to know how you come to numbers like say 2.14 because I only have 1 sec precision and certainly not 0.01 sec ! "With regards to mmap: I doubt if it will be a win. First there is this requirement that data is uncompressed. Second (as dan said in another bug) we still will need to read the zip manifest to know what files are in there." Frankly I don't know if mmap will be a win. But I don't understand why thos PR_Read() are so slow. I think that this is because of all those buffer management and memory copies. Even if the data is compressed the entire JAR file can be mapped (!). Iterating through the zip_central structures will be easier. Compressed data will be read from the in-memory file view. And the code should be smart enough to distinguish between stored and compressed items so that temporary allocations of buffers can be avoided (buffer allocation logic should reside in a lower layer). The ideal case would be that all the way up to necko no temporary buffer should be needed, the data should flow naturally... Maybe JAR files could be seen as a kind of cache ie why have things like chrome://..../image.gif in the memory cache if data can be obtained at no buffer allocation and deallocation cost (and with the power of a modern VM subsystem ?) There's also the fact that mmap has never been used in mozilla at all yet (this is sad because on NT this is so convenient and all developers that I know about use that) With mmap you'll need a wrapper around all disk I/O because of platforms that don't support it (looking at NSPR PR_CreateFileMap I can see that Mac, Beos, ... are all lacking this). Basicly I see it like a CFile class (Read, Seek, Open, Close) + a buffer logic ie methods like AllocFileView(char **buf, offset, len = -1), FreeFileView(char *buf) where AllocFileView will malloc a buffer when mmap isn't available (len = -1 to read all the file content) and FreeFileView will be a noop when mmap is available because AllocFileView will simply return a pointer into the file view. This class could be usefull by itself because there are many case where you simply open the file, alloc a buffer, read the file content, process the content, free the buffer, close the file. With a class like that and an OS with mmap (Win32 and Unix) no more buffer! As for the manifest I didn't look at the code (I's higher level in nsJARArchive or something and here I was talking about nsZipArchive) but I thought that to know the files in the archive you need zipcentral and al structures in nsZipArchive.
The time utility that comes with cygwin (and similar utilities from other vendors) gives decent resolution. I'm using the quit.html discussed in the performance tools section of www.mozilla.org (so shutdown time is being included unfortunately, but at least we get reliable numbers). time mozilla.exe -P moz c:\\test\quit.html
about mmap, see also http://developer.apple.com/techpubs/macosx/Essentials/Performance/Languages/Java_Performance_Notes.html : "* Storing each class in an individual .class file. Classes should be grouped together in .jar files. Opening and closing many small .class files is more expensive than opening and closing one large .jar file. Grouping classes in .jar files allows the class loader to efficiently use file mapping. See also "Reading Large Files With File Mapping" File Mapping in OS 9.1: http://developer.apple.com/technotes/tn/tn2011.html File Mapping in OS X: http://developer.apple.com/techpubs/macosx/Essentials/Performance/FilesNetworksThreads/Reading_Lar_ile_Mapping.html about Windows NT: "* Memory mapping produces clearly superior results for "small" files, ..." http://world.std.com/~jmhart/atou.htm
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
Target Milestone: mozilla0.9.9 → mozilla1.0
Moving out of Mozilla 1.0
Target Milestone: mozilla1.0 → mozilla1.1
*** Bug 164476 has been marked as a duplicate of this bug. ***
I think doing both uncompressed JARs in _combination_ with mmap() will be a huge performace and footprint win for the following reasons: 1. mmap() does not mean it consumes process memory, instead pages are allocated where the file contents are mapped in (AFAIK we currently cache the uncompressed data in memory allocated with |malloc()|). Moreover - assuming read-only mapping of these files - those pages can be shared between multiple mozilla/phoenix instances - which will be a win for multiuser machines (like X11/SunRay/Windows terminal servers etc.). And finally: mmap() only reads data on demand, e.g. we would not read stuff which is not required (amd unused pages are flushed by the kernel). 2. We would save the whole bunch of fread(), fseek(), fread(), uncompress, malloc(), memcpy() junk and can replace it with one clean mmap() call. 3. Distribution tarball size can be reduced a lot. ZIP always comresses worse than tar.gz since tar.gz compresses over file boundaries which ZIP can't do (Windows hackers: same applies to a uncompressed JAR which gets ZIPed in one piece!)
Blocks: 7251, 92580
Keywords: footprint
Summary: libjar performance → libjar performance / RFE: Replace {compressed JAR, read(), uncompress()} with {uncompressed JARs, mmap()}
To add my 2 cents worth: Still the contents can be compressed: * Images still are (.gif, .png, .jpg, etc) * text files such as .css, .xul, etc: can be compressed per item, as they are parsed into something, using a Reader, which could be handled by an 'zip-item-uncompress from memory' 'streamer'.
To add my 2 cents worth: Still the contents can be compressed: * Images still are (.gif, .png, .jpg, etc) * text files such as .css, .xul, etc: can be compressed per item, as they are parsed into something, using a Reader, which could be handled by an 'zip-item-uncompress from memory' 'streamer'. (just like the images are decoded, and the handled from an internal structure. As far as I understood, the parsers parse from a stream anyway. So we can have both: mmap'ing the jar-file, and using memory-operations to seek and read, while still having a compressed (per item) jar file. Although a compress jar-file instead of a jarfile with compressed item is smaller (littlemozilla 253KB versus 304KB), the uncompressed version is about 500K, and that is even with mmap more costly in footprint than 304KB mmapped. The little reduction in download of Mozilla is not worth the footprint impact. In Summary, mmapping the jar files as they are currently is the least impact as only the jar-handling code needs to be changed. Actually, as far as I can see, only nsZipArchive.cpp is to be changed.
Some more cents: Actually the current zip directory reading is quite bad on IO, lots of seeks (from the end, backwards), lots of reads of small buffers, even shifting buffers around, etc... With mmapping the whole directory can be parsed, directly from (virtual) memory, making the directory parsing also much simpler.
Alfred Kayser wrote: > Actually the current zip directory reading is quite bad on IO, [snip] > With mmapping the whole directory can be parsed, directly from > (virtual) memory, making the directory parsing also much simpler. Can you open a seperate bug for that and post the bugid here, please ?
I did a small test today: 1. Build Mozilla with compression - resulting distribution tarball size is: 19677674 bytes 2. Build Mozilla without compression - resulting distribution tarball size is: 18877228 bytes Looking at the saved amount of space in the distribution: % bc -l (19677674 - 18877228) / 1024 781.68554687500000000000 (no, this is no paradoxon. Storing files _uncompressed_ in the JAR saves space in the resulting tarball because the *.tar.gz or *.zip compresses over the file borders while the *.jar can't do that) Result of the small test: We could save more than 781KB (!!!!) in the resulting tarball by this small one-line patch!!!!
Attachment #63727 - Attachment is obsolete: true
Comment on attachment 119824 [details] [diff] [review] Store files in jar uncompressed (patch for 2003-03-23-08-trunk) Requesting r=/sr= ...
Attachment #119824 - Flags: superreview?(bzbarsky)
Attachment #119824 - Flags: review?(seawood)
Um. As I recall, using the jars led to a noticeable startup improvement on Win32 and Mac (not sure about Linux). Was this due to just reading one large file instead of many small ones? Or due to not needing to read an extra 5MB off disk at startup? On Mac, the latter may have been a large contributing factor; not sure how much it matters with the new filesystem on OSX. In short, someone needs to test the effect of this patch on Ts before I can review in good conscience. If there is a Ts hit (which there may well not be), we should at least have a plan in place to get rid of it (eg using mmap() as stated) before this can be checked in. Could someone do the Ts testing?
Target Milestone: mozilla1.1alpha → mozilla1.4beta
Boris Zbarsky wrote: > Could someone do the Ts testing? Can we just make a carpool to test that, please (that way we get lots of info from many platforms at once :) ? This bug exists now since lots of milestones (like many other possible good imprvements are rotting around, too) - at some point we should do something instead of discussing this issue to death.
Also, what's the difference in the uncompressed distribution (not the .tar.gz or zip)?
I think this is definitely worth testing. zlib has a lot of allocation overhead, and the recycling allocator only hides that issue somewhat. We don't need to mmap() anymore though - with recent improvements to the jar code to eliminate excess buffers/etc, we may see a large improvement - Read()ing an uncompressed file out of a JAR is now a direct system call from read() to the client's buffer.. these recent improvements will probably show marked improvement for uncompressed files, and reduce CPU usage (since we're reading from disk directly into memory) which will likely help Ts. let's run this test late at night when nobody is checking in, I'll certainly sr= it.
Alec Flett wrote: > I think this is definitely worth testing. zlib has a lot of allocation > overhead, and the recycling allocator only hides that issue somewhat. > > We don't need to mmap() anymore though - with recent improvements to the jar > code to eliminate excess buffers/etc, we may see a large improvement - > Read()ing an uncompressed file out of a JAR is now a direct system call from > read() to the client's buffer.. |mmap()| still has some advantages on some platforms over |read()| (for example the Solaris version of /usr/bin/cat uses AFAIK mmap() on plain files to get a _huge_ performace win). And we don't have to SPAM the heap with any |malloc()|-for-buffers, we can save some kernel roundtrips and get (async) read-ahead on some platforms, too... :)
Comment on attachment 119824 [details] [diff] [review] Store files in jar uncompressed (patch for 2003-03-23-08-trunk) OK, if we plan to carpool this so we can see how it does Ts-wise, sr=me. ;)
Attachment #119824 - Flags: superreview?(bzbarsky) → superreview+
The mmap issue has been spun off into bug 201224
Please also see bug 68686 about stripping whitespace and comments from chrome files.
Comment on attachment 119824 [details] [diff] [review] Store files in jar uncompressed (patch for 2003-03-23-08-trunk) That only fixes the default case which is to use jar files only. If you use --enable-chrome-format=both, then it won't use that option when creating the .jars .
Attachment #119824 - Flags: review?(seawood) → review-
Taking myself...
Assignee: dp → Roland.Mainz
Status: ASSIGNED → NEW
Severity: normal → enhancement
Status: NEW → ASSIGNED
OS: Windows 2000 → All
Priority: P3 → P2
Attached patch New patch per cls's comments — — Splinter Review
Attachment #119824 - Attachment is obsolete: true
Comment on attachment 120254 [details] [diff] [review] New patch per cls's comments Requesting r=/sr= ...
Attachment #120254 - Flags: superreview?(bzbarsky)
Attachment #120254 - Flags: review?(seawood)
Attachment #120254 - Flags: review?(seawood) → review+
Comment on attachment 120254 [details] [diff] [review] New patch per cls's comments sr=me, with the same condition, of course. ;)
Attachment #120254 - Flags: superreview?(bzbarsky) → superreview+
bzbarsky: I do not have CVS write access - can you handle the checkin, please ?
Patch checked-in a few hours ago (http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&branch=HEAD&cvsroot=/cvsroot&date=explicit&mindate=1050565980&maxdate=1050566961&who=neil%25parkwaycc.co.uk) at deep deep night (no checkins after that point for the next five hours :) If I interpret the startup time data from tegu correctly the Ts time has dropped significantly due this bug (see http://tegu.mozilla.org/graph/query.cgi?tbox=comet&testname=startup&autoscale=1&days=1&units=&ltype=&points=&showpoint=2003%3A04%3A17%3A01%3A39%3A06%2C1496&avg=1&size=2.0) and the binary tarball size was lowered, too (no hard data yet - I am waiting until I can compare yesterdays tar.gz against the tar.gz from today build at the same hour). bz: What do you think ?
how much bigger did the .jar's get? I see a very MINOR performance improvement myself..
Alec Flett wrote > how much bigger did the .jar's get? I am checking that right now... For the log: file | size -----------------------------------------------------+--------- 2003-04-16-09-trunk/mozilla-i686-pc-linux-gnu.tar.gz | 12.8MB 2003-04-17-09-trunk/mozilla-i686-pc-linux-gnu.tar.gz | 11.9MB -----------------------------------------------------+--------- saved between 2003-04-16-09 and 2003-04-17-09: | 0.9MB(!!) > I see a very MINOR performance improvement myself.. We only have half the work "in" right now. Without mmap()'ing the JARs we only get half the benefit from the change...
ok that makes no sense. I want a per-jar list of how big they grew. until the mmap stuff lands, lets back this back out. I don't think the size increase is going to be worth the minor performance increase.
Alec Flett wrote: > until the mmap stuff lands, lets back this back out. I don't think the size > increase is going to be worth the minor performance increase. Note that ee need to have _both_ patches in the tree to get the correct values. Removing this one will likely render the mmap()-patch useless...
alecf wrote: > I don't think the size > increase is going to be worth the minor performance increase. BTW: Wasn't the "make the distribution tarball size far smaller"-idea one of the goals behind the Phoenix project (at least they are axe'ing features with that argumentation... ;-/) ?
Size changes between 2003-04-16-09 and 2003-04-17-09 tar.gz: 2003-04-16-09 | 2003-04-17-09 | JAR archive name --------------+---------------+-------------------------------- 734337 | 2122357 | mozilla/chrome/en-US.jar 962692 | 3160067 | mozilla/chrome/comm.jar 233064 | 840359 | mozilla/chrome/toolkit.jar 10934 | 23308 | mozilla/chrome/embed-sample.jar 22424 | 76362 | mozilla/chrome/US.jar 307906 | 548424 | mozilla/chrome/classic.jar 583067 | 866164 | mozilla/chrome/modern.jar 6781 | 10924 | mozilla/chrome/en-win.jar 5454 | 8448 | mozilla/chrome/en-unix.jar 4815 | 7555 | mozilla/chrome/en-mac.jar 4226 | 13656 | mozilla/chrome/content-packs.jar 12792 | 42898 | mozilla/chrome/help.jar 261009 | 791920 | mozilla/chrome/venkman.jar 190365 | 531303 | mozilla/chrome/inspector.jar 151866 | 540030 | mozilla/chrome/chatzilla.jar 482 | 754 | mozilla/chrome/pipnss.jar 103791 | 273253 | mozilla/chrome/pippki.jar 543321 | 1891084 | mozilla/chrome/messenger.jar
Yes, Roland, that was one of the goals. Which is why a SIZE INCREASE is undesireable, in exchange for a v. minor performance gain. I think that mmap is probably going to be a win for us in a variety of ways, but that doesn't make reading for content any less important.
Let's not confuse matters: Tarball/archive size isn't the same as installed size. Using uncompressed jars is very likely to DECREASE tarball size for the same reasons that tar.gz tends to compress better than .zip despite using the same algorithm.
shaver: Did realise that todays binary tarball is _SMALLER_ (by ~~900KB) than yesterdays as result of this patch ?
I see what happened here - shaver its not as bad as it sounds when we store files in the .jar's UNCOMPRESSED then the .tar.gz does a better job of compressing the whole .jar as a stream. so actually the .tar.gz DID shrink since yesterday what has increased is our footprint on disk. But this actually bought us something - even without the mmap() work, we're allocating a WHOLE lot less than we were before because we're not using zlib. This only became possible when I revamped libjar.. so we should actually see a smaller memory usage spike at startup. so I've changed my mind - lets leave it in for now while I examine memory usage.
Yeah, alecf, I thought you knew that and were complaining about the on-disk size. Last time I try to get _your_ back!
hah, no.. but thanks anyway :)
Marking bug as FIXED per alecf's permission on IRC...
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Disk image size on Mac OS X went up from 38.6 to 45.8Mb because of this change.
Hardware: PC → All
Simon Fraser wrote: > Disk image size on Mac OS X went up from 38.6 to 45.8Mb because of this > change. Ouch... ;-( Can the disk image be shipped in a compressed format ?
Simon Fraser: Can you file a new bug (please don't reopen this one) for that the MacOSX disk image size issue, CC: me + alecf and post the bugid here, please ?
It it shipped compressed (it's about a 15Mb download), but is expanded after download, so it's not a huge deal, but that does show the disk footprint hit from this change.
any data on what is the run-time VM win here on startup?
cathleen wrote: > any data on what is the run-time VM win here on startup? No... and I think we should _wait_ with such measurements until the "mmap()-the-JARs"-patch is "in". Otherwise we get results (and the following rants&&complaints) for a work which is currently only half-finished.
I think the was probably a reasonable win from this anyway - I mean there wasn't any VM advantage to having the files be small, aside from the fact that you're reading in more data from disk now which might result in more pages being mapped into memory... (i.e. because many modern i/o libraries use mmap under the hood anyway) but since the client of libjar wants stuff in an uncompressed format anyway, the files were being decompressed the moment they came in off the disk - so the size of pipes, buffers, etc, remains the same - we just have way less allocation from zlib and so forth.
I looked back and it it looks like the nighty builds on mac dropped by about 1 meg (from 15.7M to 14.7M) - nice!
Does anyone cares on disk footprint? It's not a problem for todays HDD's.
yes. people care about on-disk footprint - don't assume the only environment gecko will run on is a PC with a big hard disk. Think smaller. For the mozilla-the-browser release though, this is fine.
People on small devices are likely to want their own custom UI, with different JARs, and may well have some compressed-storage system on which to install it. (This isn't a core gecko issue, as I understand it -- it's mainly a packaging change for our default desktop browser.)
re comment #51: > I think we should _wait_ with such measurements until the > "mmap()-the-JARs"-patch is "in" That one was just wontfixed because it does not seem to give any improvement. So maybe it's now time to measure this one and find out what it's worth alone.
see comment 52.. I think what we really need to measure is the memory usage difference... not using zlib means we're not doing any excess allocation, so even if there isn't much of a perf win, I think we'll see a trace-malloc win. (I mean I can guarantee you there is a win there so I'm not sure what to test) We saw a minor (if any) Ts improvement, and no performance degredation, plus the download size win we discussed above, so I would say we just go with what we have.
brade wrote: > Is this bug fix the cause of the 2nd Ts jump on tegu in April? > > http://tegu.mozilla.org/graph/query.cgi?tbox=monkey&testname=startup&autoscale=1&size=&days=0&units=&ltype=&points=&showpoint=2003:06:06:11:44:04,6072&avg=1 Is the answer in bug 208570 sufficient ? :)
Component: XP Miscellany → General
QA Contact: brendan → general
Component: General → Networking: JAR
QA Contact: general → networking.jar
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: