189528 - libjar and consumers make too many copies

Assignee

Description

•

23 years ago

Ok, I was poking around looking under the covers for bug 121341, and I found some nastiness. First of all, when reading a file out of a .jar file, libjar only provides a Read() API, which copies stuff out of an internal buffer into one provided by the client, even though libjar has a completely decompressed buffer in memory. That's one extra copy. In the case of CSS and String Bundles, we use UTF8InputStream to convert the buffer to Unicode on the fly - it buffers the data in an nsByteInputStream (the first extra copy mentioned above), and converts it into an internal nsUnicodeBuffer, a second extra 'copy' (conversion is part of the process though, so remember that for later...) The conversion of the whole buffer is often unnecessary and generally only needs to apply to particular strings within the buffer. Finally, when CSS or the String Bundle parses the unicode file, it makes a final copy of the unicode data and uses it later - that's a 3rd extra copy of the original data, the fourth copy including the original. If we provide some sort of API on nsZipArchive like Consume(PRUint32 aBytes, char ** aResultBuffer, PRUint32* aResultBytes); nsZipArchive could hand back parts of the buffer to nsJARInputStream. Then nsJARInputStream's ReadSegments could be fixed to call this instead, and pass the original buffer back to the nsWriteSegmentFun. This would allow the CSS or nsPersistentProperties parser to parse the original data. they would be dealing with raw UTF8 data, and would have to do the conversion on the fly, of any necessary strings. In the case of CSS, keywords are 7-bit clean, and only random user data like attribute names/values and URLs would have to be converted from UTF8 (and even in the case of URLs, necko can already handle UTF8 strings just fine. Benefits I forsee: - simplified code should result in a reduction in static compiled code across the board - restructuring some of the consumers to use readSegments() will allow for feeding of data to the CSS parser and nsPersistentProperties file asychronously. - restructuring these consumers will allow us to benefit from future memory-map-based caches (i.e. speeding up cached CSS) - performance benefits because we're not making so many copies of the data. - reduced memory usage because of fewer buffers needed for the copies I have not looked at reading XUL out of JARs because most of the time that is covered by the fastload stuff. It may get a speedup on the side, but it is not part of the goal of this bug.

Simon Fraser [no longer active]

Updated

•

23 years ago

OS: Windows 2000 → All

Hardware: PC → All

improve ReadSegments on libjar 23 years ago Alec Flett 5.69 KB, patch		Details \| Diff \| Splinter Review
fix the syncloader 23 years ago Alec Flett 1.91 KB, patch		Details \| Diff \| Splinter Review
decompress on demand 23 years ago Alec Flett 69.31 KB, patch		Details \| Diff \| Splinter Review
decompress on demand v1.1 23 years ago Alec Flett 69.68 KB, patch		Details \| Diff \| Splinter Review
decompress on demand v1.2 23 years ago Alec Flett 69.76 KB, patch		Details \| Diff \| Splinter Review
decompress on demand v1.3 22 years ago Alec Flett 73.63 KB, patch	alecf : review+ alecf : superreview+	Details \| Diff \| Splinter Review