Closed
Bug 68686
Opened 23 years ago
Closed 10 years ago
Shrink .jar files by stripping out whitespace, comments
Categories
(Firefox Build System :: General, enhancement)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 903149
People
(Reporter: sspitzer, Unassigned)
References
Details
(Keywords: memory-footprint, privacy)
Attachments
(1 file)
14.77 KB,
patch
|
Details | Diff | Splinter Review |
would it make our .jar files any smaller if we removed whitespace, comments, etc from the .xul, .css, .js, and .dtd before we made the jars? (or afterwards, before we shipped to the user?) the .jar files are zip archived, which might mean removing white space might not be a big win, but there are plenty of comments in our shipping .jar that user has to pay for. they have to pay for it on download once, and then on startup. I don't have numbers, but I'm sure smaller .jar files will make for a faster start up. I had heard that brendan was investigating us shipping pre-compiled .js, which would mean there would be no .js to compress, but there is still .xul,.dtd,.css, etc. comments?
Comment 1•23 years ago
|
||
hrm.. I had once thought about some sort of xml-proprietary compression mechanism, but I'm not sure I see the value if we're already using compression in the .jar files.. I just have a feeling this is going to be a 5% gain in space, a 1% gain in time, for a whole lot of work
Reporter | ||
Comment 2•23 years ago
|
||
alecf, it might not be worth it. we'd need numbers to justify it. here's something I (or someone with more cycles) could try: get some numbers (size and performance) for loading messenger.jar as is. then, hand strip out the comments (let's forget whitespace for now) and create a new messenger.jar and get a new set of numbers. if it looks promising, we can think about what to do next. I just wanted to through this out there, for consideration. on a related note: legally, would be even been allowed to strip the mozilla license from the files in the jar? that a lot of unused text in almost every file. by my crude measurement, I think there is about 100k of comments and localizatoin notes in en-US (which becomes en-US.jar) that could be removed. add leaf to the list, always good to have release in on this type of thing, since it would involve the build / package process.
Comment 3•23 years ago
|
||
The license issue is easily satisfied by putting a prominent notice at the root of each .jar file (say, a README or LICENSE file), at least by my interpretation of section 3.5: 3.5. Required Notices. You must duplicate the notice in Exhibit A in each file of the Source Code. If it is not possible to put such notice in a particular Source Code file due to its structure, then You must include such notice in a location (such as a relevant directory) where a user would be likely to look for such a notice. [...] It would be interesting to try an uncompressed .jar file. We'd spend more time reading from disk, but less uncompressing. Which matters more? Either way it would be nice if the parser didn't have to munge through extraneous comments. It would make our chrome a little more daunting to experimenters so I hope we don't do this unless it really turns out to help.
Comment 4•23 years ago
|
||
the uncompressed jar idea is a very good one. that is how 4.x stored it's .class files for Java. disk bandwitdth is pretty cheap these days relative to the cost of opening any file, etc. It would be interesting to do some timing on compressed vs. uncompressed jar's. cc'ing jgrm for some possible measurements.. maybe we should take this up on .porkjockeys.
Reporter | ||
Comment 5•23 years ago
|
||
sorry, I got lost. were you suggesting that it would be useful to see how much faster we'd be if we compared: reading uncompressed files vs reading uncompressed files that were stripped of comments (and whitespace?) or did you want to see compressed jar vs. uncompressed jar? (to see if we are paying the price for the decompression) of course, jars are a big win on mac (since file i/o) is slow.
Comment 6•23 years ago
|
||
No precompiled .js files would be shipped, rather, bug 68045 proposes that Mozilla precompile on first run after install. /be
Comment 7•23 years ago
|
||
I meant test compressed vs. uncompressed as a separate issue from the comment/whitespace removal. As a bonus I'd guess that an uncompressed .jar file would end up slightly smaller in the compressed download package. Let me test that... std comm.jar 574951 (530982 in .xpi) mostly text uncompressed comm.jar 1885913 (366206 in .xpi) " " std modern.jar 554959 (429592 in .xpi) lots of images uncompressed modern 812316 (381036 in .xpi) " " std en-US.jar 235293 (204005 in .xpi) text uncompressed en-US 658055 (129626 in .xpi) " An extra 2Mb on disk (2 1/2 times bigger) probably isn't worth a 290K (25%) savings on the download. Unless it's a performance win, and it might in fact be a loss. I'll add this to my list of things to try
Comment 8•23 years ago
|
||
footprint, and a triage-able summary
Keywords: footprint
Summary: is it possible to make the .jar files a smaller by removing whitespace and comments from the files? → Shrink .jar files by stripping out whitespace, comments
Comment 9•23 years ago
|
||
I'm not a good owner for this bug. one of you all want to take it.
Reporter | ||
Comment 10•23 years ago
|
||
giving to dveditz. this seems right up his alley.
Assignee: asa → dveditz
Comment 11•23 years ago
|
||
dveditz: personally, I think I disagree. 290k is 58 seconds on a 56k modem.. and this is just one package! Admittedly you probably can't knock off 25% from the other packages, but our current full download is 20-30 megs. if you could knock off 2 megs, that's 6 minutes off of a 1 hour download..seems signifigant :)
Comment 12•23 years ago
|
||
Ok, so I did all of our chrome files instead of just three main ones. compressed uncompressed diff on disk: 2283579 5896895 3613316 download: 1949833 1418815 531018 Saving half a meg on the download is nothing to sneeze at, but before jumping into this lets attack the comment stripping first. It's possible the savings is entirely due to compressing out redundant license headers which it isn't able to do when compressing each chrome file individually. And it may be the very fact the chrome is compressed that helped Mac performance when we switched to .jars.
Comment 13•23 years ago
|
||
diff's should be signed. compressed uncompressed diff on disk: 2283579 5896895 3613316 download: 1949833 1418815 -531018 personally i'm against stripping licenses because i like to work from the .jar's that i receive when i retrieve builds. there's another bug about removing the expanded version of files from all dist packages, the result is that it would be impossible for someone to create a correct diff from packaged builds. Assuming that different os's have different performance costs, would people object to mac having compress-xpi[compress-jar] while win,lin have compress-xpi[uncompress-jar]? or some other similar disparity?
Comment 14•23 years ago
|
||
ok, I am fiddling with stats about what exactly we have, and here are some interesting statistics. This is the total space used by each file type (in a mozilla build: .css: 695926 ( 12.0%) .dtd: 311344 ( 5.4%) .gif: 847212 ( 14.7%) .htm: 128 ( 0.0%) .html: 210439 ( 3.6%) .js: 2120408 ( 36.7%) .png: 5777 ( 0.1%) .properties: 125398 ( 2.2%) .rdf: 49383 ( 0.9%) .txt: 1876 ( 0.0%) .wav: 24266 ( 0.4%) .xml: 223405 ( 3.9%) .xul: 1167043 ( 20.2%) so it looks like even if we could get rid of 10-20% (a guess on my part) of JS through comments, it's still only 3-6% of the resources files, let alone the total xpi package sizes
Comment 15•23 years ago
|
||
here's a thought: what if we could 'pre-compile' xml? I mean, they're well formed, so maybe there is a more compact format which we could better store them in.
Comment 16•23 years ago
|
||
we can certainly refactor some xul, my rewrite of mail is considerably smaller(i just need to lookat it and get it reviewed...). however i'm not sure how much duplication there is left to squeeze out beyond mail. properties can be squeezed through reorg of identical values, but they're tiny and compress well...
Comment 17•23 years ago
|
||
ok, I don't know what this "mail rewrite" that you're referring to is, but I think that's a bit beyond the scope of THIS bug :) Anyhow, a little research and I came up with: elements: commandset: 4 popupset: 5 menuitem: 5 menubutton: 6 button: 7 rule: 9 box: 12 menupopup: 14 script: 15 attributes: uri: 8 tooltiptext: 8 persist: 9 flex: 10 tooltip: 13 oncommand: 14 type: 15 src: 17 value: 19 class: 28 id: 59 these are the top 10 or so tags and attributes in navigator.xul. I'm sure much of our XUL has lots of similar tag and attributes. One could imagine a compression algorithm based on this knowledge - atomizing the strings and encoding them in a stream of some kind. Hey, sure enough someone has come up with something along these lines: http://www.research.att.com/sw/tools/xmill/
Comment 18•23 years ago
|
||
wait a second, xmill is screwy, ignore that.
Comment 19•23 years ago
|
||
reminder to alecf to comment on his byte-code compiler idea for XUL/XML.
Severity: normal → enhancement
Component: Browser-General → Build Config
QA Contact: doronr → granrose
Hardware: PC → All
Updated•22 years ago
|
Status: NEW → ASSIGNED
Comment 20•22 years ago
|
||
Dan mentioned this bug in an email exchange; I didn't realize it existed :-) When working on localized embed.jars for several embedding customers we (L10n) encountered the need to reduce bloat (i.e. distribution size) and this idea popped up again. I found a perl module by the name "HTML Clean" and since it looked cheap I ran some quick tests with it. Intrigued by the results, I started hacking the script a bit and let it work the magic on a 1.2a build. http://aspn.activestate.com/ASPN/CodeDoc/HTML-Clean/HTML/Clean.html The numbers below are preliminary; I didn't have time to fine tune the script and there might be more room left for improvement. * current results indicate that XUL bloat could be cut by at least 35% (embed.jar 600kB->400kB, Mozilla chrome 3.5MB->2.2MB) * proof of concept: reduced Mozilla's distribution size by 1.1 MB, improved startup time by 3-5% (measured with a fresh profile, fast load was turned on) http://rocknroll/users/jbetak/Mozilla_1.2a_Optimized_XUL.zip
Comment 21•22 years ago
|
||
wow! those results are fantastic! maybe mcafee or seawood could help you get some scripts going that would strip .jars, and then we could roll that into the release-only build?
Comment 22•22 years ago
|
||
Does this address the license notice visibility issues brought up in bug 73661? Are the links from 'about:' good enough? *** This bug has been marked as a duplicate of 73661 ***
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Comment 23•22 years ago
|
||
Oops! Mid-afternoon naps are hazardous to bugs. That was backwards.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 24•22 years ago
|
||
*** Bug 73661 has been marked as a duplicate of this bug. ***
Comment 25•22 years ago
|
||
Chris, ahhh, I'm so glad you reopened this. I'm trying to get people excited about the potential gains XUL runtime optimization could yield, and then the bugs gets closed down ;-) What do you think of Alec's idea? Could we strip license headers, comments and white space in release builds? A little while back I thought Blackbird might be a potential candidate, but it might be too late for that. AFAIK Tao approached Cathleen and there was some talk about creating additional optimized builds and possibly look into modifying some of the build scripts in Buffy time frame.
Comment 26•22 years ago
|
||
Hmm. Since (post 1.0) we now serialize both JS and XUL (plus DTD) content into the fastload file, I'm not sure we would actually gain much runtime benefit from compressing whitespace in those files (aside from the initial serialization which should only happen once for a typical end user). After we have serialized a XUL or JS file, we will never read from the jar file again for that file (or at least, that's how it's supposed to work). Nonetheless, reducing download size (and time) is a worthwhile goal, and cutting a minute or more from modem download times is a win for the end user.
Comment 27•22 years ago
|
||
jrgm, I agree. Although the improvement seems improbable, I've posted it because that's what my test bed said. I don't consider my observation to be conclusive. I only had one optimized test build and a very limited time to play with it :-)
Comment 28•22 years ago
|
||
well, its both download size and first-time speedup for new windows - i.e. beyond just the browser, if I open mail for the first time, it will be faster. I think download size is the biggest bonus here, myself.
Comment 29•22 years ago
|
||
I assume this will be an extra release-build step (like rebase or symbol stripping) that doesn't need to run each build? If so it'd be easy to save off a copy of the unstripped jars for people who art trying to use PatchMaker. The license isn't that hard to satisfy. We could do nothing an claim its a binary/processed file, but since it will look like text to people who poke around and it's easy to add license boilerplate we should do that. Either or both of a LICENSE file inside the archive (claiming the contents are not source, but the source is available under MPL and where) and the comment field of the zip archive (not that many people look at those).
Comment 30•22 years ago
|
||
just curious, do we have a build for evaluation?
Comment 31•22 years ago
|
||
Cathleen, the only currently available evaluation build might be this optimized 1.2a talkback-enabled Mozilla release: http://rocknroll/users/jbetak/Mozilla_1.2a_Optimized_XUL.zip It's based on this binary distribution: ftp://ftp.mozilla.org/pub/mozilla/releases/mozilla1.2a/mozilla-win32-1.2a-talkback.zip
Comment 32•21 years ago
|
||
I've been working on the first part of a patch for this. Though it still has some bugs it works fairly well. I've written a standalone program 'Strip' that can remove JS style comments (// and /* */) and XML style comments ( <!-- --> ). Furthermore it can trim lines on the left and right side and remove empty lines. It is a standalone program that should be fairly easy to integrate into the build process but since I know nothing about that, someone else could perhaps give that a try (hint! ;-). It recognizes several extensions by default. Just compile and run strip -? for more help or look at the source. There are a few regressions right now, Chatzilla stopped working e.g., but most of Mozilla works. I'll investigate the bugs further. These are the results so far. All the numbers are in kilobytes. stripped: *.js, *.css, *.xul, *.rdf, *.xml, *.xsl jar uncompressed compressed compressed+stripped gain chatzilla.jar 544 148 102 46 classic.jar 549 296 237 59 comm.jar 3192 944 610 334 inspector.jar 543 188 116 72 messenger.jar 1909 537 341 196 modern.jar 859 568 478 90 pippki.jar 275 100 60 40 toolkit.jar 822 221 166 55 venkman.jar 827 258 198 60 Total 9520 3260 2308 952 Compression is based on maximum zip compression because I don't know how to make .xpi files. Nor do I know how well .xpi's are compressed but I assume they're comparable with zip. Assuming that, we can save approx. 952kb on the downloadsize. I've only tested the larger jars with the extensions above. The program can already strip *.dtd with manual settings but doesn't (yet) recognize the extension automatically. We could also trim *.html and with a small modification *.properties can also be stripped. This proces can be taken of course one step further: we can remove the spaces from "var x = 3;". It shouldn't be very hard, I'll look at it later on. So far I think the result are encouraging.
Comment 33•21 years ago
|
||
Comment 34•21 years ago
|
||
Rene: this would only ever be applied to release builds, as all the information you are removing is very useful for debugging. Also, changing the files would break tools like Patch Maker (http://www.gerv.net/software/patch-maker/). So, you have to be completely certain that running your tool over the files will not cause regressions - because the bugs will be very hard to track down. I'd say the best way to do this would be to be certain that your comment-finding rules are exactly the same as those used by the relevant parsers in the Mozilla code. Gerv
Comment 35•21 years ago
|
||
this is great stuff. Do you have a measure of savings of uncompressed+stripped? We found recently that using uncompressed .jars saves us lots of allocations on startup (Because we don't have to go through zlib) I don't think we should be burdening all users because this might break patchmaker. The best thing we can do is integrate this into the build early in 1.5alpha such as right now (thus adding it to nightly builds, not just release builds) and then seeing if any regressions crop up.
Comment 36•21 years ago
|
||
> The best thing we can do is integrate this into the build early in
> 1.5alpha such as right now (thus adding it to nightly builds, not just release
> builds) and then seeing if any regressions crop up.
Alec: do you mean that we should integrate it on a temporary basis, just in
order to find regressions? As I said, if you put it into nightly builds
permanently, that breaks Patch Maker...
Gerv
Comment 37•21 years ago
|
||
Sure, it could be temporary. We could tell people that patchmaker is going to break for a few weeks.
Comment 38•21 years ago
|
||
Rene, I just wanted to provide you a few pointers to build scripts. Ian's preprocessor for Mozilla Firebird might be of particular interest to you. I adapted a Perl script named "HTML Clean" for my test run last year. If past experience is worth anything, optimizing .properties and .dtd is most definitely worth the effort. http://lxr.mozilla.org/seamonkey/source/config/preprocessor.pl http://lxr.mozilla.org/seamonkey/search?string=preprocessor.pl http://lxr.mozilla.org/seamonkey/source/config/make-jars.pl http://lxr.mozilla.org/seamonkey/search?string=make-jars.pl I would imagine that if implemented, XUL optimization will be a build option. When the dust has settled, it will only affect optimized builds, right?
Comment 39•21 years ago
|
||
Using it temporarily on the trunk sounds good to me. Right now is not a good idea since it still contains some known bugs, but I'll try to get them out at asap (I'm rather busy with some exams coming up though), hopefully by the end of this week. Remember that this is just a first naive version that is supposed to be a proof of concept and food for thought rather than a final functional version. Right now it still has some known problems such as regexp in JS (e.g. /\// is seen as a comment) and cdata blocks in xml files can become crippled if they contain a <!--. I have no numbers on uncompressed+stripped but as a rule of thumb you can use the compression ratio of uncompressed vs. compressed. In most cases this is +-30%, except for classic.jar and modern.jar where it is higher due to the big number of images. So uncompressed+stripped should be +-3MB smaller than uncompressed.
Comment 40•21 years ago
|
||
Rene, any news on this? I would be interested in testing something newer than your 06/2003-version of Strip. Have you done any work on that in the last four months?
Comment 41•21 years ago
|
||
Hmm, no. After the first version it ended up under a big pile of dust. I was working on a rewrite to handle some issues better such as reg. expr. and removing spaces from "var x = 3;" but I never finished it. I can't promise anything w.r.t. finishing Strip any time soon since I'm still pretty occupied, as usual.
Updated•20 years ago
|
Product: Browser → Seamonkey
Comment 42•19 years ago
|
||
This is particularly annoying because some viruses pick up e-mail addresses from these files, and my e-mail address is in a least one of them. As a result, I have received several million virus e-mails to that account over the past couple years due to people having both Mozilla products and viruses on their machines. So, if not doing this for a space reduction/performance gain, how about doing it for developer sanity? By bug 279698, I'm under the impression Firefox already tries to do this.
Keywords: privacy
Updated•17 years ago
|
Assignee: dveditz → nobody
Status: REOPENED → NEW
Updated•16 years ago
|
QA Contact: granrosebugs → build-config
Updated•15 years ago
|
Product: SeaMonkey → Core
Comment 43•11 years ago
|
||
(In reply to Rene Pronk from comment #41) > Hmm, no. After the first version it ended up under a big pile of dust. I was > working on a rewrite to handle some issues better such as reg. expr. and > removing spaces from "var x = 3;" but I never finished it. I can't promise > anything w.r.t. finishing Strip any time soon since I'm still pretty > occupied, > as usual. There was a patch for this at one time. Is this even worth pursuing any more, or should it just be marked "WONT FIX"?
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 22 years ago → 10 years ago
Resolution: --- → DUPLICATE
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•