Closed Bug 162096 Opened 21 years ago Closed 13 years ago
the GIF decoder should build with -O2
according to pageload profile data at http://www.people.fas.harvard.edu/~dbaron/mozilla/jrgm-profile-index the gif decoder takes 3% CPU on a pageload test -O2 should give it a perf boost
any reason for doing this only on windows?
or, rather, any reason for limiting that to only the gif decoder rather than applying it to all of mozilla?
"any reason for doing this only on windows?" no, this should be true for other platforms you have to find -O2 equivalent "or, rather, any reason for limiting that to only the gif decoder rather than applying it to all of mozilla?" my personal tests show that the perf increase is between +5% and +10% at the cost of code increase of +10, +15% ... I don't think that +15% code size is wanted for all of mozilla but for this particular case (the GIF decoder) code size is +1 Ko (this is what I can see) and if the perf increase is +10% (gif decoding is a CPU intensive task as opposed to I/O intensive task) and if it is about 3% pageload time, then you can expect +0.3% pageload performance. I think that it is a good tradeoff.
pav, what do you think about this?
Any chance of getting this into 1.4 ? Can we get any numbers about perf/footprint here?
resp. if numbers in comment #4 are okay than this would be really a good point to improve overal page-load performance.
Windows is MSVC which supports -O2 Mac, Linux, BeOS are GCC which support -O2 too OS/2 seems to also use GCC by now (according to http://tinderbox.mozilla.org/SeaMonkey-Ports/) so this patch should work on all platforms.
Attachment #94784 - Attachment is obsolete: true
20 years ago
Attachment #120697 - Flags: review?(pavlov)
Comment on attachment 120697 [details] [diff] [review] let's do this on all platforms this is dumb. we shouldn't be adding optimize flags to individual modules. If you want a build with O2, build one.
Attachment #120697 - Flags: review?(pavlov) → review-
yes, As I've said in bug 172474, in not so specific terms: there should be a generic way to say "build this module with opt" such as setting BUILD_OPT_MODE=size or BUILD_OPT_MODE=speed and then let stuff in rules.mk or config.mk or whatever set the right flags.
> this is dumb. we shouldn't be adding optimize flags to individual modules. If > you want a build with O2, build one. Why are we treating the tree as a monolithic beast that must be compiled with the same optimization flags across the entire mozilla tree? NSPR, LDAP & NSS default to building with -O2 as their maintainers verified that -O2 works for their code and has a useful performance benefit. Assuming that someone has done the same verifications here, I don't see why defaulting to -O2 for optimized builds is a problem. Building the entire tree using -O2 is not always an option due to the problems with egcs which we still support. > yes, As I've said in bug 172474, in not so specific terms: there should be a > generic way to say "build this module with opt" such as setting > BUILD_OPT_MODE=size > or > BUILD_OPT_MODE=speed > and then let stuff in rules.mk or config.mk or whatever set the right flags. Ugh. Again, you're assuming a monolithic set of flags for the entire project (ie, "speed" always means "-Ox" and "size" always means "-Oy"). There's no way to get more granular than that as the central build rules shouldn't have any explicit knowledge of the individual modules. MODULE_OPTIMIZE_FLAGS already provides the generic mechanism for the module to decide which flags it wants to use by default. These flags can still be overridden by --enable-optimize.
seawood, no I meant that the individual modules would indicate "I want to be compiled for speed" and so forth so that each individual module doens't have to say "Use -O2 for gcc and use -Os for win32 and use /ZSPEED for RandomOS" etc...they can specify what they want in an XP way
alecf: I'm looking further into the tree-wide implications. You abstract away the specific compiler flag settings, in exchage for ? 1) assuming that the optimization flags will be universal across modules? Optimization flags will definitely not be universal across modules. Some "optimization" flags like -ffast-math would have no adverse affect on some modules but severely horks js (bug 198692). Other flags would not be recommended for certain modules but ok for others (-funroll-loops comes to mind). Besides which, the default platform specific optimize flags already covers the universal case and you can pass flags to --enable-optimize to override that. Or 2) assuming that somewhere we're going to have module/size/speed mapping? jpeg: size = -Os jpeg: speed = -09 -funroll-loops zlib: size = -O zlib: speed = -O3 -ffast-math ... That is not something I ever would want to setup or maintain. Nor do I think the core build makefiles should have that module specific knowledge. The modules should still be able to specify their own optimization flags to override the universal default. The default universal optimization flags are really the LCD. For certain modules, we can do much better than that. See bug 87585 for an example of that.
Does pavlov's review- make this bug invalid? I, for one, see no _theoretical_ problem with a module being allowed to specify its own specific flags, provided it doesn't clutter up some logical space outside the module itself. If there's a pageload performance boost, and a miniscule footprint hit, then it seems like a good idea. Of course, the people who want to compile with O2 _could_, but I'll bet that the majority of downloads (particularly for Win32 platforms) are of binaries. So adding the -O2 switch to the standard binary compile would improve the user experience of the majority of users, something that doesn't seem so bad to me. -M
-O2 won't work on some compilers like Sun Workshop. How do you plan on dealing with this?
Is the group of incompatible compilers small, or large? And is the group of compilers fixed for the future, or highly changeable? If the group is small and fixed, perhaps the incompatible compilers could be excluded by some sort of ifdef. If the group is large, then perhaps the flags could be defined only for certain compilers. Is there a way to do this in the current build structure? I don't know the details of the build process -- I'm only proposing theoretical solutions. -M
hm, what is the problem with sun workshop and -O2?
From Sun WorkShop 6 update 2 - C User guide: -xO1 Does basic local optimization (peephole). -xO2 Does basic local and global optimization. This is induction variable elimination, local and global common subexpression elimination, algebraic simplification, copy propagation, constant propagation, loop-invariant optimization, register allocation, basic block merging, tail recursion elimination, dead code elimination, tail call elimination, and complex expression expansion. The -xO2 level does not assign global, external, or indirect references or definitions to registers. It treats these references and definitions as if they were declared volatile. In general, the -xO2 level results in minimum code size. -xO3 Performs like -xO2, but also optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. This level does not trace the effects of pointer assignments. When compiling either device drivers, or programs that modify external variables from within signal handlers, you may need to use the volatile type qualifier to protect the object from optimization. In general, the -xO3 level results in increased code size. -xO4 Performs like -xO3, but also automatically inlines functions contained in the same file; this usually improves execution speed. If you want to control which functions are inlined, see “-xinline=list”. This level traces the effects of pointer assignments, and usually results in increased code size. -xO5 Attempts to generate the highest level of optimization. Uses optimization algorithms that take more compilation time or that do not have as high a certainty of improving execution time. Optimization at this level is more likely to improve performance if it is done with profile feedback. See “-xprofile=p”. --- Identical optimization options exist for different versions of this compiler, though the specific optimizations done could be slightly different.
Is there a chance of having this anytime soon on the trunk to test?
Let's first do some manual optimization: * Bug 285872: GIF Decoder: replace gathering buffer with dynamic malloc to fixed 256 bytes hold * Bug 196295: Move GIF2.cpp into nsGIFDecoder2 The first removes some dynamical allocation during parsing, and optimized the state engine, making the GIF decoding state engine about twice as fast. The second merges GIF decoding code into one file, allowing for better optimization, and less function call overhead. Quote: 'the difference is interesting: between 10% and 30% faster image decoding.' Too bad that these bugs are stuck in the review process...
Maybe we should consider this again? GIF decoding functions are still showing up pretty high in a profile of tp2.
Is this still an issue, or is this obsolete in the meantime? Is there anything I - as a greenhorn - can do here?
(In reply to comment #23) > Is this still an issue, or is this obsolete in the meantime? Is there anything > I - as a greenhorn - can do here? Our compiler flags vary by platform. In theory it's possible that we might get a very small speedup on GIF decoding by forcing -O2, but I very much doubt it's worth spending time on. If you're interested in helping out on imagelib though, I'd be happy to give you some much more useful things to help out on. Shoot me an email or ping me (bholley) on IRC.
I'm resolving this WORKSFORME because in general we don't want to set per file optimization flags, we compile all of our release builds with substantial optimization, and compilers have improved a lot since 2002. If anyone disagrees, feel free to reopen.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.