Closed
Bug 162096
Opened 22 years ago
Closed 14 years ago
the GIF decoder should build with -O2
Categories
(Core :: Graphics: ImageLib, defect)
Core
Graphics: ImageLib
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: bernard.alleysson, Unassigned)
References
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
545 bytes,
patch
|
pavlov
:
review-
|
Details | Diff | Splinter Review |
according to pageload profile data at
http://www.people.fas.harvard.edu/~dbaron/mozilla/jrgm-profile-index
the gif decoder takes 3% CPU on a pageload test
-O2 should give it a perf boost
Reporter | ||
Comment 1•22 years ago
|
||
Updated•22 years ago
|
Comment 2•22 years ago
|
||
any reason for doing this only on windows?
Comment 3•22 years ago
|
||
or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?
Reporter | ||
Comment 4•22 years ago
|
||
"any reason for doing this only on windows?"
no, this should be true for other platforms
you have to find -O2 equivalent
"or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?"
my personal tests show that the perf increase is between +5% and +10% at the
cost of code increase of +10, +15% ... I don't think that +15% code size is
wanted for all of mozilla
but for this particular case (the GIF decoder) code size is +1 Ko (this is what
I can see) and if the perf increase is +10% (gif decoding is a CPU intensive
task as opposed to I/O intensive task) and if it is about 3% pageload time, then
you can expect +0.3% pageload performance. I think that it is a good tradeoff.
Comment 5•22 years ago
|
||
pav, what do you think about this?
Comment 6•22 years ago
|
||
Any chance of getting this into 1.4 ?
Can we get any numbers about perf/footprint here?
Comment 7•22 years ago
|
||
resp. if numbers in comment #4 are okay than this would be really a good point
to improve overal page-load performance.
Blocks: 71668
Comment 8•22 years ago
|
||
Windows is MSVC which supports -O2
Mac, Linux, BeOS are GCC which support -O2 too
OS/2 seems to also use GCC by now (according to
http://tinderbox.mozilla.org/SeaMonkey-Ports/)
so this patch should work on all platforms.
Attachment #94784 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #120697 -
Flags: review?(pavlov)
Updated•22 years ago
|
OS: Windows XP → All
Hardware: PC → All
Comment 9•22 years ago
|
||
Comment on attachment 120697 [details] [diff] [review]
let's do this on all platforms
this is dumb. we shouldn't be adding optimize flags to individual modules. If
you want a build with O2, build one.
Attachment #120697 -
Flags: review?(pavlov) → review-
Comment 10•22 years ago
|
||
yes, As I've said in bug 172474, in not so specific terms: there should be a
generic way to say "build this module with opt" such as setting
BUILD_OPT_MODE=size
or
BUILD_OPT_MODE=speed
and then let stuff in rules.mk or config.mk or whatever set the right flags.
Comment 11•22 years ago
|
||
> this is dumb. we shouldn't be adding optimize flags to individual modules. If
> you want a build with O2, build one.
Why are we treating the tree as a monolithic beast that must be compiled with
the same optimization flags across the entire mozilla tree? NSPR, LDAP & NSS
default to building with -O2 as their maintainers verified that -O2 works for
their code and has a useful performance benefit. Assuming that someone has done
the same verifications here, I don't see why defaulting to -O2 for optimized
builds is a problem. Building the entire tree using -O2 is not always an option
due to the problems with egcs which we still support.
> yes, As I've said in bug 172474, in not so specific terms: there should be a
> generic way to say "build this module with opt" such as setting
> BUILD_OPT_MODE=size
> or
> BUILD_OPT_MODE=speed
> and then let stuff in rules.mk or config.mk or whatever set the right flags.
Ugh. Again, you're assuming a monolithic set of flags for the entire project
(ie, "speed" always means "-Ox" and "size" always means "-Oy"). There's no way
to get more granular than that as the central build rules shouldn't have any
explicit knowledge of the individual modules. MODULE_OPTIMIZE_FLAGS already
provides the generic mechanism for the module to decide which flags it wants to
use by default. These flags can still be overridden by --enable-optimize.
Comment 12•22 years ago
|
||
seawood, no I meant that the individual modules would indicate "I want to be
compiled for speed" and so forth
so that each individual module doens't have to say "Use -O2 for gcc and use -Os
for win32 and use /ZSPEED for RandomOS" etc...they can specify what they want in
an XP way
Comment 13•22 years ago
|
||
alecf: I'm looking further into the tree-wide implications. You abstract away
the specific compiler flag settings, in exchage for ?
1) assuming that the optimization flags will be universal across modules?
Optimization flags will definitely not be universal across modules. Some
"optimization" flags like -ffast-math would have no adverse affect on some
modules but severely horks js (bug 198692). Other flags would not be
recommended for certain modules but ok for others (-funroll-loops comes to mind).
Besides which, the default platform specific optimize flags already covers the
universal case and you can pass flags to --enable-optimize to override that.
Or
2) assuming that somewhere we're going to have module/size/speed mapping?
jpeg: size = -Os
jpeg: speed = -09 -funroll-loops
zlib: size = -O
zlib: speed = -O3 -ffast-math
...
That is not something I ever would want to setup or maintain. Nor do I think
the core build makefiles should have that module specific knowledge.
The modules should still be able to specify their own optimization flags to
override the universal default. The default universal optimization flags are
really the LCD. For certain modules, we can do much better than that. See bug
87585 for an example of that.
Comment 14•21 years ago
|
||
Does pavlov's review- make this bug invalid?
I, for one, see no _theoretical_ problem with a module being allowed to specify
its own specific flags, provided it doesn't clutter up some logical space
outside the module itself.
If there's a pageload performance boost, and a miniscule footprint hit, then it
seems like a good idea. Of course, the people who want to compile with O2
_could_, but I'll bet that the majority of downloads (particularly for Win32
platforms) are of binaries. So adding the -O2 switch to the standard binary
compile would improve the user experience of the majority of users, something
that doesn't seem so bad to me.
-M
Comment 15•21 years ago
|
||
-O2 won't work on some compilers like Sun Workshop. How do you plan on dealing
with this?
Comment 16•21 years ago
|
||
Is the group of incompatible compilers small, or large? And is the group of
compilers fixed for the future, or highly changeable?
If the group is small and fixed, perhaps the incompatible compilers could be
excluded by some sort of ifdef.
If the group is large, then perhaps the flags could be defined only for certain
compilers. Is there a way to do this in the current build structure? I don't
know the details of the build process -- I'm only proposing theoretical solutions.
-M
Comment 17•21 years ago
|
||
hm, what is the problem with sun workshop and -O2?
Comment 18•21 years ago
|
||
From Sun WorkShop 6 update 2 - C User guide:
-xO1
Does basic local optimization (peephole).
-xO2
Does basic local and global optimization. This is induction variable
elimination, local and global common subexpression elimination, algebraic
simplification, copy propagation, constant propagation, loop-invariant
optimization, register allocation, basic block merging, tail recursion
elimination, dead code elimination, tail call elimination, and complex
expression expansion.
The -xO2 level does not assign global, external, or indirect references or
definitions to registers. It treats these references and definitions as if they
were declared volatile. In general, the -xO2 level results in minimum code size.
-xO3
Performs like -xO2, but also optimizes references or definitions for external
variables. Loop unrolling and software pipelining are also performed. This level
does not trace the effects of pointer assignments. When compiling either device
drivers, or programs that modify external variables from within signal handlers,
you may need to use the volatile type qualifier to protect the object from
optimization. In general, the -xO3 level results in increased code size.
-xO4
Performs like -xO3, but also automatically inlines functions contained in the
same file; this usually improves execution speed. If you want to control which
functions are inlined, see “-xinline=list”.
This level traces the effects of pointer assignments, and usually results in
increased code size.
-xO5
Attempts to generate the highest level of optimization. Uses optimization
algorithms that take more compilation time or that do not have as high a
certainty of improving execution time. Optimization at this level is more likely
to improve performance if it is done with profile feedback. See “-xprofile=p”.
---
Identical optimization options exist for different versions of this compiler,
though the specific optimizations done could be slightly different.
Comment 19•20 years ago
|
||
Is there a chance of having this anytime soon on the trunk to test?
Comment 20•20 years ago
|
||
Let's first do some manual optimization:
* Bug 285872: GIF Decoder: replace gathering buffer with dynamic malloc to fixed
256 bytes hold
* Bug 196295: Move GIF2.cpp into nsGIFDecoder2
The first removes some dynamical allocation during parsing, and optimized the
state engine, making the GIF decoding state engine about twice as fast.
The second merges GIF decoding code into one file, allowing for better
optimization, and less function call overhead.
Quote: 'the difference is interesting: between 10% and 30% faster
image decoding.'
Too bad that these bugs are stuck in the review process...
Updated•18 years ago
|
Assignee: pavlov → nobody
QA Contact: tpreston → imagelib
Comment 21•17 years ago
|
||
Maybe we should consider this again? GIF decoding functions are still showing up pretty high in a profile of tp2.
Comment 22•17 years ago
|
||
Comment 23•14 years ago
|
||
Is this still an issue, or is this obsolete in the meantime? Is there anything I - as a greenhorn - can do here?
Comment 24•14 years ago
|
||
(In reply to comment #23)
> Is this still an issue, or is this obsolete in the meantime? Is there anything
> I - as a greenhorn - can do here?
Our compiler flags vary by platform. In theory it's possible that we might get a very small speedup on GIF decoding by forcing -O2, but I very much doubt it's worth spending time on.
If you're interested in helping out on imagelib though, I'd be happy to give you some much more useful things to help out on. Shoot me an email or ping me (bholley) on IRC.
I'm resolving this WORKSFORME because in general we don't want to set per file optimization flags, we compile all of our release builds with substantial optimization, and compilers have improved a lot since 2002. If anyone disagrees, feel free to reopen.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•