Closed Bug 162096 Opened 22 years ago Closed 14 years ago

the GIF decoder should build with -O2

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bernard.alleysson, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

according to pageload profile data at
http://www.people.fas.harvard.edu/~dbaron/mozilla/jrgm-profile-index
the gif decoder takes 3% CPU on a pageload test

-O2 should give it a perf boost
Blocks: 136999
Keywords: patch, perf
any reason for doing this only on windows?
or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?
"any reason for doing this only on windows?"
no, this should be true for other platforms
you have to find -O2 equivalent

"or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?"

my personal tests show that the perf increase is between +5% and +10% at the
cost of code increase of +10, +15% ... I don't think that +15% code size is
wanted for all of mozilla

but for this particular case (the GIF decoder) code size is +1 Ko (this is what
I can see) and if the perf increase is +10% (gif decoding is a CPU intensive
task as opposed to I/O intensive task) and if it is about 3% pageload time, then
you can expect +0.3% pageload performance. I think that it is a good tradeoff.
pav, what do you think about this?
Any chance of getting this into 1.4 ?
Can we get any numbers about perf/footprint here?
resp. if numbers in comment #4 are okay than this would be really a good point 
to improve overal page-load performance.
Blocks: 71668
Windows is MSVC which supports -O2
Mac, Linux, BeOS are GCC which support -O2 too
OS/2 seems to also use GCC by now (according to
http://tinderbox.mozilla.org/SeaMonkey-Ports/)

so this patch should work on all platforms.
Attachment #94784 - Attachment is obsolete: true
OS: Windows XP → All
Hardware: PC → All
Comment on attachment 120697 [details] [diff] [review]
let's do this on all platforms

this is dumb. we shouldn't be adding optimize flags to individual modules.  If
you want a build with O2, build one.
Attachment #120697 - Flags: review?(pavlov) → review-
yes, As I've said in bug 172474, in not so specific terms: there should be a
generic way to say "build this module with opt" such as setting
BUILD_OPT_MODE=size
or
BUILD_OPT_MODE=speed

and then let stuff in rules.mk or config.mk or whatever set the right flags.
> this is dumb. we shouldn't be adding optimize flags to individual modules.  If
> you want a build with O2, build one.

Why are we treating the tree as a monolithic beast that must be compiled with
the same optimization flags across the entire mozilla tree?  NSPR, LDAP & NSS
default to building with -O2 as their maintainers verified that -O2 works for
their code and has a useful performance benefit.  Assuming that someone has done
the same verifications here, I don't see why defaulting to -O2 for optimized
builds is a problem.  Building the entire tree using -O2 is not always an option
due to the problems with egcs which we still support.

> yes, As I've said in bug 172474, in not so specific terms: there should be a
> generic way to say "build this module with opt" such as setting
> BUILD_OPT_MODE=size
> or
> BUILD_OPT_MODE=speed
> and then let stuff in rules.mk or config.mk or whatever set the right flags.

Ugh. Again, you're assuming a monolithic set of flags for the entire project
(ie, "speed" always means "-Ox" and "size" always means "-Oy").  There's no way
to get more granular than that as the central build rules shouldn't have any
explicit knowledge of the individual modules.  MODULE_OPTIMIZE_FLAGS already
provides the generic mechanism for the module to decide which flags it wants to
use by default.  These flags can still be overridden by --enable-optimize.

seawood, no I meant that the individual modules would indicate "I want to be
compiled for speed" and so forth

so that each individual module doens't have to say "Use -O2 for gcc and use -Os
for win32 and use /ZSPEED for RandomOS" etc...they can specify what they want in
an XP way
alecf:  I'm looking further into the tree-wide implications.  You abstract away
the specific compiler flag settings, in exchage for ?

1) assuming that the optimization flags will be universal across modules?

Optimization flags will definitely not be universal across modules.  Some
"optimization" flags like -ffast-math would have no adverse affect on some
modules but severely horks js (bug 198692).  Other flags would not be
recommended for certain modules but ok for others (-funroll-loops comes to mind).  

Besides which, the default platform specific optimize flags already covers the
universal case and you can pass flags to --enable-optimize to override that.

Or
2) assuming that somewhere we're going to have module/size/speed mapping?
jpeg: size = -Os
jpeg: speed = -09 -funroll-loops 
zlib: size = -O
zlib: speed = -O3 -ffast-math
...

That is not something I ever would want to setup or maintain.  Nor do I think
the core build makefiles should have that module specific knowledge.

The modules should still be able to specify their own optimization flags to
override the universal default.  The default universal optimization flags are
really the LCD.  For certain modules, we can do much better than that.  See bug
87585 for an example of that.
Blocks: 203448
Does pavlov's review- make this bug invalid?

I, for one, see no _theoretical_ problem with a module being allowed to specify
its own specific flags, provided it doesn't clutter up some logical space
outside the module itself.

If there's a pageload performance boost, and a miniscule footprint hit, then it
seems like a good idea. Of course, the people who want to compile with O2
_could_, but I'll bet that the majority of downloads (particularly for Win32
platforms) are of binaries. So adding the -O2 switch to the standard binary
compile would improve the user experience of the majority of users, something
that doesn't seem so bad to me.

-M
-O2 won't work on some compilers like Sun Workshop.  How do you plan on dealing 
with this?
Is the group of incompatible compilers small, or large? And is the group of
compilers fixed for the future, or highly changeable?

If the group is small and fixed, perhaps the incompatible compilers could be
excluded by some sort of ifdef.

If the group is large, then perhaps the flags could be defined only for certain
compilers. Is there a way to do this in the current build structure? I don't
know the details of the build process -- I'm only proposing theoretical solutions.

-M
hm, what is the problem with sun workshop and -O2?
From Sun WorkShop 6 update 2 - C User guide:

-xO1
Does basic local optimization (peephole).

-xO2
Does basic local and global optimization. This is induction variable 
elimination, local and global common subexpression elimination, algebraic 
simplification, copy propagation, constant propagation, loop-invariant 
optimization, register allocation, basic block merging, tail recursion 
elimination, dead code elimination, tail call elimination, and complex 
expression expansion.
The -xO2 level does not assign global, external, or indirect references or 
definitions to registers. It treats these references and definitions as if they 
were declared volatile. In general, the -xO2 level results in minimum code size.

-xO3
Performs like -xO2, but also optimizes references or definitions for external
variables. Loop unrolling and software pipelining are also performed. This level
does not trace the effects of pointer assignments. When compiling either device
drivers, or programs that modify external variables from within signal handlers,
you may need to use the volatile type qualifier to protect the object from
optimization. In general, the -xO3 level results in increased code size.

-xO4
Performs like -xO3, but also automatically inlines functions contained in the 
same file; this usually improves execution speed. If you want to control which 
functions are inlined, see “-xinline=list”.
This level traces the effects of pointer assignments, and usually results in 
increased code size.

-xO5
Attempts to generate the highest level of optimization. Uses optimization 
algorithms that take more compilation time or that do not have as high a 
certainty of improving execution time. Optimization at this level is more likely
to improve performance if it is done with profile feedback. See “-xprofile=p”.

---

Identical optimization options exist for different versions of this compiler,
though the specific optimizations done could be slightly different.
Is there a chance of having this anytime soon on the trunk to test?
Let's first do some manual optimization: 
* Bug 285872: GIF Decoder: replace gathering buffer with dynamic malloc to fixed
256 bytes hold
* Bug 196295: Move GIF2.cpp into nsGIFDecoder2

The first removes some dynamical allocation during parsing, and optimized the
state engine, making the GIF decoding state engine about twice as fast.
The second merges GIF decoding code into one file, allowing for better
optimization, and less function call overhead.
Quote: 'the difference is interesting: between 10% and 30% faster
image decoding.'

Too bad that these bugs are stuck in the review process...
Assignee: pavlov → nobody
QA Contact: tpreston → imagelib
Maybe we should consider this again?  GIF decoding functions are still showing up pretty high in a profile of tp2.
See also bug 366465, bug 105370 and bug 143046 for more ways to improve GIF decoding.
Is this still an issue, or is this obsolete in the meantime? Is there anything I - as a greenhorn - can do here?
(In reply to comment #23)
> Is this still an issue, or is this obsolete in the meantime? Is there anything
> I - as a greenhorn - can do here?

Our compiler flags vary by platform. In theory it's possible that we might get a very small speedup on GIF decoding by forcing -O2, but I very much doubt it's worth spending time on.

If you're interested in helping out on imagelib though, I'd be happy to give you some much more useful things to help out on. Shoot me an email or ping me (bholley) on IRC.
I'm resolving this WORKSFORME because in general we don't want to set per file optimization flags, we compile all of our release builds with substantial optimization, and compilers have improved a lot since 2002.  If anyone disagrees, feel free to reopen.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.