162096 - the GIF decoder should build with -O2

Reporter

Description

•

22 years ago

according to pageload profile data at
http://www.people.fas.harvard.edu/~dbaron/mozilla/jrgm-profile-index
the gif decoder takes 3% CPU on a pageload test

-O2 should give it a perf boost

Bernard Alleysson

Reporter

Comment 1

•

22 years ago

Attached patch build the GIF decoder with -O2 on windows (obsolete) — Details — Splinter Review

Bernard Alleysson

Reporter

Updated

•

22 years ago

Blocks: 136999

Olivier Cahagne

Updated

•

22 years ago

Keywords: patch, perf

Christian :Biesinger (don't email me, ping me on IRC)

Comment 2

•

22 years ago

any reason for doing this only on windows?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 3

•

22 years ago

or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?

Bernard Alleysson

Reporter

Comment 4

•

22 years ago

"any reason for doing this only on windows?"
no, this should be true for other platforms
you have to find -O2 equivalent

"or, rather, any reason for limiting that to only the gif decoder rather than
applying it to all of mozilla?"

my personal tests show that the perf increase is between +5% and +10% at the
cost of code increase of +10, +15% ... I don't think that +15% code size is
wanted for all of mozilla

but for this particular case (the GIF decoder) code size is +1 Ko (this is what
I can see) and if the perf increase is +10% (gif decoding is a CPU intensive
task as opposed to I/O intensive task) and if it is about 3% pageload time, then
you can expect +0.3% pageload performance. I think that it is a good tradeoff.

Stephen Walker

Comment 5

•

22 years ago

pav, what do you think about this?

Markus Hübner

Comment 6

•

22 years ago

Any chance of getting this into 1.4 ?
Can we get any numbers about perf/footprint here?

Markus Hübner

Comment 7

•

22 years ago

resp. if numbers in comment #4 are okay than this would be really a good point 
to improve overal page-load performance.

Blocks: 71668

Christian :Biesinger (don't email me, ping me on IRC)

Comment 8

•

22 years ago

Attached patch let's do this on all platforms — Details — Splinter Review

Windows is MSVC which supports -O2
Mac, Linux, BeOS are GCC which support -O2 too
OS/2 seems to also use GCC by now (according to
http://tinderbox.mozilla.org/SeaMonkey-Ports/)

so this patch should work on all platforms.

Attachment #94784 - Attachment is obsolete: true

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

22 years ago

Attachment #120697 - Flags: review?(pavlov)

Markus Hübner

Updated

•

22 years ago

OS: Windows XP → All

Hardware: PC → All

Stuart Parmenter

Comment 9

•

22 years ago

Comment on attachment 120697 [details] [diff] [review]
let's do this on all platforms

this is dumb. we shouldn't be adding optimize flags to individual modules.  If
you want a build with O2, build one.

Attachment #120697 - Flags: review?(pavlov) → review-

Alec Flett

Comment 10

•

22 years ago

yes, As I've said in bug 172474, in not so specific terms: there should be a
generic way to say "build this module with opt" such as setting
BUILD_OPT_MODE=size
or
BUILD_OPT_MODE=speed

and then let stuff in rules.mk or config.mk or whatever set the right flags.

hacker formerly known as seawood@netscape.com

Comment 11

•

22 years ago

> this is dumb. we shouldn't be adding optimize flags to individual modules.  If
> you want a build with O2, build one.

Why are we treating the tree as a monolithic beast that must be compiled with
the same optimization flags across the entire mozilla tree?  NSPR, LDAP & NSS
default to building with -O2 as their maintainers verified that -O2 works for
their code and has a useful performance benefit.  Assuming that someone has done
the same verifications here, I don't see why defaulting to -O2 for optimized
builds is a problem.  Building the entire tree using -O2 is not always an option
due to the problems with egcs which we still support.

> yes, As I've said in bug 172474, in not so specific terms: there should be a
> generic way to say "build this module with opt" such as setting
> BUILD_OPT_MODE=size
> or
> BUILD_OPT_MODE=speed
> and then let stuff in rules.mk or config.mk or whatever set the right flags.

Ugh. Again, you're assuming a monolithic set of flags for the entire project
(ie, "speed" always means "-Ox" and "size" always means "-Oy").  There's no way
to get more granular than that as the central build rules shouldn't have any
explicit knowledge of the individual modules.  MODULE_OPTIMIZE_FLAGS already
provides the generic mechanism for the module to decide which flags it wants to
use by default.  These flags can still be overridden by --enable-optimize.

Alec Flett

Comment 12

•

22 years ago

seawood, no I meant that the individual modules would indicate "I want to be
compiled for speed" and so forth

so that each individual module doens't have to say "Use -O2 for gcc and use -Os
for win32 and use /ZSPEED for RandomOS" etc...they can specify what they want in
an XP way

hacker formerly known as seawood@netscape.com

Comment 13

•

22 years ago

alecf:  I'm looking further into the tree-wide implications.  You abstract away
the specific compiler flag settings, in exchage for ?

1) assuming that the optimization flags will be universal across modules?

Optimization flags will definitely not be universal across modules.  Some
"optimization" flags like -ffast-math would have no adverse affect on some
modules but severely horks js (bug 198692).  Other flags would not be
recommended for certain modules but ok for others (-funroll-loops comes to mind).  

Besides which, the default platform specific optimize flags already covers the
universal case and you can pass flags to --enable-optimize to override that.

Or
2) assuming that somewhere we're going to have module/size/speed mapping?
jpeg: size = -Os
jpeg: speed = -09 -funroll-loops 
zlib: size = -O
zlib: speed = -O3 -ffast-math
...

That is not something I ever would want to setup or maintain.  Nor do I think
the core build makefiles should have that module specific knowledge.

The modules should still be able to specify their own optimization flags to
override the universal default.  The default universal optimization flags are
really the LCD.  For certain modules, we can do much better than that.  See bug
87585 for an example of that.

Markus Hübner

Updated

•

22 years ago

Blocks: 203448

Max Kanat-Alexander

Comment 14

•

21 years ago

Does pavlov's review- make this bug invalid?

I, for one, see no _theoretical_ problem with a module being allowed to specify
its own specific flags, provided it doesn't clutter up some logical space
outside the module itself.

If there's a pageload performance boost, and a miniscule footprint hit, then it
seems like a good idea. Of course, the people who want to compile with O2
_could_, but I'll bet that the majority of downloads (particularly for Win32
platforms) are of binaries. So adding the -O2 switch to the standard binary
compile would improve the user experience of the majority of users, something
that doesn't seem so bad to me.

-M

Stuart Parmenter

Comment 15

•

21 years ago

-O2 won't work on some compilers like Sun Workshop.  How do you plan on dealing 
with this?

Max Kanat-Alexander

Comment 16

•

21 years ago

Is the group of incompatible compilers small, or large? And is the group of
compilers fixed for the future, or highly changeable?

If the group is small and fixed, perhaps the incompatible compilers could be
excluded by some sort of ifdef.

If the group is large, then perhaps the flags could be defined only for certain
compilers. Is there a way to do this in the current build structure? I don't
know the details of the build process -- I'm only proposing theoretical solutions.

-M

Christian :Biesinger (don't email me, ping me on IRC)

Comment 17

•

21 years ago

hm, what is the problem with sun workshop and -O2?

mmc

Comment 18

•

21 years ago

From Sun WorkShop 6 update 2 - C User guide:

-xO1
Does basic local optimization (peephole).

-xO2
Does basic local and global optimization. This is induction variable 
elimination, local and global common subexpression elimination, algebraic 
simplification, copy propagation, constant propagation, loop-invariant 
optimization, register allocation, basic block merging, tail recursion 
elimination, dead code elimination, tail call elimination, and complex 
expression expansion.
The -xO2 level does not assign global, external, or indirect references or 
definitions to registers. It treats these references and definitions as if they 
were declared volatile. In general, the -xO2 level results in minimum code size.

-xO3
Performs like -xO2, but also optimizes references or definitions for external
variables. Loop unrolling and software pipelining are also performed. This level
does not trace the effects of pointer assignments. When compiling either device
drivers, or programs that modify external variables from within signal handlers,
you may need to use the volatile type qualifier to protect the object from
optimization. In general, the -xO3 level results in increased code size.

-xO4
Performs like -xO3, but also automatically inlines functions contained in the 
same file; this usually improves execution speed. If you want to control which 
functions are inlined, see “-xinline=list”.
This level traces the effects of pointer assignments, and usually results in 
increased code size.

-xO5
Attempts to generate the highest level of optimization. Uses optimization 
algorithms that take more compilation time or that do not have as high a 
certainty of improving execution time. Optimization at this level is more likely
to improve performance if it is done with profile feedback. See “-xprofile=p”.

---

Identical optimization options exist for different versions of this compiler,
though the specific optimizations done could be slightly different.

Markus Hübner

Comment 19

•

20 years ago

Is there a chance of having this anytime soon on the trunk to test?

Alfred Kayser

Comment 20

•

20 years ago

Let's first do some manual optimization: 
* Bug 285872: GIF Decoder: replace gathering buffer with dynamic malloc to fixed
256 bytes hold
* Bug 196295: Move GIF2.cpp into nsGIFDecoder2

The first removes some dynamical allocation during parsing, and optimized the
state engine, making the GIF decoding state engine about twice as fast.
The second merges GIF decoding code into one file, allowing for better
optimization, and less function call overhead.
Quote: 'the difference is interesting: between 10% and 30% faster
image decoding.'

Too bad that these bugs are stuck in the review process...

Stuart Parmenter

Updated

•

17 years ago

Assignee: pavlov → nobody

QA Contact: tpreston → imagelib

Eli Friedman

Comment 21

•

17 years ago

Maybe we should consider this again?  GIF decoding functions are still showing up pretty high in a profile of tp2.

Alfred Kayser

Comment 22

•

17 years ago

See also bug 366465, bug 105370 and bug 143046 for more ways to improve GIF decoding.

Markus Multrus

Comment 23

•

14 years ago

Is this still an issue, or is this obsolete in the meantime? Is there anything I - as a greenhorn - can do here?

Bobby Holley (:bholley)

Comment 24

•

14 years ago

(In reply to comment #23)
> Is this still an issue, or is this obsolete in the meantime? Is there anything
> I - as a greenhorn - can do here?

Our compiler flags vary by platform. In theory it's possible that we might get a very small speedup on GIF decoding by forcing -O2, but I very much doubt it's worth spending time on.

If you're interested in helping out on imagelib though, I'd be happy to give you some much more useful things to help out on. Shoot me an email or ping me (bholley) on IRC.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 25

•

14 years ago

I'm resolving this WORKSFORME because in general we don't want to set per file optimization flags, we compile all of our release builds with substantial optimization, and compilers have improved a lot since 2002.  If anyone disagrees, feel free to reopen.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WORKSFORME

build the GIF decoder with -O2 on windows 22 years ago Bernard Alleysson 478 bytes, patch		Details \| Diff \| Splinter Review
let's do this on all platforms 22 years ago Christian :Biesinger (don't email me, ping me on IRC) 545 bytes, patch	pavlov : review-	Details \| Diff \| Splinter Review