225433 - investigate -Os for nightly/release builds

Reporter

Description

•

22 years ago

So, bryner ran an experiment on the redwood firebird tbox (gcc 3.3.2) a few days ago, by switching the optimization flag to use -Os instead of -O2. codesize reduced by 1,545kb out of 14,387kb, a 10.7% reduction. the impact on perf metrics seems to be neutral overall - Ts remained about the same, Txul improved by ~1%, and Tp got larger by a barely measurable amount (maybe ~0.5%). http://tinderbox.mozilla.org/showbuilds.cgi?tree=Phoenix&hours=24&maxdate=1068502523&legend=0 i've done a comparison between my local builds of seamonkey, between -O2 and -Os. for seamonkey, we save 9.3% in binary codesize. i don't have a -O comparison on-hand at the moment, but i'm rolling one to compare. given that perf remains neutral, and we save about 10% in binary size, is this something we want to do for gcc builds of seamonkey nightlies/releases?

dwitte@gmail.com

Reporter

Comment 1

•

22 years ago

afaict we use -O2 for gcc release builds. it seems the contributed gtk2/xft linux builds use -O3... ;)

dwitte@gmail.com

Reporter

Comment 2

•

22 years ago

my -O build finished: binary size reduces by 2.7% relative to -O2. i'm assuming performance metrics are in between those of -Os and -O2, and hence are also neutral. so it appears -Os is the sweet spot here...

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 3

•

22 years ago

I'd prefer to see a bit more detailed performance analysis before we do this. However if things look good then I'm all for it. The reason why perfomance doesn't degrade could simply be that there is less code. So we'd end up swapping code less frequently and we'd hit instruction-caches more often.

dwitte@gmail.com

Reporter

Comment 4

•

22 years ago

what kind of performance analysis would you suggest? imo, for this kind of change we'd be interested in fairly broad metrics like the ones we have, Ts/Tp/Txul. so perhaps switching one of the tinderboxen to -Os (luna?) would be a good start (ignoring for the moment that it runs a slightly older gcc, 3.2). that said, i think the data we already have for firebird is perfectly applicable to seamonkey. i've run some Ts/Txul tests locally on a p3-550, linux/gtk2, gcc 3.3.2. the Ts tests are not useful, because the standard deviation is far too high (~10%) for any changes to be visible: -Os -O2 Ts avg 3518.6 3505.15 Ts stdev 280.6 299.2 however, my Txul tests show a larger improvement than the firebird tests did (most likely due to the different perf characteristics of the p3-550). these results have a pretty low standard deviation (< 0.5%), and so are statistically significant: -Os -O2 improvement (Os relative to O2) Txul avg 970.4 998.2 2.8% Txul stdev 27.3 26.0 i'm unable to test Tp since i'm outside the firewall.

dwitte@gmail.com

Reporter

Comment 5

•

22 years ago

er, those standard deviations should read: Txul stdev 4.5 3.8

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 6

•

22 years ago

I'd like to see at least Tp measured as well so making the switch on one of the tinderboxen sounds like a good idea. Also if you have any dhtml-tests or js-tests handy that would be good but no requirement from my part (I know they exist but i don't know where, sorry).

Boris Zbarsky [:bzbarsky]

Comment 7

•

22 years ago

There are some scattered in various bugs... (search for "dhtml perf").

tor

Comment 8

•

22 years ago

Another thing that someone might want to investigate is tweaking gcc's inliner. Dropping the inline limit in half (-finline-limit=300) on gcc-3.3.2 reduced the code size by another 440K. More is probably achievable by changing this value or the underlying parameters (max-inline-*).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 9

•

22 years ago

There are some large functions that we really do want to inline, since they're only used once or twice. I'd rather tweak inlining by finding the things that really shouldn't be inlined (probably in the string code) and making them not inline.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 10

•

22 years ago

David: Are you sure these functions are really being inlined? MSVC has a pretty low limit for what it is willing to inline (for example some of the nsVoidArray functions arn't always inlined) and gcc too has a limit for what it will inline. So in general you shouldn't rely on having your functions inlined unless they are really small.

dwitte@gmail.com

Reporter

Comment 11

•

22 years ago

the only way to positively force inlining is by using the gcc __attribute__((always_inline)). having said that, i agree with dbaron's view, especially as applied to strings... the inlining model there is whacky. i'm sure we could do great things for both codesize and perf by fixing that.

Roland Mainz

Comment 12

•

22 years ago

Boris Zbarsky [:bzbarsky]

Comment 13

•

21 years ago

Of note is that while overall (compressed) tarball size does in fact drop by about 10%, the size of some libraries drops by more than that. gklayout and necko (both stripped) drop by about 20% here (-O2 compared to -Os, gcc 3.2). xpcom, docshell, and a few others drop by 10%. uconv drops by 2%. So on some libraries we're actually seing a huge win from -Os (20% of gklayout is about 900KB). Frankly, I would be in favor of flipping the switch sometime in an alpha milestone (like now, say) for tinderbox and the nightlies and seeing what happens. Once we have nightlies with the change, we can put out a call to people who do DHTML stuff (most of whom don't build) to compare the new and old builds....

Boris Zbarsky [:bzbarsky]

Comment 14

•

21 years ago

In other words, we have all these nighlies that are _supposed_ to be for testing purposes and we have people testing them. We should make use of that.

Ben Bucksch (:BenB)

Comment 15

•

21 years ago

Compare bug 53486 > if you have any dhtml-tests <http://www.world-direct.com/mozilla/dhtml/funo/domtestcases/index.htm>

Jon Granrose

Comment 16

•

21 years ago

firefox is using -Os, any reason not to switch comet (seamonkey release) or luna (seamonkey perf tests) over to doing -Os builds at this point, or do we want to wait for post 1.8?

Assignee: leaf → cmp

Priority: -- → P3

Jon Granrose

Comment 17

•

21 years ago

*** Bug 53486 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Comment 18

•

21 years ago

granrose: switching now sounds entirely reasonable to me.

Benjamin Smedberg

Comment 19

•

21 years ago

I think we should get dbaron's approval to change the tinderboxen; we generally prefer the historical comparison in the numbers by using the same build flags (which is why btek still uses egcs), even if this doesn't produce the most optimized builds.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

21 years ago

FWIW, I'd expect -O2 builds to be faster than -Os, especially with newer gccs, thanks to basic block reordering. (We've tagged a few hotspots with NS_LIKELY / NS_UNLIKELY since comment 0 happened, so it could be worth re-measuring.) I'd rather not change tinderboxes that are generating performance data. I think we already have some with -O2 and some with -Os.

dwitte@gmail.com

Reporter

Comment 21

•

21 years ago

dbaron: the results in comment 4 (alas, Txul only, no Tp measurements) were done with 3.3.2... did block reordering come in recently (3.4), or are my results still representative?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

21 years ago

IIRC, NS_LIKELY and NS_UNLIKELY are more recent than comment 4. From memory: * gcc 3.3.x does basic block reordering (-freorder-blocks) at -O2 but not -Os * gcc 3.4 also does ,pt / ,pf annotations on conditional jump instructions (which solves the branch prediction problem but not the cache miss problem that's solved by -freorder-blocks), but I'm not sure at what optimization levels.

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: Browser → Seamonkey

Chase Phillips

Comment 23

•

20 years ago

Mass reassign of open bugs for chase@mozilla.org to build@mozilla-org.bugs.

Assignee: chase → build

J. Paul Reed [:preed]

Comment 24

•

19 years ago

Mass re-assign of bugs that aren't on the build team radar, so bugs assigned to build@mozilla-org.bugs reflects reality. If there is a bug you really think we need to be looking at, please *email* build@mozilla.org with a bug number and explanation.

Assignee: build → nobody

Stan Shebs

Updated

•

19 years ago

Assignee: nobody → stanshebs

Stan Shebs

Comment 25

•

19 years ago

Apparently Linux releases on the 1.8 branch have been built -Os for awhile; Chris Cooper added this in November as part of migrating tinderbox bits to the public repository, as seen in http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/tools/tinderbox-configs/firefox/linux/mozconfig&rev=MOZILLA_1_8_BRANCH_release . Mac is being built -O2 on trunk and branches.

Worcester12345

Comment 26

•

18 years ago

Perf?

Robert Kaiser

Comment 27

•

18 years ago

Is still still something we're looking into or should it be closed in some way?

Brian Crowder

Comment 28

•

18 years ago

I still think this deserves investigation. At least, we should revisit some performance testing with newer gccs

Stan Shebs

Comment 29

•

18 years ago

At the very least we need to do a -Os/-O2 comparison on Macs.

Brendan Eich [:brendan]

Updated

•

18 years ago

Assignee: stanshebs → nobody

Product: Mozilla Application Suite → Core

QA Contact: build-config

Brendan Eich [:brendan]

Comment 30

•

18 years ago

What's the relation to bug 409803 and possibly other bugs (cc'ing sayrer)? I can guess, but it would be great to have our story for 1.9/fx3 sorted out soon, so nominating blocking. /be

Flags: blocking1.9?

Robert Sayre

Comment 31

•

18 years ago

(In reply to comment #30) > What's the relation to bug 409803 and possibly other bugs (cc'ing sayrer)? I > can guess, but it would be great to have our story for 1.9/fx3 sorted out soon, > so nominating blocking. To recap: We build -Os for release builds on linux. We build -O2 for release builds on mac. We build -O1 on msvc (it's somewhere between GCC's -Os and -O2, it does inline etc.) I tried building mac at -Os, and saw a ~5% slowdown on Tdhtml and a 2-3% slowdown on Tp/Tp2. However, the code was quite a bit smaller. To me, that indicates certain parts of the tree are faster at -O2 and others at -Os. For example, we know spidermonkey is better at -Os.

dwitte@gmail.com

Reporter

Comment 32

•

18 years ago

the 5% slowdown could be due (in full or part) to bug 409803 - any data we can get on mac gcc4.0 regarding that would be gold, and might make it easier to figure out module-specific settings. (speculation here, but the bug mostly affects code that makes heavy use of c++ wrappers, e.g. string libs, which might explain why spidermonkey isn't affected?)

Mike Schroepfer

Comment 33

•

18 years ago

+ing so we figure out one way or another

Flags: blocking1.9? → blocking1.9+

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

18 years ago

Status: NEW → RESOLVED

Closed: 18 years ago

Flags: tracking1.9+

Resolution: --- → WORKSFORME

BMO Automation

Updated

•

8 years ago

Product: Core → Firefox Build System