<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 2

•

18 years ago

i ran some tests doubling -finline-limit to 100, which starts to inline bigger functions (3-5 lines or so, so probably a large fraction of constructors/destructors and such). Tp2 results are mixed, some slower and some faster, but overall they're a few percent faster than -Os with -finline-limit=50 and about on par with -O2. codesize is about 500k larger than at 50. so this is a good knob to tune for size/speed tradeoff.

Jeff Walden [:Waldo]

Comment 3

•

18 years ago

__attribute__((always_inline)) in combination with declaring the function inline should do the trick, perhaps behind a macro: #ifdef __GNUC__ #define ALWAYS_INLINE __attribute__((always_inline)) #else #define ALWAYS_INLINE /* nothing */ #endif

Comment 4

•

18 years ago

Sounds a lot like http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/xpcom/string/public/nsUTF8Utils.h&rev=1.17&mark=55-59#55

Boris Zbarsky [:bzbarsky]

Comment 5

•

18 years ago

I would prefer to avoid sprinkling always-inline markers all over our code (MSVC has an equivalent attribute). If we can make the compiler do the right thing automatically via optimization options that would be much better... I'm appalled but not terribly surprised that -Os doesn't do the right thing automatically.

Comment 7

•

18 years ago

Have we filed a gcc bug to make -Os do the right thing?

Assignee

Comment 8

•

18 years ago

not yet (assuming there isn't one already, and/or they haven't fixed it in 4.3). ubuntu compiles at -O2 and thus sidesteps the problem; fedora 8 uses 4.1.2 and compiles at -Os, so if they're prime for a win if they respin their build. since 4.3 is where any fix will be (and caillon informs me redhat is about to stick that into rawhide), i'd like to get some data on that next.

Robert Sayre

Comment 9

•

18 years ago

(In reply to comment #8) > not yet (assuming there isn't one already, and/or they haven't fixed it in > 4.3). ubuntu compiles at -O2 and thus sidesteps the problem; We need to make sure they aren't building -O2 with --enable-optimize="-O2", since that will build spidermonkey with -O2 and make it slower. > fedora 8 uses > 4.1.2 and compiles at -Os, so if they're prime for a win if they respin their > build. > > since 4.3 is where any fix will be (and caillon informs me redhat is about to > stick that into rawhide), i'd like to get some data on that next.

Assignee

Comment 10

•

18 years ago

(In reply to comment #9) > We need to make sure they aren't building -O2 with --enable-optimize="-O2", > since that will build spidermonkey with -O2 and make it slower. they are. they also stick a whole bunch of other options into --enable-optimize, since it's a convenient dumping ground to pass flags into the compiler. what should we be telling them to do instead? patch their desired optimize setting into configure.in?

Assignee

Comment 11

•

18 years ago

i should note that fedora, and probably every other distro, does this too. for instance, my x86_64 fedora 7 install uses: '--enable-optimize=-Os -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables'

Robert Sayre

Comment 12

•

18 years ago

We should ask them what they want to happen. It's unlikely that the entire browser is optimally compiled with the same settings, so we should tell them to change to configure.in or we should change the way --enable-optimize works.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 13

•

18 years ago

Ideally most of those options would just be passed in CFLAGS/CXXFLAGS: CFLAGS="-D_FORTIFY_SOURCE=2 ..." ../configure --options here although why they would pass -fexceptions and such is beyond me... it's just useless code generation.

Comment 14

•

18 years ago

In our case, those are simply the default optimization flags in a standard rpmbuild, though I did s,-O2,-Os, during the gcc3.x days when it actually did good.

Boris Zbarsky [:bzbarsky]

Comment 15

•

18 years ago

I should note that gcc 4.0.2 (what I have over here) certainly seems to be inlining things when compiling with -Os. So it sounds like this is a regression on the gcc end...

Comment 16

•

18 years ago

Have we tried similar knobs on the mac? Are we building with stack-protector on that (looks like it requires XCODE 3.0 so prob not).

Updated

•

18 years ago

Flags: blocking-firefox3+

Priority: -- → P2

Assignee

Comment 17

•

18 years ago

i've tested gcc 4.3 a bit. to summarize, it looks like this pathological -Os behavior is specific to 4.1 branch, and possibly just 4.1.2. also, there are some substantial perf and codesize wins to be had with gcc 4.3. gory details: tested with gcc 4.3 (20080104 pull). "stock configuration" is "-Os -freorder-blocks -fno-reorder-functions". some Tp2 numbers: baseline: gcc 4.3, stock: 142.78 ms stock, with -finline-limit=50: 146.89 ms (+2.9%) -O2: 131.56 ms (-7.9%) for comparison with previous results (comment 0): gcc 4.1.2, stock: 199 ms (+39%) stock, with -finline-limit=50: 149.33 ms (+4.6%) -O2: 142.67 ms (even) |size libxul.so| gcc 4.3, stock: 12,387kb stock, with -finline-limit=50: 12,325kb (-62kb) -O2: 15,061kb (+2,674kb) gcc 4.1.2, stock: 13,249kb (+862kb) stock, with -finline-limit=50: 13,025kb (+638kb) -O2: 15,440kb (+3,053kb) a few points from this data: 1) -Os is very sane on 4.3 by default. 2) on 4.3, relative to -Os, -O2 has improved a lot (8% Tp win, although at a 2.7Mb codesize cost). 3) 4.3 is 5 - 8% faster on Tp2 than 4.1.2, depending on -Os/-O2. 4) 4.3 gives an 400-800k codesize saving over 4.1.2. 3 & 4) are probably the same thing - a result of the hidden visibility propagation improvements introduced in gcc 4.2. these are a major win for us.

Jeff Walden [:Waldo]

Comment 18

•

18 years ago

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 19

•

18 years ago

It's worse because you have to make an optimization decision in the code. Methods that look like they would be good inlines aren't always... the compiler should make that choice. Besides that the closer we stick to standard C++, the more approachable our codebase is.

Jeff Walden [:Waldo]

Comment 20

•

18 years ago

So you test before doing so, and you only use it when tests show it's a win, which you should have been doing anyway. And while approachability is nice, I think it loses to actual perf gains for an obviously-named macro that's simple to understand.

Comment 21

•

18 years ago

(In reply to comment #20) Any given change could be best for one platform but not for another. Or even one _version_ of a platform. Also, would sprinkling more inline keywords around mean that redistributors/embeddors need to test to make sure that whatever code level optimization decisions made are actually right for their supported compiler/hardware combinations? No, I think the compiler's optimizer really should be doing this work.

Mike Shaver (:shaver emeritus)

Comment 22

•

18 years ago

The point of this bug is that the compiler's optimizer _isn't_ doing this work, no? "Should" doesn't improve the quality of code we ship to users, alas.

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 23

•

18 years ago

If our choice is "pick special optimizer options via -finline-limit" or "sprinkle our codebase with FORCE_INLINE" I will pick the former without a second thought... is that not the choice here?

Comment 24

•

18 years ago

Right, I meant that if the optimizer isn't doing the work with the default options we're giving it, we should find better options to let it do better work. I agree with bsmedberg.

Doug Turner (:dougt)

Comment 25

•

18 years ago

also see bug 237041. gcc 3.3. not sure if these options are interesting or not. They were provided by a embedder.

Stan Shebs

Comment 26

•

18 years ago

The inline keyword makes sense if you have a situation where you have knowledge that the compiler can't possibly deduce; for instance, if there is an nsCOMPtr method that expands to larger than the -finline-limit, but we know it's used in some critical inner loops, and so it's a win to inline anyway. But each such keyword use should be accompanied with a long comment explaining why, including perf statistics, and many of those will want to be conditionalized per platform. As an example, on the Mac, received wisdom at Apple is that there is so much memory pressure (from all the frameworks, etc) that the reduced footprint of -Os is a net win, even it means losing out on some inlining opportunities.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 27

•

18 years ago

dwitte - assigning to you...

Assignee: nobody → dwitte

Mike Connor [:mconnor]

Updated

•

17 years ago

Assignee: dwitte → nobody

Flags: blocking-firefox3+

Product: Firefox → Core

QA Contact: build.config → build-config

Target Milestone: --- → mozilla1.9

Version: unspecified → Trunk

Mike Connor [:mconnor]

Comment 28

•

17 years ago

This is really a core bug... This matters on Linux, since the refplatform uses gcc 4.1.1, is it better to bump the compiler version? I don't think this blocks at this point, but will leave to platform drivers to decide if we want to monkey with this.

Assignee: nobody → dwitte

Flags: blocking1.9?

Comment 29

•

17 years ago

If there's a 25% Tp win this definitely blocks.

Comment 30

•

17 years ago

I think since we can easily change this via a compiler setting, we should just change the setting and look at the effect on numbers. That's a lot easier than updating our refplatform. dwitte: can you submit a patch to just put a sensible -finline-limit in fx-linux-tbox's mozconfig? (CFLAGS/CXXFLAGS) That would let us assess the perf win, and we can move it into configure if it's good.

Flags: blocking1.9? → blocking1.9+

Assignee

Comment 31

•

17 years ago

Attached patch tweak tinderbox config (obsolete) — Details — Splinter Review

Attachment #308835 - Flags: review?(ted.mielczarek)

Comment 32

•

17 years ago

Comment on attachment 308835 [details] [diff] [review] tweak tinderbox config CFLAGS/CXXFLAGS are already defined at the top of the file! Just append to those. Let's get this in and see what happens.

Attachment #308835 - Flags: review?(ted.mielczarek) → review+

Assignee

Comment 33

•

17 years ago

test landed on fx-linux-tbox... tried two settings, -finline-limit=50 (conservative) and -finline-limit=100 (more aggressive). the nightly went out with 50. results from qm-plinux-fast01: baseline -finline-limit=50 -finline-limit=100 -------------------------------------------------------------------- Tp 231 190 (-17.7%) 183 (-20.8%) Txul 246 214 (-13.0%) 210 (-14.6%) Ts 1410 1319 (- 6.5%) 1310 (- 7.1%) Tdhtml 982 943 (- 4.0%) 934 (- 4.9%) Tsspider 388 355 (- 8.5%) 351 (- 9.5%) codesize -86k (- 0.6%) +494k (+ 3.5%) so at 50 we're good for a 17.7% Tp win and an 86k codesize drop! question is do we want to do 100 - gets us an extra 3% Tp and a tiny bit on Ts, costs 579k in codesize above the 50 setting. 3% Tp for 4.1% codesize? this one isn't so obvious ;) when we deploy this, changing the optimization settings in configure.in will be slightly different to what we did here (with CFLAGS/CXXFLAGS). we should see what other modules are using -Os and make sure to change them too. (what about NSPR/NSS?)

Comment 34

•

17 years ago

> so at 50 we're good for a 17.7% Tp win and an 86k codesize drop! > > question is do we want to do 100 - gets us an extra 3% Tp and a tiny bit on Ts, > costs 579k in codesize above the 50 setting. 3% Tp for 4.1% codesize? this one > isn't so obvious ;) > Take the easy sure win (50).

Assignee

Comment 35

•

17 years ago

Attached patch tweak defaults (obsolete) — Details — Splinter Review

so, probably want to do this by default, so users & distros have better chance of getting things right out-of-box. this doesn't really hurt anything on newer gcc's, since -finline-limit=50 is about what -Os should be doing anyway. we could maxversion this flag, but there's something to be said for keeping it simple - especially since we need this flag in two other places, xpcom/io and js/src.

Attachment #308835 - Attachment is obsolete: true

Attachment #309169 - Flags: review?(ted.mielczarek)

Comment 36

•

17 years ago

Comment on attachment 309169 [details] [diff] [review] tweak defaults Please back out your changes to the fx-linux-tbox mozconfig when you check this in. No need to leave extra CFLAGS scattered all over the place.

Attachment #309169 - Flags: review?(ted.mielczarek) → review+

Assignee

Comment 37

•

17 years ago

Attached patch updated patch for js/src (obsolete) — Details — Splinter Review

updated for bitrot.

Attachment #309175 - Flags: review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 38

•

17 years ago

checked mac on gcc4.0.1 briefly; -Os seems sane there, and we mostly use -O2 on mac, so nothing to do there.

Comment 39

•

17 years ago

Any chance we can get this in today? I think that the config flag change should ship with b5.

Robert Sayre

Comment 40

•

17 years ago

Comment on attachment 309175 [details] [diff] [review] updated patch for js/src I gave this r- over IRC with dwitte. We shouldn't set -finline-limit on anything other than GCC 4.1.x.

Attachment #309175 - Flags: review-

Assignee

Comment 41

•

17 years ago

doing so will require more changes to configure.in to sniff the GCC version. i don't know if i can get to this today, but i'll try.

Comment 42

•

17 years ago

(In reply to comment #39) > Any chance we can get this in today? I think that the config flag change > should ship with b5. Vlad: we already enabled this on the tinderbox, so our nightly builds should be fine. This follow-on patch is just about making it the default in configure.

Jesse Ruderman

Updated

•

17 years ago

Summary: gcc zealously avoids inlining at -Os → gcc (4.1 only) zealously avoids inlining at -Os

Assignee

Comment 43

•

17 years ago

this may also apply to gcc 4.0 and 4.2. can anyone with those compilers, on linux, test?

Assignee

Comment 44

•

17 years ago

Attached patch tweak defaults v2 — Details — Splinter Review

a stab at version checking. not sure how you'd prefer it done, so i kept it pretty centralized and simple. alternatively, could do a MOZ_OPTIMIZE_SIZE that rolls up "-Os -finline-limit=50"...

Attachment #309169 - Attachment is obsolete: true

Attachment #309175 - Attachment is obsolete: true

Attachment #309635 - Flags: review?(ted.mielczarek)

Comment 45

•

17 years ago

Comment on attachment 309635 [details] [diff] [review] tweak defaults v2 Looks reasonable. I don't think you need the bug number, isn't that what blame is for? (But I also don't care enough to make you change it if you really want it there.)

Attachment #309635 - Flags: review?(ted.mielczarek) → review+

Assignee

Comment 46

•

17 years ago

landed (minus bug#), and reverted tbox mozconfig flags. looks like the new patch dropped Ts another 1%, but that could be artifactual. -> fixed, nothing to do on mac, and we don't have evidence that gcc4.2 needs this.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Assignee

Comment 47

•

17 years ago

i tested gcc 4.2.3 and found it suffered exactly the same problem, and that -finline-limit=50 solved it in an identical way to 4.1.x. i've added 4.2.x to the case statement in configure.in.

Assignee

Updated

•

17 years ago

Summary: gcc (4.1 only) zealously avoids inlining at -Os → gcc 4.1/4.2 zealously avoid inlining at -Os

Robert Kaiser

Comment 48

•

17 years ago

The gcc version check introduced here has caused bug 423261, btw, only checking for "version" in the line is not ideal.

Depends on: 423261

jag (Peter Annema)

Comment 49

•

17 years ago

So would any of -finline-limit=60, ...=70, ...=80, ...=90 (or maybe even ...=57) find us a place where the code size difference is 0 and the performance gains are up, or where performance is roughly the same (or better!) and we save even more on code size? Or does the limit only take multiples of 50?