Last Comment Bug 590181 - Use smarter optimization flags on Linux
: Use smarter optimization flags on Linux
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: unspecified
: x86 Linux
: -- normal with 4 votes (vote)
: mozilla6
Assigned To: Mike Hommey [:glandium]
:
:
Mentors:
: 594116 (view as bug list)
Depends on: 594234 600474 641842 654049 654858 655003
Blocks: gcc4.5
  Show dependency treegraph
 
Reported: 2010-08-24 09:07 PDT by Robert Sayre
Modified: 2011-05-24 17:56 PDT (History)
48 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
Patch v1 (2.41 KB, patch)
2010-09-07 15:41 PDT, Justin Lebar (not reading bugmail)
ted: review+
Details | Diff | Splinter Review
Part 2 - Enable -O3 (6.30 KB, patch)
2010-09-10 11:29 PDT, Justin Lebar (not reading bugmail)
ted: review+
Details | Diff | Splinter Review
Part 1a - Test fixes (3.20 KB, patch)
2010-09-16 17:24 PDT, Justin Lebar (not reading bugmail)
dbaron: review+
Details | Diff | Splinter Review
Part 1b - More test fixes (4.04 KB, patch)
2010-10-01 14:49 PDT, Justin Lebar (not reading bugmail)
dbaron: review-
Details | Diff | Splinter Review
Part 1b - More test fixes (v2) (4.04 KB, patch)
2010-10-01 15:35 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 1b - More test fixes (v3) (4.05 KB, patch)
2010-10-01 16:33 PDT, Justin Lebar (not reading bugmail)
dbaron: review+
Details | Diff | Splinter Review
Part 1 - Test fixes (v4) (7.28 KB, patch)
2010-11-09 14:43 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 2 - Enable -O3 (hg exported) (5.56 KB, patch)
2010-11-09 14:44 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Part 1 - Test fixes (v5) (9.00 KB, patch)
2010-11-17 00:51 PST, Justin Lebar (not reading bugmail)
dbaron: review+
Details | Diff | Splinter Review
Interdiff (part 1, v4 -> v5) (3.16 KB, text/plain)
2010-11-17 00:53 PST, Justin Lebar (not reading bugmail)
no flags Details
Part 1 - Test fixes (v6) (7.62 KB, patch)
2010-11-30 18:31 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
reftest failure log (163.83 KB, application/x-gzip)
2011-03-16 15:59 PDT, Mike Hommey [:glandium]
no flags Details
Additional test fix (1.10 KB, patch)
2011-03-17 05:01 PDT, Mike Hommey [:glandium]
dbaron: review+
Details | Diff | Splinter Review
part 1 - Fix tests to avoid rounding errors. Original patch from jlebar. (8.56 KB, patch)
2011-03-23 04:37 PDT, Mike Hommey [:glandium]
mh+mozilla: review+
Details | Diff | Splinter Review
part 2 - Switch default gcc optimize options to -O3 (2.73 KB, patch)
2011-03-23 04:38 PDT, Mike Hommey [:glandium]
mh+mozilla: review+
Details | Diff | Splinter Review

Description Robert Sayre 2010-08-24 09:07:42 PDT
As noted in bug 492688, Linux should build with -O3. We're switching gcc versions right now, so it's probably a bad time to experiment.
Comment 1 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-08-24 09:09:07 PDT
Do we have numbers?  The only numbers I see in that bug are for Mac.
Comment 2 (dormant account) 2010-08-25 10:59:19 PDT
(In reply to comment #1)
> Do we have numbers?  The only numbers I see in that bug are for Mac.

O2 has a huge win for me on Linux too. I'd be happy to switch to either O2/O3. The only hurt comes from startup(cos paging in huge binaries is inefficient as hell, but that just means I need to deploy icegrind sooner)

Bloat effects of high -O-levels are offset by PGO compiling cold code in -Os manner. Unfortunately 4.5 is still a bit too fail-prone for our purposes, switching to 4.4 might be safer(to get PGO).
Comment 3 Justin Lebar (not reading bugmail) 2010-09-07 13:07:54 PDT
*** Bug 594116 has been marked as a duplicate of this bug. ***
Comment 4 Justin Lebar (not reading bugmail) 2010-09-07 13:11:13 PDT
Taras, do you have numbers offhand for how much of a startup hit -O3 is on Linux?

Perhaps there's a middle ground in terms of flags if we can't use -O3.
Comment 5 Ted Mielczarek [:ted.mielczarek] 2010-09-07 13:16:38 PDT
Our previous -Os -freorder... flags were a compromise between -Os and -O2 for older GCC versions (I think 4.2 at the time).
Comment 6 (dormant account) 2010-09-07 13:27:30 PDT
(In reply to comment #4)
> Taras, do you have numbers offhand for how much of a startup hit -O3 is on
> Linux?
> 
> Perhaps there's a middle ground in terms of flags if we can't use -O3.

I think we should just take the startup hit on Linux. The hit is proportional to size difference in libxul. The Chrome binary is already a few times bigger than ours, we are in good shape in that regard.
The hit is due to lack optimizations in kernel + linker. We can address those later.
Comment 7 Justin Lebar (not reading bugmail) 2010-09-07 14:25:37 PDT
Wow, -O3 is  a lot faster than -Os -falign-the-world.  I'm seeing a 2x improvement in some parts of my benchmark.
Comment 8 Justin Lebar (not reading bugmail) 2010-09-07 15:41:52 PDT
Created attachment 472808 [details] [diff] [review]
Patch v1

Patch.  Will push to try.

This will only apply cleanly on top of my --enable-profiling patch (bug 592923), but you get the idea.
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2010-09-07 17:04:17 PDT
-O3 will pretty much win by definition on microbenchmarks, as long as they're micro enough.

The question is what whole-system perf looks like.
Comment 10 Justin Lebar (not reading bugmail) 2010-09-07 17:08:22 PDT
(In reply to comment #9)
> -O3 will pretty much win by definition on microbenchmarks, as long as they're
> micro enough.

Sure.  But my microbenchmark is the hot path for an awesomebar search.  :)

I'll post talos results once they're up.
Comment 11 Justin Lebar (not reading bugmail) 2010-09-08 14:11:11 PDT
Talos results from Linux-64 are up.  Linux-32 should be up in an hour or so; there was an ifra failure on my last push.

http://bit.ly/b1fqlT

This is good for a 9% improvement on Drameo DOM, but is an 8% TS regression.
Comment 12 (dormant account) 2010-09-08 14:21:28 PDT
It's weird to see a regression on Ts. Try with -O2. I was only expecting Ts_cold to regress.
Comment 13 Justin Lebar (not reading bugmail) 2010-09-08 14:24:43 PDT
Sorry, yes.  ts_cold regressed by 8%, but ts improved by 1.6%.
Comment 14 Justin Lebar (not reading bugmail) 2010-09-08 16:21:49 PDT
Linux 32 results are up.  Same link as above.

11% ts_cold regression, 4% ts improvement, 13.5% Dromaeo DOM improvement.
Comment 15 Boris Zbarsky [:bz] (still a bit busy) 2010-09-08 17:39:27 PDT
That sounds like the Ts_cold effects are as expected, as are the microbenchmark results.  The Ts win is nice.
Comment 16 Justin Lebar (not reading bugmail) 2010-09-09 22:50:53 PDT
So do we want to take this?
Comment 17 (dormant account) 2010-09-10 10:21:57 PDT
(In reply to comment #16)
> So do we want to take this?

yes
Comment 18 Ted Mielczarek [:ted.mielczarek] 2010-09-10 10:24:31 PDT
Comment on attachment 472808 [details] [diff] [review]
Patch v1

># HG changeset patch
># Parent 49001b53c9888896888c2b6357a610a441e257fe
>
>diff --git a/configure.in b/configure.in
>--- a/configure.in
>+++ b/configure.in
>@@ -2210,23 +2210,25 @@ ia64*-hpux*)
>         # while; Intel recommends against using it.
>         MOZ_OPTIMIZE_FLAGS="-O2"
>         MOZ_DEBUG_FLAGS="-g"
>     elif test "$GNU_CC" -o "$GNU_CXX"; then
>         GCC_VERSION=`$CC -v 2>&1 | awk '/^gcc version/ { print $3 }'`
>         case $GCC_VERSION in
>         4.1.*|4.2.*|4.5.*)
>             # -Os is broken on gcc 4.1.x 4.2.x, 4.5.x we need to tweak it to get good results.
>+            # XXX is this still needed?
>             MOZ_OPTIMIZE_SIZE_TWEAK="-finline-limit=50"

Without evidence to the contrary I guess this stays. Feel free to disprove! (But the comment is unnecessary.)
Comment 19 (dormant account) 2010-09-10 10:29:52 PDT
Comment on attachment 472808 [details] [diff] [review]
Patch v1

>         case $GCC_VERSION in
>         4.1.*|4.2.*|4.5.*)
>             # -Os is broken on gcc 4.1.x 4.2.x, 4.5.x we need to tweak it to get good results.
>+            # XXX is this still needed?

I don't think so. Old compilers can barely compile moz as is. 4.5 doesn't seem to be helped enough. Since we are switching away from -Os anyway, lets drop this.

>             MOZ_OPTIMIZE_SIZE_TWEAK="-finline-limit=50"
>         esac
>         # If we're building with --enable-profiling, we need a frame pointer.
>+        MOZ_OPTIMIZE_FLAGS="-O3 $MOZ_OPTIMIZE_SIZE_TWEAK"
>         if test -z "$MOZ_PROFILING"; then
>-            MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
>+            MOZ_OPTIMIZE_FLAGS+="-fomit-frame-pointer"
>         else
>-            MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fno-omit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
>+            MOZ_OPTIMIZE_FLAGS+="-fno-omit-frame-pointer"

Was -Os intentionally not replaced with -O3 here? -freorder-blocks is useful for pgo, don't drop that.
Comment 20 (dormant account) 2010-09-10 10:31:25 PDT
Also, I think -Os might still be a win on arm for us. We'll have to see if regresses mobile :)
Comment 21 Justin Lebar (not reading bugmail) 2010-09-10 10:37:13 PDT
I'm confused.  Are you sure you're reading this diff correctly?  The patched version doesn't have -Os:

>             MOZ_OPTIMIZE_SIZE_TWEAK="-finline-limit=50"
>         esac
>         # If we're building with --enable-profiling, we need a frame pointer.
>         MOZ_OPTIMIZE_FLAGS="-O3 $MOZ_OPTIMIZE_SIZE_TWEAK"
>         if test -z "$MOZ_PROFILING"; then
>             MOZ_OPTIMIZE_FLAGS+="-fomit-frame-pointer"
>         else
>             MOZ_OPTIMIZE_FLAGS+="-fno-omit-frame-pointer"

> Also, I think -Os might still be a win on arm for us. We'll have to see if
> regresses mobile :)

I left the Android flags as-is -- I think these are just for desktop Linux.  That said, I'd be Very Happy to switch all our -Os flags to -O3.
Comment 22 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-09-10 10:58:14 PDT
We should loop the android devs in before we do that, the cost of increasing code size might be higher there.
Comment 23 Ted Mielczarek [:ted.mielczarek] 2010-09-10 11:01:54 PDT
When you say "Desktop Linux" you probably mean "Desktop Linux and Maemo".
Comment 24 Justin Lebar (not reading bugmail) 2010-09-10 11:15:31 PDT
(In reply to comment #23)
> When you say "Desktop Linux" you probably mean "Desktop Linux and Maemo".

On IRC, stuart and stechz say they think this is worth a try on Maemo.  Should be easy enough to disable if it breaks anything.
Comment 25 Justin Lebar (not reading bugmail) 2010-09-10 11:29:16 PDT
Created attachment 474135 [details] [diff] [review]
Part 2 - Enable -O3

Rebasing onto top (as opposed to my --enable-profiling patch), and removing MOZ_OPTIMIZE_SIZE_TWEAK.

Also making the same change to js/src/configure.in.  It's not so important there since they override these flags in js/src/Makefile.in.

Ted, since I changed just about every line in this patch, would you mind having another look?
Comment 26 (dormant account) 2010-09-10 11:32:38 PDT
Comment on attachment 474135 [details] [diff] [review]
Part 2 - Enable -O3

>-        MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
>+        MOZ_OPTIMIZE_FLAGS="-O3 -fomit-frame-pointer"

Keep -freorder-blocks, it's helpful for pgo.
Comment 27 Justin Lebar (not reading bugmail) 2010-09-10 11:33:58 PDT
(In reply to comment #26)
> Comment on attachment 474135 [details] [diff] [review]
> Patch v2
> 
> >-        MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fomit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
> >+        MOZ_OPTIMIZE_FLAGS="-O3 -fomit-frame-pointer"
> 
> Keep -freorder-blocks, it's helpful for pgo.

-O2 and -O3 imply -freorder-blocks.

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Comment 28 Ted Mielczarek [:ted.mielczarek] 2010-09-10 11:34:18 PDT
Comment on attachment 474135 [details] [diff] [review]
Part 2 - Enable -O3

r+ if you make taras' requested change.
Comment 29 (dormant account) 2010-09-10 11:35:00 PDT
> -O2 and -O3 imply -freorder-blocks.

great.
Comment 30 Mike Hommey [:glandium] 2010-09-10 11:38:03 PDT
(In reply to comment #27)
> > Keep -freorder-blocks, it's helpful for pgo.
> 
> -O2 and -O3 imply -freorder-blocks.
> 
> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Wouldn't it be worth adding a comment about -freorder-blocks for the case -Os gets better and reinstated in the future?
Comment 31 Justin Lebar (not reading bugmail) 2010-09-10 13:52:55 PDT
Landed, turned two Linux-32 boxes orange, bug 578880, backed out.

Mobile people: You'll get your Talos runs for free.  :)
Comment 32 Justin Lebar (not reading bugmail) 2010-09-10 14:34:37 PDT
This also caused some apparently-legitimate oranges (that either didn't happen on try or went unnoticed);

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1284151393.1284152081.21293.gz

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1284151393.1284152080.21226.gz

...guess there's more work to be done.
Comment 33 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-09-10 15:25:17 PDT
57622 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | interpolation of transitions: none to rotate(90deg) translate(20%, 20%) rotate(-90deg) - got "matrix(1, 0, 1.28002e-8, 1, 12.9px, 8.05px)", expected "matrix(1, 0, 1.05316e-8, 1, 12.9px, 8.05px)"
57623 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | interpolation of transitions: none to rotate(-90deg) translate(20%, 20%) rotate(90deg) - got "matrix(1, 0, -1.28002e-8, 1, 14.8167px, -3.43333px)", expected "matrix(1, 0, -1.05316e-8, 1, 14.8167px, -3.43333px)"

REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/mozilla-central_fedora_test-reftest/build/reftest/tests/layout/reftests/css-calc/border-radius-1.html |
Comment 34 Justin Lebar (not reading bugmail) 2010-09-10 16:48:57 PDT
dbaron, this looks like rounding error to me.  From the test's source, it looks like these tests are set up to require exact matches.  Do you think that's right?
Comment 35 Justin Lebar (not reading bugmail) 2010-09-10 16:53:49 PDT
It is interesting that -O3 affects how we do math, though.  Maybe something is getting auto-vectorized.
Comment 36 Justin Lebar (not reading bugmail) 2010-09-10 17:58:39 PDT
All the differences in the rounded corners test appear to be off-by-one.  Looks like rounding, at least in this case.
Comment 37 Justin Lebar (not reading bugmail) 2010-09-13 13:41:33 PDT
I just pushed to try with -O2.  We'll see how that goes.
Comment 38 Justin Lebar (not reading bugmail) 2010-09-13 15:55:16 PDT
Interesting.  I get the same interpolation error (comment 33) with -O2.  No word yet on the border-radius reftest.
Comment 39 Justin Lebar (not reading bugmail) 2010-09-13 17:00:23 PDT
Linux x64 builds still fail make-check (bug 578880) with -O2.  I thought this was intermittent, but it's apparently not.
Comment 40 dwitte@gmail.com 2010-09-14 10:33:35 PDT
Comment on attachment 474135 [details] [diff] [review]
Part 2 - Enable -O3

>diff --git a/js/src/Makefile.in b/js/src/Makefile.in

>-MODULE_OPTIMIZE_FLAGS = -O3 -fstrict-aliasing -fomit-frame-pointer $(MOZ_OPTIMIZE_SIZE_TWEAK)
>+MODULE_OPTIMIZE_FLAGS = -O3 -fstrict-aliasing -fomit-frame-pointer
> # Special optimization flags for jsinterp.c
> INTERP_OPTIMIZER = -O3 -fstrict-aliasing

I think we want INTERP_OPTIMIZER to be the same as MODULE_OPTIMIZE_FLAGS now. Looking at blame, it was originally added to override -Os with -O3, which is now unnecessary. And -fomit-frame-pointer was never added to it, which is bad. I'll have numbers in a bit.

Regardless, we definitely want to remove $(MOZ_OPTIMIZE_SIZE_TWEAK) from the js/src flags, because that only made sense with -Os! If this patch doesn't land again soon, we should split out that change and land it separately. (Getting numbers here too.)

>diff --git a/xpcom/io/Makefile.in b/xpcom/io/Makefile.in

> # work around bug 408258
> ifdef GNU_CC 
> ifneq ($(OS_ARCH), Darwin)
>-MODULE_OPTIMIZE_FLAGS = -Os -fno-strict-aliasing $(MOZ_OPTIMIZE_SIZE_TWEAK)
>+MODULE_OPTIMIZE_FLAGS = -Os -fno-strict-aliasing

I think you want -O3 and -fomit-frame-pointer here too... based on blame, the relevant bit here is -fno-strict-aliasing.
Comment 41 Justin Lebar (not reading bugmail) 2010-09-14 10:51:35 PDT
My patch in bug 593116 fixes the js/src/Makefile.in issues.

I was going to switch xpcom/io in a followup bug, but I'm happy to do it here.

We have -fno-strict-aliasing on globally (bug 414641) so we could just kill xpcom/io's special flags.
Comment 42 Justin Lebar (not reading bugmail) 2010-09-14 11:27:55 PDT
Since this bug is currently blocked (test failures aside, we're still waiting on bug 578880), I'll see if I can make some progress on bug 593116.
Comment 43 dwitte@gmail.com 2010-09-14 11:35:32 PDT
(In reply to comment #41)
> My patch in bug 593116 fixes the js/src/Makefile.in issues.

Cool.

> I was going to switch xpcom/io in a followup bug, but I'm happy to do it here.

Yeah, might as well do it here. Less work. :)

> We have -fno-strict-aliasing on globally (bug 414641) so we could just kill
> xpcom/io's special flags.

Hmm. Your patch there doesn't touch xpcom/io, so if we kill the override, we'll need a fix there. Anyway, since your patch there makes aliasing warnings errors, we'll be forced to fix it, so killing the special flag seems fine to me.

dvander kindly ran perf tests for removing $(MOZ_OPTIMIZE_SIZE_TWEAK) from the js/src flags (with the new -O3). It was neutral, so no harm removing it. Removing INTERP_OPTIMIZER (i.e. adding -fomit-frame-pointer) *might* have won 1-2% on V8. So we should definitely do that.
Comment 44 Justin Lebar (not reading bugmail) 2010-09-14 12:53:11 PDT
If http://bit.ly/aXat4C is to be believed, -O2 is significantly slower than -O3, pretty much across the board.  Looks like -O2 isn't even a clear ts_cold win.
Comment 45 Justin Lebar (not reading bugmail) 2010-09-14 13:13:00 PDT
Hmmm...I can't reproduce the border-radius failure using -O3 on Linux x64 with GCC 4.4.  I'll try with gcc 4.3 now.
Comment 46 dwitte@gmail.com 2010-09-14 13:21:21 PDT
If you confirm you see it on 4.3 but not 4.5, we can just block this on 4.5 deployment. :)
Comment 47 Justin Lebar (not reading bugmail) 2010-09-14 13:46:43 PDT
I can't reproduce with gcc 4.3 either.  Maybe this is a 32- vs 64-bit issue.  I'll try that now.
Comment 48 Justin Lebar (not reading bugmail) 2010-09-14 15:39:57 PDT
My own 32-bit builds with gcc 4.3 succeed, but I've verified that tryserver's build fails when I run it myself.
Comment 49 Justin Lebar (not reading bugmail) 2010-09-14 15:48:43 PDT
I'm using gcc 4.3.4 locally, and the build machines are using 4.3.3.  I'm going to see if I can dig up a 4.3.3 binary to see if that's the problem.
Comment 50 (dormant account) 2010-09-14 16:07:41 PDT
By the way. You can also try setting CXX to CXX="/tools/gcc-4.5/bin/g++ -static-libstdc++" and CC to /tools/gcc-4.5/bin/gcc to test 4.5.1
Comment 51 Justin Lebar (not reading bugmail) 2010-09-14 17:16:48 PDT
Anyone have gcc 4.3.3 on Linux x86-32 handy?  I'm trying to spin a VM, but Ubuntu is being difficult.
Comment 52 Ted Mielczarek [:ted.mielczarek] 2010-09-14 17:51:28 PDT
Here's how it got installed on the build slaves:
https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0#GCC_4.3.3
Comment 53 Justin Lebar (not reading bugmail) 2010-09-15 16:18:18 PDT
Just finished my build with 4.3.3; I don't get an error on the border-radius reftest.

jlebar@jaunty-vm:~/code/moz/ff-1$ c++ --version
c++ (Ubuntu 4.3.3-5ubuntu4) 4.3.3

I can try following Ted's steps and get the exact GCC they have on the build machines.  It might actually be easier (and more informative) to update the GCC on one of the builds machines, though.
Comment 54 Justin Lebar (not reading bugmail) 2010-09-15 23:12:30 PDT
Seeing the same issue with GCC 4.5.

Something very strange is going on here.  By far the simplest explanation is that I'm doing something wrong, and that things are not behaving as I think they are.  I'm going to double-check everything in the morning.
Comment 55 Justin Lebar (not reading bugmail) 2010-09-16 17:24:32 PDT
Created attachment 476113 [details] [diff] [review]
Part 1a - Test fixes

IRL dbaron and I decided to just accept the floating point differences we were seeing and modify the tests that broke.

This patch fixes the build I got from try when I run it locally.  I'll push to try.

We're still blocked on bug 578880.
Comment 56 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-09-16 18:55:33 PDT
Comment on attachment 476113 [details] [diff] [review]
Part 1a - Test fixes

r=dbaron
Comment 57 Justin Lebar (not reading bugmail) 2010-09-16 23:52:14 PDT
This looks good on try, modulo bug 578880.
Comment 58 Justin Lebar (not reading bugmail) 2010-09-28 17:44:05 PDT
I get more test failures with gcc 4.5 plus -O3.  On Linux-32 (not -64):

61343 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property border-bottom-left-radius: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61361 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property border-bottom-right-radius: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61379 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property border-top-left-radius: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61397 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property border-top-right-radius: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61456 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property -moz-outline-radius-bottomleft: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61474 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property -moz-outline-radius-bottomright: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61492 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property -moz-outline-radius-topleft: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"
61510 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/style/test/test_transitions_per_property.html | radius-valued property -moz-outline-radius-topright: interpolation of radius with mixed units - got "21.9833px 22.9833px", expected "22px 23px"

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1285714922.1285715549.1120.gz

I wonder why specifically transitions and border radius are sensitive to -O3.
Comment 59 (dormant account) 2010-09-28 17:48:17 PDT
this sucks, are fails still there in -O2?

Would be nice to figure out what's going on this week. How is talos data?
Comment 60 Mike Hommey [:glandium] 2010-09-29 00:09:53 PDT
> got "21.9833px 22.9833px", expected "22px 23px"

sounds like accumulated rounding errors.
Comment 61 Justin Lebar (not reading bugmail) 2010-09-29 00:23:09 PDT
Right.  The unanswered question is, why does -O2/-O3 cause this, while -Os doesn't?   It would be good to understand the code that's responsible for computing these numbers and to understand why whatever gcc is doing is a legal optimization.
Comment 62 Mike Hommey [:glandium] 2010-09-29 02:51:27 PDT
Interestingly, it doesn't fail on my machine (and yes, I tried on x86 (not -64) with the patch applied)
Comment 63 Mike Hommey [:glandium] 2010-09-29 02:55:22 PDT
(In reply to comment #62)
> Interestingly, it doesn't fail on my machine (and yes, I tried on x86 (not -64)
> with the patch applied)

I should mention I tried with debian's latest gcc-4.5 (svn r164618)
Comment 64 Mike Hommey [:glandium] 2010-09-29 03:44:12 PDT
(In reply to comment #62)
> Interestingly, it doesn't fail on my machine (and yes, I tried on x86 (not -64)
> with the patch applied)

/me digs a hole and hides. I was not in the right chroot... unfortunately i get "too much recursion" when starting the httpd.js server.
Comment 65 Justin Lebar (not reading bugmail) 2010-09-29 09:40:21 PDT
(In reply to comment #63)
> > Interestingly, it doesn't fail on my machine (and yes, I tried on x86 (not -64)
> > with the patch applied)

This is (or, I guess, would be, since you didn't actually run it) consistent with what we were seeing with the failures caused by gcc-4.3: Locally-compiled builds worked fine, but the builds from try (when run locally) would fail.  Weird, I know.
Comment 66 Justin Lebar (not reading bugmail) 2010-09-29 09:42:51 PDT
Hm...dbaron
Comment 67 Justin Lebar (not reading bugmail) 2010-09-29 09:45:46 PDT
... dbaron, is the test in comment 58 one where roundoff error is acceptable?  There's no fudge factor built into that code, but I could add one (or try switching to numbers that are powers of two).
Comment 68 Justin Lebar (not reading bugmail) 2010-09-29 11:13:17 PDT
Talos results: http://bit.ly/92akzb

Still waiting for full 32-bit results.  On 64-bit, see the expected Dromaeo improvements (7% Dromaeo DOM, 4.5% Dromaeo CSS.  It's also good for a small sunspider win (1.5% on 32-bit).

Interestingly, I *don't* see a ts_cold regression on either 32- or 64-bit.
Comment 69 Justin Lebar (not reading bugmail) 2010-09-29 11:17:39 PDT
Whoa.  32-bit Dromaeo results:

17% DOM improvement,
9% CSS improvement,
7% jslib improvement.
Comment 70 (dormant account) 2010-09-29 11:34:44 PDT
(In reply to comment #69)
> Whoa.  32-bit Dromaeo results:
> 
> 17% DOM improvement,
> 9% CSS improvement,
> 7% jslib improvement.

I think these numbers meant to go into bug 559964 :)

Should be even better with pgo.
Comment 71 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-09-29 14:44:06 PDT
(In reply to comment #67)
> ... dbaron, is the test in comment 58 one where roundoff error is acceptable? 
> There's no fudge factor built into that code, but I could add one (or try
> switching to numbers that are powers of two).

Maybe try switching the width and height set in test_radius_transition from 200px to 256px?
Comment 72 (dormant account) 2010-10-01 12:05:40 PDT
(In reply to comment #71)
> (In reply to comment #67)
> > ... dbaron, is the test in comment 58 one where roundoff error is acceptable? 
> > There's no fudge factor built into that code, but I could add one (or try
> > switching to numbers that are powers of two).
> 
> Maybe try switching the width and height set in test_radius_transition from
> 200px to 256px?

Justin, do you have time to make a patch for that?
Comment 73 Justin Lebar (not reading bugmail) 2010-10-01 14:47:33 PDT
I pushed a patch this morning to try, and it looks good on Linux x86-32.  I'll attach here in a moment.
Comment 74 Justin Lebar (not reading bugmail) 2010-10-01 14:49:20 PDT
Created attachment 480255 [details] [diff] [review]
Part 1b - More test fixes
Comment 75 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-10-01 15:20:31 PDT
Comment on attachment 480255 [details] [diff] [review]
Part 1b - More test fixes

The middle case (of the three that you modified), you changed such that there's no animation for the y component.  Could you fix that so there's still animation for both components?  Otherwise this looks fine.
Comment 76 Justin Lebar (not reading bugmail) 2010-10-01 15:35:16 PDT
Created attachment 480270 [details] [diff] [review]
Part 1b - More test fixes (v2)

Like this?  Interdiff (v1 -> v2):

diff --git a/layout/style/test/test_transitions_per_property.html b/layout/style/test/test_transitions_per_property.
--- a/layout/style/test/test_transitions_per_property.html
+++ b/layout/style/test/test_transitions_per_property.html
@@ -530,30 +530,30 @@ function test_radius_transition(prop) {
   div.style.setProperty(prop, "3px 8px", "");
   is(cs.getPropertyValue(prop), "3px 8px",
      "radius-valued property " + prop + ": computed value before transition");
   div.style.setProperty("-moz-transition-property", prop, "");
   div.style.setProperty(prop, "15px 12px", "");
   is(cs.getPropertyValue(prop), "6px 9px",
      "radius-valued property " + prop + ": interpolation of radius");
   div.style.setProperty("-moz-transition-property", "none", "");
-  div.style.setProperty(prop, "12.5% 25%", "");
-  is(cs.getPropertyValue(prop), "32px 64px",
+  div.style.setProperty(prop, "12.5% 6.25%", "");
+  is(cs.getPropertyValue(prop), "32px 16px",
      "radius-valued property " + prop + ": computed value before transition");
   div.style.setProperty("-moz-transition-property", prop, "");
   div.style.setProperty(prop, "25%", "");
-  is(cs.getPropertyValue(prop), "40px 64px",
+  is(cs.getPropertyValue(prop), "40px 28px",
      "radius-valued property " + prop + ": interpolation of radius");
   div.style.setProperty("-moz-transition-property", "none", "");
   div.style.setProperty(prop, "6.25% 12.5%", "");
   is(cs.getPropertyValue(prop), "16px 32px",
      "radius-valued property " + prop + ": computed value before transition");
   div.style.setProperty("-moz-transition-property", prop, "");
-  div.style.setProperty(prop, "64px 32px", "");
-  is(cs.getPropertyValue(prop), "28px 32px",
+  div.style.setProperty(prop, "64px 16px", "");
+  is(cs.getPropertyValue(prop), "28px 28px",
      "radius-valued property " + prop + ": interpolation of radius with mixed units");
 
   test_length_percent_calc_transition(prop);
 
   div.style.removeProperty("width");
   div.style.removeProperty("height");
   div.style.removeProperty("border");
   div.style.removeProperty("padding");
Comment 77 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-10-01 16:18:19 PDT
I think you reattached the old patch, but the interdiff looks like it fixes the problem, except you might have introduced a new one in that the very last line should now expect "28px" instead of "28px 28px".
Comment 78 Justin Lebar (not reading bugmail) 2010-10-01 16:30:42 PDT
Ah, oops -- I qref'ed the wrong patch.

Interestingly, "28px 28px" seems to work.
Comment 79 Justin Lebar (not reading bugmail) 2010-10-01 16:33:00 PDT
Created attachment 480297 [details] [diff] [review]
Part 1b - More test fixes (v3)
Comment 80 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-10-01 16:39:29 PDT
Comment on attachment 480297 [details] [diff] [review]
Part 1b - More test fixes (v3)

"28px 28px" shouldn't work in my tree (post bug 595650 and bug 595651); I guess we'll see who lands first.
Comment 81 Justin Lebar (not reading bugmail) 2010-10-01 20:54:37 PDT
The new test is green on try, so this should be good to land, unless dbaron gets there first.  :)
Comment 82 Dão Gottwald [:dao] 2010-10-24 02:26:39 PDT
The test fixes fail to apply in test_transitions_per_property.html.
Comment 83 Justin Lebar (not reading bugmail) 2010-10-25 14:16:28 PDT
(In reply to comment #82)
> The test fixes fail to apply in test_transitions_per_property.html.

I'll rebase once we're ready to check in.
Comment 84 Justin Lebar (not reading bugmail) 2010-11-09 14:43:08 PST
Created attachment 489301 [details] [diff] [review]
Part 1 - Test fixes (v4)

Test fixes, rolled up into one patch.

dbaron, can you please review the change I made to the one check_distance() call?  Just that one line is the only substantial difference between this patch and the ones it obsoletes.

I've pushed to try: http://hg.mozilla.org/try/rev/2384e33af4a0
Comment 85 Justin Lebar (not reading bugmail) 2010-11-09 14:44:24 PST
Created attachment 489302 [details] [diff] [review]
Part 2 - Enable -O3 (hg exported)

Same as previous part 2 patch, just now with a checkin comment.
Comment 86 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-09 14:54:52 PST
Comment on attachment 489301 [details] [diff] [review]
Part 1 - Test fixes (v4)

I think you're probably better off keeping the check_distance calls in sync with the values just above that they're testing.
Comment 87 Justin Lebar (not reading bugmail) 2010-11-09 15:02:42 PST
I admit to not fully understanding this test, but I don't see what the 10% in the original call corresponds to.  Is it (25-5)/4 + 5?  That's how I got 15.625%.
Comment 88 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-09 15:05:43 PST
check_distance expects three arguments:
 * the start of an interval
 * the value 1/4 of the way from the start to the end.  (1/2 is bad since then the test wouldn't catch bugs where start and end got swapped)
 * the end of the interval
Comment 89 Justin Lebar (not reading bugmail) 2010-11-09 15:09:54 PST
Oh, you mean I should change the other calls to check_interval?  That I can do.
Comment 90 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-09 15:15:01 PST
No, just change the numbers in both places where you're changing them in one.
Comment 91 Justin Lebar (not reading bugmail) 2010-11-09 21:40:53 PST
> No, just change the numbers in both places where you're changing them in one.

I think we may be talking past each other.  Do you mean that, because I made this change:

-  div.style.setProperty(prop, "5% 15%", "");
-  is(cs.getPropertyValue(prop), "10px 30px",
+  div.style.setProperty(prop, "12.5% 6.25%", "");
+  is(cs.getPropertyValue(prop), "32px 16px",
      "radius-valued property " + prop + ": computed value before transition");
   div.style.setProperty("-moz-transition-property", prop, "");
   div.style.setProperty(prop, "25%", "");
-  is(cs.getPropertyValue(prop), "20px 35px",
+  is(cs.getPropertyValue(prop), "40px 28px",
      "radius-valued property " + prop + ": interpolation of radius");
   check_distance(prop, "5% 15%", "10% 17.5%", "25%");

I should change the arguments to the check_distance call above to ("12.5% 6.25%", "...", "25%"), where "..." meets the criteria from comment 88?  That's what I meant in comment 89.
Comment 92 Justin Lebar (not reading bugmail) 2010-11-10 10:30:13 PST
In an effort to get this landed on schedule, Taras and I think we should take these test fixes as a followup.
Comment 93 Justin Lebar (not reading bugmail) 2010-11-10 10:38:16 PST
That wasn't particularly clear.  I meant that we're planning to land the test fixes in this bug as-is, and then clean them up per the above comments as a followup.
Comment 94 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-10 11:00:14 PST
(In reply to comment #91)
> > No, just change the numbers in both places where you're changing them in one.
> 
> I think we may be talking past each other.  Do you mean that, because I made
> this change:
> 
> -  div.style.setProperty(prop, "5% 15%", "");
> -  is(cs.getPropertyValue(prop), "10px 30px",
> +  div.style.setProperty(prop, "12.5% 6.25%", "");
> +  is(cs.getPropertyValue(prop), "32px 16px",
>       "radius-valued property " + prop + ": computed value before transition");
>    div.style.setProperty("-moz-transition-property", prop, "");
>    div.style.setProperty(prop, "25%", "");
> -  is(cs.getPropertyValue(prop), "20px 35px",
> +  is(cs.getPropertyValue(prop), "40px 28px",
>       "radius-valued property " + prop + ": interpolation of radius");
>    check_distance(prop, "5% 15%", "10% 17.5%", "25%");
> 
> I should change the arguments to the check_distance call above to ("12.5%
> 6.25%", "...", "25%"), where "..." meets the criteria from comment 88?  That's
> what I meant in comment 89.

yes.
Comment 95 Ted Mielczarek [:ted.mielczarek] 2010-11-11 06:23:02 PST
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/d2db7be6d3b4 (-O3 switch)
http://hg.mozilla.org/mozilla-central/rev/f634808dcae9 (test fixes)
Comment 96 Martin Schröder [:mschroeder] 2010-11-12 13:49:02 PST
Should this bug be reopened as the patch has been backed out in http://hg.mozilla.org/mozilla-central/rev/e9e55a9460bb ?
Comment 97 (dormant account) 2010-11-16 16:07:27 PST
So I just tried reproducing the sunspider hang on try, all of the benchmarks complete successfully.
Comment 98 Chris AtLee [:catlee] 2010-11-16 16:40:25 PST
(In reply to comment #97)
> So I just tried reproducing the sunspider hang on try, all of the benchmarks
> complete successfully.

Let's get some additional testing on these builds just to be sure.

Can you file a bug (http://bit.ly/test_request), and include the <username>-<revision> for the builds (that helps whoever is submitting the extra builds), and ask for a few extra test and talos runs?
Comment 99 Justin Lebar (not reading bugmail) 2010-11-17 00:51:10 PST
Created attachment 491126 [details] [diff] [review]
Part 1 - Test fixes (v5)

Interdiff in a moment
Comment 100 Justin Lebar (not reading bugmail) 2010-11-17 00:53:05 PST
Created attachment 491128 [details]
Interdiff (part 1, v4 -> v5)
Comment 101 (dormant account) 2010-11-17 12:07:18 PST
This sucks, running talos multiple times results in a hang at -O3. -O2 causes sunspider to hang too. I'm really hoping that bug 609543 is the same issue.
Comment 102 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-30 16:17:15 PST
Comment on attachment 491126 [details] [diff] [review]
Part 1 - Test fixes (v5)

>-  check_distance(prop, "8% 12%", "-moz-calc(6% + 10px) -moz-calc(5px + 9%)",
>-                       "40px 20px");
>+  check_distance(prop, "6.25% 12.5%", "16px 4px", "64px 16px");

I don't see where the middle numbers here came from or why they should even work.  You should use the correct -moz-calc() expression.

r=dbaron with that fixed
Comment 103 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-11-30 16:18:07 PST
And once you fix that, can you undo the accuracy reduction that you made as well?
Comment 104 Justin Lebar (not reading bugmail) 2010-11-30 18:29:43 PST
Aha.  I *finally* understand this test.  Fix in a moment.

Sorry for the runaround, dbaron.
Comment 105 Justin Lebar (not reading bugmail) 2010-11-30 18:31:54 PST
Created attachment 494276 [details] [diff] [review]
Part 1 - Test fixes (v6)
Comment 106 Justin Lebar (not reading bugmail) 2010-11-30 18:38:26 PST
Pushed to try: http://hg.mozilla.org/try/rev/f38141e6c74c
Comment 107 Justin Lebar (not reading bugmail) 2010-11-30 23:18:21 PST
Two new test failures on try which appear to be from gcc 4.5 plus -O3:

I probably won't have time to take a closer look until the weekend.

Linux64: ERROR TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/text/test_{singleline,whitespace}.html

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1291181801.1291183038.24230.gz

Linux32: REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/tryserver_fedora_test-reftest/build/reftest/tests/layout/reftests/bugs/368020-5.html | image comparison (==)

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1291180522.1291181422.17207.gz
Comment 108 Justin Lebar (not reading bugmail) 2010-12-16 10:05:43 PST
Just ran through tryserver again.  Got the same two failures as in comment 107, plus a ts_cold crash on linux-32:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1292478670.1292480244.6506.gz
Comment 109 (dormant account) 2010-12-16 10:24:05 PST
(In reply to comment #108)
> Just ran through tryserver again.  Got the same two failures as in comment 107,
> plus a ts_cold crash on linux-32:
> 
> http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1292478670.1292480244.6506.gz

Sounds like 4.5 is too buggy at O3. Is O2 ok?
Comment 110 Justin Lebar (not reading bugmail) 2010-12-16 11:39:24 PST
With -O2, same reftest failure on linux-32 plus a js-ctypes crash:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1292527302.1292528084.32576.gz

Still awaiting linux-64 results.
Comment 111 Justin Lebar (not reading bugmail) 2010-12-16 11:43:44 PST
The reftest failure is kind of interesting.  It looks like the gradients are being cut off a pixel or two too soon at the bottom.
Comment 112 David 2010-12-16 22:26:36 PST
Hi um could the default compile flags also include: -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security  -fstack-protector

Also --> -pie -fPIE 

See http://wiki.debian.org/Hardening (among other sites) for as to why these should be added.
Comment 113 David 2010-12-16 22:29:49 PST
For the default firefox - from http://www.mozilla.com/en-US/firefox/ (32bit)

 Position Independent Executable: no, normal executable!
 Stack protected: no, not found!
 Fortify Source functions: no, not found!
 Read-only relocations: no, not found!
 Immediate binding: no, not found!

Here is firefox compiled on amd64 with hardening flags enabled:
 Position Independent Executable: yes
 Stack protected: yes
 Fortify Source functions: yes
 Read-only relocations: yes
 Immediate binding: yes
Comment 114 Justin Lebar (not reading bugmail) 2010-12-17 09:52:57 PST
(In reply to comment #112)
> Hi um could the default compile flags also include: -D_FORTIFY_SOURCE=2
> -Wformat -Wformat-security  -fstack-protector
> 
> Also --> -pie -fPIE 
> 
> See http://wiki.debian.org/Hardening (among other sites) for as to why these
> should be added.

Could you please file a separate bug on this?  This bug's summary was pretty vague, but I've updated it.  :)
Comment 115 David 2010-12-17 16:49:24 PST
I could... but this bug seemed to be about compile flags already and I didn't want to file a duplicate bug.
Comment 116 Justin Lebar (not reading bugmail) 2010-12-17 16:54:38 PST
This bug is specifically about optimization flags; that's why I changed the summary.  Your bug is about security flags, so it's a separate issue.
Comment 117 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-12-17 16:55:49 PST
(In reply to comment #115)
> I could... but this bug seemed to be about compile flags already and I didn't
> want to file a duplicate bug.

Yes, the summary was lying :-)
Comment 119 Mike Beltzner [:beltzner, not reading bugmail] 2011-02-23 10:49:42 PST
Comment on attachment 474135 [details] [diff] [review]
Part 2 - Enable -O3

(bookkeeping)
Comment 120 Mike Hommey [:glandium] 2011-03-16 15:40:46 PDT
With the two patches from here applied, and building with gcc 4.5 with -static-libstdc++ on try (off revision 4866be78732f), I only get one failure:

NEXT ERROR REFTEST TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/layout/reftests/bugs/368020-5.html | image comparison (==)
REFTEST   IMAGE 1 (TEST): 
REFTEST   IMAGE 2 (REFERENCE): 
REFTEST number of differing pixels: 126
Comment 121 :Ehsan Akhgari 2011-03-16 15:45:02 PDT
(In reply to comment #120)
> With the two patches from here applied, and building with gcc 4.5 with
> -static-libstdc++ on try (off revision 4866be78732f), I only get one failure:

Can you please attach the log as a text file which can be used with reftest-analyzer.xhtml?
Comment 122 Mike Hommey [:glandium] 2011-03-16 15:59:58 PDT
Created attachment 519800 [details]
reftest failure log

Note this only fails on x86, and doesn't fail on x86-64. (so that could very well be a double precision problem)
Comment 123 :Ehsan Akhgari 2011-03-16 20:57:23 PDT
(In reply to comment #122)
> Created attachment 519800 [details]
> reftest failure log
> 
> Note this only fails on x86, and doesn't fail on x86-64. (so that could very
> well be a double precision problem)

Looks like the height of the boxes are incorrectly calculated.  CCing layout folks.
Comment 124 Mike Hommey [:glandium] 2011-03-17 05:01:25 PDT
Created attachment 519873 [details] [diff] [review]
Additional test fix

Reading bug 368020, it doesn't look like the use of percentage in the test case is useful for what is being tested. Replacing them with absolute sizes fix the failure. I get an all green on try with this patch + the two others on this bug with gcc 4.5 -static-libstdc++.
Comment 125 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-03-17 08:24:26 PDT
Comment on attachment 519873 [details] [diff] [review]
Additional test fix

If we can't accurately compute 2% or 3% of 500px, that's a problem.  It should come out to exactly 10px and 15px; floating point error shouldn't enter the picture, since even if there's slight error we should round right back to the correct value.
Comment 126 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-03-17 08:25:51 PDT
(In reply to comment #124)
> Reading bug 368020, it doesn't look like the use of percentage in the test case
> is useful for what is being tested.

Tests aren't always strictly written to test only the bug being tested.  In this case I wrote tests for percentages to make sure we don't break percentages.
Comment 127 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-03-17 08:41:16 PDT
Comment on attachment 519873 [details] [diff] [review]
Additional test fix

Actually, I guess this is understandable if the normal line-height comes out to exactly a half-pixel value, and then the float multiplication comes out very slightly low, so r=dbaron.
Comment 128 Mike Hommey [:glandium] 2011-03-17 11:52:36 PDT
BTW, when applying part 1, I noticed this in js/src/configure.in (and changed it locally):
-          MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fno-reorder-functions -fno-omit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
+          MOZ_OPTIMIZE_FLAGS="-Os -fno-omit-frame-pointer"

I guess that's a typo, right, Justin ?
Comment 129 Mike Hommey [:glandium] 2011-03-17 11:54:05 PDT
(In reply to comment #128)
> BTW, when applying part 1

part 2, that is.
Comment 130 Justin Lebar (not reading bugmail) 2011-03-17 18:02:03 PDT
(In reply to comment #128)
> BTW, when applying part 1, I noticed this in js/src/configure.in (and changed
> it locally):
> -          MOZ_OPTIMIZE_FLAGS="-Os -freorder-blocks -fno-reorder-functions
> -fno-omit-frame-pointer $MOZ_OPTIMIZE_SIZE_TWEAK"
> +          MOZ_OPTIMIZE_FLAGS="-Os -fno-omit-frame-pointer"
> 
> I guess that's a typo, right, Justin ?

Yes, nice catch!  It should be -03 -fno-omit-frame-pointer.
Comment 131 Mike Hommey [:glandium] 2011-03-23 04:37:19 PDT
Created attachment 521150 [details] [diff] [review]
part 1 - Fix tests to avoid rounding errors. Original patch from jlebar.

Refreshed patch. I'll take this bug to completion (or not, it appears there might be problems with -O3 + PGO).
Comment 132 Mike Hommey [:glandium] 2011-03-23 04:38:10 PDT
Created attachment 521151 [details] [diff] [review]
part 2 - Switch default gcc optimize options to -O3
Comment 133 Mike Hommey [:glandium] 2011-03-29 05:22:01 PDT
This can safely be done before actually switching to gcc 4.5, though we'll probably do both at (roughly) the same time.
Comment 136 Justin Lebar (not reading bugmail) 2011-05-01 20:01:47 PDT
That certainly wasn't intentional on my part!
Comment 137 Daniel Holbert [:dholbert] 2011-05-03 16:09:56 PDT
This might have broken Linux crash reports.  See bug 654595.
Comment 138 Daniel Holbert [:dholbert] 2011-05-04 16:44:06 PDT
Turned out it was PGO that busted the crash reports.  However, when we disabled PGO, we got perma-orange in 4 reftests, due to this bug here: bug 654858

Backed out "part 2" of this bug to fix the perma-orange:
 http://hg.mozilla.org/mozilla-central/rev/e5c5ea4d03ed
Comment 139 Boris Zbarsky [:bz] (still a bit busy) 2011-05-04 19:13:37 PDT
OK, since that backout Mochitest-other is orange on Linux opt, both 32-bit and 64-bit.  I've closed the tree until we sort that out, since at this point I don't know how to get it to a known-good state.
Comment 140 Daniel Holbert [:dholbert] 2011-05-04 19:31:09 PDT
bz suggested that we might need a clobber, which seems reasonable, given that my backout constituted a build config change.  Requested clobbers, and hopefully that fixes the mochitest-oth issues..
Comment 141 Daniel Holbert [:dholbert] 2011-05-04 20:43:42 PDT
Nope, the clobber didn't fix it. (As I later noticed, the builds for my backout were actually free-space clobber builds themselves)

glandium, ball's in your court... I'm not sure what we need to do to get the tree back in a green state. :)
Comment 142 Boris Zbarsky [:bz] (still a bit busy) 2011-05-04 21:38:25 PDT
And for what it's worth, if I push current m-c to try I get the failures there too.
Comment 143 Boris Zbarsky [:bz] (still a bit busy) 2011-05-04 21:40:27 PDT
And note that the failures are unexpected passes on the a11y tests...
Comment 144 Daniel Holbert [:dholbert] 2011-05-04 22:30:16 PDT
Note also that the a11y tests haven't changed in 3 months (since Feb 3 2011).

The unexpected passes all seem to be of the form
> "" should equal ""
> 0 should equal 0
so I don't think they're *legitimate* passes - rather, it looks like we're messing up the expected value somehow.  (The test isn't super skimmable, so I'm not absolutely sure of that, but it seems likely.)

philor pointed out that it looks like we hit this exact issue last week in bug 652459 (the log in bug 652459 comment 0 exactly matches the unexpected passes we're seeing now).  That bug mysteriously stopped being a problem -- perhaps that was when this bug here's cset landed? (and now that this is backed out, it's a problem again)
Comment 145 Daniel Holbert [:dholbert] 2011-05-04 22:40:16 PDT
(In reply to comment #144)
> philor pointed out that it looks like we hit this exact issue last week in bug
> 652459

See in particular bug 652459 comment 3, which basically says that the GCC 4.5 upgrade caused the a11y orange.

So, AFAICT, here's the situation:
(a) gcc-4.5 upgrade causes a11y orange (bug 652459 & comment 139 here)
(b) This bug (-O3) fixes the a11y orange, but breaks reftests (bug 654858)
(c) PGO fixes the reftests, but breaks crash reports (bug 654595)

We've reverted (c) and (b) already.

IIUC, it looks like we should revert (a) in order to get back to a known-good state.
Comment 146 Daniel Holbert [:dholbert] 2011-05-04 23:15:40 PDT
Alternately, if reverting the GCC upgrade is a hassle & if we anticipate having bug 654595 resolved within a few days so that (a)+(b)+(c) can all be applied again, then we could potentially disable these a11y tests for a few days until we get to that point. (if possible, it'd be nice to have someone from the a11y team dig in deeper to figure out what's going on)
Comment 147 Mike Hommey [:glandium] 2011-05-04 23:51:32 PDT
I think we have several problems in one, and they might grant separate bugs.

- This is something I wanted to file before all this mess appears, we can't set -O3 unconditionally on all gcc versions. People may be building with older versions and may have problems as a consequence (others than those we have). (and O3 without PGO makes libxul.so 7MB bigger). So all in all, maybe we should only enable O3 if we are building with PGO.

- We really need to figure out what this a11y orange is because if it happens for us with gcc 4.5 and -Os, it means it likely happens for distros. On Debian, we've been using gcc 4.5 for a while without regressions in reftests, xpcshell-tests, crashtests, jstestbrowser, and make check, but I don't run mochitest (because they take too much time). I will try running that particular test.

- Likewise for the reftests breakage with -O3, because while currently PGO apparently optimizes the involved code at -Os, it may decide to optimize at -O3 some day.

Note that I may have a workaround for (c), which is going through try right now.
Comment 148 Mike Hommey [:glandium] 2011-05-04 23:53:09 PDT
I'll add that I didn't run in the reftest regression when I first tried gcc 4.5 + -O3 without PGO on try a few weeks ago.
Comment 149 Mike Hommey [:glandium] 2011-05-05 00:02:47 PDT
(In reply to comment #147)
> - We really need to figure out what this a11y orange is because if it happens
> for us with gcc 4.5 and -Os, it means it likely happens for distros. On
> Debian, we've been using gcc 4.5 for a while without regressions in reftests,
> xpcshell-tests, crashtests, jstestbrowser, and make check, but I don't run
> mochitest (because they take too much time). I will try running that
> particular test.

Note that it was also happening on tm with PGO.
Comment 150 Boris Zbarsky [:bz] (still a bit busy) 2011-05-05 05:01:54 PDT
We definitely need separate bugs on the three items in comment 147.
Comment 151 Mike Hommey [:glandium] 2011-05-05 05:07:28 PDT
(In reply to comment #150)
> We definitely need separate bugs on the three items in comment 147.

- I reopened bug 652459 for the second item.
- We could use bug 654858 for the third item.
- As for the first one, it could either be treated in this bug or a new separate one.
Comment 152 Mike Hommey [:glandium] 2011-05-05 05:08:54 PDT
And I filed bug 654975 as a possible workaround for bug 654595
Comment 153 Mike Hommey [:glandium] 2011-05-05 12:27:21 PDT
http://hg.mozilla.org/mozilla-central/rev/3abe2f8c592e

Note You need to log in before you can comment on or make changes to this bug.