Closed Bug 790624 Opened 12 years ago Closed 11 years ago

Android toolchain for armv6 generates faulty code when compiling with -Os flag

Categories

(Firefox Build System :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox17+ wontfix)

RESOLVED FIXED
Tracking Status
firefox17 + wontfix

People

(Reporter: jmaher, Assigned: bugs)

References

()

Details

(Whiteboard: [armv6])

Attachments

(6 files)

yesterday we turned on reftests for armv6 builds.  On R2, we have 14 failures which are all in text-overflow:
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/ellipsis-font-fallback.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/marker-basic.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/marker-string.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/anonymous-block.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/false-marker-overlap.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/visibility-hidden.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/quirks-decorations.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/quirks-line-height.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/standards-decorations.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/standards-line-height.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/selection.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/marker-shadow.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/two-value-syntax.html | image comparison (==) 
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.48.208:30032/tests/layout/reftests/text-overflow/atomic-under-marker.html | image comparison (==) 

A full log can be found here:
https://tbpl.mozilla.org/php/getParsedLog.php?id=15160230&tree=Firefox&full=1

I suspect there is something different in the armv6 builds which is causing us to fail the majority of these tests.
Marking as blocking bug 784681 not for wanting to turn it off, but so releng can keep track of what is hidden vs not (after my chat with joduinn yesterday).
Blocks: 784681
Jet, can you get someone to look into these failures?
Blocks: 791103
+cc: Mats:

I added the actual and expected screen shots to the first reftest failure ( ellipsis-font-fallback.html )

Please have a look and advise on why this occurs only on the ARMv6 builds. Note: the tests are running on ARMv7 Tegra machines.
Attached file reftest log
The easier way to do that is just to extract the lines from the log and stick them in a file that you can then load in https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml
Looking at the test screenshots, I think it's a more fundamental
problem outside of the code that does the text-overflow analysis.
Two problems that stands out:
1.  in layout/reftests/text-overflow/visibility-hidden.html
    the ellipsis on the last block is aligned at the block edge (wrong)
    (several tests have this error)
2.  in layout/reftests/text-overflow/two-value-syntax.html
    the 2nd row from the top have no ellipsis on the left side,
    but there's only four |'s visible so there should be overflow
    reported on that side

I would guess something strange is going on with scroll frames
and/or overflow areas on this platform.  Are there other tests
involving overflow that fails on this platform?  Does the error
occur if you load the tests manually from a http server? (not file: )
Does the error also occur if you turn off font inflation?
(set font.size.inflation.emPerLine and font.size.inflation.minTwips
to zero and restart, iirc)
Component: Layout: Text → Layout: Block and Inline
There's one other reftest failing, yeah: bug 790630 where an svg test has overflow="hidden" but still gets a scrollbar. (Dunno whether anyone has even looked at what the bug 792300 mochitest failure might be.)
these tests don't have an error on armv7, so I would say we are not limited by going remote over a http server.  All mobile tests are done via http, so we don't do file at all.  I will have to try out the prefs to see if that makes a difference.  

Regarding other overflow tests, we only have these and 1 other reftest failing.
I tried setting the prefs:
user_pref("font.size.inflation.emPerLine", 0);
user_pref("font.size.inflation.minTwips", 0);

and I still get the same 14 failures.
https://bugzilla.mozilla.org/show_bug.cgi?id=790630#c5
>Robert Longson 2012-09-13 02:08:31 PDT:
> I looked into this some more and you do have a actual bug on armv6. The
> scrollbars should be suppressed by overflow="hidden" but are not. 

seems to agree with my comment 7 above that there is a bug with scroll
frames on this platform; and we should fix that bug rather than tweak
the tests IMO.  Unfortunately, there's not much I can do without access
to a device or emulator that can reproduce the problem.
Depends on: 790630
Whiteboard: [armv6]
Assignee: nobody → matspal
Emailing Erin to see if we can get you a phone, Mats.
I can reproduce the problem using an armv6 build on my Samsung Nexus phone:
http://ftp.mozilla.org/pub/mozilla.org/mobile/nightly/latest-mozilla-aurora-android-armv6/ 
Now I only need a debug build...
Attached file Testscase #1
I did two local armv6 builds, the first with:
ac_add_options --enable-debug
ac_add_options --disable-optimize

with the intention to debug the problem... but this build works fine!
The second build, with the above options flipped (and no other changes):
ac_add_options --disable-debug
ac_add_options --enable-optimize

The bug occurs!  This is really weird...
With --disable-optimize I get "-fno-omit-frame-pointer -funwind-tables" for a
typical C++ compilation, whereas --enable-optimize results in
"-Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer"
(all else equal).

--enable-optimize="-freorder-blocks -fno-reorder-functions -fomit-frame-pointer"
works and --enable-optimize="-Os" is broken.

So it appears this is a bug in the tool chain when using -Os on armv6.
I suggest we remove that flag from --enable-optimize for armv6 builds.
Assignee: matspal → nobody
Component: Layout: Block and Inline → Build Config
--enable-optimize="-freorder-blocks -fno-reorder-functions -fomit-frame-pointer" means you don't optimize *at all*. This is going to be big and slow.
(In reply to Mike Hommey [:glandium] from comment #18)
> --enable-optimize="-freorder-blocks -fno-reorder-functions
> -fomit-frame-pointer" means you don't optimize *at all*. This is going to be
> big and slow.

Yea, I would think we just pragma around the code which exposes an optimizer issue, to disable optimization/change optimization in *that one* spot disabling optimization on Armv6 devices, which tend to be lower-end devices already is imho a non-starter.
Does the NDK we have installed on build machines have the GCC 4.6.3 toolchain available? It'd be interesting to try building with that and see if this optimizer bug still exists.
(In reply to Ted Mielczarek [:ted] from comment #20)
> Does the NDK we have installed on build machines have the GCC 4.6.3
> toolchain available? It'd be interesting to try building with that and see
> if this optimizer bug still exists.

According to bug 675572, we have at least the ndk r7b with gcc 4.6 available on builders, although maybe in the meanwhile we got a r8 installed, I'm not sure.

However, that ndk r7b with gcc 4.6 also defaults to using gold, and on armv6, it does some bad (bug 772974)
We probably don't have a r8 installed yet: bug 779568
(In reply to Ted Mielczarek [:ted] from comment #20)
> Does the NDK we have installed on build machines have the GCC 4.6.3
> toolchain available? It'd be interesting to try building with that and see
> if this optimizer bug still exists.

Looks like android-ndk5-r5c doesn't; that's what we're using on m-c atm.
Galaxy Gio, I get the same result as seen in :mats screenshot
As a data point: I built armv6 with ndk r5c and got a SIGILL when trying to load a page. Error was fixed by using r8b.
Back to Jet to help decide next steps here, we're still tracking this for 17 but if there's a case to be made that these tests are not key to our being able to ship a stable ARMv6 build when we release Firefox 17, please let us know.
Assignee: nobody → bugs
(In reply to Lukas Blakk [:lsblakk] from comment #26)
> Back to Jet to help decide next steps here

Compiler bugs are rather scary. I don't like the #pragma workaround, as I think we've likely got more bugs here with the same root cause. I'd like to see how this behaves with the r8b tools. Benoit, can you test on your local build?
Just because it's triggered by a certain set of compiler flags doesn't mean we can't figure out where the problem is -- it can be reduced by disabling optimization for just one directory at a time, or just one file at a time, to find where the problem is.  (And it's also not necessarily a compiler bug; we've had our own bugs (e.g., depending on undefined behavior in C++) that show up only with one optimizer.)
Depends on: 804641
The assembler was compiled with -g so it has ".loc" line numbers
to the corresponding source line.

This compiler bug has bitten us once before in bug 642205 but apparently
that workaround is insufficient on armv6.

I've filed bug 804641 to work around it more thoroughly.
Which doesn't imply *this* bug is fixed -- it should remain open until
the toolchain is finally fixed.
Comment on attachment 674277 [details]
Offending source + assembler that triggers compiler bug

Forgot to say: the source code that triggers the bug is the latter half
of GetScrollbarStylesFromFrame(), from line 4147 forward.
Depends on: 642205
Summary: armv6 builds of fennec fail 14 of the reftest/text-overflow tests → Android toolchain for armv6 generates faulty code when compiling with -Os flag
From email: will not be fixed for FF17 (too risky to change compilers now.) We landed bug 804641 instead.
No longer blocks: 816993
Depends on: 816993
No longer needs to block bug 784681, since Armv6 tests are not hidden, since we worked around it for now.
No longer blocks: 784681
Depends on: 825453
No longer depends on: 816993
Now that we have gcc-4.6.3 as the default toolchain on armv6, I pushed a try build with the hacky compiler workarounds backed out: https://tbpl.mozilla.org/?tree=Try&rev=f42d42f93e1c

We can test that build to see if it has the same problems, and if not, we can back out the workarounds and close this bug.
The above try build was based on a bad inbound cset; I have an updated one at https://tbpl.mozilla.org/?tree=Try&rev=486c8343b80a
The try builds look good. I'm going to close this bug as the toolchain is fixed as of bug 825453, and file a new bug for backing out the workarounds.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.