Allow building Fennec with clang

RESOLVED FIXED in Firefox 58

Status

defect
RESOLVED FIXED
4 years ago
a year ago

People

(Reporter: snorp, Assigned: froydnj)

Tracking

(Depends on 1 bug, Blocks 1 bug)

Firefox Tracking Flags

(firefox57 wontfix, firefox58 fixed)

Details

Attachments

(8 attachments, 7 obsolete attachments)

2.38 KB, patch
snorp
: review+
Details | Diff | Splinter Review
3.00 KB, patch
glandium
: review+
Details | Diff | Splinter Review
3.60 KB, patch
glandium
: review+
Details | Diff | Splinter Review
1.20 KB, patch
froydnj
: review+
jbeich
: feedback+
Details | Diff | Splinter Review
4.19 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.66 KB, patch
glandium
: review+
Details | Diff | Splinter Review
2.03 KB, patch
snorp
: review+
Details | Diff | Splinter Review
2.05 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
Recent NDK has support for clang. We should try to get that working and see if there are any benefits (smaller, faster code?).
ASAN builds might be cool.
Assignee

Comment 2

4 years ago
I started poking at this today.  One problem I've had thus far is that the NDK's clang, by default, includes things like /usr/include on its #include search list.  This wouldn't be so bad, except that it appears that clang treats -isystem and -idirafter somewhat differently than our setup expects them to work.  The upshot is that directories aren't searched in the correct order for things like #include_next to work, and various things fall apart.
Assignee

Updated

4 years ago
Depends on: 1194806
Assignee

Comment 3

4 years ago
I've gotten a lot farther with this, but it looks like we're going to have to wait for an updated NDK with a newer version of clang, as clang 3.5/3.6 in NDK r10e choke on some of the JNI template wizardry.
Assignee

Updated

4 years ago
Assignee: nobody → nfroyd
Assignee

Updated

4 years ago
Depends on: 1195477
Assignee

Updated

3 years ago
Depends on: 1238635
Assignee

Updated

3 years ago
Depends on: 1238661
Depends on: 1262052
Assignee

Updated

3 years ago
Depends on: 1277236
Assignee

Updated

3 years ago
Depends on: 1277619
Assignee

Updated

3 years ago
Depends on: 1277624
Assignee

Updated

3 years ago
Depends on: 1277646
Assignee

Updated

3 years ago
Depends on: 1277647
Assignee

Updated

3 years ago
Depends on: 1277649
Assignee

Updated

3 years ago
Depends on: 1277650
Assignee

Comment 4

3 years ago
(In reply to Nathan Froyd [:froydnj] from comment #3)
> I've gotten a lot farther with this, but it looks like we're going to have
> to wait for an updated NDK with a newer version of clang, as clang 3.5/3.6
> in NDK r10e choke on some of the JNI template wizardry.

clang in the r11b NDK still chokes on the JNI template wizardry, which is surprising to me, as I thought 3.7 was going to fix things, and the clang in the NDK claims to be 3.8.x.  But I get the same error with the clang from the r10e NDK, so...
Assignee

Updated

3 years ago
Depends on: 1255210
Assignee

Updated

3 years ago
Depends on: 1277853
Assignee

Updated

3 years ago
Depends on: 1278314
Assignee

Comment 5

3 years ago
Another porting effort, another internal compiler error:

https://github.com/android-ndk/ndk/issues/114
Assignee

Updated

3 years ago
Depends on: 1278861
Assignee

Updated

3 years ago
Depends on: 1278863
Assignee

Updated

3 years ago
Depends on: 1278868
Assignee

Updated

3 years ago
Depends on: 1278871
Assignee

Updated

3 years ago
Depends on: 1282096
Assignee

Updated

3 years ago
Depends on: 1288442
Assignee

Updated

3 years ago
No longer depends on: 1255210
froydnj: hey, I see you doing great work here!  I'm interested in this to avoid wankery like https://code.google.com/p/android/issues/detail?id=214639, which I'm hitting (I think), but not due to lambda.  The built-in Android Studio NDK debugger is pretty nice so it would be cool if it actually worked... can you post a few words about where this project is at?
Flags: needinfo?(nfroyd)
Assignee

Comment 7

3 years ago
Sure!  Bug 1288442 is the biggest blocker on the Firefox side of things that I know of: clang requires a different set of flags on Android, which was very doable in the old configure.in-based way of doing things.  But now we pick the "Android-specific" flags before we even know what compiler we're using on Android:

https://dxr.mozilla.org/mozilla-central/source/build/moz.configure/android-ndk.configure#91

so there's some chicken-and-egg bits to be unraveled there.  There's also bug 1283611 for updating our configury to cope with NDK r13 and r14.  It's been a little while since I compiled with clang because of bug 1288442 and moz.configure, so there's undoubtedly a few things to fix that have crept in over the last several months.

But even once all that work is complete, there's still a massive (10%+ the last time I checked) libxul size regression when compiling with clang:

https://github.com/android-ndk/ndk/issues/133

It's not just us, either; folks have been reporting size increases with Qt as well.  I know GCC is deprecated and clang has user-friendly warning messages and all that, but a 10% size increase would be a bitter pill to swallow if we made the switch.  My impression is that we don't want to do that...

https://github.com/android-ndk/ndk/issues/133
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Sure!

Nathan -- thank you for this, incredibly helpful to understand where things are at.  I took a quick look at the linked issues and agree, 10% bloat is completely out of the question :(  However, I'd still like to plug away at this so that we can use it locally, since the NDK debugging story is, I expect, going to be much better in the long term.

Thanks again!
Depends on: 1369992
Depends on: 1370133
froydnj: m_kato: can you give some directions for how you even try using clang with the Android builds?  Like, how you set the toolchain versions/paths/etc?  Hints or a patch appreciated.  Thanks!
Flags: needinfo?(nfroyd)
Flags: needinfo?(m_kato)
Assignee

Comment 10

2 years ago
It looks like bug 1369992 comment 3 has an answer to this, which is apparently a lot simpler than the explanation I was going to post!  In addition to the patches there, you'll also need bug 1370133.
Flags: needinfo?(nfroyd)
Note you don't need HOST_CC to be clang, it can be gcc.
Flags: needinfo?(m_kato)
Assignee

Updated

2 years ago
Blocks: C++14

Comment 12

2 years ago
Have we tried building with -Oz?  Apparently for Chrome, that has improved the size and speed of clang builds: https://github.com/android-ndk/ndk/issues/133#issuecomment-308549264
Assignee

Comment 13

2 years ago
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #12)
> Have we tried building with -Oz?  Apparently for Chrome, that has improved
> the size and speed of clang builds:
> https://github.com/android-ndk/ndk/issues/133#issuecomment-308549264

I haven't tried -Oz, but seeing that comment made me realize that maybe we don't have to take a 10% size regression!
opt build by changeset: 8b3a146037a2

clang 5.0 (NDK 15b) with -Oz (but turn off elfhack since elfhack throws assertion)
32754501 Jul  6 09:52 fennec-56.0a1.en-US.android-arm.apk

gcc 4.9 (NDK 15b)
32918280 Jul  6 10:19 fennec-56.0a1.en-US.android-arm.apk


clang w/ -Oz is smaller than gcc w/ -Os.
Assignee

Comment 15

2 years ago
Thank you for those numbers!  I guess I should hurry up and fix this! :)

We will want to investigate the elfhack errors though.  And I was running into several problems making everything compile with clang that I was slowly fixing...What does your mozconfig and local patch stack look like?
Flags: needinfo?(m_kato)
Depends on: 1378600
Depends on: 1378605
Posted patch clang.patch (obsolete) — Splinter Review
attached patch includes bug 1350822 and this is for test.

Also, mozconfig is the following.

mk_add_options AUTOCLOBBER=1

ac_add_options --enable-application=mobile/android

ac_add_options --target=arm-linux-androideabi
ac_add_options --with-android-ndk="$HOME/bin/android-ndk-r15b"
ac_add_options --with-android-sdk="$HOME/Android/Sdk"
ac_add_options --with-android-version=15

ac_add_options --disable-debug
ac_add_options --enable-optimize

CC="$HOME/bin/android-ndk-r15b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang"
CXX="$HOME/bin/android-ndk-r15b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++"

# ac_add_options --disable-elf-hack
Flags: needinfo?(m_kato)
Assignee

Comment 18

2 years ago
Comment on attachment 8883792 [details] [diff] [review]
clang.patch

Review of attachment 8883792 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for sharing this!  Comments below.

::: build/autoconf/android.m4
@@ +37,5 @@
> +    fi
> +    CPPFLAGS="$directory_include_args $CPPFLAGS"
> +    CFLAGS="-fno-short-enums $CFLAGS"
> +    CXXFLAGS="-fno-short-enums $CXXFLAGS"
> +    ASFLAGS="$directory_include_args -DANDROID $ASFLAGS"

This is probably worth its own patch, though given the comments below in android-ndk.configure, I don't think we can support building with both GCC and clang unless we go the standalone toolchain route.  And if we're not going to support building with both GCC and clang, then it doesn't make much sense to test CC_TYPE here, since we'd know that we only have one compiler.

::: build/moz.configure/android-ndk.configure
@@ +164,5 @@
> +        if isdir(toolchain):
> +            return toolchain
> +
> +        die('You have to specify --with-android-toolchain='
> +            '/path/to/ndk/toolchain.')

The changes to this function (and any other places that the result of --with-android-gnu-compiler-version are replaced with a fixed 4.9) are probably worth splitting into their own bug/patch.  I think we should additionally get rid of --with-android-toolchain and hardcode GCC 4.9 as the version--as you have done here--since Google has indicated that further GCC development isn't happening, and nothing in-tree uses --with-android-toolchain in any event.  We should provide commentary on why we're hardcoding 4.9 everywhere that we do.

Would you mind doing that?  I am happy to review the patch.

@@ +196,5 @@
> +    # the correct place.
> +    return ['-isystem',
> +            os.path.join(platform_dir, 'usr', 'include'),
> +            '-gcc-toolchain',
> +            toolchain]

This is the sticky part.  We had talked in bug 1336471 about switching to requiring toolchains built with `make-standalone-toolchain.sh` from the NDK.  If we did that, all of this extra_toolchain_flags stuff would simply go away; the compilers from the NDK would have extra options added automatically.  (A lot of the flag-setting from android.m4 would also go away.)

But we *could* just switch to requiring clang and hardcoding the clang-required options here and elsewhere.  Maybe that would be better for the purposes of moving to clang, and then we could evaluate switching to `make-standalone-toolchain.sh` on its own merits.

::: build/unix/elfhack/inject/Makefile.in
@@ +10,5 @@
>  
>  GARBAGE += $(CSRCS)
>  
> +CFLAGS := -O2 -fno-stack-protector $(filter-out -O% -f% -W% -g,$(CFLAGS))
> +# CFLAGS := -O2 -fno-stack-protector $(filter -m% -I%,$(CFLAGS))

As already mentioned, I don't think we can afford to disable elfhack just so we can ship with clang.

::: media/webrtc/trunk/webrtc/base/task_queue_libevent.cc
@@ +238,1 @@
>  		  static_cast<suseconds_t>((milliseconds % 1000) * 1000)};

Can you please split this out as a new bug and patch, and r?jesup on it?

::: old-configure.in
@@ +870,4 @@
>      if test -z "$CLANG_CC"; then
> +       MOZ_OPTIMIZE_FLAGS="-freorder-blocks -fno-reorder-functions -Os"
> +    else
> +       MOZ_OPTIMIZE_FLAGS="-Oz"

Can you please split this part out into a new bug/patch, and tag me for review?  This part is definitely worth it on its own--it cuts about 3.3MB off libxul with NDK r11b locally, and it's obviously better than gcc with your measurements.
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Thank you for those numbers!  I guess I should hurry up and fix this! :)
> 
> We will want to investigate the elfhack errors though.  And I was running
> into several problems making everything compile with clang that I was slowly
> fixing...What does your mozconfig and local patch stack look like?

It may be worth revisiting elfhack effectiveness again. It might not matter much on modern hardware.
I pushed a build to autophone with elfhack disabled. We'll see if it has a significant impact on startup perf.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e193f54c7dda0f87ac2a091976f90065bd2a0d7
Assignee

Updated

2 years ago
Depends on: 1378830
The reason elfhack doesn't seem to be a win is because... drumroll... it's been disabled in practice for a while, apparently, and this went unnoticed. I guess this was the consequence of a toolchain change. I'm going to bisect this (and figure out how long that has been the case).
Flags: needinfo?(mh+mozilla)
(In reply to Nathan Froyd [:froydnj] from comment #18)
 
> ::: media/webrtc/trunk/webrtc/base/task_queue_libevent.cc
> @@ +238,1 @@
> >  		  static_cast<suseconds_t>((milliseconds % 1000) * 1000)};
> 
> Can you please split this out as a new bug and patch, and r?jesup on it?

This is bug 1378412 that is general issue by clang 4.0+.

> ::: old-configure.in
> @@ +870,4 @@
> >      if test -z "$CLANG_CC"; then
> > +       MOZ_OPTIMIZE_FLAGS="-freorder-blocks -fno-reorder-functions -Os"
> > +    else
> > +       MOZ_OPTIMIZE_FLAGS="-Oz"
> 
> Can you please split this part out into a new bug/patch, and tag me for
> review?  This part is definitely worth it on its own--it cuts about 3.3MB
> off libxul with NDK r11b locally, and it's obviously better than gcc with
> your measurements.

OK.  I will file it.
Depends on: 1379032
Depends on: 1379835
Assignee

Updated

2 years ago
Depends on: 1390524
Assignee

Updated

2 years ago
Depends on: 1390640
Assignee

Comment 24

2 years ago
New wrinkle: the LLVM in the NDK depends on a newer version of glibc than GCC does, which is also a newer version of glibc than our Docker images provide.

nalexander has been looking at modifying the docker images we use for Android builds as part of bug 1370119; I am trying to poke at his patches while he is minimally involved during his parental leave.
Assignee

Comment 25

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #24)
> nalexander has been looking at modifying the docker images we use for
> Android builds as part of bug 1370119; I am trying to poke at his patches
> while he is minimally involved during his parental leave.

Update: this actually works, using Debian as the base OS, but now we have the problem that our sccache binary is linked on Fedora/CentOS-like machines, which set up sonames for OpenSSL libraries differently than Debian does.  So we can't run sccache; I have tried making the appropriate symbolic links to use the Fedora-ish names, which just leads into deeper, cryptic error messages that I'm not willing to unravel currently.

Options to move forward:

1) Disable sccache.  This is only mentioned for completeness; I don't think we actually want to do this.

The advantage is that this extremely simple compared to the following two approaches and is virtually guaranteed to work:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d61ccc146521f1e3fb614fb3fb27c5e011dad72

(This run is not compiling with clang, but it *is* building on the new images which would support clang, AFAIK.)

2) Compile sccache and statically link in dependencies, so it will run virtually anywhere.  sccache has a script to do this:

https://github.com/mozilla/sccache/blob/master/scripts/build-release.sh#L24

and the official releases of sccache used this script successfully.  It's not what we use to build our sccache in automation: we don't have docker there, for instance.  We could either look into making Docker available or we could hand-roll a similar approach.  This similar approach would require compiling musl, its suite of tools, and any C/C++ libraries that sccache requires against musl.

If we have future Rust tools that the build depends on, this musl work will be useful for them as well.

Alternatively, we may be able to coax sccache into linking openssl statically.

3) Compile our own clang and use that in lieu of the NDK clang on the current images that we have.  I haven't looked at how many patches Google applies, but they make a point of mentioning the SVN revision numbers for the clang they distribute, so it's pretty plausible that they do very little patching.  Their clang and gcc binaries are also built *outside* of the normal NDK release process, it seems:

https://github.com/android-ndk/ndk/blob/master/docs/Toolchains.md

which suggests that NDK-specific patches being applied are unlikely.  This option is attractive because it makes static analyses very easy to stand up (we know that our clang works with our static analysis code because we already use it for such).  It's unattractive because it means that automation is using something different than the majority of developers are using, and so we have to use slightly different configuration mechanisms.  We would also be using a very different version of clang (depending on how we built it) than the NDK, which may cause problems, and mismatches may affect us in other ways (e.g. integration with NDK tools?).

I think we eventually need to wind up doing #2, because changing the base system we need to use must happen to upgrade our Android tools generally.  #3 may become necessary once we want to do static analysis on Android, too; it's not clear if the NDK clang will let us load custom plugins.  The LLVM headers *are* provided, though, so that's a positive sign.
Assignee

Comment 26

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #25)
> Alternatively, we may be able to coax sccache into linking openssl
> statically.

So!  The openssl crate:

https://github.com/sfackler/rust-openssl

does provide an option to do this: the environment variable OPENSSL_STATIC can be set to indicate your preference that everything be linked statically.  Problem solved, right?

No.  https://github.com/sfackler/rust-openssl/issues/604 indicates that if you have pkg-config on your system, then pkg-config will be executed to determine things like libraries to link and so forth, and your OPENSSL_STATIC preference will be completely ignored.

But, if you set the additional variables OPENSSL_LIB_DIR and OPENSSL_INCLUDE_DIR, pkg-config will not be consulted, and your settings will take precedence.  These are mainly useful in the cross-compilation case, but they are also useful here.  Problem solved, right?

No.  Apparently with:

env CC=clang OPENSSL_STATIC=1 OPENSSL_LIB_DIR=/usr/lib64 OPENSSL_INCLUDE_DIR=/usr/include cargo build --release --verbose

the OPENSSL_STATIC=1 and the associated settings it triggers makes rustc think that you want *all* libraries statically linked.  This is not an unreasonable position, but it's really not the behavior we want.  And linking all libraries statically doesn't work, because our system images don't have static libraries installed for at least libzlib and libkrb5.  (We don't really care about libkrb5, I think, because sccache doesn't care about Keberos, but we very much care about libzlib.)  So just install the static libraries in the system image!  Problem solved, right?

No.  Static libraries are provided for zlib, but there are *no* static libraries provided for libkrb5--at least none that `yum` is willing to tell me about.  Searching indicates that this is some sort of confusion between upstream (where you can't build static and shared libraries from a single build tree?) and packagers (who don't want to provide an extra package with static libraries that would have to be named differently).  So, we could...compile our own libkrb5, just so we can have static binaries?

The yaks are strong with this stuff. :(
Assignee

Comment 27

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #26)
> So, we could...compile our own libkrb5, just so we can have static binaries?

Or we could just compile our own openssl, which apparently doesn't come with a libkrb5 dependency by default.  This is the tack that bug 1395722 takes.
Depends on: 1395722
Assignee

Updated

2 years ago
Depends on: 1396098
Assignee

Updated

2 years ago
Depends on: 1396850
Assignee

Updated

2 years ago
Depends on: 1396892
Assignee

Updated

2 years ago
Depends on: 1399315
Assignee

Comment 28

2 years ago
The first thing to do to make Fennec compile with clang is to use NDK
tarballs that actually include LLVM.
Attachment #8908764 - Flags: review?(snorp)
Assignee

Updated

2 years ago
Attachment #8883792 - Attachment is obsolete: true
Assignee

Comment 29

2 years ago
Our normal method of locating the compilers in the NDK is to set
--with-toolchain-prefix when compiling for Android.  However, the clang
binaries are in a different location than would otherwise be implied by
--with-toolchain-prefix.  Additionally, we still need to know about
--with-toolchain-prefix because the NDK clang needs to be pointed there
so clang can find important binaries like the linker and because various
other bits of the build system depend on the --with-toolchain-prefix
value as well.

So we need a separate mechanism for communicating the whereabouts of the
NDK clang.  We can't set CC and CXX because doing so would conflict with
the CC and CXX values exported to the js/src/ subconfigure.  But Android
already has an extra_toolchain_flags function that we use for informing
moz.configure about additional flags needed solely for Android, and we
can use a similar trick to communicate the existence of the NDK's clang
to the main C/C++ compiler selection process.

(This patch should be folded in with the next one before commit, since
together they make use default to clang.)
Attachment #8908765 - Flags: review?(mh+mozilla)
Assignee

Comment 30

2 years ago
We pass -idirafter to GCC on Android, but the NDK clang includes host
paths (!) in its default search paths when cross-compiling, so inserting
include directories after the default search paths does work.  Using
-isystem does, however, so we should use that instead.

The NDK clang also needs to be informed about the existence of a GCC
toolchain, so important programs like the linker can be located.
Attachment #8908767 - Flags: review?(mh+mozilla)
Assignee

Comment 31

2 years ago
NDK r15+ clang changed the code generation strategy for the __atomic_*
intrinics such that using them with 64-bit types now requires linking
with libatomic.  Our current configure tests for libatomic doesn't catch
this, because the std::atomic implementation is such that it doesn't
require an external library, even for 64-bit types, whereas the
__atomic_* intrinsics do.  The safest thing to do is to force this
configure check to always return true when we are compiling for
x86/Android with clang.

I realize that it's probably more correct to take out the Android condition
here, but I'd like to leave it in; if it hasn't bitten anybody by this point,
it probably won't bite anybody in the future?
Attachment #8908768 - Flags: review?(mh+mozilla)
Assignee

Comment 32

2 years ago
clang does not have the specific problem that led to this hack in the
first place, so we can remove the hack.  (The hack also causes issues
with clang; it complains that you can't pass `volatile T*` into the
intrinsics that we're using for atomic operations.)
Attachment #8908769 - Flags: review?(lhansen)
Assignee

Comment 33

2 years ago
I haven't posted the change that would switch us to using the r15 NDK, because of a severe compile-time regression for x86:

https://github.com/android-ndk/ndk/issues/522

I don't think the same regression appears on ARM, but if we're not overly concerned with how long x86 compiles take, we could go ahead and switch before the r16 NDK comes out--Google has indicated that they're going to try and fix the regression for r16.

Alternatively, we could try, say, r14 to determine whether the clang there doesn't horribly regress size and use that in the interim.
Attachment #8908764 - Flags: review?(snorp) → review+
(In reply to Nathan Froyd [:froydnj] from comment #33)
> 
> Alternatively, we could try, say, r14 to determine whether the clang there
> doesn't horribly regress size and use that in the interim.

Do we have a big size regression with clang in r11c? I thought you figured that out and fixed it with different optimization flags?
Flags: needinfo?(nfroyd)
Assignee

Comment 35

2 years ago
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #34)
> (In reply to Nathan Froyd [:froydnj] from comment #33)
> > 
> > Alternatively, we could try, say, r14 to determine whether the clang there
> > doesn't horribly regress size and use that in the interim.
> 
> Do we have a big size regression with clang in r11c? I thought you figured
> that out and fixed it with different optimization flags?

IIRC, we still had a (smaller) big size regression with the different optimization flags; switching to NDK r15 finally improved clang with the new flags to produce smaller code than gcc.  I did not investigate any intermediate points along the way, so r12, r13, or r14 might also address the regression.
Flags: needinfo?(nfroyd)
Assignee

Comment 36

2 years ago
One possibility is to use r15b for ARM (which does not have the terrible compile-time regression) and something older for x86.  We would take a size hit on x86 depending on the version that we used, though.
Comment on attachment 8908769 [details] [diff] [review]
part 4 - remove arm-specific hacks in typed array implementation

Review of attachment 8908769 [details] [diff] [review]:
-----------------------------------------------------------------

I guess...

Are we only using clang to compile on all ARM platforms now?  Or has the gcc bug (reported against 4.7) been fixed to our satisfaction in every gcc version that we do support (> 4.8)?  Surely not all ARM OSes we care about support the unaligned accesses that gave rise to this hack in the first place.
Attachment #8908769 - Flags: review?(lhansen) → review+
Assignee

Comment 38

2 years ago
(In reply to Lars T Hansen [:lth] from comment #37)
> Are we only using clang to compile on all ARM platforms now?  Or has the gcc
> bug (reported against 4.7) been fixed to our satisfaction in every gcc
> version that we do support (> 4.8)?  Surely not all ARM OSes we care about
> support the unaligned accesses that gave rise to this hack in the first
> place.

We will be compiling with clang on Android/ARM eventually; since that's the only tier1 ARM platform we have, that's kind of the only platform we care about.  I don't know what happens on Linux/ARM, and I don't have access to a machine to test.

Did we actually report the GCC bug?  We require at least 4.9 everywhere.
Flags: needinfo?(lhansen)
I know that at least Ubuntu makes official ARM builds of Firefox (see eg bug 1391802 for a current problem).  I understand this is not tier 1 but even so it is nice of us to worry about them.

Did we report it?  I don't know, the bug trail is quiet on that point, though I actually doubt it or it'd be mentioned.  (It predates my interaction with the code though I'm possibly the one that moved the code last.)  Where's Waldo when you need him?

(I do have a ARM Linux board but the gcc is too old on that and I'd have to figure out cross-compilation probably; not this month.)

We could be extremely conservative for now and do __arm__ && __GNUC__ && !__clang__, I guess.  Or whatever's the best way to do that.  Your call, the r+ stands.
Flags: needinfo?(lhansen)
Assignee

Comment 40

2 years ago
(In reply to Lars T Hansen [:lth] from comment #39)
> We could be extremely conservative for now and do __arm__ && __GNUC__ &&
> !__clang__, I guess.  Or whatever's the best way to do that.  Your call, the
> r+ stands.

This is a good point, I will do this.
Comment on attachment 8908769 [details] [diff] [review]
part 4 - remove arm-specific hacks in typed array implementation

Review of attachment 8908769 [details] [diff] [review]:
-----------------------------------------------------------------

Mozilla building android fennec with clang doesn't mean this code is only built with clang for arm.
Attachment #8908769 - Flags: review-
Comment on attachment 8908765 [details] [diff] [review]
part 2a - expose the NDK clang binaries to toolchain configury

Review of attachment 8908765 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz.configure/android-ndk.configure
@@ +208,5 @@
> +        llvm_path = toolchain_format % (ndk, host.kernel.lower(), 'x86')
> +
> +    return namespace(
> +        cc='%s/clang' % llvm_path,
> +        cxx='%s/clang++' % llvm_path,

You're not using cxx are you? Not in this patch at least. If it's not used in a subsequent patch, don't use a namespace.
Attachment #8908765 - Flags: review?(mh+mozilla)
Comment on attachment 8908767 [details] [diff] [review]
part 2b - pass the correct flags to clang on android

Review of attachment 8908767 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/android.m4
@@ +20,5 @@
> +    dnl inserts the android platform's /usr/include after the host's /usr/include,
> +    dnl which obviously doesn't work that well.  Using -isystem appears to put
> +    dnl things in the correct place.
> +    if test "$CC_TYPE" != "clang"; then
> +        directory_include_args"-idirafter $android_platform/usr/include"

I think we should just use -isystem instead of -idirafter in all cases.

@@ +29,5 @@
> +    # it to use the NDK tools.
> +    if test "$CC_TYPE" = "clang"; then
> +        extra_opts="-gcc-toolchain $(dirname $(dirname $TOOLCHAIN_PREFIX))"
> +        CPPFLAGS="$extra_opts $CPPFLAGS"
> +        CFLAGS="$extra_opts $CFLAGS"

CPPFLAGS is always used on command lines that use CFLAGS and CXXFLAGS, so if you set CPPFLAGS, you don't need to set CFLAGS and CXXFLAGS. (FYI, CFLAGS and CXXFLAGS are always used on command lines that use LDFLAGS, but since you'll be removing from CFLAGS and CXXFLAGS, that doesn't matter...)

@@ +35,5 @@
> +        ASFLAGS="$extra_opts $ASFLAGS"
> +        LDFLAGS="$extra_opts $LDFLAGS"
> +    fi
> +    CPPFLAGS="$directory_include_args $CPPFLAGS"
> +    CFLAGS="$directory_include_args -fno-short-enums -fno-exceptions $CFLAGS"

same comment as above

::: build/moz.configure/android-ndk.configure
@@ +179,5 @@
> +@depends(android_platform, android_toolchain)
> +def extra_toolchain_flags(platform_dir, toolchain_dir):
> +    if not platform_dir:
> +        return []
> +    # -idirafter does not work correctly with llvm; the default #include search

I actually think we shouldn't care about this comment at all. -idirafter was the wrong thing to use in the first place. We should just use -isystem.

::: build/unix/elfhack/inject/Makefile.in
@@ +9,5 @@
>  	cp $< $@
>  
>  GARBAGE += $(CSRCS)
>  
> +CFLAGS := -O2 -fno-stack-protector $(filter -m% -I% -isystem%,$(subst -isystem ,-isystem,$(CFLAGS)))

Note that since you're currently not replacing -idirafter when building with gcc, this effectively would break gcc builds.
Attachment #8908767 - Flags: review?(mh+mozilla)
Comment on attachment 8908768 [details] [diff] [review]
part 3 - fix jsapi-tests link errors with __atomic_* intrinsics on x86/Android with clang

Review of attachment 8908768 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/toolchain.m4
@@ +105,5 @@
> +        dnl alignment of T.  The result is that the use of std::atomic
> +        dnl variables needs no special flags on x86/Android, but the use of
> +        dnl __atomic_* or __sync_* intrinsics on raw 64-bit pointers does
> +        dnl require linking to libatomic.  Since the JS engine, as of this
> +        dnl writing, does use the __atomic_* family of intrinsics, we require

There's more than the js engine. At least cairo, ffvpx, libstagefright, libvpx, webrtc, aom, rust/backtrace-sys and protobuf are using __atomic intrinsics.

Note, this paragraph was somehow confusing. Maybe just say that on x86 android std::atomic doesn't actually require libatomic, thus failing the test, but intrinsics do, so we force it.

::: js/src/jsapi-tests/moz.build
@@ +152,5 @@
>      'static:js',
>  ]
>  
> +if CONFIG['MOZ_NEEDS_LIBATOMIC']:
> +    OS_LIBS += ['atomic']

This shouldn't be needed. It should be pulled through the use of mfbt.
Attachment #8908768 - Flags: review?(mh+mozilla)
Assignee

Comment 45

2 years ago
(In reply to Mike Hommey [:glandium] from comment #43)
> ::: build/autoconf/android.m4
> @@ +20,5 @@
> > +    dnl inserts the android platform's /usr/include after the host's /usr/include,
> > +    dnl which obviously doesn't work that well.  Using -isystem appears to put
> > +    dnl things in the correct place.
> > +    if test "$CC_TYPE" != "clang"; then
> > +        directory_include_args"-idirafter $android_platform/usr/include"
> 
> I think we should just use -isystem instead of -idirafter in all cases.

I could have sworn that this didn't work, but try says otherwise:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f26efbad9d72813c7d65b99b54876ec34928ddf

> @@ +29,5 @@
> > +    # it to use the NDK tools.
> > +    if test "$CC_TYPE" = "clang"; then
> > +        extra_opts="-gcc-toolchain $(dirname $(dirname $TOOLCHAIN_PREFIX))"
> > +        CPPFLAGS="$extra_opts $CPPFLAGS"
> > +        CFLAGS="$extra_opts $CFLAGS"
> 
> CPPFLAGS is always used on command lines that use CFLAGS and CXXFLAGS, so if
> you set CPPFLAGS, you don't need to set CFLAGS and CXXFLAGS. (FYI, CFLAGS
> and CXXFLAGS are always used on command lines that use LDFLAGS, but since
> you'll be removing from CFLAGS and CXXFLAGS, that doesn't matter...)

I'm 90% positive I tested patches without these options in C{,XX}FLAGS and they didn't work, but locally they appear to.  I'll try making that change.

> ::: build/unix/elfhack/inject/Makefile.in
> @@ +9,5 @@
> >  	cp $< $@
> >  
> >  GARBAGE += $(CSRCS)
> >  
> > +CFLAGS := -O2 -fno-stack-protector $(filter -m% -I% -isystem%,$(subst -isystem ,-isystem,$(CFLAGS)))
> 
> Note that since you're currently not replacing -idirafter when building with
> gcc, this effectively would break gcc builds.

This will be addressed by the above use of -isystem.

(In reply to Mike Hommey [:glandium] from comment #44)
> ::: build/autoconf/toolchain.m4
> @@ +105,5 @@
> > +        dnl alignment of T.  The result is that the use of std::atomic
> > +        dnl variables needs no special flags on x86/Android, but the use of
> > +        dnl __atomic_* or __sync_* intrinsics on raw 64-bit pointers does
> > +        dnl require linking to libatomic.  Since the JS engine, as of this
> > +        dnl writing, does use the __atomic_* family of intrinsics, we require
> 
> There's more than the js engine. At least cairo, ffvpx, libstagefright,
> libvpx, webrtc, aom, rust/backtrace-sys and protobuf are using __atomic
> intrinsics.

OK.

> Note, this paragraph was somehow confusing. Maybe just say that on x86
> android std::atomic doesn't actually require libatomic, thus failing the
> test, but intrinsics do, so we force it.

I thought that's what the paragraph was saying.  I will try to tweak the writing.

> ::: js/src/jsapi-tests/moz.build
> @@ +152,5 @@
> >      'static:js',
> >  ]
> >  
> > +if CONFIG['MOZ_NEEDS_LIBATOMIC']:
> > +    OS_LIBS += ['atomic']
> 
> This shouldn't be needed. It should be pulled through the use of mfbt.

jsapi-tests doesn't link in MFBT, apparently, because this is needed:

Executing: /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ -std=gnu++11 --target=i386-linux-android -o jsapi-tests -Qunused-arguments -isystem /home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/include -gcc-toolchain /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/linux-x86_64 -Qunused-arguments -Wall -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wloop-analysis -Wc++11-compat-pedantic -Wc++14-compat -Wc++14-compat-pedantic -Wc++1z-compat -Wcomma -Wimplicit-fallthrough -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wno-gnu-zero-variadic-macro-arguments -Wformat-security -Wno-unknown-warning-option -Wno-return-type-c-linkage -fno-short-enums -fno-exceptions -I/home/froydnj/.mozbuild/android-ndk-r15c/sources/cxx-stl/llvm-libc++/include -I/home/froydnj/.mozbuild/android-ndk-r15c/sources/android/support/include -I/home/froydnj/.mozbuild/android-ndk-r15c/sources/cxx-stl/llvm-libc++abi/include -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pipe -g -O3 -fno-omit-frame-pointer -funwind-tables /opt/build/froydnj/build-android-x86/js/src/jsapi-tests/tmpa0xC4E.list -L/home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/lib -Wl,-rpath-link=/home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/lib --sysroot=/home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86 -Wl,--allow-shlib-undefined -gcc-toolchain /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/linux-x86_64 -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,--build-id -Wl,--icf=safe -Wl,-rpath-link,/opt/build/froydnj/build-android-x86/dist/bin -Wl,-rpath-link,/opt/build/froydnj/build-android-x86/dist/lib ../build/libjs_static.a ../../../mozglue/build/libmozglue.so ../../../security/libnss3.so -lm -lm -ldl -L/home/froydnj/.mozbuild/android-ndk-r15c/sources/cxx-stl/llvm-libc++/libs/x86 -lc++_static -lc++abi -landroid_support -llog
/opt/build/froydnj/build-android-x86/js/src/jsapi-tests/tmpa0xC4E.list:
    INPUT("hidePointer.o")
    INPUT("testAssemblerBuffer.o")
    INPUT("Unified_cpp_js_src_jsapi-tests0.o")
    INPUT("Unified_cpp_js_src_jsapi-tests1.o")
    INPUT("Unified_cpp_js_src_jsapi-tests2.o")
    INPUT("Unified_cpp_js_src_jsapi-tests3.o")
    INPUT("Unified_cpp_js_src_jsapi-tests4.o")
    INPUT("Unified_cpp_js_src_jsapi-tests5.o")
    INPUT("Unified_cpp_js_src_jsapi-tests6.o")
    INPUT("../../../memory/fallible/fallible.o")
    INPUT("../../../config/external/icu/data/icudata.o")

/home/froydnj/src/gecko-dev.git/js/src/jit/x86-shared/AtomicOperations-x86-shared-gcc.h:92: error: undefined reference to '__atomic_load_8'

...and many more errors like it.
Assignee

Comment 46

2 years ago
This command-line flag is a little more evocative and works correctly
with both GCC and clang.
Attachment #8910420 - Flags: review?(mh+mozilla)
Assignee

Comment 47

2 years ago
Our normal method of locating the compilers in the NDK is to set
--with-toolchain-prefix when compiling for Android.  However, the clang
binaries are in a different location than would otherwise be implied by
--with-toolchain-prefix.  Additionally, we still need to know about
--with-toolchain-prefix because the NDK clang needs to be pointed there
so clang can find important binaries like the linker and because various
other bits of the build system depend on the --with-toolchain-prefix
value as well.

So we need a separate mechanism for communicating the whereabouts of the
NDK clang.  We can't set CC and CXX because doing so would conflict with
the CC and CXX values exported to the js/src/ subconfigure.  But Android
already has an extra_toolchain_flags function that we use for informing
moz.configure about additional flags needed solely for Android, and we
can use a similar trick to communicate the existence of the NDK's clang
to the main C/C++ compiler selection process.
Attachment #8910421 - Flags: review?(mh+mozilla)
Assignee

Updated

2 years ago
Attachment #8908765 - Attachment is obsolete: true
Assignee

Comment 48

2 years ago
The NDK clang needs to be informed about the existence of a GCC
toolchain, so important programs like the linker can be located.
Attachment #8910422 - Flags: review?(mh+mozilla)
Assignee

Updated

2 years ago
Attachment #8908767 - Attachment is obsolete: true
Assignee

Comment 49

2 years ago
Patch updated to continue using the hack for GCC, as discussed.  Carrying over r+.
Attachment #8910423 - Flags: review+
Assignee

Updated

2 years ago
Attachment #8908768 - Attachment is obsolete: true
Assignee

Comment 50

2 years ago
Whoops, actually meant this one for carrying over r+.
Attachment #8910424 - Flags: review+
Assignee

Updated

2 years ago
Attachment #8908769 - Attachment is obsolete: true
Assignee

Comment 51

2 years ago
Comment on attachment 8910423 [details] [diff] [review]
part 4 - fix jsapi-tests link errors with __atomic_* intrinsics on x86/Android with clang

Revised the comment, but the jsapi-tests change is still needed.
Attachment #8910423 - Flags: review+ → review?(mh+mozilla)
Assignee

Comment 52

2 years ago
Also, NDK r14b has the same issues as NDK r15c on x86 with nsContentUtils.cpp, so downgrading our NDK version for x86 only does not seem like a viable option.  We'll have to wait for NDK r16, or just accept that compiling for x86 Android will take a looong time (though sccache might help with that problem).
Attachment #8910420 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8910421 [details] [diff] [review]
part 3a - expose the NDK clang binaries to toolchain configury

Review of attachment 8910421 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz.configure/android-ndk.configure
@@ +196,5 @@
>               reason='--with-android-ndk')
> +
> +@depends(host, ndk)
> +@imports(_from='os.path', _import='isdir')
> +def android_c_compiler(host, ndk):

maybe name this android_clang_compiler?
Attachment #8910421 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8910422 [details] [diff] [review]
part 3b - inform clang about the NDK's gcc toolchain

Review of attachment 8910422 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz.configure/android-ndk.configure
@@ +171,3 @@
>          else:
>              if gnu_compiler_version:
>                  die('Your --with-android-gnu-compiler-version may be wrong')

note bug 1350822 is removing this.

@@ +181,5 @@
> +    if not platform_dir:
> +        return []
> +    return ['-isystem',
> +            os.path.join(platform_dir, 'usr', 'include'),
> +            '-gcc-toolchain',

Note that at this point, building with gcc is still supported, so you should not add this unconditionally, only when using clang.
Attachment #8910422 - Flags: review?(mh+mozilla)
Comment on attachment 8910423 [details] [diff] [review]
part 4 - fix jsapi-tests link errors with __atomic_* intrinsics on x86/Android with clang

Review of attachment 8910423 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/toolchain.m4
@@ +94,5 @@
>  AC_LANG_CPLUSPLUS
>  if test "$GNU_CXX"; then
>      AC_CACHE_CHECK([whether 64-bits std::atomic requires -latomic],
>          ac_cv_needs_atomic,
> +        dnl x86 with clang is a little peculiar.  std::atomic does not require

thanks, this is clearer.

::: js/src/jsapi-tests/moz.build
@@ +152,5 @@
>      'static:js',
>  ]
>  
> +if CONFIG['MOZ_NEEDS_LIBATOMIC']:
> +    OS_LIBS += ['atomic']

It /feels/ like this should be in a template. Like Binary().
Attachment #8910423 - Flags: review?(mh+mozilla) → review+
Assignee

Comment 56

2 years ago
(In reply to Mike Hommey [:glandium] from comment #54)
> @@ +181,5 @@
> > +    if not platform_dir:
> > +        return []
> > +    return ['-isystem',
> > +            os.path.join(platform_dir, 'usr', 'include'),
> > +            '-gcc-toolchain',
> 
> Note that at this point, building with gcc is still supported, so you should
> not add this unconditionally, only when using clang.

I don't understand this--you're saying we should support building on Android with GCC?  I think we should consider that an unsupported configuration after this set of patches.  Why should we continue to support GCC here?

Besides, we don't have access to the compiler here: determining the compiler requires extra_toolchain_flags.  We could try to rearrange things so that wasn't true, but...
Flags: needinfo?(mh+mozilla)
I'm not opposed to unsupporting gcc, but you're not doing it in this patch queue. So either you do it, or you don't break it. Pick your poison :)
Flags: needinfo?(mh+mozilla)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d790b8d5027a83c7ff028fa2f153fc6ac86fc5a
Bug 1163171 - part 5 - remove arm-specific hacks in typed array implementation; p=froydnj, r=lth, push=lth
Patch 5 pushed to satisfy a different bug, see https://bugzilla.mozilla.org/show_bug.cgi?id=1409196#c11 et seq.
Keywords: leave-open
Comment on attachment 8910424 [details] [diff] [review]
part 5 - remove arm-specific hacks in typed array implementation

Approval Request Comment
[Feature/Bug causing the regression]:
This is old code hitting an issue with a newer Clang and impacts FreeBSD, which builds with Clang.

[User impact if declined]:
FreeBSD won't be able to compile FF57 without maintaining a local fork.

[Is this code covered by automated tests?]:
Likely, since the problem arises from an earlier bug fix (gcc optimization bug).

[Has the fix been verified in Nightly?]:
Landed yesterday.  Affects ARM + Clang only, a config we do not build ourselves yet.

[Needs manual test from QE? If yes, steps to reproduce]: 
No.

[List of other uplifts needed for the feature/fix]:
None.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
We do not build with Clang on ARM ourselves, this should not impact us.

[String changes made/needed]:
None.
Attachment #8910424 - Flags: approval-mozilla-beta?

Comment 62

2 years ago
Comment on attachment 8910424 [details] [diff] [review]
part 5 - remove arm-specific hacks in typed array implementation

Marking f+ per bug 1409196 comment 5. I also confirm Nightly after comment 60 builds fine on FreeBSD armv6/armv7 with Clang 4.0/5.0.
Attachment #8910424 - Flags: feedback+
Comment on attachment 8910424 [details] [diff] [review]
part 5 - remove arm-specific hacks in typed array implementation

Helps FreeBSD builds, beta57+
Attachment #8910424 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Ritu Kothari (:ritu) from comment #63)
> Comment on attachment 8910424 [details] [diff] [review]
> part 5 - remove arm-specific hacks in typed array implementation
> 
> Helps FreeBSD builds, beta57+

https://hg.mozilla.org/releases/mozilla-beta/rev/8b84e5418b80

Calling 57 "fixed" would be a bit of a stretch here, though, so just updating the status flags to something closer to reality.
Assignee

Comment 65

2 years ago
OK, so Google has confirmed that the fix for comment 52 will not be arriving in NDK r16, but will be arriving in r17 instead.  I'm guessing r17 isn't coming out until 2018Q1 or so.  This is unfortunate news.

Our options are:

1) Switch builds (developer and automation) to use clang from NDK r15 and accept that compiling Android/x86 is going to be slow.  I don't know how many developers routinely compile for Android/x86, so I don't know how much of a productivity hit that would be.  That slowdown would affect infrastructure as well, though perhaps it's somewhat less of a concern there: we run so few tests on Android/x86 that having a slower build doesn't matter *too* much.

1a) Switch builds (developer and automation) to use an older clang that doesn't exhibit the slowdown in comment 52.  The problem with doing this is that older clang made libxul much bigger than GCC: the whole point of upgrading to the newest and shiniest clang was that libxul was finally on par or smaller size-wise and we didn't have to accept regressions in that area.  I don't think this is a reasonable option, but I mention it for completeness.

2) Don't switch to using clang right now, but stand up builds for all of our Android platforms that use clang instead of GCC, to try and ensure things don't break until we can make the switch for real.  Supporting both GCC and clang in the build system for Android is a little tricky, but perhaps this can be done with a standalone clang toolchain or something.

3) Something else that I have not thought of but somebody reading this bug has.

Snorp, I guess you're the closest thing we have to a decision maker for this...how much of a penalty is the slowdown from option 1 (Android/x86 only) for your team in day-to-day work?  Would you be OK with switching to clang sooner rather than later?
Flags: needinfo?(snorp)
(In reply to Nathan Froyd [:froydnj] from comment #65)
> 
> Snorp, I guess you're the closest thing we have to a decision maker for
> this...how much of a penalty is the slowdown from option 1 (Android/x86
> only) for your team in day-to-day work?  Would you be OK with switching to
> clang sooner rather than later?

I don't think anyone builds x86 locally, so I think the only concern should be how it impacts automation. As you said we don't run too many tests there, so maybe it's fine.
Flags: needinfo?(snorp)
(In reply to Nathan Froyd [:froydnj] from comment #65)
> OK, so Google has confirmed that the fix for comment 52 will not be arriving
> in NDK r16, but will be arriving in r17 instead.  I'm guessing r17 isn't
> coming out until 2018Q1 or so.  This is unfortunate news.
> 
> Our options are:
> 
> 1) Switch builds (developer and automation) to use clang from NDK r15 and
> accept that compiling Android/x86 is going to be slow.  I don't know how
> many developers routinely compile for Android/x86, so I don't know how much
> of a productivity hit that would be.  That slowdown would affect
> infrastructure as well, though perhaps it's somewhat less of a concern
> there: we run so few tests on Android/x86 that having a slower build doesn't
> matter *too* much.

We should do this.

I personally want to keep Android/x86 builds _alive_ because they're the fastest to get from source checkout to "I can test this", but the route I take for this _doesn't really feel slow build times_: I use --enable-artifact-builds and an x86 emulator.

If I understand correctly, the resulting x86 code has roughly the same run time performance as the old, so I don't understand your comment about testing.  Is that not correct?  Even if the run time is significantly slower, I still think we should do this, because clang is/can be:

- supported
- better to debug in Android Studio
- statically analyzable
Assignee

Comment 68

2 years ago
(In reply to Nick Alexander :nalexander from comment #67)
> (In reply to Nathan Froyd [:froydnj] from comment #65)
> > OK, so Google has confirmed that the fix for comment 52 will not be arriving
> > in NDK r16, but will be arriving in r17 instead.  I'm guessing r17 isn't
> > coming out until 2018Q1 or so.  This is unfortunate news.
> > 
> > Our options are:
> > 
> > 1) Switch builds (developer and automation) to use clang from NDK r15 and
> > accept that compiling Android/x86 is going to be slow.  I don't know how
> > many developers routinely compile for Android/x86, so I don't know how much
> > of a productivity hit that would be.  That slowdown would affect
> > infrastructure as well, though perhaps it's somewhat less of a concern
> > there: we run so few tests on Android/x86 that having a slower build doesn't
> > matter *too* much.
> 
> We should do this.
> 
> I personally want to keep Android/x86 builds _alive_ because they're the
> fastest to get from source checkout to "I can test this", but the route I
> take for this _doesn't really feel slow build times_: I use
> --enable-artifact-builds and an x86 emulator.

This is a good point; artifact builds are probably more common on Android than non-artifact builds, so C++ compilation times matter a little less?

> If I understand correctly, the resulting x86 code has roughly the same run
> time performance as the old, so I don't understand your comment about
> testing.  Is that not correct?

This was poor communication on my part: my thinking was that slower builds are bad because they prevent tests from running prior to completion of the build, not that the runtime performance necessarily materially affects the runtime of the tests.  So slow builds impact automation times--seeing whether builds are busted on Android, seeing whether something busted Android tests, etc.  But since we run so few tests on Android/x86, waiting a little longer to get to them is not a huge burden.  If we were facing the same sort of slowdown on Android/ARM, I'd be more reluctant to switch to clang prior to that slowdown being fixed.

> Even if the run time is significantly
> slower, I still think we should do this, because clang is/can be:
> 
> - supported
> - better to debug in Android Studio
> - statically analyzable

Indeed.
Assignee

Updated

2 years ago
Depends on: 1411692, 1411694
Assignee

Comment 69

2 years ago
Renamed as per comment 53; also took the liberty of adding existence checks for
files and directories so we can fail early in configure.
Attachment #8922065 - Flags: review+
Assignee

Updated

2 years ago
Attachment #8910421 - Attachment is obsolete: true
Assignee

Comment 70

2 years ago
The NDK clang needs to be informed about the existence of a GCC
toolchain, so important programs like the linker can be located.  With
this change, we're starting to use command-line options that are
incompatible with GCC, so we also add a check to inform the user about
the non-support for this configuration.

I'm not completely sure what you want for comment 54; I think this is the
easiest approach.
Attachment #8922066 - Flags: review?(mh+mozilla)
Assignee

Updated

2 years ago
Attachment #8910422 - Attachment is obsolete: true
Assignee

Comment 71

2 years ago
OK, so we're going to go with NDK r15c and accept the compilation hit on x86.
Attachment #8922072 - Flags: review?(snorp)
Attachment #8922072 - Flags: review?(snorp) → review+
Comment on attachment 8922066 [details] [diff] [review]
part 3b - inform clang about the NDK's gcc toolchain

Review of attachment 8922066 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/autoconf/android.m4
@@ +7,5 @@
>  
>  case "$target" in
>  *-android*|*-linuxandroid*)
>      dnl $android_platform will be set for us by Python configure.
>      directory_include_args="-isystem $android_platform/usr/include"

I'm mystified by this duplication of -isystem and -gcc-toolchain. I now wonder why that wasn't moved when extra_toolchain_flags was added to python configure.

::: build/moz.configure/toolchain.configure
@@ +859,5 @@
>                  'Only GCC 4.9 or newer is supported (found version %s).'
>                  % info.version)
>  
> +        if info.type == 'gcc' and host_or_target.os == 'Android':
> +            raise FatalCheckError('GCC is not supported on Android')

Please mention somehow that clang is what should be used instead.
Attachment #8922066 - Flags: review?(mh+mozilla) → review+
Assignee

Updated

2 years ago
Depends on: 1412405
Assignee

Comment 73

2 years ago
These tests were failing because of some misoptimizations that GCC did.
clang is generating correct code here, and the tests correctly pass, so
we can mark them as such.
Attachment #8922964 - Flags: review?(dholbert)
Comment on attachment 8922964 [details] [diff] [review]
part 7 - mark counter tests as passing now that we use clang

Review of attachment 8922964 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Be sure to close out bug 989718 as FIXED by this bug, once you've landed this.
Attachment #8922964 - Flags: review?(dholbert) → review+
Comment on attachment 8922964 [details] [diff] [review]
part 7 - mark counter tests as passing now that we use clang

Review of attachment 8922964 [details] [diff] [review]:
-----------------------------------------------------------------

Actually -- it looks like there's one more manifest that you need to update, on the assumption that this is fixing bug 989718 -- this one:
> asserts-if(Android,0-1) load 964078.html # bug 989718
https://dxr.mozilla.org/mozilla-central/rev/aa958b29c149a67fce772f8473e9586e71fbdb46/layout/generic/crashtests/crashtests.list#566

This wouldn't have turned your try run orange, since it's "0-1", which means it's happy to assert or not-assert (on Android).  But if the sometimes-assertion is really bug 989718 (per the comment), and if you're really fixing bug 989718, then we should be able to remove that "asserts-if" annotation entirely.
Assignee

Comment 76

2 years ago
(In reply to Daniel Holbert [:dholbert] from comment #75)
> Actually -- it looks like there's one more manifest that you need to update,
> on the assumption that this is fixing bug 989718 -- this one:
> > asserts-if(Android,0-1) load 964078.html # bug 989718
> https://dxr.mozilla.org/mozilla-central/rev/
> aa958b29c149a67fce772f8473e9586e71fbdb46/layout/generic/crashtests/
> crashtests.list#566
> 
> This wouldn't have turned your try run orange, since it's "0-1", which means
> it's happy to assert or not-assert (on Android).  But if the
> sometimes-assertion is really bug 989718 (per the comment), and if you're
> really fixing bug 989718, then we should be able to remove that "asserts-if"
> annotation entirely.

Thanks for pointing this out!  Did a try run and everything looks good for removing that, so will add that to this patch.
Assignee

Updated

2 years ago
Depends on: 1412513, 1412546, 1412547

Comment 77

2 years ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f074437df8b
part 1 - switch to r11c NDKs that include clang; r=snorp
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f81a968a258
part 2 - switch to using -isystem rather than -idirafter for Android; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/b3f95a78cca8
part 3a - expose the NDK clang binaries to toolchain configury; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c33bd0a9c36
part 3b - inform clang about the NDK's gcc toolchain; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a03305f96d7
part 4 - fix jsapi-tests link errors with __atomic_* intrinsics on x86/Android with clang; r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/28f3d894a0dc
part 6 - update to NDK r15c; r=snorp
https://hg.mozilla.org/integration/mozilla-inbound/rev/6afa42ff81ef
part 7 - mark counter tests as passing now that we use clang; r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/babd400c828a
part 8 - touch CLOBBER because we're changing compilers; r=me

Comment 78

2 years ago
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4f85969fa24a
part 9 - fix flake8 complains in android-ndk.configure. r=me

Comment 79

2 years ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/56279ee04019
part 9.1 - attempt to pacify flake8 complaints for real; r=me
Blocks: 1412677
Assignee

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Depends on: 1412734
Depends on: 1413041
Assignee

Updated

2 years ago
No longer depends on: 1413041

Updated

a year ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.