Closed Bug 1423238 Opened 7 years ago Closed 6 years ago

Android x86 build on macOS uses invalid Linux quota lib (USE_LINUX_QUOTACTL)

Categories

(Firefox Build System :: Android Studio and Gradle Integration, defect)

58 Branch
x86_64
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: p.bertran, Unassigned)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171112125346

Steps to reproduce:

Building the android full version (not artifact) from beta repository

hg log: 
changeset:   442726:81488b5935fa
tag:         tip
fxtree:      beta
user:        Andreas Farre <farre@mozilla.com>
date:        Tue Dec 05 04:54:00 2017 -0500
summary:     Bug 1421009 - Don't schedule idle callback if window is shutting down. r=bkelly, a=gchang
[...]

mozconfig: https://pastebin.mozilla.org/9074086



Actual results:

I have an error about not respecting the quotactl prototype, and although missing some members in the dqblk struct.

build output: https://pastebin.mozilla.org/9074083

Investigating, I found out that linux and macOSX have different libs around quotas:
- the quotactl functions have different prototypes: https://dxr.mozilla.org/mozilla-central/source/third_party/rust/libc-0.2.24/src/unix/bsd/apple/mod.rs#1646 and https://dxr.mozilla.org/mozilla-central/source/third_party/rust/libc-0.2.24/src/unix/notbsd/linux/mod.rs#878
- the structs dqblk have different members: https://dxr.mozilla.org/mozilla-central/source/third_party/rust/libc-0.2.24/src/unix/bsd/apple/mod.rs#256 and https://dxr.mozilla.org/mozilla-central/source/third_party/rust/libc-0.2.24/src/unix/notbsd/linux/mod.rs#159

But while I'm on macOSX for building, USE_LINUX_QUOTACTL seems defined and the linux version seems used. creating the errors.
https://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileUnix.cpp#1371


Expected results:

Build was OK with the version 57 (old beta and release). Should still be ok with 58.
Here is the true build output: https://pastebin.mozilla.org/9074089

Sorry for the first one, and I couldn't find how to edit it
Pierre: my best guest is that the clang in your HOST_CC isn't correct, but I really can't say much.  Can you give me the output of |mach configure|?

froydnj: any thoughts on this?
Flags: needinfo?(nfroyd)
Can you provide the config.status file from your object directory?

HAVE_LINUX_QUOTA_H would be defined if we can find the linux/quota.h header, and indeed, NDK r15c includes linux/quota.h.  Maybe the r15c linux/quota.h doesn't provide the functions we need?

But that leads to the question of why this hasn't caused problems for anybody else...this doesn't seem like it should be uncommon.
Flags: needinfo?(nfroyd) → needinfo?(p.bertran)
(In reply to :froydnj (on leave until 2018, ni? or email if necessary) from comment #3)
> Can you provide the config.status file from your object directory?
> 
> HAVE_LINUX_QUOTA_H would be defined if we can find the linux/quota.h header,
> and indeed, NDK r15c includes linux/quota.h.  Maybe the r15c linux/quota.h
> doesn't provide the functions we need?
> 
> But that leads to the question of why this hasn't caused problems for
> anybody else...this doesn't seem like it should be uncommon.

Here it is, sorry for the delay: https://pastebin.mozilla.org/9074477
Flags: needinfo?(p.bertran)
(In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018) from comment #2)
> Pierre: my best guest is that the clang in your HOST_CC isn't correct, but I
> really can't say much.  Can you give me the output of |mach configure|?
> 
> froydnj: any thoughts on this?

I recently had to add this in my mozconfig file, after an issue seen with nalexander on IRC if I remember well.

export HOST_CC=clang
export HOST_CXX=clang++

Here is the output of the configure: https://pastebin.mozilla.org/9074478
After speaking with rnewman, and some more testing, I can precise that this issue is only when building for x86 target (from mac). No issues with arm builds
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → Android
Hardware: Unspecified → x86_64
Summary: Android build on macOSX using invalid linux quota lib (USE_LINUX_QUOTACTL) → Android x86 build on macOS uses invalid Linux quota lib (USE_LINUX_QUOTACTL)
(In reply to Pierre Bertran from comment #6)
> After speaking with rnewman, and some more testing, I can precise that this
> issue is only when building for x86 target (from mac). No issues with arm
> builds

I think the issue is that your local clang and system libraries are doing things we don't anticipate.  Could you try installing clang from Mozilla's automation?  Something like `./mach artifact toolchain --from-build linux64-clang`, then move the clang directory into ~/.mozbuild, then update the HOST_* variables in your mozconfig to point to that clang.  If you still see problems, we have a moz.configure problem we need to solve more generally.
(In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018) from comment #7)
> I think the issue is that your local clang and system libraries are doing
> things we don't anticipate.  Could you try installing clang from Mozilla's
> automation?  Something like `./mach artifact toolchain --from-build
> linux64-clang`, then move the clang directory into ~/.mozbuild, then update
> the HOST_* variables in your mozconfig to point to that clang.  If you still
> see problems, we have a moz.configure problem we need to solve more
> generally.

Did that, and it looks like the distributed clang won't execute on my machine:
-bash: /Users/pierre/.mozbuild/clang/bin/clang-check: cannot execute binary file

Here is the output of uname -a on the mac, file of the clang executable (and ls of the bin directory)

https://pastebin.mozilla.org/9075190
(In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018) from comment #7)
> (In reply to Pierre Bertran from comment #6)
> > After speaking with rnewman, and some more testing, I can precise that this
> > issue is only when building for x86 target (from mac). No issues with arm
> > builds
> 
> I think the issue is that your local clang and system libraries are doing
> things we don't anticipate.  Could you try installing clang from Mozilla's
> automation?  Something like `./mach artifact toolchain --from-build
> linux64-clang`, then move the clang directory into ~/.mozbuild, then update
> the HOST_* variables in your mozconfig to point to that clang.  If you still
> see problems, we have a moz.configure problem we need to solve more
> generally.

Sorry, this should not have been linux64-clang, it should have been

./mach artifact toolchain -v --from-build macosx64-clang

So download that, move it (replacing the linux64 version, which is useless on Mac), and try again.
(In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018) from comment #9)
> (In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018)
> from comment #7)
> > (In reply to Pierre Bertran from comment #6)
> > > After speaking with rnewman, and some more testing, I can precise that this
> > > issue is only when building for x86 target (from mac). No issues with arm
> > > builds
> > 
> > I think the issue is that your local clang and system libraries are doing
> > things we don't anticipate.  Could you try installing clang from Mozilla's
> > automation?  Something like `./mach artifact toolchain --from-build
> > linux64-clang`, then move the clang directory into ~/.mozbuild, then update
> > the HOST_* variables in your mozconfig to point to that clang.  If you still
> > see problems, we have a moz.configure problem we need to solve more
> > generally.
> 
> Sorry, this should not have been linux64-clang, it should have been
> 
> ./mach artifact toolchain -v --from-build macosx64-clang
> 
> So download that, move it (replacing the linux64 version, which is useless
> on Mac), and try again.

Ok, so I did install the macosx64-clang from the toolchain, moved it to the mozbuild directory and added it to my mozconfig.
Then, building it (with clobber and configure steps) give me the same errors as the original report: quota lib issues.

So it looks like the issue may not be local (also I'm not 100% sure, but as I only do, and did, mozilla builds on my computer, I think my system clang has been installed by mozilla's bootstraps ...)
Flags: needinfo?(nalexander)
(In reply to Pierre Bertran from comment #10)
> (In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018)
> from comment #9)
> > (In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018)
> > from comment #7)
> > > (In reply to Pierre Bertran from comment #6)
> > > > After speaking with rnewman, and some more testing, I can precise that this
> > > > issue is only when building for x86 target (from mac). No issues with arm
> > > > builds
> > > 
> > > I think the issue is that your local clang and system libraries are doing
> > > things we don't anticipate.  Could you try installing clang from Mozilla's
> > > automation?  Something like `./mach artifact toolchain --from-build
> > > linux64-clang`, then move the clang directory into ~/.mozbuild, then update
> > > the HOST_* variables in your mozconfig to point to that clang.  If you still
> > > see problems, we have a moz.configure problem we need to solve more
> > > generally.
> > 
> > Sorry, this should not have been linux64-clang, it should have been
> > 
> > ./mach artifact toolchain -v --from-build macosx64-clang
> > 
> > So download that, move it (replacing the linux64 version, which is useless
> > on Mac), and try again.
> 
> Ok, so I did install the macosx64-clang from the toolchain, moved it to the
> mozbuild directory and added it to my mozconfig.
> Then, building it (with clobber and configure steps) give me the same errors
> as the original report: quota lib issues.
> 
> So it looks like the issue may not be local (also I'm not 100% sure, but as
> I only do, and did, mozilla builds on my computer, I think my system clang
> has been installed by mozilla's bootstraps ...)

Huh.  Can I get full logs of the clobber, configure, and build steps, please?  We must be detecting some unexpected header/package somehow.
Flags: needinfo?(nalexander)
(In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018) from comment #11)
> (In reply to Pierre Bertran from comment #10)
> > (In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018)
> > from comment #9)
> > > (In reply to Nick Alexander :nalexander (less responsive until Jan 3, 2018)
> > > from comment #7)
> > > > (In reply to Pierre Bertran from comment #6)
> > > > > After speaking with rnewman, and some more testing, I can precise that this
> > > > > issue is only when building for x86 target (from mac). No issues with arm
> > > > > builds
> > > > 
> > > > I think the issue is that your local clang and system libraries are doing
> > > > things we don't anticipate.  Could you try installing clang from Mozilla's
> > > > automation?  Something like `./mach artifact toolchain --from-build
> > > > linux64-clang`, then move the clang directory into ~/.mozbuild, then update
> > > > the HOST_* variables in your mozconfig to point to that clang.  If you still
> > > > see problems, we have a moz.configure problem we need to solve more
> > > > generally.
> > > 
> > > Sorry, this should not have been linux64-clang, it should have been
> > > 
> > > ./mach artifact toolchain -v --from-build macosx64-clang
> > > 
> > > So download that, move it (replacing the linux64 version, which is useless
> > > on Mac), and try again.
> > 
> > Ok, so I did install the macosx64-clang from the toolchain, moved it to the
> > mozbuild directory and added it to my mozconfig.
> > Then, building it (with clobber and configure steps) give me the same errors
> > as the original report: quota lib issues.
> > 
> > So it looks like the issue may not be local (also I'm not 100% sure, but as
> > I only do, and did, mozilla builds on my computer, I think my system clang
> > has been installed by mozilla's bootstraps ...)
> 
> Huh.  Can I get full logs of the clobber, configure, and build steps,
> please?  We must be detecting some unexpected header/package somehow.

I don't have any output for clobber.
Here is the configure output: https://pastebin.mozilla.org/9075634
And here is the build: https://pastebin.com/JDPQAdwH
Ah, I think I see (part of) the problem.  nsLocalFileUnix.cpp says:

#if defined(HAVE_SYS_QUOTA_H) && defined(HAVE_LINUX_QUOTA_H)
#define USE_LINUX_QUOTACTL
#include <sys/mount.h>
#include <sys/quota.h>
#include <sys/sysmacros.h>
#ifndef BLOCK_SIZE
#define BLOCK_SIZE 1024 /* kernel block size */
#endif
#endif

On ARM and x86 builds (at least on my local Linux machine), HAVE_SYS_QUOTA_H is not defined.  I'd assume the same is true on Mac hosts for ARM builds.  But, somehow, x86 Android on Mac is picking up that <sys/quota.h> exists, and I'd bet that it's picking up the *host* <sys/quota.h>:

 6:06.56 /Users/pierre/workspace/tests/rejects-android-mozilla-beta/mozilla-beta/xpcom/io/nsLocalFileUnix.cpp:1386:8: error: no matching function for call to 'quotactl'
 6:06.56   if (!quotactl(QCMD(Q_GETQUOTA, USRQUOTA), deviceName.get(),
 6:06.56        ^~~~~~~~
 6:06.56 /usr/include/sys/quota.h:224:5: note: candidate function not viable: no known conversion from 'int' to 'const char *' for 1st argument

because the only <sys/quota.h> for Android exists in a place that I don't think we're including in our sysroot:

froydnj@hawkeye:~/src/gecko-dev.git$ find ~/.mozbuild/ -name quota.h
...bunch of android-$N/arch-$FOO/.../linux/quota.h...
/home/froydnj/.mozbuild/android-ndk-r15c/sysroot/usr/include/sys/quota.h
/home/froydnj/.mozbuild/android-ndk-r15c/sysroot/usr/include/linux/quota.h

which is stupid.

So why are we picking up that file for x86 builds, but not for ARM builds?  If I do:

echo '#include <sys/quota.h>' | ~/.mozbuild/clang/bin/clang --target=i386-linux-android -std=gnu99 -x c - -E -o - -v -Qunused-arguments -isystem /home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/include -gcc-toolchain /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/linux-x86_64  

(same thing happens when using the NDK's clang) I do indeed pick up the <sys/quota.h> from my /usr/include.  But I see that we are saved by errors (from config.log) when we find this during configure:

DEBUG: Creating `/tmp/conftest.tFuiLj.cpp` with content:
DEBUG: | #include <sys/quota.h>
DEBUG: | int
DEBUG: | main(void)
DEBUG: | {
DEBUG: | 
DEBUG: |   ;
DEBUG: |   return 0;
DEBUG: | }
DEBUG: Executing: `/home/froydnj/.mozbuild/android-ndk-r15c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ -std=gnu++11 --target=i386-linux-android -isystem /home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/include -gcc-toolchain /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/linux-x86_64 -c /tmp/conftest.tFuiLj.cpp`
DEBUG: The command returned non-zero exit status 1.
DEBUG: Its error output was:
DEBUG: | In file included from /tmp/conftest.tFuiLj.cpp:1:
DEBUG: | /usr/include/i386-linux-gnu/sys/quota.h:221:24: error: expected function body after function declarator
DEBUG: |                      caddr_t __addr) __THROW;
DEBUG: |                                      ^
DEBUG: | 1 error generated.

and presumably we're *not* saved from similar errors on OS X--though I'm still not sure how our ARM builds survive.

So we're seeing the effects of clang's stupid header search paths, e.g. for the above, clang says:

#include <...> search starts here:
 /home/froydnj/.mozbuild/android-ndk-r15c/platforms/android-9/arch-x86/usr/include
 /usr/local/include
 /home/froydnj/.mozbuild/android-ndk-r15c/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/5.0.300080/include
 /usr/include/i386-linux-gnu
 /usr/include

which is *completely bogus* for the cross-compile case to include /usr/* directories.

I guess we could try to fix by putting $NDK/sysroot in our include paths somehow?  (Whether that should actually *be* the -isystem=... directory or just -I... is something I'm not quite sure of yet.)
Aha, ARM builds on Mac get:

INFO: checking for sys/quota.h... 
DEBUG: Creating `/var/folders/d0/fwhpqq711mb_t2dlxnvv08t00000gn/T/conftest.yb3yVx.cpp` with content:
DEBUG: | #include <sys/quota.h>
DEBUG: | int
DEBUG: | main(void)
DEBUG: | {
DEBUG: | 
DEBUG: |   ;
DEBUG: |   return 0;
DEBUG: | }
DEBUG: Executing: `/usr/local/bin/ccache /Users/rbarker/.mozbuild/android-ndk-r15c/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ -std=gnu++14 --target=arm-linux-androideabi -isystem /Users/rbarker/.mozbuild/android-ndk-r15c/platforms/android-9/arch-arm/usr/include -gcc-toolchain /Users/rbarker/.mozbuild/android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64 -I/Users/rbarker/.mozbuild/android-ndk-r15c/sources/cxx-stl/llvm-libc++/include -I/Users/rbarker/.mozbuild/android-ndk-r15c/sources/android/support/include -I/Users/rbarker/.mozbuild/android-ndk-r15c/sources/cxx-stl/llvm-libc++abi/include -c /var/folders/d0/fwhpqq711mb_t2dlxnvv08t00000gn/T/conftest.yb3yVx.cpp`
DEBUG: The command returned non-zero exit status 1.
DEBUG: Its error output was:
DEBUG: | In file included from /var/folders/d0/fwhpqq711mb_t2dlxnvv08t00000gn/T/conftest.yb3yVx.cpp:1:
DEBUG: | In file included from /usr/include/sys/quota.h:74:
DEBUG: | In file included from /usr/include/mach/boolean.h:73:
DEBUG: | /usr/include/mach/machine/boolean.h:35:2: error: architecture not supported
DEBUG: | #error architecture not supported
DEBUG: |  ^
DEBUG: | 1 error generated.

So that explains that part.  Now, what should we do to fix it?
Can you try the patches in bug 1428182 and see if those fix anything?

I tried moving some things around in the build system and ran into a horrible mess.  I think if we switched to using standalone toolchains from the NDK, that would fix things.  Our header setup is not the same as the NDK's standalone toolchain, and that's definitely part of the problem here.  But it's possible the patches in bug 1428182 would have a similar effect.
Flags: needinfo?(p.bertran)
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Can you try the patches in bug 1428182 and see if those fix anything?
> 
> I tried moving some things around in the build system and ran into a
> horrible mess.  I think if we switched to using standalone toolchains from
> the NDK, that would fix things.  Our header setup is not the same as the
> NDK's standalone toolchain, and that's definitely part of the problem here. 
> But it's possible the patches in bug 1428182 would have a similar effect.

So I added the 11 patches to the beta repository, clobber, configure and build. 
The original issue with quota lib seems gone, but I'm now with another error with the linker.

Here is full the build output: https://pastebin.com/Y8jqPJzM
And a shorter (without clobber): https://pastebin.com/DgfV5i0j
Flags: needinfo?(p.bertran)
Flags: needinfo?(nfroyd)
Well, that's certainly much further along than before!  No thanks to a completely unhelpful error message, though...I can't think offhand of what would cause read-only relocations in the text section.  Usually linker errors means that we've forgotten to wrap a system header, but I think that comes with a different error message...

jchen: do you have access to a Mac?  can you try building Android/x86 on a Mac with your patches from bug 1428182 and see if you can diagnose what's going on?
Flags: needinfo?(nfroyd) → needinfo?(nchen)
I don't have a Mac unfortunately. Maybe try NDK r16b?
Flags: needinfo?(nchen)
This, at least, seems to be addressed as of https://mail.mozilla.org/pipermail/mobile-firefox-dev/2018-October/002414.html.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Product: Firefox for Android → Firefox Build System
Version: Firefox 58 → 58 Branch
You need to log in before you can comment on or make changes to this bug.