Linking error on libfreebl3.so in Fennec x86 build (read-only segment has dynamic relocations failure linking)

RESOLVED FIXED in Firefox 64

Status

()

defect
RESOLVED FIXED
9 months ago
7 months ago

People

(Reporter: edgar, Assigned: nalexander)

Tracking

unspecified
Firefox 64
Points:
---

Firefox Tracking Flags

(firefox64 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

9 months ago
Gecko revision: 0c947d96e8f3
NDS version: android-ndk-r16b (I have tried android-ndk-r15c, but got same error)

I got the following error,

...
 0:26.35 js/src/gdb
 0:26.38 js/src/jsapi-tests
 0:26.39 libfreebl3.so
 0:26.56 /Users/edgarchen/.mozbuild/android-ndk-r16b/toolchains/x86-4.9/prebuilt/darwin-x86_64/lib/gcc/i686-linux-android/4.9.x/../../../../i686-linux-android/bin/ld: error: read-only segment has dynamic relocations
 0:26.58 clang: error: linker command failed with exit code 1 (use -v to see invocation)
 0:26.58 make[4]: *** [libfreebl3.so] Error 1
 0:26.58 make[3]: *** [security/nss/lib/freebl/freebl_freebl3/target] Error 2
 0:26.58 make[3]: *** Waiting for unfinished jobs....
 0:32.17 gdb-tests
 0:45.83 jsapi-tests
 0:49.66 make[2]: *** [compile] Error 2
 0:49.67 make[1]: *** [default] Error 2
 0:49.67 make: *** [build] Error 2
 0:49.72 280 compiler warnings present.
Hm I don't think we have seen this error. Have you tried NDK r17b or r18 beta?
Reporter

Comment 2

9 months ago
I saw this with NDK r17b as well. Haven't yet tried r18 beta, trying now.
This is a dupe of https://bugzilla.mozilla.org/show_bug.cgi?id=1449356, AFAICT.  I really don't have any idea what's going on; I think in #build we concluded this was a compiler bug.
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1449356
nfroyd: we're seeing this a lot trying to build target Android x86 on host macOS.  Does the solution from FreeBSD (!) at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230239 sound like a sensible thing to do, perhaps only if we're using `--enable-linker=lld`?
Flags: needinfo?(nfroyd)
Relevant error log is:

22:03.02 security/nss/lib/freebl/libfreebl3.so
22:03.02 /Users/k/.mozbuild/macosx64-clang/bin/ld.lld: error: can't create dynamic relocation R_386_PC32 against symbol: s_mpi_is_sse2 in readonly segment; recompile object files with -fPIC
22:03.02 >>> defined in mpcpucache.o
22:03.02 >>> referenced by mpi_x86.s:71
22:03.02 >>>               mpi_x86.o:(.text+0xE)
22:03.02 /Users/k/.mozbuild/macosx64-clang/bin/ld.lld: error: can't create dynamic relocation R_386_PC32 against symbol: s_mpi_is_sse2 in readonly segment; recompile object files with -fPIC
22:03.05 >>> defined in mpcpucache.o
22:03.05 >>> referenced by mpi_x86.s:170
22:03.05 >>>               mpi_x86.o:(.text+0x9F)
22:03.05 /Users/k/.mozbuild/macosx64-clang/bin/ld.lld: error: can't create dynamic relocation R_386_PC32 against symbol: s_mpi_is_sse2 in readonly segment; recompile object files with -fPIC
22:03.05 >>> defined in mpcpucache.o
22:03.05 >>> referenced by mpi_x86.s:267
22:03.05 >>>               mpi_x86.o:(.text+0x13D)
22:03.05 /Users/k/.mozbuild/macosx64-clang/bin/ld.lld: error: can't create dynamic relocation R_386_PC32 against symbol: s_mpi_is_sse2 in readonly segment; recompile object files with -fPIC
22:03.05 >>> defined in mpcpucache.o
22:03.06 >>> referenced by mpi_x86.s:394
22:03.06 >>>               mpi_x86.o:(.text+0x203)
22:03.06 clang-6.0: error: linker command failed with exit code 1 (use -v to see invocation)
22:03.06 make[4]: *** [libfreebl3.so] Error 1
22:03.06 make[3]: *** [security/nss/lib/freebl/freebl_freebl3/target] Error 2

(That's with lld, hence the informative error messages that we didn't see with GNU ld and/or gold.)
Wow... also: https://searchfox.org/mozilla-central/rev/65f9687eb192f8317b4e02b0b791932eff6237cc/Makefile.in#320-325 -- this issue has _history_.  So that suggests that we have a badly configured toolchain somehow that's getting a system header that it should, that I don't see locally but (most?) others do.
Comparing my config.status against a failing config.status, I see the following changes that seem relevant:

-    'DSO_LDOPTS': '-shared -Wl,-z,defs',
+    'DSO_LDOPTS': '-shared',

-    'GCC_USE_GNU_LD': '1',
+    'GCC_USE_GNU_LD': '',

-    'LLVM_CONFIG': '/usr/local/opt/llvm/bin/llvm-config',
+    'LLVM_CONFIG': '/opt/local/bin/llvm-config-mp-4.0',

-    'MOZ_CLANG_PATH': '/usr/local/Cellar/llvm/5.0.1/bin/clang',
+    'MOZ_CLANG_PATH': '/opt/local/libexec/llvm-4.0/bin/clang',

-    'MOZ_LIBCLANG_PATH': b'/usr/local/Cellar/llvm/5.0.1/lib',
+    'MOZ_LIBCLANG_PATH': b'/opt/local/libexec/llvm-4.0/lib',

I don't understand what DSO_LDOPTS controls.  But what I think this all means is that /usr/bin/clang (which is HOST_CC) is different for me and for people seeing this.  Although maybe all those MOZ_* clang things are different again? I get:

$ /usr/bin/clang --version
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

and

$ /usr/local/Cellar/llvm/5.0.1/bin/clang --version
clang version 5.0.1 (tags/RELEASE_501/final)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /usr/local/Cellar/llvm/5.0.1/bin

lina: could you figure out what the versions of clang in the snippets listed above are for you?

froyd: can we override all of these using HOST_CC/HOST_CXX?
Flags: needinfo?(lina)
Oh, sure enough, mine are different!

> $ /usr/bin/clang --version
> Apple LLVM version 10.0.0 (clang-1000.11.45.2)
> Target: x86_64-apple-darwin17.7.0
> Thread model: posix
> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

> $ /opt/local/libexec/llvm-4.0/bin/clang --version
> clang version 4.0.1 (tags/RELEASE_401/final)
> Target: x86_64-apple-darwin17.7.0
> Thread model: posix
> InstalledDir: /opt/local/libexec/llvm-4.0/bin

The one I'm using now, that produced the log in comment 5:

> $ /Users/k/.mozbuild/macosx64-clang/bin/clang --version
> clang version 6.0.1 (tags/RELEASE_601/final) (llvm/tags/RELEASE_601/final 335538)
> Target: x86_64-apple-darwin17.7.0
> Thread model: posix
> InstalledDir: /Users/k/.mozbuild/macosx64-clang/bin

I can try upgrading the clang in `/opt/local` tomorrow, and seeing if that helps; that's the one installed by MacPorts. I'm surprised it shows up at all, though; shouldn't it be using the one I downloaded into `$HOME/.mozbuild/macosx64-clang` instead?
Flags: needinfo?(lina)
For folks whose builds work and whose builds don't, what is your CC (from config.status) and what do you get executing:

$CC -print-prog-name=ld

and then, what do you get for:

`$CC -print-prog-name=ld` -v

(the backticks are important there).

I swear we've looked at this on #build and figured out that:

1. Android builds on Macs fail.
2. Android builds on Linux work.
3. Android builds on Mac, but with the freebl objects from Linux, work.
4. Android builds on Linux, but with the freebl objects from Mac, work.

which strongly suggested that the Mac linker was somehow busted, but if some people can get Android builds on Mac to work, maybe there's something very specific going wrong.

We *could* make things work by using the FreeBSD fix, but that is a hack, and if we *really* had to, it would only be available on developer, not release, builds.  (You don't want relocations in your .text section/segment in general, because that requires your shared library's .text section/segment to be writable, which makes it both not shareable between multiple processes and read-write-execute code segments are a Bad Thing.  Android might even yell at you nowadays if you try to load objects with read-write-execute code segments.)
Flags: needinfo?(nfroyd)
Thanks for looking into this, Nathan! From config.status:

> 'CC': '/Users/k/.mozbuild/macosx64-clang/bin/clang -std=gnu99 --target=i686-linux-android'

Without backticks:

> $ /Users/k/.mozbuild/macosx64-clang/bin/clang -std=gnu99 --target=i686-linux-android -print-prog-name=ld -v
> clang version 6.0.1 (tags/RELEASE_601/final) (llvm/tags/RELEASE_601/final 335538)
> Target: i686--linux-android
> Thread model: posix
> InstalledDir: /Users/k/.mozbuild/macosx64-clang/bin
> Found candidate GCC installation: /Users/k/.mozbuild/macosx64-clang/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4
> Selected GCC installation: /Users/k/.mozbuild/macosx64-clang/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4
> Candidate multilib: .;@m64
> Candidate multilib: 32;@m32
> Selected multilib: 32;@m32
> /opt/local/bin/ld

With backticks:

> $ `/Users/k/.mozbuild/macosx64-clang/bin/clang -std=gnu99 --target=i686-linux-android -print-prog-name=ld` -v
> @(#)PROGRAM:ld  PROJECT:ld64-274.2
> configured to support archs: i386 x86_64 x86_64h armv6 armv7 armv7s armv7m armv7k arm64 (tvOS)
> LTO support using: LLVM version 3.9.1

> 1. Android builds on Macs fail.
> 2. Android builds on Linux work.

This matches my experience, too; Android builds fine in an Ubuntu VM. :-)
(In reply to Nathan Froyd [:froydnj] from comment #9)
> For folks whose builds work and whose builds don't, what is your CC (from
> config.status) and what do you get executing:
> 
> $CC -print-prog-name=ld
> 
> and then, what do you get for:
> 
> `$CC -print-prog-name=ld` -v
> 
> (the backticks are important there).

My builds work, both with and without --enable-linker=lld.

bash-3.2$ export CC='/Users/nalexander/.mozbuild/macosx64-clang-6/bin/clang -std=gnu99 --target=i686-linux-android'
bash-3.2$ $CC -v
clang version 6.0.1 (tags/RELEASE_601/final) (llvm/tags/RELEASE_601/final 335538)
Target: i686--linux-android
Thread model: posix
InstalledDir: /Users/nalexander/.mozbuild/macosx64-clang-6/bin
Found candidate GCC installation: /Users/nalexander/.mozbuild/macosx64-clang-6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4
Selected GCC installation: /Users/nalexander/.mozbuild/macosx64-clang-6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: 32;@m32
clang-6.0: warning: argument unused during compilation: '-std=gnu99' [-Wunused-command-line-argument]

bash-3.2$ $CC -print-prog-name=ld
/Users/nalexander/.mozbuild/android-sdk-macosx/build-tools/23.0.3/i686-linux-android-ld
bash-3.2$ `$CC -print-prog-name=ld` -v
GNU gold (GNU Binutils 2.22.90.20120727) 1.11

bash-3.2$ `$CC -print-prog-name=ld.lld` -v
LLD 6.0.1 (tags/RELEASE_601/final 335538) (compatible with GNU linkers)
bash-3.2$ $CC -print-prog-name=ld.lld
/Users/nalexander/.mozbuild/macosx64-clang-6/bin/ld.lld

Is it possible that somehow gold is getting worked into the mix here in some way that avoids the issue for me?

> I swear we've looked at this on #build and figured out that:
> 
> 1. Android builds on Macs fail.
> 2. Android builds on Linux work.
> 3. Android builds on Mac, but with the freebl objects from Linux, work.
> 4. Android builds on Linux, but with the freebl objects from Mac, work.

This agrees with my memory.
 
> which strongly suggested that the Mac linker was somehow busted, but if some
> people can get Android builds on Mac to work, maybe there's something very
> specific going wrong.
> 
> We *could* make things work by using the FreeBSD fix, but that is a hack,
> and if we *really* had to, it would only be available on developer, not
> release, builds.  (You don't want relocations in your .text section/segment
> in general, because that requires your shared library's .text
> section/segment to be writable, which makes it both not shareable between
> multiple processes and read-write-execute code segments are a Bad Thing. 
> Android might even yell at you nowadays if you try to load objects with
> read-write-execute code segments.)

Thanks for this context.  And in fact, yes, Android should yell (post Android 23) about this, IIUC.

froyd: can you explain why GCC_USE_GNU_LD differs for me and for lina?
Flags: needinfo?(nfroyd)
To follow up: for Lina,

$ /Users/k/.mozbuild/macosx64-clang/bin/clang -std=gnu99 --target=i686-linux-android -print-prog-name=ld
/opt/local/bin/ld

$ /Users/k/.mozbuild/macosx64-clang/bin/clang -std=gnu99 --target=i686-linux-android -print-prog-name=ld.lld
/Users/k/.mozbuild/macosx64-clang/bin/ld.lld

So it seems that somehow we’re invoking `ld`, and for Lina that’s a (busted?) macOS linker, and for me that’s a (non-busted?) gold.
agi: could you provide the outputs in https://bugzilla.mozilla.org/show_bug.cgi?id=1489443#c9 for your busted macOS Android x86 objdir?  If you don't have the build, you should be able to just set up the mozconfig and then ./mach configure to get the details _without_ doing the actual build.
Flags: needinfo?(agi)
Nick, I think my setup is similar to Lina's

My config status has:

> $ cat obj-i686-linux-android/config.status | grep \'CC\'
>    'CC': '/Users/asferro/clang/clang+llvm-6.0.0-x86_64-apple-darwin/bin/clang -std=gnu99 --target=i686-linux-android',

This is the output of $CC --print-prog-name=ld

> /Users/asferro/clang/clang+llvm-6.0.0-x86_64-apple-darwin/bin/clang -std=gnu99 --target=i686-linux-android --print-prog-name=ld -v
> clang version 6.0.0 (tags/RELEASE_600/final)
> Target: i686--linux-android
> Thread model: posix
> InstalledDir: /Users/asferro/clang/clang+llvm-6.0.0-x86_64-apple-darwin/bin
> /usr/bin/ld

and this is the output of `$CC --print-prog-name=ld` -v

> `/Users/asferro/clang/clang+llvm-6.0.0-x86_64-apple-darwin/bin/clang -std=gnu99 --target=i686-linux-android --print-prog-name=ld` -v
> @(#)PROGRAM:ld  PROJECT:ld64-409.12
> BUILD 02:04:28 Aug 14 2018
> configured to support archs: armv6 armv7 armv7s arm64 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em arm64e arm64_32
> LTO support using: LLVM version 10.0.0, (clang-1000.11.45.2) (static support for 21, runtime is 21)
> TAPI support using: Apple TAPI version 10.0.0 (tapi-1000.11.8.2)

Thanks for looking at this!
Flags: needinfo?(agi)
So, what's really interesting is that I get GNU gold and everybody else gets some system linker.  GNU gold sets the GCC_USE_GNU_LD.  I think I get this 'cuz the build-tools are on my $PATH.  I'm trying a build with them not on my path and we'll see if I can reproduce the failure.
If I put my android sdk folder in the path

> export PATH=$PATH:$HOME/.mozbuild/android-sdk-macosx/build-tools/28.0.2/

I get the right linker!

> $CC --print-prog-name=ld
> /Users/asferro/.mozbuild/android-sdk-macosx/build-tools/28.0.2/i686-linux-android-ld
(In reply to Nick Alexander :nalexander [he/him] from comment #15)
> So, what's really interesting is that I get GNU gold and everybody else gets
> some system linker.  GNU gold sets the GCC_USE_GNU_LD.  I think I get this
> 'cuz the build-tools are on my $PATH.  I'm trying a build with them not on
> my path and we'll see if I can reproduce the failure.

I can reproduce the failure!  So the issue is definitely that in some way we're invoking a macOS system linker, which is not the specified linker, since I have --enable-linker=lld.  I notice that freebl is built from gyp definitions; is it possible that our conversion from gyp isn't doing the right thing vis-a-vis linker definitions?  I will try to figure out how to force the linker that our clang toolchain finds to be a more sensible option.
You may want to completely remove the garbage that we have for determining GCC_USE_GNU_LD and instead add the necessary logic to select_linker in toolchain.configure, which has a more complete picture of what toolchain we're using.
Flags: needinfo?(nfroyd)
(select_linker also adds in Android-specific flags to pick the correct tools--as/ld/etc.--which may address the situation of people not having things on their PATH)
The failing invocation is:

 0:07.07 security/nss/lib/freebl/libfreebl3.so
 0:07.07 rm -f libfreebl3.so
 0:07.07 /usr/local/Cellar/icecream/1.1/libexec/icecc/bin/clang -std=gnu99 --target=i686-linux-android -Qunused-arguments -isystem /Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/include -isystem /Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/include -gcc-toolchain /Users/nalexander/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/darwin-x86_64 -D__ANDROID_API__=16 -fno-short-enums -fno-exceptions -fno-strict-aliasing -fno-math-errno -pipe -g -Oz -fno-omit-frame-pointer -funwind-tables -Qunused-arguments -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wstring-conversion -Wtautological-overlap-compare -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments  -fPIC -shared -Wl,-h,libfreebl3.so -o libfreebl3.so /Users/nalexander/Mozilla/objdirs/objdir-droid-icecc3/security/nss/lib/freebl/freebl_freebl3/libfreebl3_so.list   -L/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/lib -Wl,-rpath-link=/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/lib --sysroot=/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86 -Wl,--allow-shlib-undefined -gcc-toolchain /Users/nalexander/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/darwin-x86_64 -fuse-ld=lld -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,--build-id=sha1 -Wl,--hash-style=sysv -Wl,-rpath-link,/Users/nalexander/Mozilla/objdirs/objdir-droid-icecc3/dist/bin -Wl,-rpath-link,/usr/local/lib     ../../../../libnss3.so

Just FYI, icecream doesn't seem to make any difference -- I can succeed with and without it, and I can fail with it -- so I've been using it, 'cuz it's so much faster.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The successful invocation is:

security/nss/lib/freebl/libfreebl3.so
rm -f libfreebl3.so
/usr/local/Cellar/icecream/1.1/libexec/icecc/bin/clang -std=gnu99 --target=i686-linux-android -Qunused-arguments -isystem /Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/include -isystem /Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/include -gcc-toolchain /Users/nalexander/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/darwin-x86_64 -D__ANDROID_API__=16 -fno-short-enums -fno-exceptions -fno-strict-aliasing -fno-math-errno -pipe -g -Oz -fno-omit-frame-pointer -funwind-tables -Qunused-arguments -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wstring-conversion -Wtautological-overlap-compare -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments  -fPIC -shared -Wl,-z,defs -Wl,-h,libfreebl3.so -o libfreebl3.so /Users/nalexander/Mozilla/objdirs/objdir-droid-icecc2/security/nss/lib/freebl/freebl_freebl3/libfreebl3_so.list   -L/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/lib -Wl,-rpath-link=/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86/usr/lib --sysroot=/Users/nalexander/.mozbuild/android-ndk-r15c/platforms/android-16/arch-x86 -Wl,--allow-shlib-undefined -gcc-toolchain /Users/nalexander/.mozbuild/android-ndk-r15c/toolchains/x86-4.9/prebuilt/darwin-x86_64 -fuse-ld=lld -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,--build-id=sha1 -Wl,--hash-style=sysv -Wl,-rpath-link,/Users/nalexander/Mozilla/objdirs/objdir-droid-icecc2/dist/bin -Wl,-rpath-link,/usr/local/lib     ../../../../libnss3.so -Wl,--version-script,out.freebl.def

The changes are an additional -Wl,-z,defs and -Wl,--version-script,out.freebl.def on success.

If I understand correctly -Wl,-z,defs just complains about undefined functions, so I doubt there's anything significant there.
The mpi_x86.s compilation invocation is identical.  The resulting mpi_x86.o files differ by a single byte: they have the objdir encoded in them, which has a single character difference.
This is a cheap way to address a simple problem that has a complicated
ideal solution.  The underlying issue is that in some situations, when
targeting Android, a macOS system `ld` is interrogated to determine if
a cross-compiling linker "is GNU ld" and a particular linker feature
is set in that situation.  The macOS system `ld` doesn't pass the "is
GNU ld", and the linker feature isn't set; that causes link failures,
even though the actual linker has nothing to do with the system `ld`.

The ideal solution is to test for linker capabilities dynamically.  We
do a lot of that in old-configure.in, and we don't do any of that in
toolchain.configure.  Rather than start testing in
toolchain.configure, we hard-code: a cheap solution to the immediate
problem.
Attachment #9016467 - Attachment description: Bug 1489443 - Use -Wl,--version-script based on linker kind. r?froydnj → Bug 1489443 - Set GCC_USE_GNU_LD based on linker kind.

Comment 24

8 months ago
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/91300d29898b
Set GCC_USE_GNU_LD based on linker kind. r=froydnj
I thought MinGW was tier 2.  I guess I said "back me out if that's the right call on IRC", but was it the right call?

In any case: tjr, can you explain what the correct fix here is for mingw?  This patch changes linkers that are "command line compatible with GNU ld (bfd)" to actually behave like that, even if they don't say "GNU" in their version information.  Clearly whatever is building MinGW has a linker.KIND that is GNU ld compatible, but is not in fact GNU ld compatible.
Flags: needinfo?(tom)

Comment 27

7 months ago
(In reply to Nick Alexander :nalexander [he/him] from comment #26)
> In any case: tjr, can you explain what the correct fix here is for mingw? 
> This patch changes linkers that are "command line compatible with GNU ld
> (bfd)" to actually behave like that, even if they don't say "GNU" in their
> version information.  Clearly whatever is building MinGW has a linker.KIND
> that is GNU ld compatible, but is not in fact GNU ld compatible.

It's the opposite: we are ld compatible, but this isn't detecting it as such.

We had several NSS patches that added mingw support to NSS, and they all altered the build scripts to detect MinGW as "OS == win and cc_use_gnu_ld == 1".  You can see comments to this effect in the gyp files: https://searchfox.org/mozilla-central/search?q=mingw&case=false&regexp=false&path=*.gyp

Your patch stops setting GCC_USE_GNU_LD for MinGW.

The problem is that select_linker is never run because is_linker_option_enabled is false.  (Because it detects the target kernel as WINNT.) Since select_linker is never run, we don't have a KIND, so we don't set GCC_USE_GNU_LD and we fail all over the place.

In theory the fix is simple: 

> -@depends(target)
> -def is_linker_option_enabled(target):
> +@depends(target, c_compiler)
> +def is_linker_option_enabled(target, compiler):
>      if target.kernel not in ('WINNT', 'SunOS'):
>          return True
> +    if target.kernel == 'WINNT' and compiler.type == 'clang':
> +        return True

In practice this doesn't work:

> /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/x86_64-w64-mingw32-clang -std=gnu99 -Wl,--version 
> lld-link: error: undefined symbol: WinMain
> >>> referenced by libmingw32.a(lib64_libmingw32_a-crt0_c.o):(main)
> clang-7: error: linker command failed with exit code 1 (use -v to see invocation)

But we can do something dirty....

> @@ -1768,6 +1770,12 @@ def select_linker(linker, c_compiler, de
>          if retcode == 1 and 'Logging ld64 options' in stderr:
>              kind = 'ld64'
>  
> +        # MinGW can't just print you out a version without giving it a lot of
> +        # extra stuff, but we can detect it and just fudge our way through:
> +        if retcode == 1 and 'referenced by libmingw32.a' in stderr and \
> +           'lld-link: error' in stderr:
> +             kind = 'lld'
> +


Here's a try run, hopefully it works: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc34ededc45166ecd8e409694d86c5491d4c06a7
Flags: needinfo?(tom)

Comment 28

7 months ago
(In reply to Tom Ritter [:tjr] from comment #27)
> Here's a try run, hopefully it works:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=cc34ededc45166ecd8e409694d86c5491d4c06a7

Try 2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc60af1c227ff591922802ed3b1294131b3ec008
The desired outcome of this change is that we'll set
`-Wl,--version-script` based on linker kind and not on the output of
`$LINKER -v`.

This is a cheap way to address a simple problem that has a complicated
ideal solution.  The underlying issue is that in some situations, when
targeting Android, a macOS system `ld` is interrogated to determine if
a cross-compiling linker "is GNU ld" and a particular linker feature
is set in that situation.  The macOS system `ld` doesn't pass the "is
GNU ld" test, and the linker feature isn't set; that causes link
failures, even though the actual linker has nothing to do with the
system `ld`.

The ideal solution is to test for linker capabilities dynamically.  We
do a lot of that in old-configure.in, and we don't do any of that in
toolchain.configure.  Rather than start testing in
toolchain.configure, we hard-code: a cheap solution to the immediate
problem.

MinGW suffers somewhat from the opposite problem: the linker "is GNU
ld" (compatible), but the checks aren't happening at all.  We enable
`select_linker` for MinGW to follow the flow here as well.
It would be better if you reopened D8471 and updated it, so that interdiff works.
Attachment #9017396 - Attachment is obsolete: true
Flags: needinfo?(nalexander)
(In reply to Mike Hommey [:glandium] from comment #30)
> It would be better if you reopened D8471 and updated it, so that interdiff
> works.

I have done this.  Mike, something demands `--help`, and it looks like actually making that happen infects many, many layers of meta-programming in `toolchain.configure`.  Can you provide guidance, preferably in the form of a patch that makes the configure lint checks pass?
Assignee: nobody → nalexander
Yeah... it's better to not have any indirect dependency on c_compiler in a `when`. Keeping that would mean configure --help would run the compiler checks... (which is why the --help dependencies are explicit: to make it apparent what bad things are going to happen if you do)

Your best bet is to change the definition of GCC_USE_GNU_LD itself to be select_linker.KIND in ('bfd', 'gold', 'lld') or (target.kernel == 'WINNT' and compiler.type == 'clang') or None
(In reply to Mike Hommey [:glandium] from comment #32)
> Yeah... it's better to not have any indirect dependency on c_compiler in a
> `when`. Keeping that would mean configure --help would run the compiler
> checks... (which is why the --help dependencies are explicit: to make it
> apparent what bad things are going to happen if you do)

I did not know this.  Thanks.

> Your best bet is to change the definition of GCC_USE_GNU_LD itself to be
> select_linker.KIND in ('bfd', 'gold', 'lld') or (target.kernel == 'WINNT'
> and compiler.type == 'clang') or None

I think that tjr explains why this doesn't work in https://bugzilla.mozilla.org/show_bug.cgi?id=1489443#c27: specifically, select_linker isn't invoked, for $REASONS.  Any other approach?
Flags: needinfo?(mh+mozilla)
I doesn't matter that select_linker is not invoked, since you'd be basing the choise on select_linker *or* target/compiler.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #34)
> I doesn't matter that select_linker is not invoked, since you'd be basing
> the choise on select_linker *or* target/compiler.

This works well enough to reland, I think: https://treeherder.mozilla.org/#/jobs?repo=try&revision=77ba5c8dfea5a928e9a636d2f145d2c819b35f35

Thanks Mike, thanks tjr.

Comment 36

7 months ago
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/63248a593eae
Set GCC_USE_GNU_LD based on linker kind. r=froydnj

Comment 37

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/63248a593eae
Status: REOPENED → RESOLVED
Last Resolved: 9 months ago7 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 64
You need to log in before you can comment on or make changes to this bug.