Closed Bug 1512541 Opened 9 months ago Closed 7 months ago

linking rkv fails on Linux in CI due to "mdb.c:9303: undefined reference to `memalign'"

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: myk, Assigned: glandium)

References

Details

Attachments

(1 file)

Building with the patch in bug 1490496 fails to link on Linux in CI due to "third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303: undefined reference to `memalign'", f.e. in <https://taskcluster-artifacts.net/dLYjZKdMSpujKDyMA-xivw/0/public/logs/live_backing.log>:

> INFO -  make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/toolkit/library'
> INFO -  /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang++ --target=i686-linux-gnu -Qunused-arguments -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Qunused-arguments -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -march=pentium-m -msse -msse2 -mfpmath=sse -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -fno-omit-frame-pointer -funwind-tables -Werror  -fPIC -shared -Wl,-z,defs -Wl,--gc-sections -Wl,-h,libxul.so -o libxul.so /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul_so.list   -lpthread -Wl,-z,noexecstack -Wl,-z,text -Wl,-z,relro -Wl,-z,nocopyreloc -Wl,-Bsymbolic-functions -Wl,--build-id=sha1 /builds/worker/workspace/build/src/toolkit/library/StaticXULComponents.ld -Wl,-rpath-link,/builds/worker/workspace/build/src/obj-firefox/dist/bin -Wl,-rpath-link,/usr/local/lib   ../../security/nss/lib/crmf/crmf_crmf/libcrmf.a ../../js/src/build/libjs_static.a /builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/libgkrust.a ../../security/sandbox/linux/libmozsandbox.so ../../config/external/nspr/pr/libnspr4.so ../../config/external/nspr/libc/libplc4.so ../../config/external/nspr/ds/libplds4.so ../../config/external/lgpllibs/liblgpllibs.so ../../security/nss/lib/nss/nss_nss3/libnss3.so ../../security/nss/lib/util/util_nssutil3/libnssutil3.so ../../security/nss/lib/smime/smime_smime3/libsmime3.so ../../config/external/sqlite/libmozsqlite3.so ../../security/nss/lib/ssl/ssl_ssl3/libssl3.so ../../widget/gtk/mozgtk/stub/libmozgtk_stub.so ../../widget/gtk/mozwayland/libmozwayland.so -Wl,--version-script,symverscript  -ldl  -lrt -lm -lX11 -lX11-xcb -lxcb -lXcomposite -lXcursor -lXdamage -lXext -lXfixes -lXi -lXrender -lpthread -ldl -lc -latomic -lfreetype -lfontconfig -Wl,--version-script,/builds/worker/workspace/build/src/build/unix/stdc++compat/hide_std.ld -ldbus-glib-1 -ldbus-1 -lgobject-2.0 -lglib-2.0 -latk-1.0 -lpangocairo-1.0 -lgdk_pixbuf-2.0 -lcairo-gobject -lpango-1.0 -lcairo -lgio-2.0 -lxcb-shm -lpangoft2-1.0 -lXt -lgthread-2.0
> INFO -  /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/libgkrust.a(mdb.o): in function `mdb_env_copyfd1':
> INFO -  /builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303: undefined reference to `memalign'
> INFO -  clang-7: error: linker command failed with exit code 1 (use -v to see invocation)

While building, memalign on that line generates warnings but not errors:

>    INFO -       Running `/builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/rustc/bin/rustc --crate-name crossbeam_deque /builds/worker/workspace/build/src/third_party/rust/crossbeam-deque-0.2.0/src/lib.rs --color never --crate-type lib --emit=dep-info,link -C opt-level=2 -C panic=abort -C codegen-units=1 -C metadata=1306a06ea3184152 -C extra-filename=-1306a06ea3184152 --out-dir /builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/deps --target i686-unknown-linux-gnu -C linker=/builds/worker/workspace/build/src/build/cargo-linker -L dependency=/builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/deps -L dependency=/builds/worker/workspace/build/src/obj-firefox/release/deps --extern crossbeam_epoch=/builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/deps/libcrossbeam_epoch-68705feaadc56cd7.rlib --extern crossbeam_utils=/builds/worker/workspace/build/src/obj-firefox/i686-unknown-linux-gnu/release/deps/libcrossbeam_utils-78b6a317a9c75716.rlib --cap-lints warn -C opt-level=2 -C debuginfo=2`
> WARNING -  cargo:warning=/builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303:18: warning: implicit declaration of function 'memalign' is invalid in C99 [-Wimplicit-function-declaration]
>    INFO -  cargo:warning=        my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
>    INFO -  cargo:warning=                        ^
> WARNING -  cargo:warning=/builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303:16: warning: incompatible integer to pointer conversion assigning to 'char *' from 'int' [-Wint-conversion]
>    INFO -  cargo:warning=        my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
>    INFO -  cargo:warning=                      ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    INFO -  cargo:warning=2 warnings generated.

I haven't been able to reproduce locally.
Looks like it has to do with feature test macros, liblmdb follows _GNU_SOURCE, while `memalign` requires "_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600" accrording to https://linux.die.net/man/3/memalign.

Also, the man page says this function is obsolete to favor posix_memalign. Perhaps we should report this to the upstream.
Any chance this failure actually took place on Android?

Looking into the source code of liblmdb, this line [1] seems suspicious to me

From [2], memalign seems obsolete since Android 4.0

[1]: https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L147
[2]: https://codereview.chromium.org/11364047/diff/1004/src/shared/platform/posix/aligned_malloc.c
Hmm, no, this was a Linux for desktop build.  The worker type is gecko-1-b-linux, and the Android dependencies aren't available on it:

> INFO -  js/src> checking for android platform directory... no
> INFO -  js/src> checking for android sysroot directory... no
> INFO -  js/src> checking for android system directory... no
> …
> INFO -  js/src> checking for android ndk version... no

Here's the tryserver push where I noticed the problem:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b78d6a6ba29e673afb929896127142cac987cfa&selectedJob=214744483

Note that most of the Linux builds failed on that push.  I haven't checked them all, but at least one other Linux build failed for the same reason.  My local build on an Ubuntu 18.04 VM succeeds, however.  And some of the Linux builds on that push succeeded, such as the Btup build and the asan builds.
Product: Toolkit → Firefox Build System
Blocks: 1515094

Here's a recent tryserver run showing the problem:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2bf1f1e907ae1ac3ad1341ff1de660fc4739bab&selectedJob=223405469

For example, in https://taskcluster-artifacts.net/MNJCIy2YTTSA7jKRb0NC0g/0/public/logs/live_backing.log, the build fails with these errors:

INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/obj-firefox/x86_64-unknown-linux-gnu/debug/libgkrust.a(mdb.o): in function `mdb_env_copyfd1':
INFO - /builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/obj-firefox/x86_64-unknown-linux-gnu/debug/libgkrust.a(gkrust_shared-4eaa280e53d004b9.gkrust_shared.9pegd2vp-cgu.13.rcgu.o): in function `<gkrust_shared::moz_memory::GeckoAlloc as core::alloc::GlobalAlloc>::alloc':
INFO - /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: final link failed: nonrepresentable section on output

In https://taskcluster-artifacts.net/FVh7hodtRRicT5Uc1pLf1Q/0/public/logs/live_backing.log, the build complains about "DWARF error: mangled line number section (bad file number)" before failing:

INFO - /builds/worker/workspace/build/src/binutils/bin/ld: DWARF error: mangled line number section (bad file number)

Interestingly, failure doesn't only occur on mdb.c:9303 but also on toolkit/library/rust/shared/lib.rs:307 in those two previous builds and on jsutil.cpp:123 in https://taskcluster-artifacts.net/eF4N16l3QvGGmkexdUUt2w/0/public/logs/live_backing.log:

INFO - /builds/worker/workspace/build/src/binutils/bin/ld: ../../js/src/build/libjs_static.a(jsutil.i_o): in function `js::AllTheNonBasicVanillaNewAllocations()':
INFO - /builds/worker/workspace/build/src/js/src/jsutil.cpp:123: undefined reference to `memalign'

All of these builds have both `HAVE_MEMALIGN` and `HAVE_POSIX_MEMALIGN` defined. They all also warn:

WARNING - [lmdb-sys 0.8.0] cargo:warning=/builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303:18: warning: implicit declaration of function 'memalign' is invalid in C99 [-Wimplicit-function-declaration]
INFO - [lmdb-sys 0.8.0] cargo:warning= my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
INFO - [lmdb-sys 0.8.0] cargo:warning= ^

But I don't know if that's related (nor how, if so). And I still haven't been able to reproduce locally.

Ted: do you have any suggestions for how I might go about reproducing and debugging this issue?

Flags: needinfo?(ted)

mdb.c defines HAVE_MEMALIGN and includes malloc.h only on Sun and Android:

#if defined(__sun) || defined(ANDROID)
/* Most platforms have posix_memalign, older may only have memalign */
#define HAVE_MEMALIGN	1
#include <malloc.h>
/* On Solaris, we need the POSIX sigwait function */
#if defined (__sun)
# define _POSIX_PTHREAD_SEMANTICS	1
#endif
#endif

(https://github.com/LMDB/lmdb/blob/LMDB_0.9.23/libraries/liblmdb/mdb.c#L112-L120)

It then calls memalign() if HAVE_MEMALIGN is defined:

#ifdef HAVE_MEMALIGN
	my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
	if (my.mc_wbuf[0] == NULL) {
		rc = errno;
		goto done;
	}
#else
	{
		void *p;
		if ((rc = posix_memalign(&p, env->me_os_psize, MDB_WBUF*2)) != 0)
			goto done;
		my.mc_wbuf[0] = p;
	}
#endif

(https://github.com/LMDB/lmdb/blob/LMDB_0.9.23/libraries/liblmdb/mdb.c#L9339-L9352)

In this case, we aren't on Sun nor Android, so malloc.h isn't included, and HAVE_MEMALIGN isn't defined by mdb.c. But HAVE_MEMALIGN has already been defined by the Firefox build system, so mdb.c ends up calling memalign(), despite not having included malloc.h, which triggers the implicit-function-declaration warning and perhaps also causes the linker issue.

I still don't know why the problem doesn't reproduce locally, where the build system also sets HAVE_MEMALIGN. But perhaps a workaround (which doesn't require changing LMDB nor Firefox) is for the lmdb-rkv build.rs script to unset HAVE_MEMALIGN on systems that HAVE_POSIX_MEMALIGN. I'll tryserver that.

(In reply to Myk Melez [:myk] [@mykmelez] from comment #5)

I still don't know why the problem doesn't reproduce locally, where the build system also sets HAVE_MEMALIGN. But perhaps a workaround (which doesn't require changing LMDB nor Firefox) is for the lmdb-rkv build.rs script to unset HAVE_MEMALIGN on systems that HAVE_POSIX_MEMALIGN. I'll tryserver that.

Hmm, no this won't work, because HAVE_MEMALIGN isn't defined in the environment, it's defined in mozilla-config.h, which gets included via CFLAGS; and HAVE_MEMALIGN can't be subsequently undefined in the build.rs script, as far as I can tell.

Over in https://github.com/mozilla/lmdb-rs/pull/28, I work around the problem by undefining HAVE_MEMALIGN via a header file that the lmdb-sys build.rs script includes.

Ted: I'm still interested in your feedback on that approach, either here in this bug or there in that issue.

Note that with the stack of patches from bug 1522609 and bug 1522614, mozilla-config.h should end up force-included in mdb.c, although that wouldn't happen for e.g. asan builds, but we could change that.

err, but that was already the case and that's what the problem was.

Come to think of it, it would probably be better if we didn't force-include mozilla-config.h in rust-built C/C++ code.

Assignee: nobody → mh+mozilla

(In reply to Mike Hommey [:glandium] from comment #12)

Come to think of it, it would probably be better if we didn't force-include mozilla-config.h in rust-built C/C++ code.

Yeah, this sounds like the most sensible thing. Our autoconf defines are pretty specific to our needs.

Flags: needinfo?(ted)

Funny thing, fixing this breaks Android with:

third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:4853:53: error: use of undeclared identifier 'PTHREAD_MUTEX_ROBUST'

which is because mdb.c relies on the ANDROID macro, which is not something the compiler sets, __ANDROID__ is (and it works now because we do set ANDROID from mozilla-config.h). Interestingly, this was fixed upstream in http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=0a954f1a67410dceb0fafe202bd6c2f2f409cb4a but didn't make it into a release.

Depends on: 1525219
Depends on: 1522609

(In reply to Mike Hommey [:glandium] from comment #15)

Interestingly, this was fixed upstream in http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=0a954f1a67410dceb0fafe202bd6c2f2f409cb4a but didn't make it into a release.

FWIW, according to the commit log for the mdb.master branch (https://github.com/LMDB/lmdb/commits/mdb.master), which is where version 1.0 is being developed, it also didn't make it onto that branch until January 17 of this year, even though it was committed on April 12, 2017. I suspect that it was cherry-picked from some other branch, but I don't know which one that would be, since it clearly isn't the stable branch.

Other fixes have landed on mdb.master but not on the mdb.RE/0.9 branch (https://github.com/LMDB/lmdb/commits/mdb.RE/0.9), i.e. the current stable 0.9 release, which is what we're using in rkv. For example, https://github.com/mozilla/lmdb-rs/pull/23 requests switching to mdb.master to fix an issue with the initial size of database files on Windows.

(I don't intend to switch to mdb.master, which is an unstable development branch; but I'm open to backporting patches from it, and I expect we'll find more that we'll want to backport.)

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/d408a5f9a9ea
Don't force-include mozilla-config.h in rust-built C/C++ code. r=mshal
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.