linking rkv fails on Linux in CI due to "mdb.c:9303: undefined reference to `memalign'"
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox67 fixed)
Tracking | Status | |
---|---|---|
firefox67 | --- | fixed |
People
(Reporter: myk, Assigned: glandium)
References
Details
Attachments
(1 file)
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 4•6 years ago
|
||
Here's a recent tryserver run showing the problem:
For example, in https://taskcluster-artifacts.net/MNJCIy2YTTSA7jKRb0NC0g/0/public/logs/live_backing.log, the build fails with these errors:
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/obj-firefox/x86_64-unknown-linux-gnu/debug/libgkrust.a(mdb.o): in function `mdb_env_copyfd1':
INFO - /builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/obj-firefox/x86_64-unknown-linux-gnu/debug/libgkrust.a(gkrust_shared-4eaa280e53d004b9.gkrust_shared.9pegd2vp-cgu.13.rcgu.o): in function `<gkrust_shared::moz_memory::GeckoAlloc as core::alloc::GlobalAlloc>::alloc':
INFO - /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: /builds/worker/workspace/build/src/toolkit/library/rust/shared/lib.rs:307: undefined reference to `memalign'
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: final link failed: nonrepresentable section on output
In https://taskcluster-artifacts.net/FVh7hodtRRicT5Uc1pLf1Q/0/public/logs/live_backing.log, the build complains about "DWARF error: mangled line number section (bad file number)" before failing:
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: DWARF error: mangled line number section (bad file number)
Interestingly, failure doesn't only occur on mdb.c:9303 but also on toolkit/library/rust/shared/lib.rs:307 in those two previous builds and on jsutil.cpp:123 in https://taskcluster-artifacts.net/eF4N16l3QvGGmkexdUUt2w/0/public/logs/live_backing.log:
INFO - /builds/worker/workspace/build/src/binutils/bin/ld: ../../js/src/build/libjs_static.a(jsutil.i_o): in function `js::AllTheNonBasicVanillaNewAllocations()':
INFO - /builds/worker/workspace/build/src/js/src/jsutil.cpp:123: undefined reference to `memalign'
All of these builds have both `HAVE_MEMALIGN` and `HAVE_POSIX_MEMALIGN` defined. They all also warn:
WARNING - [lmdb-sys 0.8.0] cargo:warning=/builds/worker/workspace/build/src/third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:9303:18: warning: implicit declaration of function 'memalign' is invalid in C99 [-Wimplicit-function-declaration]
INFO - [lmdb-sys 0.8.0] cargo:warning= my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
INFO - [lmdb-sys 0.8.0] cargo:warning= ^
But I don't know if that's related (nor how, if so). And I still haven't been able to reproduce locally.
Ted: do you have any suggestions for how I might go about reproducing and debugging this issue?
Reporter | ||
Comment 5•6 years ago
•
|
||
mdb.c defines HAVE_MEMALIGN and includes malloc.h only on Sun and Android:
#if defined(__sun) || defined(ANDROID)
/* Most platforms have posix_memalign, older may only have memalign */
#define HAVE_MEMALIGN 1
#include <malloc.h>
/* On Solaris, we need the POSIX sigwait function */
#if defined (__sun)
# define _POSIX_PTHREAD_SEMANTICS 1
#endif
#endif
(https://github.com/LMDB/lmdb/blob/LMDB_0.9.23/libraries/liblmdb/mdb.c#L112-L120)
It then calls memalign() if HAVE_MEMALIGN is defined:
#ifdef HAVE_MEMALIGN
my.mc_wbuf[0] = memalign(env->me_os_psize, MDB_WBUF*2);
if (my.mc_wbuf[0] == NULL) {
rc = errno;
goto done;
}
#else
{
void *p;
if ((rc = posix_memalign(&p, env->me_os_psize, MDB_WBUF*2)) != 0)
goto done;
my.mc_wbuf[0] = p;
}
#endif
(https://github.com/LMDB/lmdb/blob/LMDB_0.9.23/libraries/liblmdb/mdb.c#L9339-L9352)
In this case, we aren't on Sun nor Android, so malloc.h isn't included, and HAVE_MEMALIGN isn't defined by mdb.c. But HAVE_MEMALIGN has already been defined by the Firefox build system, so mdb.c ends up calling memalign(), despite not having included malloc.h, which triggers the implicit-function-declaration warning and perhaps also causes the linker issue.
I still don't know why the problem doesn't reproduce locally, where the build system also sets HAVE_MEMALIGN. But perhaps a workaround (which doesn't require changing LMDB nor Firefox) is for the lmdb-rkv build.rs script to unset HAVE_MEMALIGN on systems that HAVE_POSIX_MEMALIGN. I'll tryserver that.
Reporter | ||
Comment 6•6 years ago
|
||
Reporter | ||
Comment 7•6 years ago
|
||
Reporter | ||
Comment 8•6 years ago
•
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #5)
I still don't know why the problem doesn't reproduce locally, where the build system also sets HAVE_MEMALIGN. But perhaps a workaround (which doesn't require changing LMDB nor Firefox) is for the lmdb-rkv build.rs script to unset HAVE_MEMALIGN on systems that HAVE_POSIX_MEMALIGN. I'll tryserver that.
Hmm, no this won't work, because HAVE_MEMALIGN isn't defined in the environment, it's defined in mozilla-config.h, which gets included via CFLAGS; and HAVE_MEMALIGN can't be subsequently undefined in the build.rs script, as far as I can tell.
Reporter | ||
Comment 9•6 years ago
|
||
Over in https://github.com/mozilla/lmdb-rs/pull/28, I work around the problem by undefining HAVE_MEMALIGN via a header file that the lmdb-sys build.rs script includes.
Ted: I'm still interested in your feedback on that approach, either here in this bug or there in that issue.
Assignee | ||
Comment 10•6 years ago
|
||
Note that with the stack of patches from bug 1522609 and bug 1522614, mozilla-config.h should end up force-included in mdb.c, although that wouldn't happen for e.g. asan builds, but we could change that.
Assignee | ||
Comment 11•6 years ago
|
||
err, but that was already the case and that's what the problem was.
Assignee | ||
Comment 12•6 years ago
|
||
Come to think of it, it would probably be better if we didn't force-include mozilla-config.h in rust-built C/C++ code.
Comment 13•6 years ago
|
||
FWIW, I can reproduce locally on a linux build with https://phabricator.services.mozilla.com/D17668 from bug 1429796.
Comment 14•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #12)
Come to think of it, it would probably be better if we didn't force-include mozilla-config.h in rust-built C/C++ code.
Yeah, this sounds like the most sensible thing. Our autoconf defines are pretty specific to our needs.
Assignee | ||
Comment 15•6 years ago
•
|
||
Funny thing, fixing this breaks Android with:
third_party/rust/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:4853:53: error: use of undeclared identifier 'PTHREAD_MUTEX_ROBUST'
which is because mdb.c relies on the ANDROID
macro, which is not something the compiler sets, __ANDROID__
is (and it works now because we do set ANDROID
from mozilla-config.h). Interestingly, this was fixed upstream in http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=0a954f1a67410dceb0fafe202bd6c2f2f409cb4a but didn't make it into a release.
Assignee | ||
Comment 16•6 years ago
|
||
Reporter | ||
Comment 17•6 years ago
•
|
||
(In reply to Mike Hommey [:glandium] from comment #15)
Interestingly, this was fixed upstream in http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=0a954f1a67410dceb0fafe202bd6c2f2f409cb4a but didn't make it into a release.
FWIW, according to the commit log for the mdb.master branch (https://github.com/LMDB/lmdb/commits/mdb.master), which is where version 1.0 is being developed, it also didn't make it onto that branch until January 17 of this year, even though it was committed on April 12, 2017. I suspect that it was cherry-picked from some other branch, but I don't know which one that would be, since it clearly isn't the stable branch.
Other fixes have landed on mdb.master but not on the mdb.RE/0.9 branch (https://github.com/LMDB/lmdb/commits/mdb.RE/0.9), i.e. the current stable 0.9 release, which is what we're using in rkv. For example, https://github.com/mozilla/lmdb-rs/pull/23 requests switching to mdb.master to fix an issue with the initial size of database files on Windows.
(I don't intend to switch to mdb.master, which is an unstable development branch; but I'm open to backporting patches from it, and I expect we'll find more that we'll want to backport.)
Comment 18•6 years ago
|
||
Comment 19•6 years ago
|
||
bugherder |
Description
•