Closed Bug 1626932 Opened 4 years ago Closed 4 years ago

trying to upgrade the Android SDK to version 29 triggers weird aidl execution failures

Categories

(Firefox Build System :: Android Studio and Gradle Integration, defect)

defect
Not set
normal

Tracking

(firefox76 fixed)

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

Attachments

(1 file)

Explanation forthcoming in the commit message for the patch I am about to post.

It started, as most things do, with a software upgrade.

Upgrading the Android SDK version produced mysterious build errors:

java.lang.RuntimeException: ... Error while executing process /builds/worker/fetches/android-sdk-linux/build-tools/29.0.3/aidl ...

Looking a little further back in the log, one found the mysterious
error:

/builds/worker/fetches/android-sdk-linux/build-tools/29.0.3/aidl: error while loading shared libraries: /builds/worker/fetches/clang/lib/libc++.so: file too short

And when one investigated that file, one found that all libc++.so contained
was:

INPUT(libc++.so.1 -lc++abi)

which was obviously not any kind of loadable shared library.

The libc++.so file was meant as a redirect, an artifact placed so people
could say -lc++ on their compiler command lines and things would Just Work.
The intent was that programs would link to libc++.so.1 (or whatever other
soversion was in use) and that they wouldn't have to worry about -lc++abi or
any other private implementation details of libc++. Why, then, was aidl
linking to libc++.so?

Looking at ldd aidl on a local machine says:

...
libc++.so => $HOME/.mozbuild/android-sdk-linux/build-tools/29.0.3/./lib64/libc++.so (0x00007f8b4b2d1000)
...

and poking at aidl with readelf --dynamic -W indicated a partial solution
to the problem:

...
0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/../lib64:$ORIGIN/lib64]
...

aidl contained a DT_RUNPATH entry for libc++.so that specified to the
linker "find my libc++.so at a path relative to my current binary location."
Presumably this was done so that aidl would link to a specific libc++.so
and not pick up a random libc++.so from the system on which aidl was
running.

...Except that setting LD_LIBRARY_PATH, which we specify during our builds for
various reasons, takes precedence over DT_RUNPATH. So invoking aidl was
looking for libc++.so along LD_LIBRARY_PATH, finding this linker script
that was meant to be used only by the compiler, and attempting to use it like
a real shared library.

Great, we understand the problem now; what do we do about it?

Unwinding our use of LD_LIBRARY_PATH is an involved task. One further
wrinkle that wasn't mentioned above is that not every Android build that we
run failed after the SDK update: our x86-64 fuzzing build was fine, as was the
ARM PGO instrumentation build. Presumably, those builds set LD_LIBRARY_PATH
in slightly different ways compared to our other builds, and those differences
in setting LD_LIBRARY_PATH contribute to aidl somehow not getting invoked
with LD_LIBRARY_PATH set. It is not obvious to me how difficult getting a
consistent LD_LIBRARY_PATH setting is.

This next part is conjecture -- informed conjecture, but conjecture
nonetheless. When we invoke gradle for the first time, a separate daemon
process is started, presumably to make various Java startup problems go away.
What we would like to have happen, and what appears to happen, is that
everything that gradle does is actually spawned from the daemon itself, not
from the invoked gradle process. So if we could get the daemon invoked with
the correct environment (i.e. no LD_LIBRARY_PATH present), everything would
just work, because aidl would be launched from an environment that will
respect its DT_RUNPATH setting.

This change assumes that gradle does not invoke things that depend on
LD_LIBRARY_PATH, which is not at all clear; I think there are cases where we
can go mach -> gradle -> mach -> ... -> clang. But those cases don't seem
to come up in automation (perhaps due to every build being a clobber), and if
they come up on local developer machines, local developer machines seem
unlikely to have LD_LIBRARY_PATH set (mandatory invocation of xkcd#1172).

All of that is to say that this patch takes a semi-hacky approach to the
problem: when we invoke gradle, we make sure that we're invoking it with an
environment that doesn't contain LD_LIBRARY_PATH. And doing so avoids all
of the problems outlined above.

Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f9799e1d3d5
delete LD_LIBRARY_PATH before invoking gradle; r=firefox-build-system-reviewers,rstewart
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: