Closed Bug 1290638 Opened 8 years ago Closed 7 years ago

Shared objects (.so files) in Android build artifacts are not ELF binaries

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: robwu, Assigned: robwu)

References

Details

I am trying to get "mach xpcshell-test" to succeed on Fennec, with artifact builds. After setting up a working emulator (bug 1290627), I ran "mach xpcshell-test".


1. At first the command failed due to a missing xpcshell binary with the error "/data/local/xpcb/xpcw[18]: /data/local/xpcb/xpcshell: not found".
Luckily I already did a full Fennec build (with ndk and all) before trying out artifact builds, so I copied the file to objdir-frontend/bin/dist/xpcshell and retried.

2. Then it failed with the error "CANNOT LINK EXECUTABLE: could not load library "libxul.so" needed by "/data/local/xpcb/xpcshell"; caused by "libnss3.so" has bad ELF magic" (this was also reported in a comment at bug 1260916).
When I copied libnss3.so from the full Fennec build to objdir-frontend/dist/bin/libnss3.so, repackaged and reuploaded, the "bad ELF magic" message repeated (with libxul.so instead of libnss3.so).

The magic bytes of the .so files in the Android emulator look bad indeed (SeZz instead of <7F>ELF):
~> adb shell
root@generic:/ # for f in $(busybox find /data/local/xpcb -name '*.so'); do busybox printf "%-44s" "$f" && busybox head -c8 "$f" | busybox hexdump | busybox head -n1 ;done
/data/local/xpcb/libplugin-container.so     0000000 457f 464c 0101 0001
/data/local/xpcb/libplugin-container-pie.so 0000000 457f 464c 0101 0001
/data/local/xpcb/libmozglue.so              0000000 457f 464c 0101 0001
/data/local/xpcb/libxul.so                  0000000 6553 7a5a a6c5 0190
/data/local/xpcb/libsoftokn3.so             0000000 6553 7a5a 6c75 0001
/data/local/xpcb/libomxpluginkk.so          0000000 6553 7a5a 83d1 0000
/data/local/xpcb/libomxplugin.so            0000000 6553 7a5a 7c31 0000
/data/local/xpcb/libnssckbi.so              0000000 6553 7a5a 94df 0003
/data/local/xpcb/libnss3.so                 0000000 457f 464c 0101 0001
/data/local/xpcb/liblgpllibs.so             0000000 6553 7a5a 8776 0000
/data/local/xpcb/libfreebl3.so              0000000 6553 7a5a 5646 0002

This is how my GOOD full Fennec build looks like:

~/mozilla-central/obj-android/dist/bin> for f in *.so ; do printf "%-30s" "$f" && head -c8 "$f" | xxd -c8 ;done
libfreebl3.so                 00000000: 7f45 4c46 0101 0100  .ELF....
liblgpllibs.so                00000000: 7f45 4c46 0101 0100  .ELF....
liblogalloc.so                00000000: 7f45 4c46 0101 0100  .ELF....
libmozglue.so                 00000000: 7f45 4c46 0101 0100  .ELF....
libnss3.so                    00000000: 7f45 4c46 0101 0100  .ELF....
libnssckbi.so                 00000000: 7f45 4c46 0101 0100  .ELF....
libomxplugin.so               00000000: 7f45 4c46 0101 0100  .ELF....
libomxpluginkk.so             00000000: 7f45 4c46 0101 0100  .ELF....
libplugin-container-pie.so    00000000: 7f45 4c46 0101 0100  .ELF....
libplugin-container.so        00000000: 7f45 4c46 0101 0100  .ELF....
libreplace_jemalloc.so        00000000: 7f45 4c46 0101 0100  .ELF....
libreplace_malloc.so          00000000: 7f45 4c46 0101 0100  .ELF....
libsoftokn3.so                00000000: 7f45 4c46 0101 0100  .ELF....
libxul.so                     00000000: 7f45 4c46 0101 0100  .ELF....

When I unpack the cached artifact in ~/mozbuild/package-frontend/, I get files with the same headers as the bad .so files in the Android emulator:
~/.mozbuild/package-frontend/> 7z x f911cbb3a7e171c3-public%2Fbuild%2Ffennec-50.0a1.en-US.android-arm.apk assets/armeabi-v7a
~/.mozbuild/package-frontend> cd assets/armeabi-v7a
~/.mozbuild/package-frontend/assets/armeabi-v7a> for f in *.so ; do printf "%-30s" "$f" && head -c8 "$f" | xxd -c8 ;done
libfreebl3.so                 00000000: 5365 5a7a 4656 0200  SeZzFV..
liblgpllibs.so                00000000: 5365 5a7a 7687 0000  SeZzv...
libnss3.so                    00000000: 5365 5a7a ff0b 0e00  SeZz....
libnssckbi.so                 00000000: 5365 5a7a df94 0300  SeZz....
libomxplugin.so               00000000: 5365 5a7a 317c 0000  SeZz1|..
libomxpluginkk.so             00000000: 5365 5a7a d183 0000  SeZz....
libsoftokn3.so                00000000: 5365 5a7a 756c 0100  SeZzul..
libxul.so                     00000000: 5365 5a7a c5a6 9001  SeZz....

There seems to be a flaw in the packaging process (corrupting the .so files before archiving the Fennec build), or the unpackaging process (not properly unpacking the .so files). Please help with getting it to work.
This is intentional, at least in the APK: the Fennec library loader decompresses szip-ed .so files on the fly.

I forget how it's supposed to work for xpcshell tests; in fact, I'm not sure artifact builds support xpcshell tests on Android.  chmanchester, gbrown -- do you know if this did/should work?
Flags: needinfo?(gbrown)
Flags: needinfo?(cmanchester)
See Also: → 900508
It ends up here: http://searchfox.org/mozilla-central/rev/1112b7a5222b71a3b5b68bd531f50ded6bcbc770/testing/xpcshell/remotexpcshelltests.py#441

                # If the test package doesn't contain szip, it means files
                # are not szipped in the test package.

Clearly this assumption is wrong.

When I copy the szip binary from my full build to the artifact build, the xpcshell tests run as expected:
~/mozilla-central> mkdir objdir-frontend/dist/bin/host
~/mozilla-central> cp obj-android/mozglue/linker/szip objdir-frontend/dist/bin/host/

So to fix this bug you have to make sure that the szip binary ends up in the expected location, that is one of the following:
mozilla-central/objdir-frontend/dist/host/bin/szip
mozilla-central/objdir-frontend/dist/bin/host/szip
See Also: 900508
I'm pretty sure we never got xpcshell tests for Android artifact builds working. The discussion at the time indicated there was limited value in this for typical android front end development tasks.
Flags: needinfo?(cmanchester)
It's time to get them to work because WebExtensions tests run in xpcshell. I'll take a look.
Assignee: nobody → rob
Status: NEW → ASSIGNED
After applying the patch at https://pastebin.mozilla.org/8888745 (and fixing an import error for the exception), removing ~/.mozbuild/package-frontend/*.processed.jar, running mach build && mach package, I get the following error when I run mach xpcshell-test:

OSError: [Errno 8] Exec format error

  File "/Users/rwu/mozilla-central/testing/xpcshell/mach_commands.py", line 390, in run_xpcshell_test
    return xpcshell.run_test(**params)
  File "/Users/rwu/mozilla-central/testing/xpcshell/mach_commands.py", line 228, in run_test
    xpcshell = remotexpcshelltests.XPCShellRemote(dm, options, log)
  File "/Users/rwu/mozilla-central/testing/xpcshell/remotexpcshelltests.py", line 275, in __init__
    self.setupUtilities()
  File "/Users/rwu/mozilla-central/testing/xpcshell/remotexpcshelltests.py", line 432, in setupUtilities
    self.pushLibs()
  File "/Users/rwu/mozilla-central/testing/xpcshell/remotexpcshelltests.py", line 458, in pushLibs
    out = subprocess.check_output([szip, '-d', localFile], stderr=subprocess.STDOUT)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 567, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception

This is because the szip binary in the Android artifact archive is an ELF binary, which does not run on Mac.
I did not find an archived build of szip for Mac in taskcluster, and artifact builds are documented to be compilation-less builds, so I'm considering to write a cross-platform Python implementation to decode StreamableZStreams.

Please tell me that it's an stupid idea and that there is a better alternative (or that is it an appropriate work-around and that I should go ahead and port the decompression part of szip.cpp to Python2.7+).
Flags: needinfo?(mh+mozilla)
Sorry, not much to add. I haven't been involved in artifact builds really.
Flags: needinfo?(gbrown)
(In reply to Rob Wu [:robwu] from comment #2)
>                 # If the test package doesn't contain szip, it means files
>                 # are not szipped in the test package.
> 
> Clearly this assumption is wrong.

I think that comment is strictly correct, where "test package" means a test*.zip from the production test environment -- not anything to do with artifact builds or a local build.

Perhaps one approach to consider is to allow use of szip from $PATH (via which or something like that) when it isn't found in one of the expected locations.
Considering bug 1291424, let's pause here.
Flags: needinfo?(mh+mozilla)
I successfully ran xpcshell tests with artifact builds earlier this month, so I guess that the bug has been fixed.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.