Closed Bug 1606739 Opened 5 years ago Closed 5 years ago

Packaging libgraphitewasm.so fails in stage-package step during elfhack invocation

Categories

(Firefox Build System :: General, defect)

68 Branch
defect
Not set
normal

Tracking

(firefox74 fixed)

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: gk, Assigned: glandium)

Details

Attachments

(2 files)

We are trying to test RLBox with Graphite2 in Tor Browser which is based on ESR68. The currently backported set of patches[1] fails with
{{{
0:01.84 /usr/bin/make -j4 -s stage-package
0:11.36 ../../dist/firefox/libnspr4.so: Reduced by 7992 bytes
0:11.36 ../../dist/firefox/libplc4.so: No gain. Skipping
0:11.36 ../../dist/firefox/libplds4.so: Couldn't find .bss. Skipping
0:18.92 ../../dist/firefox/libxul.so: Reduced by 7147304 bytes
0:19.23 ../../dist/firefox/libmozgtk.so: Couldn't find .bss. Skipping
0:19.24 ../../dist/firefox/gtk2/libmozgtk.so: Couldn't find .bss. Skipping
0:19.25 ../../dist/firefox/libgraphitewasm.so: Traceback (most recent call last):
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/toolkit/mozapps/installer/packager.py", line 347, in <module>
0:19.25 main()
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/toolkit/mozapps/installer/packager.py", line 341, in main
0:19.25 copier.copy(args.destination)
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/python/mozbuild/mozpack/copier.py", line 431, in copy
0:19.25 copy_results.append((destfile, f.copy(destfile, skip_if_older)))
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/python/mozbuild/mozpack/files.py", line 310, in copy
0:19.25 elfhack(dest)
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/python/mozbuild/mozpack/executables.py", line 125, in elfhack
0:19.25 errors.fatal('Error executing ' + ' '.join(cmd))
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/python/mozbuild/mozpack/errors.py", line 103, in fatal
0:19.25 self._handle(self.FATAL, msg)
0:19.25 File "/var/tmp/build/firefox-f557e0b636c3/python/mozbuild/mozpack/errors.py", line 98, in _handle
0:19.25 raise ErrorMessage(msg)
0:19.25 mozpack.errors.ErrorMessage: Error: Error executing /var/tmp/build/firefox-f557e0b636c3/obj-x86_64-pc-linux-gnu/build/unix/elfhack/elfhack ../../dist/firefox/libgraphitewasm.so
0:19.27 make[1]: *** [stage-package] Error 1
0:19.27 make: *** [stage-package] Error 2
}}}
As we are on the esr68 branch we use a similar clang-based toolchain as Mozilla which is based on clang 8.0.1. GCC is 8.3.0, binutils is 2.31.1, and configure thinks the linker is gold. The .mozconfig file we have is
{{{
. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-@CONFIG_GUESS@
mk_add_options MOZ_APP_DISPLAYNAME="Tor Browser"
export MOZILLA_OFFICIAL=1
CC="clang --gcc-toolchain=/var/tmp/dist/gcc"
CXX="clang++ --gcc-toolchain=/var/tmp/dist/gcc"
HOST_CC=$CC
HOST_CXX=$CXX
export BINDGEN_CFLAGS='--gcc-toolchain=/var/tmp/dist/gcc'
ac_add_options --enable-optimize
ac_add_options --enable-official-branding
ac_add_options --enable-default-toolkit=cairo-gtk3
ac_add_options --enable-tor-browser-update
ac_add_options --enable-signmar
ac_add_options --enable-verify-mar
ac_add_options --disable-strip
ac_add_options --disable-install-strip
ac_add_options --disable-tests
ac_add_options --disable-debug
ac_add_options --disable-crashreporter
ac_add_options --disable-webrtc
ac_add_options --disable-eme
ac_add_options --enable-proxy-bypass-protection
ac_add_options MOZ_TELEMETRY_REPORTING=
}}}

Something to note is that I currently use commit 87b7a019472770f08d49cf3b558867dc76ea74eb for the wasi-sdk due to the clang 8.0.1 requirements in case that matters.

I've uploaed the resulting library for further inspection.[2]

[1] https://gitweb.torproject.org/user/gk/tor-browser.git/log/?h=bug_32380_v6
[2] https://people.torproject.org/~gk/testbuilds/libgraphitewasm.so{.asc}

elfhack fails on the libgraphitewasm.so file but doesn't dump any useful info when it does so (Segmentation fault (core dumped)). I'm not familiar enough with elfhack to understand why that might be happening. According to the README the library isn't awfully robust especially on non-libxul targets, so I guess that shouldn't be terribly shocking.

I assume elfhack is making some assumptions about the library that aren't true of libgraphitewasm.so. We can either fix elfhack so it won't freak out or always skip it for wasm targets.

[Note: the previous version of this comment had incorrect info which I edited out.]

Flags: needinfo?(mh+mozilla)
Attached file Valgrind crash dump

Valgrind dump attached. Looks like some sort of bad pointer math somewhere.

What appears to be happening is that we're doing relocations on the dynamic section (.dynamic), which we get here:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elfhack.cpp#764

We look for the .rela.dyn section for the relocations:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elfhack.cpp#770

And then, eventually, we want to look at the symtab associated with that section:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elfhack.cpp#830

And that symtab is nullptr, which crashes in predictable ways. We try to get the symtab here:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elf.cpp#491

The shdr.sh_link field is set correctly in the binary, pointing at .dynsym, so I guess the next question is why the getSection call there fails. Or maybe we're not using the correct parent for when we construct the ElfSection for .rela.dyn?

(In reply to Nathan Froyd [:froydnj] from comment #3)

The shdr.sh_link field is set correctly in the binary, pointing at .dynsym, so I guess the next question is why the getSection call there fails. Or maybe we're not using the correct parent for when we construct the ElfSection for .rela.dyn?

The long story short is that this all happens while we're initializing the section array for the Elf object:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elf.cpp#179-185

we're initializing section 1, the .dynsym section, which is a symtab-y section, so we wind up in ElfSymtab_Section where we want to ask for a section for some symbol:

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elf.cpp#827-828

and so we start to create ElfSections on demand, until we get to the point where we hit an infinite recursion guard (!):

https://searchfox.org/mozilla-central/source/build/unix/elfhack/elf.cpp#310

and we return nullptr up the chain, with predictably bad results later.

I think we could work around this by changing lucetc to lay out the binary differently (?). The other option is some surgery on elfhack to make it less dependent on the particular ordering of the sections.

until we get to the point where we hit an infinite recursion guard (!):

The path that leads to that is that while creating the ElfSymtab_Section, one symbol (_DYNAMIC) refers to the .dynamic section, which makes us initialize section 11, initializing a ElfDynamic_Section. While doing that, we initialize a ElfLocation for DT_RELA, which points to section 5, which we initialize a ElfRel_Section for, which has a sh_link or 1, so ElfSection initializes link trying to get section 1...

I think we could work around this by changing lucetc to lay out the binary differently (?). The other option is some surgery on elfhack to make it less dependent on the particular ordering of the sections.

The problem is not the order of the sections. It's that _DYNAMIC symbol. A elfhack workaround would be to start by initializing the .dynamic section.

Flags: needinfo?(mh+mozilla)
Assignee: nobody → mh+mozilla
Status: NEW → ASSIGNED
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/45d5e4ff7007 Initialize the .dynamic section first. r=froydnj
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: