Open Bug 1213698 Opened 4 years ago Updated 2 years ago

error: undefined reference to 'dlsym' if building with ASan and GCC (Tor 17509)

Categories

(Firefox Build System :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: gk, Assigned: gk)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tor][tor-standalone])

Attachments

(1 file)

Trying to build Firefox with GCC and Address Sanitizer breaks with

/path/to/firefox/intl/icu/source/common/putil.cpp:2103: error: undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status

That is not an ICU-only issue as building without it breaks the build later with the same error message.

There is no fancy .mozconfig involved. Basically only

export CFLAGS="-fsanitize=address -Dxmalloc=myxmalloc"
export CXXFLAGS="-fsanitize=address -Dxmalloc=myxmalloc"
export LDFLAGS="-fsanitize=address"

ac_add_options --enable-address-sanitizer
ac_add_options --disable-jemalloc
ac_add_options --disable-elf-hack

ac_add_options --enable-optimize

ac_add_options --disable-strip
ac_add_options --disable-install-strip
ac_add_options --disable-tests
ac_add_options --disable-debug
Attached is the config.log. For some reason -ldl is said to be not needed (contrary to a non-ASan build). But the compilation error seems to indicate the opposite.
This does not happen with GCC 4.9.x.
Summary: error: undefined reference to 'dlsym' if building with ASan and GCC 5.2.1 → error: undefined reference to 'dlsym' if building with ASan and GCC 5
So, with GCC revision 215527 the check whether we need to specify -ldl expiclitly via testing with dlopen() is not working anymore if one is building with ASan support. What patch would Mozilla merge to fix that? For instance testing with dlsym() instead seems to solve the issue for me. Would that be an acceptable option?
Flags: needinfo?(mh+mozilla)
Why is the dlopen() test failing?
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #4)
> Why is the dlopen() test failing?

You mean why -ldl does not get added to the linker flags although it fails later with

/path-to/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:167: error: undefined reference to 'dlerror'
/path-tp/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:176: error: undefined reference to 'dlsym'
/path-to/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:176: error: undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status

?

Good question. I could not come up with an answer yet. Bisecting the ASan changes is not really working as the problem does not happen with clang and all the changes got squashed into one commit, rev 215527. I look over them a couple of times but there was not anything that immediately jumped to my mind.

If you or anybody else has some ideas why dlopen() does not do the trick here while dlsym() (e.g.) does and especially on what kind of patch you would accept, I'd be happy to do the (remaining) work.
Whiteboard: [tor]
Did you try to run the configure test independently and see how/why it fails?
(In reply to Mike Hommey [:glandium] from comment #6)
> Did you try to run the configure test independently and see how/why it fails?

Yes. Here are the results (commands run + output):

conftest.c
----------
char dlopen();

int main() {
dlopen()
; return 0; }

no ASan:
--------

gcc -o conftest -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /path/to/obj-x86_64-unknown-linux-gnu/build/unix/gold -Wl,-Bsymbolic conftest.c
/tmp/ccEvighn.o:conftest.c:function main: error: undefined reference to 'dlopen'
collect2: error: ld returned 1 exit status

ASan:

gcc -o conftest -fsanitize=address -Dxmalloc=myxmalloc -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno  -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /path/to/obj-x86_64-unknown-linux-gnu/build/unix/gold conftest.c

If I change conftest.c to

char dlsym();

int main() {
dlsym()
; return 0; }

I get

no ASan:
--------

gcc -o conftest -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /path/to/obj-x86_64-unknown-linux-gnu/build/unix/gold -Wl,-Bsymbolic conftest.c
/tmp/ccFf85Zg.o:conftest.c:function main: error: undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status

ASan:
-----

gcc -o conftest -fsanitize=address -Dxmalloc=myxmalloc -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno  -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /path/to/obj-x86_64-unknown-linux-gnu/build/unix/gold conftest.c
/tmp/ccTBNSTL.o:conftest.c:function main: error: undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status

Looking at the verbose output does not gives me any hints either. The ASan one basically differs due to adding the ASan specific bits.
There should be a -ldl on both commands, added by 
LIBS="-l$i  $ac_func_search_save_LIBS"

where $i is "dl".
(In reply to Mike Hommey [:glandium] from comment #8)
> There should be a -ldl on both commands, added by 
> LIBS="-l$i  $ac_func_search_save_LIBS"
> 
> where $i is "dl".

I totally agree with that and adding them (manually) is working fine but that is not what is happening with the configure script (and why there is this bug report):

ASan case:
----------

configure:11799: checking for library containing dlopen
configure:11817: gcc -o conftest -fsanitize=address -Dxmalloc=myxmalloc -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno  -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /home/thomas/Arbeit/Tor/tor-browser/obj-x86_64-unknown-linux-gnu/build/unix/gold conftest.c  1>&5
configure:11857: checking for dlfcn.h

non-ASan case:
--------------

configure:11799: checking for library containing dlopen
configure:11817: gcc -o conftest  -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno   -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /home/thomas/Arbeit/Tor/tor-browser/obj-x86_64-unknown-linux-gnu/build/unix/gold conftest.c  1>&5
/tmp/cc9U7CRo.o:conftest.c:function main: error: undefined reference to 'dlopen'
collect2: error: ld returned 1 exit status
configure: failed program was:
#line 11806 "configure"
#include "confdefs.h"
/* Override any gcc2 internal prototype to avoid an error.  */
/* We use char because int might match the return type of a gcc2
    builtin and then its argument prototype would still apply.  */
char dlopen();

int main() {
dlopen()
; return 0; }
configure:11839: gcc -o conftest  -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno   -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B /home/thomas/Arbeit/Tor/tor-browser/obj-x86_64-unknown-linux-gnu/build/unix/gold conftest.c -ldl   1>&5
configure:11857: checking for dlfcn.h

If I convince the configure script to use dlsym for testing then it works as expected, hence my question if such a patch would be acceptable.
(In reply to Georg Koppen from comment #9)
> ASan case:
> ----------
> 
> configure:11799: checking for library containing dlopen
> configure:11817: gcc -o conftest -fsanitize=address -Dxmalloc=myxmalloc
> -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno 
> -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B
> /home/thomas/Arbeit/Tor/tor-browser/obj-x86_64-unknown-linux-gnu/build/unix/
> gold conftest.c  1>&5

So, in fact, the question is why does this command *not* fail without -ldl?
Yes. As I said above that behavior changed with GCC's r215527 but I am not sure how to debug that further.
(In reply to Mike Hommey [:glandium] from comment #10)
> (In reply to Georg Koppen from comment #9)
> > ASan case:
> > ----------
> > 
> > configure:11799: checking for library containing dlopen
> > configure:11817: gcc -o conftest -fsanitize=address -Dxmalloc=myxmalloc
> > -std=gnu99 -fgnu89-inline -fno-strict-aliasing -fno-math-errno 
> > -fsanitize=address -Wl,-z,noexecstack -Wl,-z,text -Wl,--build-id -B
> > /home/thomas/Arbeit/Tor/tor-browser/obj-x86_64-unknown-linux-gnu/build/unix/
> > gold conftest.c  1>&5
> 
> So, in fact, the question is why does this command *not* fail without -ldl?

It looks like libasan defines a weak dlopen symbol, but not a weak dlsym symbol.
so something that uses both dlopen and dlsym, which is a rightful combination, gets dlopen from libasan and dlsym from libdl? That sounds awful... why are they doing this?
(In reply to Mike Hommey [:glandium] from comment #13)
> so something that uses both dlopen and dlsym, which is a rightful
> combination, gets dlopen from libasan and dlsym from libdl? That sounds
> awful... why are they doing this?

It looks like they want to intercept dlopen, which I would expect is in llvm's version too? so I'm curious why we don't see a problem there, but I don't have an llvm checkout to poke at that.

what libasan has is a weak alias, so this probably works fine, but I'm curious why itss publicly visible though its explicitly __attribute__((visibility(default))) so I assume there is a good reason.
(In reply to Mike Hommey [:glandium] from comment #13)
> so something that uses both dlopen and dlsym, which is a rightful
> combination, gets dlopen from libasan and dlsym from libdl? That sounds
> awful... why are they doing this?

(In reply to Trevor Saunders (:tbsaunde) from comment #14)
> It looks like they want to intercept dlopen, which I would expect is in
> llvm's version too? so I'm curious why we don't see a problem there, but I
> don't have an llvm checkout to poke at that.

Agreed: it's to intercept dlopen() and dlclose() in order to track what's loaded and where; I assume it's just passing through the library handles and doesn't need to know about symbol lookups, so doesn't intercept dlsym.  Clang's ASan does the same thing.

As for why Clang works while GCC 5 doesn't: empirically, it looks like Clang adds a bunch of extra libraries when linking with -fsanitize=address, namely, -lpthread -lrt -lm -ldl -lgcc_s.

Or, as found by searching the Clang source for "ldl":

static void linkSanitizerRuntimeDeps(const ToolChain &TC,
                                     ArgStringList &CmdArgs) {
  // Force linking against the system libraries sanitizers depends on
  // (see PR15823 why this is necessary).
  CmdArgs.push_back("--no-as-needed");
  CmdArgs.push_back("-lpthread");
  CmdArgs.push_back("-lrt");
  CmdArgs.push_back("-lm");
  // There's no libdl on FreeBSD.
  if (TC.getTriple().getOS() != llvm::Triple::FreeBSD)
    CmdArgs.push_back("-ldl");
}

So, see also https://llvm.org/bugs/show_bug.cgi?id=15823 (which I think GCC solved differently; I've already shaved enough yaks for this comment, but I notice that GCC uses a shared libasan.so by default).
In the meantime, I simply added -dl to LDFLAGS environment variable for local build of ASAN version of C-C TB with GCC5 (and GCC6).
I think for a local build this is fine (?)
Priority: -- → P3
Summary: error: undefined reference to 'dlsym' if building with ASan and GCC 5 → error: undefined reference to 'dlsym' if building with ASan and GCC 5 (Tor 17509)
Blocks: meta_tor
Whiteboard: [tor] → [tor][tor-standalone]
We'd like to get a fix for this bug into ESR52. Mike, what kind of fix would be your preferred one?
Flags: needinfo?(mh+mozilla)
Add a check for dlsym?
Flags: needinfo?(mh+mozilla)
Assignee: nobody → gk
Hi,

Again, automatic detection and configuration is very good.

In the meantime, I have done away with the issue by adding manually
"-ld" to my MOZCONFIG file thusly:

# Mandatory flags for ASan
export ASANFLAGS="-fsanitize=address -Dxmalloc=myxmalloc -fPIC"
export CFLAGS="$ASANFLAGS $CFLAGS -fno-delete-null-pointer-checks "
export CXXFLAGS="$ASANFLAGS $CXXFLAGS -fno-delete-null-pointer-checks "
export LDFLAGS="-fsanitize=address -ldl"   <=== NOTE the addition.

The addition of f-no-delete-null-pointer-checks is a workaround for GCC6 issue with comm-central TB.
(I believe that the these exported environmental variables also have effect during configuration, too.
Come to think of it, since build invokes makefiles and stuff that are created during configuration,
MOZCONFIG setup is it!.)

TIA
We're not using gcc 5 anymore, going to wontfix this
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
(In reply to Tom Ritter [:tjr] from comment #20)
> We're not using gcc 5 anymore, going to wontfix this

Yes, but the issue is not gone. :) With 6.4.0 and without explicitly -ldl I still get errors like

/home/thomas/Arbeit/Tor/tor-browser/xpcom/glue/standalone/nsXPCOMGlue.cpp:116: error: undefined reference to 'dlsym'

I've removed the GCC version information in case that's what confused you.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: error: undefined reference to 'dlsym' if building with ASan and GCC 5 (Tor 17509) → error: undefined reference to 'dlsym' if building with ASan and GCC (Tor 17509)
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.