Closed Bug 554854 Opened 10 years ago Closed 9 years ago

Install a newer glibc on Linux build slaves (linux32 symbols missing since breakpad update)

Categories

(Release Engineering :: General, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: nthomas, Assigned: joduinn)

References

Details

(Whiteboard: [buildslaves][triagefollowup])

Noticed this as part of 554754, where we're trying to resolve symbols when the zip file contains only an empty manifest. Don't have a super precise smoking gun, but we took 70-80 seconds to run 'make buildsymbols' prior to updating to breakpad 554, and now it's consistently < 5 seconds. 

starttime (PDT)		revision	elapsed	checkin
			
20/03/10 05:05		d03261d7898a	71	Masayuki
20/03/10 08:09		1cce6e1d3089	80	josh
20/03/10 09:16		339565aa8cfc		Breakpad 554
20/03/10 09:42		3a726b459ad3		first bustage fix for breakpad
20/03/10 09:46		6290189437fc	5	2nd bustage fix for breakpad
20/03/10 10:31		0154b0cea0f2	10	bas
20/03/10 10:37		4b4a1b1cb99a	22	bas
20/03/10 10:49		d9808811bb01	15	serge
20/03/10 14:58		333a791d201e	2	khuey decom
20/03/10 16:12		333a791d201e	4	

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=45338ed951a7&tochange=333a791d201e

The log now has 
Processing file: ./dist/bin/TestDNS
/builds/slave/mozilla-central-linux/build/obj-firefox/dist/host/bin/dump_syms: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /builds/slave/mozilla-central-linux/build/obj-firefox/dist/host/bin/dump_syms)
which wasn't present a week ago. We have glibc-2.5-12 on the linux ref platform.
More relevantly, we have:
$ rpm -qa | grep libstdc
libstdc++-4.1.1-52.el5
compat-libstdc++-33-3.2.3-61
compat-libstdc++-296-2.96-138
libstdc++-devel-4.1.1-52.el5

$ rpm -q --provides libstdc++-4.1.1-52.el5
libstdc++.so.6  
libstdc++.so.6(CXXABI_1.3)  
libstdc++.so.6(CXXABI_1.3.1)  
libstdc++.so.6(GLIBCXX_3.4)  
libstdc++.so.6(GLIBCXX_3.4.1)  
libstdc++.so.6(GLIBCXX_3.4.2)  
libstdc++.so.6(GLIBCXX_3.4.3)  
libstdc++.so.6(GLIBCXX_3.4.4)  
libstdc++.so.6(GLIBCXX_3.4.5)  
libstdc++.so.6(GLIBCXX_3.4.6)  
libstdc++.so.6(GLIBCXX_3.4.7)  
libstdc++.so.6(GLIBCXX_3.4.8)  
libstdc++ = 4.1.1-52.el5
So, we hit this same problem with our Firefox binaries in bug 526868. The problem there was resolved by not using std::ostream, because that's what introduced the dependency on the new ABI. It seems silly to tiptoe around ABI compat on a binary that's only ever going to be run on the build machine itself. I think if we're going to be building things with a version of GCC capable of producing a dependency on this version of glibc, we're going to have to have this version of glibc available on the build machines.
I'm punting this back over to RelEng. I think we really need to fix the build slaves so they can run this binary, considering that it was built by GCC on the slave itself.

I can back out the Breakpad update to fix the immediate bustage, but we really do want to land this update, and I don't think we should have to jump through hoops changing the code to make it work.
Component: Breakpad Integration → Release Engineering
Product: Toolkit → mozilla.org
QA Contact: breakpad.integration → release
Version: Trunk → other
Summary: linux32 symbols missing since breakpad update → Install a newer glibc on Linux build slaves (linux32 symbols missing since breakpad update)
Ok, getting a recent glibc (3.49 is what we want, it looks like) onto our build slaves is going to be a bit of a pain. We certainly need to build it ourselves, and we need to figure out if this affects the actual build and link of Firefox itself.

It *sounds* like we might be able to get away with just adding it to LD_LIBRARY_PATH during steps which run dump_syms. If that works, we don't need to worry about Firefox compat at all.

If we need to add it to the global LD_LIBRARY_PATH we should test that Firefox still runs on older glibc's.
I have no idea how gcc/glibc ABI compat works, but yeah, we should definitely make sure this doesn't affect our Firefox builds.
Okay, we were able to implement a workaround:
http://hg.mozilla.org/mozilla-central/rev/33d05f60932b

I'd still like this bug fixed so we don't hit it again in the future, since we're likely to keep using more C++ features. However, it's not high priority right now.
With the workaround check-in of linking dump_syms statically, I am no longer able to build under Linux with crashreporter enabled.  I get a seemingly nonsensical error:

/usr/bin/ld: cannot find -lm
I believe that means you're missing a static libm for whatever architecture you're targeting. Someone mentioned a "glibc-static" package on IRC, perhaps you can see if that's available to install on your distro.
(In reply to comment #8)
> I believe that means you're missing a static libm for whatever architecture
> you're targeting. Someone mentioned a "glibc-static" package on IRC, perhaps
> you can see if that's available to install on your distro.

Ah.  It does make sense now.  And that did fix the issue.  Thanks.
Priority: -- → P3
I'm confused, probably by my ignorance of our buildsystem --- why does building a program with g++ on the build slave construct a program that the build slave can't execute?  If the static linker finds a particular version of libstdc++, then I'd expect the dynamic linker to be able to find it as well.
I can't say I know exactly what's going on, but our build slaves are running a fairly old Linux (CentOS 5), on which we've installed (in a custom location) a more recent GCC (4.3.3, IIRC). The system libc is not new enough (if you use enough C++ to bump your libstdc++ ABI requirement), and so running the binary fails. I don't know if our gcc install has its own libstdc++ that works or not.
The gcc/g++ 4.3.3 binaries and libs have been installed in its own directory, which is not included in LD_LIBRARY_PATH, nor the system path.  We're not doing anything during linking to encode this directory into our binaries, so there's not really any way for the apps to find the gcc 4.3.3 libs.

What would the effects be of modifying /etc/ld.so.conf?
Oh, that makes total sense.  The compiler itself knows to search the directory containing the libstdc++ that was built along with it --- everything in the GNU Compiler Collection uses binary-relative paths: if the compiler is run as /PATH/bin/g++, then it looks in /PATH/lib for static and shared libraries.

So the static linker finds the newer libstdc++.so, and cites its symbol versions in the executable it's linking, but then the dynamic linker can't find that libstdc++.so.

I'll bet listing the new G++'s lib directory in /etc/ld.so.conf would allow the executable to run.  But it would also mean that if Firefox itself started depending on newer versions of symbols, as found in G++'s lib directory, we wouldn't get any errors trying to run it.  We don't want to allow Firefox to refer to anything newer than it does now, do we?  That sounds like an important decision.
Just in case someone besides me is thinking this through for the first time:

Even in a newer libstdc++, most of the symbols are marked with older version numbers.  For example, in a libstdc++ built from recent sources, most symbols are marked GLIBCXX_3.4, with a few GLIBCXX_3.4.9's and GLIBCXX_3.4.14's scattered here and there.

If my program only ever refers to symbols marked GLIBCXX_3.4, then I can run it against older shared libraries.  But if the static linker satisfies a symbol reference in my code with a definition in a shared library marked GLIBCXX_3.4.14, then my executable will require that version of that symbol.  This is why it matters which facilities from libstdc++.so the code actually uses.
Whiteboard: [buildslaves]
Is it ok to remove HOST_LDFLAGS += -static for our build for Fedora? We don't like to link against static libraries, this could bring us security issues eventually (by forget to rebuild when security problems with libc occurs and such problems).
This only links dump_syms statically. This is a host binary that is not shipped with Firefox.
This should basically be WFM since the patches to not depend on newer libstdc++ landed, right?
As I wrote in bug 661914, the stdcxx-compat hack is opt-in, so we'd need to update the dump_syms Makefile in order to use it, by apart from that, it should be enough. Note that IIRC, our build infrastructure uses a precompiled dump_syms that was linked against gcc 4.3's libstdc++ and doesn't work on the system libstdc++ on our buildbots. We could also rebuild that version that our build scripts downloads and be done with the requirement of setting LD_LIBRARY_PATH to gcc 4.3's lib directory altogether (I think dump_syms is the last bit that requires it). That would only work for m-c and aurora at the moment, though.
We only use a prebuilt dump_syms on windows, we build it during the build on other platforms.
(In reply to comment #19)
> We only use a prebuilt dump_syms on windows, we build it during the build on
> other platforms.

Ah now I remember, that's the minidump_stackwalk program that had this problem.
(In reply to comment #20)
> Ah now I remember, that's the minidump_stackwalk program that had this
> problem.

So was this solved by getting the statically-linked version in place?
Assignee: nobody → joduinn
Whiteboard: [buildslaves] → [buildslaves][triagefollowup]
(In reply to comment #21)
> (In reply to comment #20)
> > Ah now I remember, that's the minidump_stackwalk program that had this
> > problem.
> 
> So was this solved by getting the statically-linked version in place?

Most probably, but the linux64 binary is not statically linked. Note that as written in comment 18, using the stdcxx-compat hack should be equally well (instead of statically linking)
bug 610311 has a patch with a statically linked minidump_stackwalk that I forgot to land. I bugged coop about this a few weeks ago, but probably not hard enough. :)
(In reply to comment #23)
> bug 610311 has a patch with a statically linked minidump_stackwalk that I
> forgot to land. I bugged coop about this a few weeks ago, but probably not
> hard enough. :)

The statically linked minidump_stackwalk *is* landed, but it's only for linux, not linux64. (and i don't see a linux64 patch in that bug)
Oh, hah. I thought they were symlinks. In any event, it should be trivial to build the same thing for linux64.
(In reply to comment #25)
> Oh, hah. I thought they were symlinks. In any event, it should be trivial to
> build the same thing for linux64.

Just copied the existing statically linked version over for linux64 in https://bugzilla.mozilla.org/show_bug.cgi?id=610311#c23

Any else to do here?
I'm sure we'll hit this again at some point in the future, but we've gotten alright at mitigating it at this point.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.