Closed Bug 221217 Opened 16 years ago Closed 16 years ago

gtk2+xft builds since 1st Oct don't work

Categories

(Firefox Build System :: General, defect, critical)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: andy, Assigned: bryner)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft)

The last gtk2+xft build which works on my pentium 4, Mandrake 9.1 box is the
build from the 29th of Sept.

Reproducible: Always

Steps to Reproduce:
1.Download
2.Untar
3.Launch

Actual Results:  
Nothing

Expected Results:  
Firebird should open

See http://forums.mozillazine.org/viewtopic.php?t=27600 and
http://forums.mozillazine.org/viewtopic.php?t=27771
Oct 4th nightly and my own CVS build segfault during startup.

/home/mconnor/moztree/mozilla/dist/bin/run-mozilla.sh: line 454: 14605
Segmentation fault      "$prog" ${1+"$@"}

build issue?
Assignee: blake → bryner
Status: UNCONFIRMED → NEW
Component: General → build-config
Ever confirmed: true
QA Contact: mpconnor
Severity: normal → critical
There's not much I can do about this unless someone can post a stack trace of
the crash.
I backed out the fix for bug 128668 in my local tree, and I'm up and running on
the trunk, going to rebuild as a debug with a clean tree and get a stacktrace to
attach
Keywords: regression
I have this problem in nightlies of Firebird and my cvs builds of thunderbird.
I tracked it down to uriloader/exthandler/unix/nsGNOMERegistry.cpp

It segfaults on this line

  _gnome_program_init("Gecko", "1.0", _libgnome_module_info_get(),
                    1, argv, NULL);

I have gtk2 installed, but not the gnome library, therefore I assume both
_gnome_program_init and _libgnome_module_info_get()  are going to be null.

Commenting this line out in my thunderbird cvs build made the problem go away.

I can do a thunderbird stack dump if it will help, but this is where it segfaults.
I have no Firebird build with symbols and I can't get it to build from cvs.

Recent Mozilla CVS trunk build has the same problem.
This is not only Mozilla Fierbird's problem.
Ok, so here's the odd part.  It should be aborting the GNOME stuff way before
that if it couldn't find either the gnome_program_init() symbol or the
libgnome_module_info_get() symbol.  Are you saying libgnome-2.so is not present
on your system?

If you could print out the values of the following variables at the line where
it segfaults, it would help:

_gnome_program_init
_libgnome_module_info_get
gnomeLib
The code logic seemed wrong to me, the only time the function will be exited is
when PR_FindFunctionSymbol fails and the only time that is called is if
PR_LoadLibrary is successful which isn't the case if you don't have the library.
So i changed:

if (gconfLib) {
  GET_LIB_FUNCTION(gconf, gconf_client_get_default);
  GET_LIB_FUNCTION(gconf, gconf_client_get_string);
}

to

if (!gconfLib) {
  PR_Free(libName);
  CleanUp();
  return;
}

GET_LIB_FUNCTION(gconf, gconf_client_get_default);
GET_LIB_FUNCTION(gconf, gconf_client_get_string);

and the same for the other PR_LoadLibrarys and it loaded fine.
Making the change Heitham suggested fixes it here.
None of the libraries it tries to load are on my system, so it never attempts to
resolve any of the symbols. Thus it get's to the gnome init call without bombing
out, but with undefined values in the required symbols.
Good catch.  I'll post a patch.
Attached patch patch (obsolete) — Splinter Review
This ensures that we successfully open each library before we proceed.
Attachment #132764 - Flags: superreview?(blizzard)
Attachment #132764 - Flags: review?(bzbarsky)
Comment on attachment 132764 [details] [diff] [review]
patch

r=bzbarsky
Attachment #132764 - Flags: review?(bzbarsky) → review+
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031006 Firebird/0.7+

I looked this too on my Red Hat Linux 9 system which GNOME2 installed.
Although bryner said that 'libgnome-2.so', plain Red Hat 9 has not
'libgnome-2.so' but 'libgnome-2.so.0', so I made a symlink 'libgnome-2.so' to
'libgnome-2.so.0', then recent build work fine!
Is this library name problem?
Attachment #132764 - Flags: superreview?(blizzard) → superreview+
The fixes suggested in comments #4, 7 and 11 all work for me but the Firebird I
get is crashy, particularly if there are any mime type issues involved. For
example if I try to use the BBC radio player ( http://www.bbc.co.uk/radio/ and
click listen). It may be caused by another checkin, but the problem wasn't there
a week ago. Also, the fix suggested in comment #12 doesn't work for me. The
official builds still don't work and if I put this symlink in it stops the
builds I've made with these fixes from working.
Wow, good catch.  The libgnome-2.so symlink is only created if you install the
libgnome-devel rpm.  This is bad.  I'll come up with a new patch to address this.
I don't have *any* gnome components installed at all.
As a nasty hack/test I created a symlink from
/usr/lib/libgtk-x11-2.0.so.0.200.1
to /usr/lib/libgnome-2.so
and Firebird loaded with no problems that I could see.
I figured that way it would be able to load the lib, but bomb out when trying to
resolve the first symbol, thus not get to the init gnome part.
Seems to work here anyway.
I tried the bbc site and nothing happened when I clicked on any of the live
streaming links, but I have never set anything up to happen anyway. It did not
crash.
I was also unable to get a crash on the BBC Radio site, after I hacked it
locally to look for non-existant libraries instead of the real ones.  A stack
trace for that would be helpful.
Attached patch patch v2Splinter Review
Also fix it to look for the right suffix after .so.

I tested this both with the library present and with it not present and did not
see a crash in either case.
Attachment #132764 - Attachment is obsolete: true
Attachment #132858 - Flags: review?(blizzard)
Further experimentation has shown that the crashes I talked about in comment 13
were caused by the Tabbrowser extension. Without TBE installed Firebird works
fine. Clearly as Firebird changes, extensions have to be altered to fit the
changes. The new patch works well. Last night I clicked on a link and firebird
new to open it with the realplayer even though I hadn't yet moved the plugin
into the plugins folder. And today I downloaded a word document and Firebird
knew to open it with Open Office without me having to navigate to it. Good work
people. Thankyou.
You're hard-coding the current soname. Is that really a good idea? It 
would be unfortunate if mozilla stopped working because gnome changed 
something.
*** Bug 221475 has been marked as a duplicate of this bug. ***
tenthumbs: it's no worse than a link-time dependency.  We're saying that we're
expecting the API present in that particular major version of the shared
library.  Who knows if future major versions will have the same API?
*** Bug 221904 has been marked as a duplicate of this bug. ***
Comment on attachment 132858 [details] [diff] [review]
patch v2

sr=darin

i just wrote nearly the same patch in the dup bug! ;-)	... spent several hours
tracking this down on my arm-linux build without a functioning debugger!!
Attachment #132858 - Flags: superreview+
*** Bug 220964 has been marked as a duplicate of this bug. ***
Brian Ryner: in some ways it is worse. With a link-time dependency you
can figure out what's missing with ldd. Here you would either have to 
know
or run strace to find out. Running strings wouldn't be very productive
because you've split the name into two strings.

Also if someone wanted to compile against a new library they couldn't
without editing the source code (presuming they even knew where to 
look).

At the very least you need some configure-time method to specify the
library name.

It also seems to me that you need a run-time override mechanism to avoid
potential "it just stopped working the way it used to" bug reports.
There's precedent. See the font.freetype2.shared-library pref in
gfx/src/freetype.

Actually, if it were me, I would do something in the middle, namely, 
have
a stub library whose only purpose is to provide addresses of certain
functions. With a link-time dependency it would allow you to discover
quickly if you have the functionality you need.
I'm just not convinced that this will be as big of a problem as you make it out
to be.  There's precedent for a lot of things in Mozilla that don't necessarily
need to be repeated.  Can you point me to any configuration in which this
doesn't work as-is?
> There's precedent for a lot of things in Mozilla that don't 
> necessarily need to be repeated.

I agree and this is one of them. It seems unreasonable to me to dlopen a
library you don't control by just assuming that the current conditions
will persist. You just don't know what will happen. A future version 
might actually be backwards compatible but you lose that.

I may have missed something but it seems that there's no fallback option
here. At the very least I suggest trying to load libfoo.so if loading
libfoo.so.12 fails.

> Can you point me to any configuration in which this doesn't work 
as-is?

Assuming you haven't introduced any bugs then your patch, by definition,
works with current configurations so the answer is exactly what you
expect. I think the real question is what happens tomorrow, or next 
month, or next year.
Brian pointed me to this bug so here's my recommendation.

If you dlopen() some DSO you expect the interface to have a specific interface
and semantics.  The dlsym() interface might be able to guarantee interfaces for
C++ code through name mangling but this isn't the case for C and on the
semantics side the issue cannot be resolved at all by dlsym() itself.  Therefore
you have to record the requirements on the DSO and this is done by using the
SONAME.  Run

  readelf -d /lib/libc.so.6 | fgrep SONAME

to see the SONAME.  This is the identification ld.so uses internally and the
"contract" the DSO maintainer follows is that either the DSO is ABI compatible
(not only API compatible) or the SONAME is changed.  The SONAME is the string
the linker adds as the file name to the dynamically linked binaries.

To do the same of programs using dlopen() it is necessary to use the SONAME as
the file name.  In glibc we have, for this reason, a file named
<gnu/lib-names.h>.  An equivalent file unfortunately doesn't exist for (m)any
other projects but it is easy enough to maintain separately.

Re the ldd issue.  Yes, there is no easy way to find the dependency list without
running the application.  This is an documentation issue and maybe a hint that a
little option list --list-dependencies could to used to print the list.  The
argument that one needs strace to find out what is needed is bogus.  If your
program doesn not print an understandable message after dlopen() failed this is
entirely the applications fault; it should provide better output.

Re unnecessarily changing SONAME (i.e., the ABI stays the same).  This only
means the DSO maintainer is clueless and is unnecessary causing costs.  Two
solutions: find a different maintainer, or create a symlink with the old SONAME.

In any case, using the .so name, even if it is only in addition to testing the
SONAME, is wrong.  Very wrong.  The .so name is exclusively the link name.  It
is meant to change.  Maybe even from one day to the other on the same system,
and also going back in time.  If I want to recreate the build environment from 2
years ago I'll change some .so names.  So, don't use them at dlopen()-time if
the file has a SONAME.
Attachment #132858 - Flags: review?(blizzard) → review?(bzbarsky)
So... could someone explain to me how Ulrich's comment 28 relates to the "patch
v2"?  If I understand it correctly, it's saying that what LoadVersionedLibrary
is doing is wrong?
bz: boiled down, ulrich is supporting brian's inclusion of this line:

  versionLibName.Append(libVersion);

because it means that we will be passing the library's SONAME to dlopen (e.g.,
gconf-2.so.4 instead of gconf-2.so).  likewise, he is saying that it is not a
good idea to failover to gconf-2.so, if gconf-2.so.4 cannot be loaded, because
that could potentially refer to a completely different ABI.
Ulrich Drepper: I agree that the soname is the right thing to use. I
suggested using the .so name as a fallback just because of all the
clueless packages out there. I should think more before I speak.

I'm wondering how dlsym reacts if there are versioned symbols and you give
it an unversioned name. Is it possible that the function could be
completely different on the target machine than on the build machine? I'm
wondering if it's necessary to know the versioned names at compile time to
guarantee that the function you get is the one you want? I realize almost
no one uses versioning but it's always possible. (It's also a nice idea
that should be used more often.)

What I'm getting from all this is that mozilla really needs to know the
correct soname at build time on the build machine. If you want to embed
the name in the code this means some configure stuff to determine this.
There's the possible question of whether mozilla will get the correct
function at run time.

It seems to me that mozilla winds up reproducing parts of the static and
dynamic linkers functionality. I'll repeat that I think the best approach
is to create a stub library that just returns the addresses of certain
functions in the desired library when asked. The static linker handles all
the grubby details of finding the soname, embedding it, etc. At runtime
mozilla dlopen's the stub and the target machine's dynamic linker then
decides if it can load the stub. It's probably faster, and better, to let
the system do the dirty work.
>What I'm getting from all this is that mozilla really needs to know the
>correct soname at build time on the build machine. If you want to embed
>the name in the code this means some configure stuff to determine this.

why on earth do we need or want to dynamically discover the right SONAME at
build time?  brian has written code that is designed to only work with, for
example, major version 4 of gconf-2.  it is not right for any other major
version of that library to be loaded in its place.  so, it makes sense for the
specific major version numbers to be hard coded into the code next to where they
are being used.   anything else is an unnecessary maintenance headache in my
opinion.
Comment on attachment 132858 [details] [diff] [review]
patch v2

jrb sits in the cube next to me.  He says it's safe to make assumptions about
the ABI for these libraries.
Attachment #132858 - Flags: review?(bzbarsky) → review+
checked in.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Are we sure this is fixed? I compiled a gtk2 Firebird build a couple of hours
following the checkin and my build still does not run. Here's what I get when I
attempt to run it:

/builds/firebird$ mozilla/dist/bin/MozillaFirebird
mozilla/dist/bin/MozillaFirebird-bin: relocation error:
mozilla/dist/bin/MozillaFirebird-bin: undefined symbol: __vt_14nsXPIDLCString

The system is Debian; I never had a problem prior to the checkin for bug 128668.

I have to say I don't understand why we went and used GNOME application
associations in the first place, since that's pretty much useless for us KDE
users since there doesn't seem to be a way to associate KDE apps (particularly
kmail) to protocols (I could be wrong, but gconf documentation is incredibly
confusing).
You can associate any application with a protocol via GConf.  RedHat 9 has a
tool to do it (though I've had mixed luck using it - hopefully it will
improve)... but what you can do for sure is create the registry key:

/desktop/gnome/url-handlers/mailto/command

as a string which is the command to run.  The gconf-editor app lets you create
these keys.

The undefined symbol looks totally unrelated to me; I'd expect to see that sort
of thing if it was picking up an outdated libxpcom from /usr/lib.
This is really a me too type of message, but the checked in fix solves the
problem completely for both firebird & thunderbird on my machine.
Thankyou all very much :p)
v
Status: RESOLVED → VERIFIED
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in before you can comment on or make changes to this bug.