Closed
Bug 221217
Opened 21 years ago
Closed 21 years ago
gtk2+xft builds since 1st Oct don't work
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: andy, Assigned: bryner)
References
Details
(Keywords: regression)
Attachments
(1 file, 1 obsolete file)
3.25 KB,
patch
|
blizzard
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft)
The last gtk2+xft build which works on my pentium 4, Mandrake 9.1 box is the
build from the 29th of Sept.
Reproducible: Always
Steps to Reproduce:
1.Download
2.Untar
3.Launch
Actual Results:
Nothing
Expected Results:
Firebird should open
See http://forums.mozillazine.org/viewtopic.php?t=27600 and
http://forums.mozillazine.org/viewtopic.php?t=27771
Comment 1•21 years ago
|
||
Oct 4th nightly and my own CVS build segfault during startup.
/home/mconnor/moztree/mozilla/dist/bin/run-mozilla.sh: line 454: 14605
Segmentation fault "$prog" ${1+"$@"}
build issue?
Assignee: blake → bryner
Status: UNCONFIRMED → NEW
Component: General → build-config
Ever confirmed: true
QA Contact: mpconnor
Updated•21 years ago
|
Severity: normal → critical
Assignee | ||
Comment 2•21 years ago
|
||
There's not much I can do about this unless someone can post a stack trace of
the crash.
Comment 3•21 years ago
|
||
I backed out the fix for bug 128668 in my local tree, and I'm up and running on
the trunk, going to rebuild as a debug with a clean tree and get a stacktrace to
attach
Keywords: regression
Comment 4•21 years ago
|
||
I have this problem in nightlies of Firebird and my cvs builds of thunderbird.
I tracked it down to uriloader/exthandler/unix/nsGNOMERegistry.cpp
It segfaults on this line
_gnome_program_init("Gecko", "1.0", _libgnome_module_info_get(),
1, argv, NULL);
I have gtk2 installed, but not the gnome library, therefore I assume both
_gnome_program_init and _libgnome_module_info_get() are going to be null.
Commenting this line out in my thunderbird cvs build made the problem go away.
I can do a thunderbird stack dump if it will help, but this is where it segfaults.
I have no Firebird build with symbols and I can't get it to build from cvs.
Comment 5•21 years ago
|
||
Recent Mozilla CVS trunk build has the same problem.
This is not only Mozilla Fierbird's problem.
Assignee | ||
Comment 6•21 years ago
|
||
Ok, so here's the odd part. It should be aborting the GNOME stuff way before
that if it couldn't find either the gnome_program_init() symbol or the
libgnome_module_info_get() symbol. Are you saying libgnome-2.so is not present
on your system?
If you could print out the values of the following variables at the line where
it segfaults, it would help:
_gnome_program_init
_libgnome_module_info_get
gnomeLib
Comment 7•21 years ago
|
||
The code logic seemed wrong to me, the only time the function will be exited is
when PR_FindFunctionSymbol fails and the only time that is called is if
PR_LoadLibrary is successful which isn't the case if you don't have the library.
So i changed:
if (gconfLib) {
GET_LIB_FUNCTION(gconf, gconf_client_get_default);
GET_LIB_FUNCTION(gconf, gconf_client_get_string);
}
to
if (!gconfLib) {
PR_Free(libName);
CleanUp();
return;
}
GET_LIB_FUNCTION(gconf, gconf_client_get_default);
GET_LIB_FUNCTION(gconf, gconf_client_get_string);
and the same for the other PR_LoadLibrarys and it loaded fine.
Comment 8•21 years ago
|
||
Making the change Heitham suggested fixes it here.
None of the libraries it tries to load are on my system, so it never attempts to
resolve any of the symbols. Thus it get's to the gnome init call without bombing
out, but with undefined values in the required symbols.
Assignee | ||
Comment 9•21 years ago
|
||
Good catch. I'll post a patch.
Assignee | ||
Comment 10•21 years ago
|
||
This ensures that we successfully open each library before we proceed.
Assignee | ||
Updated•21 years ago
|
Attachment #132764 -
Flags: superreview?(blizzard)
Attachment #132764 -
Flags: review?(bzbarsky)
Comment 11•21 years ago
|
||
Comment on attachment 132764 [details] [diff] [review]
patch
r=bzbarsky
Attachment #132764 -
Flags: review?(bzbarsky) → review+
Comment 12•21 years ago
|
||
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031006 Firebird/0.7+
I looked this too on my Red Hat Linux 9 system which GNOME2 installed.
Although bryner said that 'libgnome-2.so', plain Red Hat 9 has not
'libgnome-2.so' but 'libgnome-2.so.0', so I made a symlink 'libgnome-2.so' to
'libgnome-2.so.0', then recent build work fine!
Is this library name problem?
Updated•21 years ago
|
Attachment #132764 -
Flags: superreview?(blizzard) → superreview+
Reporter | ||
Comment 13•21 years ago
|
||
The fixes suggested in comments #4, 7 and 11 all work for me but the Firebird I
get is crashy, particularly if there are any mime type issues involved. For
example if I try to use the BBC radio player ( http://www.bbc.co.uk/radio/ and
click listen). It may be caused by another checkin, but the problem wasn't there
a week ago. Also, the fix suggested in comment #12 doesn't work for me. The
official builds still don't work and if I put this symlink in it stops the
builds I've made with these fixes from working.
Assignee | ||
Comment 14•21 years ago
|
||
Wow, good catch. The libgnome-2.so symlink is only created if you install the
libgnome-devel rpm. This is bad. I'll come up with a new patch to address this.
Comment 15•21 years ago
|
||
I don't have *any* gnome components installed at all.
As a nasty hack/test I created a symlink from
/usr/lib/libgtk-x11-2.0.so.0.200.1
to /usr/lib/libgnome-2.so
and Firebird loaded with no problems that I could see.
I figured that way it would be able to load the lib, but bomb out when trying to
resolve the first symbol, thus not get to the init gnome part.
Seems to work here anyway.
I tried the bbc site and nothing happened when I clicked on any of the live
streaming links, but I have never set anything up to happen anyway. It did not
crash.
Assignee | ||
Comment 16•21 years ago
|
||
I was also unable to get a crash on the BBC Radio site, after I hacked it
locally to look for non-existant libraries instead of the real ones. A stack
trace for that would be helpful.
Assignee | ||
Comment 17•21 years ago
|
||
Also fix it to look for the right suffix after .so.
I tested this both with the library present and with it not present and did not
see a crash in either case.
Attachment #132764 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #132858 -
Flags: review?(blizzard)
Reporter | ||
Comment 18•21 years ago
|
||
Further experimentation has shown that the crashes I talked about in comment 13
were caused by the Tabbrowser extension. Without TBE installed Firebird works
fine. Clearly as Firebird changes, extensions have to be altered to fit the
changes. The new patch works well. Last night I clicked on a link and firebird
new to open it with the realplayer even though I hadn't yet moved the plugin
into the plugins folder. And today I downloaded a word document and Firebird
knew to open it with Open Office without me having to navigate to it. Good work
people. Thankyou.
Comment 19•21 years ago
|
||
You're hard-coding the current soname. Is that really a good idea? It
would be unfortunate if mozilla stopped working because gnome changed
something.
Comment 20•21 years ago
|
||
*** Bug 221475 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 21•21 years ago
|
||
tenthumbs: it's no worse than a link-time dependency. We're saying that we're
expecting the API present in that particular major version of the shared
library. Who knows if future major versions will have the same API?
Assignee | ||
Comment 22•21 years ago
|
||
*** Bug 221904 has been marked as a duplicate of this bug. ***
Comment 23•21 years ago
|
||
Comment on attachment 132858 [details] [diff] [review]
patch v2
sr=darin
i just wrote nearly the same patch in the dup bug! ;-) ... spent several hours
tracking this down on my arm-linux build without a functioning debugger!!
Attachment #132858 -
Flags: superreview+
Comment 24•21 years ago
|
||
*** Bug 220964 has been marked as a duplicate of this bug. ***
Comment 25•21 years ago
|
||
Brian Ryner: in some ways it is worse. With a link-time dependency you
can figure out what's missing with ldd. Here you would either have to
know
or run strace to find out. Running strings wouldn't be very productive
because you've split the name into two strings.
Also if someone wanted to compile against a new library they couldn't
without editing the source code (presuming they even knew where to
look).
At the very least you need some configure-time method to specify the
library name.
It also seems to me that you need a run-time override mechanism to avoid
potential "it just stopped working the way it used to" bug reports.
There's precedent. See the font.freetype2.shared-library pref in
gfx/src/freetype.
Actually, if it were me, I would do something in the middle, namely,
have
a stub library whose only purpose is to provide addresses of certain
functions. With a link-time dependency it would allow you to discover
quickly if you have the functionality you need.
Assignee | ||
Comment 26•21 years ago
|
||
I'm just not convinced that this will be as big of a problem as you make it out
to be. There's precedent for a lot of things in Mozilla that don't necessarily
need to be repeated. Can you point me to any configuration in which this
doesn't work as-is?
Comment 27•21 years ago
|
||
> There's precedent for a lot of things in Mozilla that don't
> necessarily need to be repeated.
I agree and this is one of them. It seems unreasonable to me to dlopen a
library you don't control by just assuming that the current conditions
will persist. You just don't know what will happen. A future version
might actually be backwards compatible but you lose that.
I may have missed something but it seems that there's no fallback option
here. At the very least I suggest trying to load libfoo.so if loading
libfoo.so.12 fails.
> Can you point me to any configuration in which this doesn't work
as-is?
Assuming you haven't introduced any bugs then your patch, by definition,
works with current configurations so the answer is exactly what you
expect. I think the real question is what happens tomorrow, or next
month, or next year.
Comment 28•21 years ago
|
||
Brian pointed me to this bug so here's my recommendation.
If you dlopen() some DSO you expect the interface to have a specific interface
and semantics. The dlsym() interface might be able to guarantee interfaces for
C++ code through name mangling but this isn't the case for C and on the
semantics side the issue cannot be resolved at all by dlsym() itself. Therefore
you have to record the requirements on the DSO and this is done by using the
SONAME. Run
readelf -d /lib/libc.so.6 | fgrep SONAME
to see the SONAME. This is the identification ld.so uses internally and the
"contract" the DSO maintainer follows is that either the DSO is ABI compatible
(not only API compatible) or the SONAME is changed. The SONAME is the string
the linker adds as the file name to the dynamically linked binaries.
To do the same of programs using dlopen() it is necessary to use the SONAME as
the file name. In glibc we have, for this reason, a file named
<gnu/lib-names.h>. An equivalent file unfortunately doesn't exist for (m)any
other projects but it is easy enough to maintain separately.
Re the ldd issue. Yes, there is no easy way to find the dependency list without
running the application. This is an documentation issue and maybe a hint that a
little option list --list-dependencies could to used to print the list. The
argument that one needs strace to find out what is needed is bogus. If your
program doesn not print an understandable message after dlopen() failed this is
entirely the applications fault; it should provide better output.
Re unnecessarily changing SONAME (i.e., the ABI stays the same). This only
means the DSO maintainer is clueless and is unnecessary causing costs. Two
solutions: find a different maintainer, or create a symlink with the old SONAME.
In any case, using the .so name, even if it is only in addition to testing the
SONAME, is wrong. Very wrong. The .so name is exclusively the link name. It
is meant to change. Maybe even from one day to the other on the same system,
and also going back in time. If I want to recreate the build environment from 2
years ago I'll change some .so names. So, don't use them at dlopen()-time if
the file has a SONAME.
Assignee | ||
Updated•21 years ago
|
Attachment #132858 -
Flags: review?(blizzard) → review?(bzbarsky)
Comment 29•21 years ago
|
||
So... could someone explain to me how Ulrich's comment 28 relates to the "patch
v2"? If I understand it correctly, it's saying that what LoadVersionedLibrary
is doing is wrong?
Comment 30•21 years ago
|
||
bz: boiled down, ulrich is supporting brian's inclusion of this line:
versionLibName.Append(libVersion);
because it means that we will be passing the library's SONAME to dlopen (e.g.,
gconf-2.so.4 instead of gconf-2.so). likewise, he is saying that it is not a
good idea to failover to gconf-2.so, if gconf-2.so.4 cannot be loaded, because
that could potentially refer to a completely different ABI.
Comment 31•21 years ago
|
||
Ulrich Drepper: I agree that the soname is the right thing to use. I
suggested using the .so name as a fallback just because of all the
clueless packages out there. I should think more before I speak.
I'm wondering how dlsym reacts if there are versioned symbols and you give
it an unversioned name. Is it possible that the function could be
completely different on the target machine than on the build machine? I'm
wondering if it's necessary to know the versioned names at compile time to
guarantee that the function you get is the one you want? I realize almost
no one uses versioning but it's always possible. (It's also a nice idea
that should be used more often.)
What I'm getting from all this is that mozilla really needs to know the
correct soname at build time on the build machine. If you want to embed
the name in the code this means some configure stuff to determine this.
There's the possible question of whether mozilla will get the correct
function at run time.
It seems to me that mozilla winds up reproducing parts of the static and
dynamic linkers functionality. I'll repeat that I think the best approach
is to create a stub library that just returns the addresses of certain
functions in the desired library when asked. The static linker handles all
the grubby details of finding the soname, embedding it, etc. At runtime
mozilla dlopen's the stub and the target machine's dynamic linker then
decides if it can load the stub. It's probably faster, and better, to let
the system do the dirty work.
Comment 32•21 years ago
|
||
>What I'm getting from all this is that mozilla really needs to know the
>correct soname at build time on the build machine. If you want to embed
>the name in the code this means some configure stuff to determine this.
why on earth do we need or want to dynamically discover the right SONAME at
build time? brian has written code that is designed to only work with, for
example, major version 4 of gconf-2. it is not right for any other major
version of that library to be loaded in its place. so, it makes sense for the
specific major version numbers to be hard coded into the code next to where they
are being used. anything else is an unnecessary maintenance headache in my
opinion.
Comment 33•21 years ago
|
||
Comment on attachment 132858 [details] [diff] [review]
patch v2
jrb sits in the cube next to me. He says it's safe to make assumptions about
the ABI for these libraries.
Attachment #132858 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 34•21 years ago
|
||
checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 35•21 years ago
|
||
Are we sure this is fixed? I compiled a gtk2 Firebird build a couple of hours
following the checkin and my build still does not run. Here's what I get when I
attempt to run it:
/builds/firebird$ mozilla/dist/bin/MozillaFirebird
mozilla/dist/bin/MozillaFirebird-bin: relocation error:
mozilla/dist/bin/MozillaFirebird-bin: undefined symbol: __vt_14nsXPIDLCString
The system is Debian; I never had a problem prior to the checkin for bug 128668.
I have to say I don't understand why we went and used GNOME application
associations in the first place, since that's pretty much useless for us KDE
users since there doesn't seem to be a way to associate KDE apps (particularly
kmail) to protocols (I could be wrong, but gconf documentation is incredibly
confusing).
Assignee | ||
Comment 36•21 years ago
|
||
You can associate any application with a protocol via GConf. RedHat 9 has a
tool to do it (though I've had mixed luck using it - hopefully it will
improve)... but what you can do for sure is create the registry key:
/desktop/gnome/url-handlers/mailto/command
as a string which is the command to run. The gconf-editor app lets you create
these keys.
The undefined symbol looks totally unrelated to me; I'd expect to see that sort
of thing if it was picking up an outdated libxpcom from /usr/lib.
Comment 37•21 years ago
|
||
This is really a me too type of message, but the checked in fix solves the
problem completely for both firebird & thunderbird on my machine.
Thankyou all very much :p)
Updated•6 years ago
|
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•