Closed Bug 221217 Opened 21 years ago Closed 21 years ago

gtk2+xft builds since 1st Oct don't work

Categories

(Firefox Build System :: General, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: andy, Assigned: bryner)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20030928 Firebird/0.7+ (persist,gtk2+xft) The last gtk2+xft build which works on my pentium 4, Mandrake 9.1 box is the build from the 29th of Sept. Reproducible: Always Steps to Reproduce: 1.Download 2.Untar 3.Launch Actual Results: Nothing Expected Results: Firebird should open See http://forums.mozillazine.org/viewtopic.php?t=27600 and http://forums.mozillazine.org/viewtopic.php?t=27771
Oct 4th nightly and my own CVS build segfault during startup. /home/mconnor/moztree/mozilla/dist/bin/run-mozilla.sh: line 454: 14605 Segmentation fault "$prog" ${1+"$@"} build issue?
Assignee: blake → bryner
Status: UNCONFIRMED → NEW
Component: General → build-config
Ever confirmed: true
QA Contact: mpconnor
Severity: normal → critical
There's not much I can do about this unless someone can post a stack trace of the crash.
I backed out the fix for bug 128668 in my local tree, and I'm up and running on the trunk, going to rebuild as a debug with a clean tree and get a stacktrace to attach
Keywords: regression
I have this problem in nightlies of Firebird and my cvs builds of thunderbird. I tracked it down to uriloader/exthandler/unix/nsGNOMERegistry.cpp It segfaults on this line _gnome_program_init("Gecko", "1.0", _libgnome_module_info_get(), 1, argv, NULL); I have gtk2 installed, but not the gnome library, therefore I assume both _gnome_program_init and _libgnome_module_info_get() are going to be null. Commenting this line out in my thunderbird cvs build made the problem go away. I can do a thunderbird stack dump if it will help, but this is where it segfaults. I have no Firebird build with symbols and I can't get it to build from cvs.
Recent Mozilla CVS trunk build has the same problem. This is not only Mozilla Fierbird's problem.
Ok, so here's the odd part. It should be aborting the GNOME stuff way before that if it couldn't find either the gnome_program_init() symbol or the libgnome_module_info_get() symbol. Are you saying libgnome-2.so is not present on your system? If you could print out the values of the following variables at the line where it segfaults, it would help: _gnome_program_init _libgnome_module_info_get gnomeLib
The code logic seemed wrong to me, the only time the function will be exited is when PR_FindFunctionSymbol fails and the only time that is called is if PR_LoadLibrary is successful which isn't the case if you don't have the library. So i changed: if (gconfLib) { GET_LIB_FUNCTION(gconf, gconf_client_get_default); GET_LIB_FUNCTION(gconf, gconf_client_get_string); } to if (!gconfLib) { PR_Free(libName); CleanUp(); return; } GET_LIB_FUNCTION(gconf, gconf_client_get_default); GET_LIB_FUNCTION(gconf, gconf_client_get_string); and the same for the other PR_LoadLibrarys and it loaded fine.
Making the change Heitham suggested fixes it here. None of the libraries it tries to load are on my system, so it never attempts to resolve any of the symbols. Thus it get's to the gnome init call without bombing out, but with undefined values in the required symbols.
Good catch. I'll post a patch.
Attached patch patch (obsolete) — Splinter Review
This ensures that we successfully open each library before we proceed.
Attachment #132764 - Flags: superreview?(blizzard)
Attachment #132764 - Flags: review?(bzbarsky)
Comment on attachment 132764 [details] [diff] [review] patch r=bzbarsky
Attachment #132764 - Flags: review?(bzbarsky) → review+
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031006 Firebird/0.7+ I looked this too on my Red Hat Linux 9 system which GNOME2 installed. Although bryner said that 'libgnome-2.so', plain Red Hat 9 has not 'libgnome-2.so' but 'libgnome-2.so.0', so I made a symlink 'libgnome-2.so' to 'libgnome-2.so.0', then recent build work fine! Is this library name problem?
Attachment #132764 - Flags: superreview?(blizzard) → superreview+
The fixes suggested in comments #4, 7 and 11 all work for me but the Firebird I get is crashy, particularly if there are any mime type issues involved. For example if I try to use the BBC radio player ( http://www.bbc.co.uk/radio/ and click listen). It may be caused by another checkin, but the problem wasn't there a week ago. Also, the fix suggested in comment #12 doesn't work for me. The official builds still don't work and if I put this symlink in it stops the builds I've made with these fixes from working.
Wow, good catch. The libgnome-2.so symlink is only created if you install the libgnome-devel rpm. This is bad. I'll come up with a new patch to address this.
I don't have *any* gnome components installed at all. As a nasty hack/test I created a symlink from /usr/lib/libgtk-x11-2.0.so.0.200.1 to /usr/lib/libgnome-2.so and Firebird loaded with no problems that I could see. I figured that way it would be able to load the lib, but bomb out when trying to resolve the first symbol, thus not get to the init gnome part. Seems to work here anyway. I tried the bbc site and nothing happened when I clicked on any of the live streaming links, but I have never set anything up to happen anyway. It did not crash.
I was also unable to get a crash on the BBC Radio site, after I hacked it locally to look for non-existant libraries instead of the real ones. A stack trace for that would be helpful.
Attached patch patch v2Splinter Review
Also fix it to look for the right suffix after .so. I tested this both with the library present and with it not present and did not see a crash in either case.
Attachment #132764 - Attachment is obsolete: true
Attachment #132858 - Flags: review?(blizzard)
Blocks: 221475
Further experimentation has shown that the crashes I talked about in comment 13 were caused by the Tabbrowser extension. Without TBE installed Firebird works fine. Clearly as Firebird changes, extensions have to be altered to fit the changes. The new patch works well. Last night I clicked on a link and firebird new to open it with the realplayer even though I hadn't yet moved the plugin into the plugins folder. And today I downloaded a word document and Firebird knew to open it with Open Office without me having to navigate to it. Good work people. Thankyou.
You're hard-coding the current soname. Is that really a good idea? It would be unfortunate if mozilla stopped working because gnome changed something.
*** Bug 221475 has been marked as a duplicate of this bug. ***
tenthumbs: it's no worse than a link-time dependency. We're saying that we're expecting the API present in that particular major version of the shared library. Who knows if future major versions will have the same API?
*** Bug 221904 has been marked as a duplicate of this bug. ***
Comment on attachment 132858 [details] [diff] [review] patch v2 sr=darin i just wrote nearly the same patch in the dup bug! ;-) ... spent several hours tracking this down on my arm-linux build without a functioning debugger!!
Attachment #132858 - Flags: superreview+
*** Bug 220964 has been marked as a duplicate of this bug. ***
Brian Ryner: in some ways it is worse. With a link-time dependency you can figure out what's missing with ldd. Here you would either have to know or run strace to find out. Running strings wouldn't be very productive because you've split the name into two strings. Also if someone wanted to compile against a new library they couldn't without editing the source code (presuming they even knew where to look). At the very least you need some configure-time method to specify the library name. It also seems to me that you need a run-time override mechanism to avoid potential "it just stopped working the way it used to" bug reports. There's precedent. See the font.freetype2.shared-library pref in gfx/src/freetype. Actually, if it were me, I would do something in the middle, namely, have a stub library whose only purpose is to provide addresses of certain functions. With a link-time dependency it would allow you to discover quickly if you have the functionality you need.
I'm just not convinced that this will be as big of a problem as you make it out to be. There's precedent for a lot of things in Mozilla that don't necessarily need to be repeated. Can you point me to any configuration in which this doesn't work as-is?
> There's precedent for a lot of things in Mozilla that don't > necessarily need to be repeated. I agree and this is one of them. It seems unreasonable to me to dlopen a library you don't control by just assuming that the current conditions will persist. You just don't know what will happen. A future version might actually be backwards compatible but you lose that. I may have missed something but it seems that there's no fallback option here. At the very least I suggest trying to load libfoo.so if loading libfoo.so.12 fails. > Can you point me to any configuration in which this doesn't work as-is? Assuming you haven't introduced any bugs then your patch, by definition, works with current configurations so the answer is exactly what you expect. I think the real question is what happens tomorrow, or next month, or next year.
Brian pointed me to this bug so here's my recommendation. If you dlopen() some DSO you expect the interface to have a specific interface and semantics. The dlsym() interface might be able to guarantee interfaces for C++ code through name mangling but this isn't the case for C and on the semantics side the issue cannot be resolved at all by dlsym() itself. Therefore you have to record the requirements on the DSO and this is done by using the SONAME. Run readelf -d /lib/libc.so.6 | fgrep SONAME to see the SONAME. This is the identification ld.so uses internally and the "contract" the DSO maintainer follows is that either the DSO is ABI compatible (not only API compatible) or the SONAME is changed. The SONAME is the string the linker adds as the file name to the dynamically linked binaries. To do the same of programs using dlopen() it is necessary to use the SONAME as the file name. In glibc we have, for this reason, a file named <gnu/lib-names.h>. An equivalent file unfortunately doesn't exist for (m)any other projects but it is easy enough to maintain separately. Re the ldd issue. Yes, there is no easy way to find the dependency list without running the application. This is an documentation issue and maybe a hint that a little option list --list-dependencies could to used to print the list. The argument that one needs strace to find out what is needed is bogus. If your program doesn not print an understandable message after dlopen() failed this is entirely the applications fault; it should provide better output. Re unnecessarily changing SONAME (i.e., the ABI stays the same). This only means the DSO maintainer is clueless and is unnecessary causing costs. Two solutions: find a different maintainer, or create a symlink with the old SONAME. In any case, using the .so name, even if it is only in addition to testing the SONAME, is wrong. Very wrong. The .so name is exclusively the link name. It is meant to change. Maybe even from one day to the other on the same system, and also going back in time. If I want to recreate the build environment from 2 years ago I'll change some .so names. So, don't use them at dlopen()-time if the file has a SONAME.
Attachment #132858 - Flags: review?(blizzard) → review?(bzbarsky)
So... could someone explain to me how Ulrich's comment 28 relates to the "patch v2"? If I understand it correctly, it's saying that what LoadVersionedLibrary is doing is wrong?
bz: boiled down, ulrich is supporting brian's inclusion of this line: versionLibName.Append(libVersion); because it means that we will be passing the library's SONAME to dlopen (e.g., gconf-2.so.4 instead of gconf-2.so). likewise, he is saying that it is not a good idea to failover to gconf-2.so, if gconf-2.so.4 cannot be loaded, because that could potentially refer to a completely different ABI.
Ulrich Drepper: I agree that the soname is the right thing to use. I suggested using the .so name as a fallback just because of all the clueless packages out there. I should think more before I speak. I'm wondering how dlsym reacts if there are versioned symbols and you give it an unversioned name. Is it possible that the function could be completely different on the target machine than on the build machine? I'm wondering if it's necessary to know the versioned names at compile time to guarantee that the function you get is the one you want? I realize almost no one uses versioning but it's always possible. (It's also a nice idea that should be used more often.) What I'm getting from all this is that mozilla really needs to know the correct soname at build time on the build machine. If you want to embed the name in the code this means some configure stuff to determine this. There's the possible question of whether mozilla will get the correct function at run time. It seems to me that mozilla winds up reproducing parts of the static and dynamic linkers functionality. I'll repeat that I think the best approach is to create a stub library that just returns the addresses of certain functions in the desired library when asked. The static linker handles all the grubby details of finding the soname, embedding it, etc. At runtime mozilla dlopen's the stub and the target machine's dynamic linker then decides if it can load the stub. It's probably faster, and better, to let the system do the dirty work.
>What I'm getting from all this is that mozilla really needs to know the >correct soname at build time on the build machine. If you want to embed >the name in the code this means some configure stuff to determine this. why on earth do we need or want to dynamically discover the right SONAME at build time? brian has written code that is designed to only work with, for example, major version 4 of gconf-2. it is not right for any other major version of that library to be loaded in its place. so, it makes sense for the specific major version numbers to be hard coded into the code next to where they are being used. anything else is an unnecessary maintenance headache in my opinion.
Comment on attachment 132858 [details] [diff] [review] patch v2 jrb sits in the cube next to me. He says it's safe to make assumptions about the ABI for these libraries.
Attachment #132858 - Flags: review?(bzbarsky) → review+
checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Are we sure this is fixed? I compiled a gtk2 Firebird build a couple of hours following the checkin and my build still does not run. Here's what I get when I attempt to run it: /builds/firebird$ mozilla/dist/bin/MozillaFirebird mozilla/dist/bin/MozillaFirebird-bin: relocation error: mozilla/dist/bin/MozillaFirebird-bin: undefined symbol: __vt_14nsXPIDLCString The system is Debian; I never had a problem prior to the checkin for bug 128668. I have to say I don't understand why we went and used GNOME application associations in the first place, since that's pretty much useless for us KDE users since there doesn't seem to be a way to associate KDE apps (particularly kmail) to protocols (I could be wrong, but gconf documentation is incredibly confusing).
You can associate any application with a protocol via GConf. RedHat 9 has a tool to do it (though I've had mixed luck using it - hopefully it will improve)... but what you can do for sure is create the registry key: /desktop/gnome/url-handlers/mailto/command as a string which is the command to run. The gconf-editor app lets you create these keys. The undefined symbol looks totally unrelated to me; I'd expect to see that sort of thing if it was picking up an outdated libxpcom from /usr/lib.
This is really a me too type of message, but the checked in fix solves the problem completely for both firebird & thunderbird on my machine. Thankyou all very much :p)
v
Status: RESOLVED → VERIFIED
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: