Closed Bug 193442 Opened 22 years ago Closed 22 years ago

getting the GRE working without LD_LIBRARY_PATH on linux

Categories

(Core Graveyard :: Embedding: GRE Core, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.4alpha

People

(Reporter: blizzard, Assigned: dougt)

References

Details

Attachments

(4 files, 10 obsolete files)

4.71 KB, patch
alecf
: review+
sspitzer
: approval1.4b+
Details | Diff | Splinter Review
1.82 KB, patch
netscape
: review+
sspitzer
: approval1.4b+
Details | Diff | Splinter Review
9.25 KB, patch
darin.moz
: superreview+
sspitzer
: approval1.4b+
Details | Diff | Splinter Review
1.31 KB, patch
netscape
: review+
Details | Diff | Splinter Review
This is the holy grail of embedding on Linux. I would really like to have this properly working on Linux by 1.3, if possible. Here's the problem: Some of the components that we include are linked with various shared libs (libmozjs.so, libgkgfx.so, etc). In order for those files to to be automatically loaded by ld.so when xpcom opens the component shared library, the library has to be located somewhere in the library path, usually controlled at least on linux by /etc/ld.so.conf and the LD_LIBRARY_PATH environment variable. However, once a program has been started there's no way for that program to change its library path. You have to use an explicit dlopen(). Now, given that set of restrictions here are the requirements as I understand them, or really as I would like to see it done: 1. The location of the GRE on Linux should be runtime configurable. 2. Clients using the GRE should be able to locate the GRE package using a config file, through the standard rules. 3. No part of the GRE should be included in the library path so that you can have more than one version of it installed if you want. (Well, except for nspr but that's another bug.) And now some possible solutions: Doug's approach was to change the value of LD_LIBRARY_PATH at run time after the GRE has been located. However, it turns out that the value of that is only checked once at startup and changing it has no effect. So we need a new approach. The solution here requires coming up with a method of using dlopen() to load any dependent libraries of any components before that component is loaded. We can't get the dependency information directly from the shared library because to do so we would have to open it and trigger the dynamic linker error. This means that this information has to be stored outside of the shared library. I thought it might be possible to store this information in the .xpt files, but shaver talked me down. There isn't a clear 1 to 1 relationship between shared libraries and the .xpt files and besides, .xpt files describe interfaces, not shared libraries. We also might install some kind of text file or resource file for each of the shared libraries that is linked against other libraries, but to me this seems like a maintenance and installer headache. The information about linking is only loosely linked with that shared library. Another idea that I came up with was to store that information in the shared library itself, much like we have NSGetModule right now. If we can guarantee that LD_LIBRARY_PATH is set during registration (which doesn't seem entirely unreasonable) we can query the shared library, compile a list of dependencies and store that information in the registry so that information is available at run time before any components are loaded. It also tightly binds the information to the shared library in question and it is automatically updated when the shared library is updated. The last listed here is my personal favorite since it will solve the problem in an elegant and easy to use manner. If we do decide to do it this way, there is at least one wrinkle. We have to use SONAMEs for the shared libraries. If you use dlopen() and just load the shared library without a SONAME, the DT_NEEDED field in the component doesn't know which shared library you mean and will execute a search, which is what we want to avoid. However, if the shared library is linked against a SONAME, not just a shared library by name the SONAME from the previously loaded shared library will satisfy the dependency. No going out to disk, no nothing. I think we need to do SONAME anyway, so this isn't all bad. Just a little more work.
Depends on: 53727
cls informs me that we already set SONAME and looking at at least one of the libraries, we seem to. One problem solved, I think.
OK, I've talked with both dougt and cls about the xpcom changes that need to happen and the build changes that need to happen as well. cls is going to try to provide me with a way to provide a define that will allow me to provide a list of gre-specific libraries that need to be loaded before a component library is loaded. This list will look something like this: FOO=libgkgfx.so libmozjs.so that will have to be transformed into a null terminated list of libraries that need to be loaded: char *required_libs[] = { "libgkgfx.so", "libmozjs.so", NULL }; From there, dougt is going to add the required bits to the NS_IMPL_GETMODULE* macros so that if some magic define is set with this list of null terminated list of character strings (like how components works now) it will call back into xpcom, letting xpcom know the list of required libraries. The actual details should be worked out here. If we need to make changes to lots of makefiles, I'm perfectly willing to do the grunt work.
Attached patch incomplete patch v.1 (obsolete) — Splinter Review
This patch contains the bulk of the backend work to store the dependent libraries. It also contains some cleanup work in component loading including removal of obsolete loading, removal of absolute file paths in persistent stores when using the GRE, and removal of linear (by name) dll searches. What is left is to answer - how do we want the component telling xpcom what its dependent libs are? Currently there are two ideas on the table: (1) Add an interface which the component manager will implemements (something that is QI'able from the nsIComponentManager passed into RegisterSelf). From the components nsIModule::RegisterSelf, it may use this interface to state its depends. (2) Add a new entry point on the DLL which returns a string of the depend libraries. This optional entry point will be loaded by xpcom prior to the call to register self. I am leaning toward solution (1).
That patch doesn't cache the dependent libs -- without caching this will be a perf hit. I will fix that up.
I'm wondering if this can't be done simply by reusing EXTRA_DSO_LIBS. EXTRA_DSO_LIBS already contains the basename of the mozilla libraries that we link against. When you include rules.mk, EXTRA_DSO_LIBS contains the full link list of names. The main problem with piggybacking onto that variable is that not every mozilla shared library dependency is listed in those variables (xpcom, js & nspr have their own variables). Another alternative would be to just parse the library names out of the list of link flags (EXTRA_DSO_LDOPTS & maybe EXTRA_LIBS) used to create shared libs. That would remove the need to touch each of the 100+ component build makefiles.
Attached patch Add define using dep list (obsolete) — Splinter Review
Attachment #114815 - Attachment is obsolete: true
A couple of notes: I added support to the generic module code so that a component can specify a list of libraries which should be loaded prior to XPCOM loading that component. All a component has to do is something like: NS_IMPL_NSGETMODULE_WITH_CTOR_DTOR_DEPEND(component_name, moduleInfo, ctor, dtor, array_of_library_names) Along with the cls build changes, we can do something like this: Index: nsNetModule.cpp =================================================================== RCS file: /cvsroot/mozilla/netwerk/build/nsNetModule.cpp,v retrieving revision 1.86 diff -u -1 -0 -r1.86 nsNetModule.cpp --- nsNetModule.cpp 22 Jan 2003 03:50:13 -0000 1.86 +++ nsNetModule.cpp 19 Feb 2003 03:19:04 -0000 @@ -898,12 +898,16 @@ }, { NS_CACHESERVICE_CLASSNAME, NS_CACHESERVICE_CID, NS_CACHESERVICE_CONTRACTID, nsCacheService::Create }, }; -NS_IMPL_NSGETMODULE_WITH_DTOR(necko_core_and_primary_protocols, gNetModuleInfo, - nsNeckoShutdown) +const char* dependlibs[] = {DEPENDENT_LIBS "\0"}; +NS_IMPL_NSGETMODULE_WITH_CTOR_DTOR_DEPEND(necko_core_and_primary_protocols, + gNetModuleInfo, + nsnull, + nsNeckoShutdown, + dependlibs) I really do not like having to change all of the call sites of NS_IMPL_NSGETMODULE to use this form. (if I do this, I will probably get rid of the shorter forms). Maybe we should just add another macro? I am trying to think of a better way. The problem is that this macro expands into something that calls into XPCOM and passes data into it. This data is the only connection between xpcom and components that use the generic module code. so, if we are going to use the existing generic module code, we must pass this list using the above method. Also, I am not sure i want ALL library dependencies listed or resolved automatically by XPCOM. For example, of the 12 libraries that libnecko.so requires, only 1 non system library may not have loaded by the time xpcom gets around to loading necko. Hence there will be 11 PR_LoadLibrary class for nothing. Furthermore, there are 80 or so libraries that xpcom knows about. This just feels like it is going to kill startup time. I think that either we need to restrict the libraries to mozilla things (stuff usually found in bin/) or cache these lookups by name or something... (blizzard tells me to measure first - good advice. lets find out.) Alec, can you think of a better way to pass an array of string into the generic module code? Do you have any preference on converting existing code to use this new form? (note that the patch to necko should have been #ifdef'ed so that if that define does exist, things still will be happy).
Severity: normal → critical
OS: Linux → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.3final
Attached patch current state. (obsolete) — Splinter Review
Attachment #114750 - Attachment is obsolete: true
Attachment #114819 - Attachment is obsolete: true
alec, darin, blizzard, can you please review the last patch.
Comment on attachment 114912 [details] [diff] [review] current state. adding it to my queue
Attachment #114912 - Flags: superreview?(alecf)
Comment on attachment 114912 [details] [diff] [review] current state. minor comments: - years ago warren made a non-modifying nsCRT::strtok() - it would save you the strdup. - it seems like we should only explicity load libraries that are not in the path. I know that this is added complexity, but it seems like the perforance win would be worth it. You'd obviously have to do some platform-specific parsing of LD_LIBRARY_PATH and so forth. The concern I have is that for some things, like shell32.dll, we only make one call into it, and do it rarely. I think the DLL is not actually loaded until the first call is made.
blizzard and I think that before we optimize, we should get numbers.
At least on linux there isn't an LD_LIBRARY_PATH set by default. You would have to parse /etc/ld.so.conf and I don't want to go there. We start getting into dark places when we're second guessing the loader. Anyway, I think the real solution here is to try to filter out certain files out of the list if dependent libs instead of trying to filter the list of libs in xpcom. The make rules that generate that are pretty simple right now and work but could be improved.
Attached patch incorporated feedback (obsolete) — Splinter Review
this patch does not enable dependent library loading -- we need to adjust the generic module macros to enable this functionality.
Attachment #114912 - Attachment is obsolete: true
Comment on attachment 115592 [details] [diff] [review] incorporated feedback >Index: xpcom/components/nsComponentManager.cpp >+ NR_StartupRegistry(); not meant to be part of the patch, right? >+ if (mGREComponentsDir) { ... >+ mGREComponentsOffset = componentDescriptor.Length(); >+ } does this compile? i don't see mGREComponentsOffset declared anywhere. can this be removed? > // fall through to QI as anything QIable is a superset of what canbe s/canbe/can be/ >+ if (!extraData) >+ { >+ //name,last_modification_date >+ PR_fprintf(fd, >+ "%s,%lld\n", >+ entry->GetName(), >+ entry->GetDate()); >+ } >+ else >+ { >+ //name,last_modification_date, optional-data >+ PR_fprintf(fd, >+ "%s,%lld,%s\n", >+ entry->GetName(), >+ entry->GetDate(), >+ extraData); >+ } nitty: wouldn't something like this also work? const char *fmt; if (extraData) fmt = "%s,%lld,%s\n"; else fmt = "%s,%lld\n"; PR_fprintf(fd, fmt, entry->GetName(), entry->GetDate(), extraData); function with (...) argument has parameters pushed and popped by caller, so this should be ok. >+ /* absolute names include volume info on Mac, so persistent descriptor */ >+ rv = aSpec->GetNativePath(persistentDescriptor); >+ if (NS_FAILED(rv)) >+ return rv; >+ return MakeRegistryName(persistentDescriptor.get(), XPCOM_ABSCOMPONENT_PREFIX, aRegistryName); ok, can you clean up this persistent descriptor isn't a persistent descriptor business? >+ *_retval = (char*)nsMemory::Clone(opData, strlen(opData)+1); equivalent to: *_retval = ToNewCString(nsDependentCString(opData)); maybe just a bit nicer to read. >Index: xpcom/components/nsIComponentLoaderManager.idl >+ string getOptionalData(in nsIFile file, in string loaderString); >+ void setOptionalData(in nsIFile file, in string loaderString, in string value); for getters like this it is usually nice to use ACString because then the caller can use a nsCAutoString instead of a nsXPIDLCString. >Index: xpcom/reflect/xptinfo/src/xptiManifest.cpp what will happen when older xpti.dat files exist. are we counting on the installer to take care of forcing a re-gen of xpti.dat? >Index: xpfe/bootstrap/Makefile.in >-GRE_BUILD = 1; >+GRE_BUILD = 0; what's this all about? overall this patch looks pretty good to me.
>for getters like this it is usually nice to use ACString because then the >caller can use a nsCAutoString instead of a nsXPIDLCString. remember, I don't want to use our string classes in this interface? Because of the fact that I want this used in the glue macros and didn't want to use nsEmbedString. >what will happen when older xpti.dat files exist. are we counting on >the installer to take care of forcing a re-gen of xpti.dat? This is a nonissue. We are adding a new type to the manifest list, not changing existing entries. Everything else fixed.... new patch coming up....
Attached patch includes darin's feedback (obsolete) — Splinter Review
Attachment #115592 - Attachment is obsolete: true
Attached patch more feedback. (obsolete) — Splinter Review
cleans up the nsIGenericModule.h macro's (last patch removed too much stuff).
Attachment #115682 - Attachment is obsolete: true
Comment on attachment 115775 [details] [diff] [review] more feedback. blizzard, could look over this patch as well?
Attachment #115775 - Flags: superreview?(alecf)
Attachment #115775 - Flags: review?(darin)
Comment on attachment 115775 [details] [diff] [review] more feedback. + /* absolute names include volume info on Mac, so persistent descriptor */ hmm? me not understand. I'm also not sure why nsIComponentLoaderManager is no longer scriptable (I'm not sure I see much value in it being scriptable, but I don't see why not..) + if (strncmp(values[1], "gre", 3)) I'm slightly concerned that this only tests if the PREFIX of the string is gre. what if this is somehow "great" or the persisten descriptor starts with "gre" - won't this be a false match? other than those minor things, this seems ok. sr=alecf of course along with reviews from darin and blizzard. Sorry I couldn't spend too much more time with this.
Attachment #115775 - Flags: superreview?(alecf) → superreview+
Comment on attachment 115775 [details] [diff] [review] more feedback. r=darin
Attachment #115775 - Flags: review?(darin) → review+
Fix landed on the trunk this afternoon.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.3final → mozilla1.4alpha
Apparently, beos' gcc & mingw's windres don't like the -DDEPENDENT_LIBS="\"foo\", \"bar\"," construct. We're going to have to find a better way to add those defines or limit that hack to linux.
I don't think we need it on win32 since changing PATH on win32 actually updates the library path. That doesn't work on Linux which is why we need this code. OSX might need it, but I can't be sure - no one has tested it.
reopening since we still need to land changes to the generic module code to utilize this support.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I was just looking at the startup graphs, and they all show a nice 10-15% slowdown on startup in the time period corresponding to this checkin: http://tegu.mozilla.org/graph/multiquery.cgi?&testname=startup&tboxes=comet,luna,sleestack,mecca,facedown,openwound,rheeeet&days=7&autoscale=1&units=&ltype=&points=&avg=1 What's up with that? ;)
Identified the performance problem. relanded changes last night.
so don't keep us in suspense...:) what was it?
two problems: (a) A logic error in xptiManifest.cpp Was doing this: if (PL_strcmp(values[1], "gre")) instead of this: if (0 != PL_strcmp(values[1], "gre")) b) The performance problem still exists, but isn't hit in builds using a local GRE. This takes 200ms: > { > nsCOMPtr<nsIFile> greDirectory; > NS_GetSpecialDirectory(NS_GRE_COMPONENT_DIR, getter_AddRefs(greDirectory)); > nsCOMPtr<nsILocalFile> lFile = do_QueryInterface(greDirectory); > if (!lFile) > goto out; > > lFile->GetPersistentDescriptor(str); a pattern too familiar to not investigate.
Attachment #114912 - Flags: superreview?(alecf)
re: comment #25 We will want this support on windows. Windows has a dll search ordering where the path variable is checked last.
So what's left to finish this support up?
the first thing that we need is to revive the DEPENDENT_LIB define. I have local changes which change the generic module code to use this define. Let me clean the up.
Attachment #121555 - Flags: review?(alecf)
Comment on attachment 121555 [details] [diff] [review] Enables generic module code. r/sr=alecf
Attachment #121555 - Flags: review?(alecf) → review+
Attachment #121555 - Flags: approval1.4b?
Comment on attachment 121555 [details] [diff] [review] Enables generic module code. a=sspitzer
Attachment #121555 - Flags: approval1.4b? → approval1.4b+
Checking in nsIGenericFactory.h; /cvsroot/mozilla/xpcom/glue/nsIGenericFactory.h,v <-- nsIGenericFactory.h new revision: 1.37; previous revision: 1.36 done Now we need a build/config change which will set DEPENDENT_LIBS.
One of Chris Seawood's patches here already contains the build-fu to generate the dependent libs.
it does, but I didn't think he was happy with it.
applying cls's patch which enables dependent libs requires about 100ms. I am going to add a cache.
Blocks: 202326
Attached patch More on dependent libs (obsolete) — Splinter Review
Adds a loaded dependent lib cache. (getting perf data now) Fixes a reregistration problem where the dependent lib name would not be deleted.
Attachment #122167 - Flags: review?(alecf)
Attachment #122167 - Flags: approval1.4b?
Attached patch adds USE_DEPENDENT_LIB (obsolete) — Splinter Review
Attachment #122168 - Flags: review?(seawood)
Attachment #122168 - Flags: approval1.4b?
I'll a= after you have r=, ok?
Comment on attachment 122168 [details] [diff] [review] adds USE_DEPENDENT_LIB Where's the autoconf.mk.in changes?
Attachment #122168 - Flags: review?(seawood)
Attachment #122168 - Flags: review-
Attachment #122168 - Flags: approval1.4b?
Attachment #122168 - Attachment is obsolete: true
Attachment #122206 - Flags: review?(seawood)
Attachment #122206 - Flags: approval1.4b?
Attachment #122206 - Flags: review?(seawood) → review+
Comment on attachment 122206 [details] [diff] [review] adds USE_DEPENDENT_LIB v.2 a=sspitzer
Attachment #122206 - Flags: approval1.4b? → approval1.4b+
Comment on attachment 122167 [details] [diff] [review] More on dependent libs clearing a request. dougt, do you still need a on this? if so, please request.
Attachment #122167 - Flags: approval1.4b?
yup, for 1.4. See 202326 for details on why. Working on getting a patch.
Flags: blocking1.4b?
Flags: blocking1.4?
Comment on attachment 122167 [details] [diff] [review] More on dependent libs >Index: nsNativeComponentLoader.cpp >+ mLoadedDependentLibs = new nsHashtable(16, PR_TRUE); shouldn't your ctor initialize this? and when do you destroy it? does it just leak? >Index: xcDll.cpp >+ NS_ASSERTION(m_loader->mLoadedDependentLibs, "huh?"); >+ if (!m_loader->mLoadedDependentLibs) >+ return PR_TRUE; nit: if (!m_loader->mLoadedDependentLibs) { NS_ERROR("huh? no dependent libs"); return PR_TRUE; } >-#if defined(XP_UNIX) && !defined(MACOSX) >-#if defined(XP_UNIX) && !defined(MACOSX) i'm curious.. why do we want to enable this for mach-o now? >Index: xcDll.h >+ nsCOMPtr<nsIFile> m_dllSpec; >+ PRLibrary *m_instance; >+ nsIModule *m_moduleObject; >+ nsNativeComponentLoader *m_loader; >+ PRBool m_markForUnload; looks funny to have the m_'s not line up, but whatever.
Attachment #122167 - Flags: superreview-
Attached patch includes darin's comments (obsolete) — Splinter Review
Attachment #115775 - Attachment is obsolete: true
Attachment #122333 - Attachment is obsolete: true
Comment on attachment 122334 [details] [diff] [review] all of darin's comments. sr=darin although i'm not sure why it is necessary to dynamically allocate the hash tables.
Attachment #122334 - Flags: superreview+
Attachment #122167 - Attachment is obsolete: true
Attached patch config patchSplinter Review
Attachment #122344 - Flags: review?(seawood)
Attachment #122344 - Flags: approval1.4b?
Comment on attachment 122334 [details] [diff] [review] all of darin's comments. a=sspitzer
Attachment #122334 - Flags: approval1.4b+
only patch 122344 is left. this patch actually enables everything.
Attachment #122344 - Flags: review?(seawood) → review+
Comment on attachment 122206 [details] [diff] [review] adds USE_DEPENDENT_LIB v.2 Doug, why is USE_DEPENDENT_LIBS disabled for BeOS and Win32 with mingw?
Comment on attachment 122344 [details] [diff] [review] config patch a=asa (on behalf of drivers) for checkin to 1.4b
Attachment #122344 - Flags: approval1.4b? → approval1.4b+
wtc, the beos compiler and the mingw compiler both have problems with the nested quotes in the define passed on the command line.
I attempted to land, but the clobbers boxes turned oranged. The error was: ERROR: profile /mnt/4/tinderbox/brad/Linux_2.4.20abackupboy3_Clobber/.mozilla/default does not exist I am backing out my change until i figure this out. (i will just change the ifdef in rules.mk.)
(maybe that wasn't the real error)
All parts have been checked in now and tbox's are still green. Marking fixed.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Woo! It's party time!
Flags: blocking1.4b?
Flags: blocking1.4?
sorry, just went through bugmail on this - in the future please don't dynamically allocate hashtables with new/delete - it is FAR cheaper to make them member variables directly.
I will fix it soon. see bug 204634
Attachment #122167 - Flags: review?(alecf)
The unloading of the dependent libraries at xcDLL.cpp nsDll::Load() is leading to 204804 on HP-UX PA 32. As the comment says: "should we unload later - or even at all?" Without this unloading, everything seems to work fine on HP-UX 232 #if defined(XP_UNIX) 233 // Unload any of library dependencies we loaded earlier. The assumption 234 // here is that the component will have a "internal" reference count to 235 // the dependency library we just loaded. 236 // XXX should we unload later - or even at all? 237 238 if (extraData != nsnull) 239 { 240 PRInt32 arrayCount = dependentLibArray.Count(); 241 for (PRInt32 index = 0; index < arrayCount; index++) 242 PR_UnloadLibrary((PRLibrary*)dependentLibArray.ElementAt(index)); 243 } 244 #endif This might be working on other platforms, because of the different behaviour of HP-UX PA32 shl_unload() which unloads the shared library without any reference count checks.
I see not real reason to unload. Feel free to attach a patch to 204804 which removes unloading and I will review it, or just assign 204804 to me.
bug 204804 is assigned to dougt@meer.net, but its still at UNCONFIRMED status. I tried reassigning, but bugzilla says I am not a "sufficiently empowered user" to reassign, since it requires a status change from UNCONFIRMED to NEW.
HP is on the short bus. If we're going to disable unloading, please make an #ifdef HPUX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: