Last Comment Bug 785102 - Two different libxul.so used while populating startupcache
: Two different libxul.so used while populating startupcache
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: mozilla17
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
:
: Gregory Szorc [:gps]
Mentors:
Depends on: 785748
Blocks: 784691
  Show dependency treegraph
 
Reported: 2012-08-23 08:57 PDT by Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
Modified: 2012-08-27 03:46 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch to use dist/firefox/run-mozilla.sh (1.22 KB, patch)
2012-08-23 09:46 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
mh+mozilla: review-
Details | Diff | Splinter Review
Handle windows too (1.87 KB, patch)
2012-08-23 11:30 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
mh+mozilla: review-
Details | Diff | Splinter Review
and os 2 (1.66 KB, patch)
2012-08-23 12:01 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
mh+mozilla: review+
Details | Diff | Splinter Review
Hack: just do not execute osfile_shared_allthreads.jsm while building the startup cache, (1.58 KB, patch)
2012-08-24 07:03 PDT, David Teller [:Yoric] (please use "needinfo")
mh+mozilla: review-
Details | Diff | Splinter Review
Semi-hack: Loading libxul lazily from osfile_shared_allthreads.jsm (5.80 KB, patch)
2012-08-24 12:17 PDT, David Teller [:Yoric] (please use "needinfo")
mh+mozilla: review+
Details | Diff | Splinter Review

Description Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 08:57:59 PDT
xpcshell links with libxul.so and the script that populates the startupcache dlopens libxul.so. The problem is that with elfhack they are not the same libxul.so.
Comment 1 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 09:27:10 PDT
This happens during "make package". We use bin/run-mozilla.sh which causes ld.so to load the libxul.so in dist/bin. The script then dlopens dist/firefox/libxul.so.
Comment 2 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 09:32:27 PDT
Using dist/firefox/run-mozilla.sh fixes the problem by making both ld.so and dlopen use the libxul.so in dist/firefox.

The dlopen is from osfile_shared_allthreads.jsm:

     let libxul = ctypes.open(OS.Constants.Path.libxul)
Comment 3 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 09:38:08 PDT
A quick summary from IRC:

We can fix this by
* Using dist/firefox/run-mozilla.sh instead of dist/bin/run-mozilla.sh
* Passing just a filename in osfile_shared_allthreads.jsm
* Fixing the path computation in osfile_shared_allthreads.jsm

Looks like openbsd requires a full path to dlopen.
Comment 4 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 09:46:01 PDT
Created attachment 654664 [details] [diff] [review]
patch to use dist/firefox/run-mozilla.sh
Comment 5 Mike Hommey [:glandium] 2012-08-23 10:04:15 PDT
Comment on attachment 654664 [details] [diff] [review]
patch to use dist/firefox/run-mozilla.sh

I'm pretty sure this will break windows builds, because they don't have run-mozilla.sh at all.
Comment 6 David Teller [:Yoric] (please use "needinfo") 2012-08-23 11:13:17 PDT
(In reply to Mike Hommey [:glandium] from comment #5)
> Comment on attachment 654664 [details] [diff] [review]
> patch to use dist/firefox/run-mozilla.sh
> 
> I'm pretty sure this will break windows builds, because they don't have
> run-mozilla.sh at all.

On the other hand, if we can solve the Windows problem, using run-mozilla.sh would make startupcache much less error-prone (see e.g. bug 784368).

However, for the current bug, I can add a hack in osfile_shared_allthreads.jsm to not open libxul if we are currently building the startup cache.

Of course, it might be interesting if there is a good way to detect that we are building the startup cache. For the moment, the only technique I know is to check whether |NS_GetSpecialDirectory("ProfD", ...)| success.
Comment 7 Mike Hommey [:glandium] 2012-08-23 11:19:48 PDT
(In reply to David Rajchenbach Teller [:Yoric] from comment #6)
> (In reply to Mike Hommey [:glandium] from comment #5)
> > Comment on attachment 654664 [details] [diff] [review]
> > patch to use dist/firefox/run-mozilla.sh
> > 
> > I'm pretty sure this will break windows builds, because they don't have
> > run-mozilla.sh at all.
> 
> On the other hand, if we can solve the Windows problem, using run-mozilla.sh
> would make startupcache much less error-prone (see e.g. bug 784368).

Looks like this is the same bug.

> However, for the current bug, I can add a hack in
> osfile_shared_allthreads.jsm to not open libxul if we are currently building
> the startup cache.

Was there a problem, besides openbsd, loading "libxul.so" or "XUL" or "xul.dll", without a full path?
Comment 8 David Teller [:Yoric] (please use "needinfo") 2012-08-23 11:21:44 PDT
(In reply to Mike Hommey [:glandium] from comment #7)
> > However, for the current bug, I can add a hack in
> > osfile_shared_allthreads.jsm to not open libxul if we are currently building
> > the startup cache.
> 
> Was there a problem, besides openbsd, loading "libxul.so" or "XUL" or
> "xul.dll", without a full path?

IIRC, MacOS X. I can double-check if necessary.
Comment 9 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 11:30:18 PDT
Created attachment 654713 [details] [diff] [review]
Handle windows too

https://tbpl.mozilla.org/?tree=Try&rev=6170aaa5fe0a
Comment 10 Mike Hommey [:glandium] 2012-08-23 11:46:42 PDT
Comment on attachment 654713 [details] [diff] [review]
Handle windows too

Review of attachment 654713 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/installer/packager.mk
@@ +461,5 @@
>  PRECOMPILE_GRE=$$PWD
>  endif
>  
> +ifneq (WINNT,$(OS_ARCH))
> +RUN_FROM_PWD = "$$PWD/run-mozilla.sh"

Sorry to throw the ball again, but i just realized two things:
- OS2 would also be broken by this (yes, OS2 ; I know some IBM people were still building Firefox on it)
- applications built against a xulrunner sdk would likely be broken too.

So, you should safeguard for OS_ARCH==OS2 and LIBXUL_SDK, and still fallback to $(_ABS_RUN_TEST_PROGRAM)
Comment 11 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-23 12:01:33 PDT
Created attachment 654722 [details] [diff] [review]
and os 2
Comment 12 Mike Hommey [:glandium] 2012-08-23 12:14:35 PDT
Comment on attachment 654722 [details] [diff] [review]
and os 2

Review of attachment 654722 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/installer/packager.mk
@@ +469,5 @@
> +ifeq (OS2,$(OS_ARCH))
> +# FIXME: this is untested. Is OS/2 using the correct XUL?
> +RUN_TEST_PROGRAM = $(topsrcdir)/build/os2/test_os2.cmd "$(LIBXUL_DIST)"
> +else
> +ifneq (WINNT,$(OS_ARCH))

if you ifneq(,$(filter WINNT OS2,$(OS_ARCH)), you shouldn't need the OS2 part, which, by the way, sets RUN_TEST_PROGRAM instead of RUN_FROM_PWD.
Comment 13 David Teller [:Yoric] (please use "needinfo") 2012-08-23 12:16:33 PDT
Note that bug 775588 (just landed) may have fixed the issue.
Comment 14 Ryan VanderMeulen [:RyanVM] 2012-08-23 19:17:17 PDT
https://hg.mozilla.org/mozilla-central/rev/7f81af866697
Comment 15 Mike Hommey [:glandium] 2012-08-23 23:47:01 PDT
RUN_FROM_PWD is not set in the LIBXUL_SDK case :(
Comment 16 Mark Banner (:standard8, limited time in Dec) 2012-08-24 02:27:05 PDT
We should really start moving away from using run-mozilla.sh, the only remaining use seems to be to set library paths, and we also don't really need to ship it on at least some, but probably all platforms (e.g. bug 715089).
Comment 17 Mark Banner (:standard8, limited time in Dec) 2012-08-24 06:28:38 PDT
Unfortunately this broke Thunderbird and SeaMonkey as they don't ship run-mozilla.sh on mac, and therefore that doesn't get written into the app directory where the command is run from.

As all run-mozilla.sh is really giving us is the library path definitions, I just checked in a change to fix that, and the case of RUN_FROM_PWD not being set in the libxul_sdk with r=glandium over irc:

https://hg.mozilla.org/mozilla-central/rev/b3c4235d1300
Comment 18 David Teller [:Yoric] (please use "needinfo") 2012-08-24 07:03:42 PDT
Created attachment 655002 [details] [diff] [review]
Hack: just do not execute osfile_shared_allthreads.jsm while building the startup cache,

As promised, here is a hack that could resolve the issue by just not loading libxul.so.
How can I reproduce said issue? I can't seem to find it on TryServer.
Comment 19 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-24 07:23:59 PDT
(In reply to David Rajchenbach Teller [:Yoric] from comment #18)
> Created attachment 655002 [details] [diff] [review]
> Hack: just do not execute osfile_shared_allthreads.jsm while building the
> startup cache,
> 
> As promised, here is a hack that could resolve the issue by just not loading
> libxul.so.
> How can I reproduce said issue? I can't seem to find it on TryServer.

You can just do a try run with clang.

https://hg.mozilla.org/try/rev/b0fa449f6855

should do it. It includes my patch, you can just replace that part with yours.
Comment 20 Mark Banner (:standard8, limited time in Dec) 2012-08-24 08:32:26 PDT
(In reply to Mark Banner (:standard8) from comment #17)
> Unfortunately this broke Thunderbird and SeaMonkey as they don't ship
> run-mozilla.sh on mac, and therefore that doesn't get written into the app
> directory where the command is run from.
> 
> As all run-mozilla.sh is really giving us is the library path definitions, I
> just checked in a change to fix that, and the case of RUN_FROM_PWD not being
> set in the libxul_sdk with r=glandium over irc:
> 
> https://hg.mozilla.org/mozilla-central/rev/b3c4235d1300

Unfortunately this broke linux64 debug with a segfault, so I backed out just the linux part of that changeset:

https://hg.mozilla.org/mozilla-central/rev/85634e93f08d
Comment 21 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-24 09:12:36 PDT
> Unfortunately this broke linux64 debug with a segfault, so I backed out just
> the linux part of that changeset:
> 
> https://hg.mozilla.org/mozilla-central/rev/85634e93f08d

Is that correct? I am afraid OS X will still be loading XUL twice, just not manifesting the same problem as linux.
Comment 22 Mike Hommey [:glandium] 2012-08-24 09:16:27 PDT
Comment on attachment 655002 [details] [diff] [review]
Hack: just do not execute osfile_shared_allthreads.jsm while building the startup cache,

Not worth doing that way. Either we work around the issue on the xpcshell launching side, or you find a way to actually load the right library, which, on at least some platforms, is just a matter of loading the file name instead of a full path.
Comment 23 David Teller [:Yoric] (please use "needinfo") 2012-08-24 10:14:17 PDT
It seems to me that the "right" solution would be for the cache builder to not execute scripts when it only needs to parse them. This bug, as well as the issue that prevented bug 775588 from landing are just side-effects of the cache builder executing code that is not meant to be executed in raw xpcshell.

In the meantime, rewriting some code to make loading of libxul lazy.
Comment 24 David Teller [:Yoric] (please use "needinfo") 2012-08-24 12:17:07 PDT
Created attachment 655091 [details] [diff] [review]
Semi-hack: Loading libxul lazily from osfile_shared_allthreads.jsm

Here is a second version of the workaround. This time, we ensure that libxul is loaded lazily, which should be sufficient for this bug. Still rather fragile, of course.
Comment 25 David Teller [:Yoric] (please use "needinfo") 2012-08-26 07:42:10 PDT
Lazy loading seems to work: https://tbpl.mozilla.org/?tree=Try&rev=aca0058ce521
Comment 26 Mike Hommey [:glandium] 2012-08-27 01:21:17 PDT
Comment on attachment 655091 [details] [diff] [review]
Semi-hack: Loading libxul lazily from osfile_shared_allthreads.jsm

Review of attachment 655091 [details] [diff] [review]:
-----------------------------------------------------------------

This should probably go in a separate bug, now that this one has been addressed at the packaging level.

::: toolkit/components/osfile/osfile_shared_allthreads.jsm
@@ +33,5 @@
>  
> +     // Define a lazy getter for a property
> +     let defineLazyGetter = function(object, name, getter) {
> +       Object.defineProperty(object, name, {
> +         configurable: true,

What is this variable for?

Note the implementation in js/xpconnect/loader/XPCOMUtils.jsm is way simpler. I'd rather you used that one.

@@ +884,5 @@
> +         return exports.OS.Shared.libxul.declare("osfile_ns_free",
> +           ctypes.default_abi,
> +           /*return*/ Types.void_t.implementation,
> +           /*ptr*/ Types.voidptr_t.implementation);
> +       });

You're changing how NS_Free is defined, and don't define wstrdup, but I guess they were not intended to be exported?
Comment 27 David Teller [:Yoric] (please use "needinfo") 2012-08-27 03:33:29 PDT
> This should probably go in a separate bug, now that this one has been addressed at the packaging level.

Re-filed as bug 785828. However, comment 16, comment 17 and comment 19 gave me the impression that the current patch is not satisfactory.

> > +     // Define a lazy getter for a property
> > +     let defineLazyGetter = function(object, name, getter) {
> > +       Object.defineProperty(object, name, {
> > +         configurable: true,
> 
> What is this variable for?

Without |configurable|, I cannot replace the getter with something else.
 
> Note the implementation in js/xpconnect/loader/XPCOMUtils.jsm is way
> simpler. I'd rather you used that one.

It uses deprecated methods that do not work in strict mode, so no can do.

> @@ +884,5 @@
> > +         return exports.OS.Shared.libxul.declare("osfile_ns_free",
> > +           ctypes.default_abi,
> > +           /*return*/ Types.void_t.implementation,
> > +           /*ptr*/ Types.voidptr_t.implementation);
> > +       });
> 
> You're changing how NS_Free is defined, and don't define wstrdup, but I
> guess they were not intended to be exported?

Indeed, they are not really meant to be exported, at least not yet.

Note You need to log in before you can comment on or make changes to this bug.