Throw away wrapper shell script on unix and lazily load libxul

RESOLVED FIXED in mozilla7

Status

()

Toolkit
Startup and Profile System
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: zwol, Assigned: glandium)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs)

unspecified
mozilla7
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 16 obsolete attachments)

919 bytes, text/x-csrc
Details
118 bytes, application/octet-stream
Details
1.74 KB, patch
Details | Diff | Splinter Review
11.28 KB, patch
Details | Diff | Splinter Review
12.49 KB, patch
Details | Diff | Splinter Review
11.57 KB, patch
Details | Diff | Splinter Review
355 bytes, text/plain
Details
(Reporter)

Description

7 years ago
Bug 550310 observes that the 'firefox' and 'run-mozilla.sh' wrapper scripts add significant startup overhead on systems where they are used (Unix that is not OSX, if I understand correctly).  We should see if we can get rid of them.  The purpose of the wrapper scripts is to tell the dynamic linker where our private libraries live, so here are three possible ways to get rid of them:

1) If all the shared libraries installed by an --enable-libxul build have sufficiently stable public ABIs, we could give them SONAMEs, install them in /usr/lib, install firefox-bin etc. as just plain 'firefox' etc., and throw the wrapper scripts away in an installed configuration.  (We would still need a wrapper script for running out of the build tree, but that's not the case we care about for performance, and that script could be much simpler.)  The SONAMEs we assigned would presumably involve the Gecko version -- it seems to me that we do make pretty strong binary compatibility guarantees within one Gecko version.

2) If we decide we want to go for static linkage of most of those libraries into the 'firefox' binary anyway (bug 525013) we could make firefox-on-top-of-xulrunner the normal release build configuration (I don't know how this works but I know several Linux distributions do it), put xulrunner in /usr/bin, and only worry about binary compatibility for the libraries that are still dynamic (presumably just libxpcom.so at that point, plus system stuff?)

3) If (1) and (2) are not feasible we could look into expanded use of $ORIGIN, which used to be a Solaris-ism but I believe is now quite widely available.  Here's what my manpages say about that:

       $ORIGIN
              ld.so understands the string $ORIGIN (or equivalently ${ORIGIN})
              in  an  rpath specification to mean the directory containing the
              application  executable.  Thus,  an   application   located   in
              somedir/app   could   be  compiled  with  gcc  -Wl,-rpath,'$ORI‐
              GIN/../lib' so that it finds an  associated  shared  library  in
              somedir/lib  no matter where somedir is located in the directory
              hierarchy.

That would have the advantage of not needing the shell script even when running out of the build directory, and not having to worry about expanded binary compatibility constraints, but the disadvantage of maybe not working universally.
#1 is not an option
The static build and XULRunner are incompatible options
#3 is worth looking into (I think there's already a bug, but we can use this one). I think that would drop support for older versions of ld.so, so we should figure out exactly when it was added and which Linux versions we'd be breaking.

Although I'm surprised that, if we fix the arithmetic in bug 550310, we'd still be seing a lot of overhead on Linux systems: typically execing a single process is very lightweight. Do we have numbers measuring the actual cost?
(Assignee)

Comment 2

7 years ago
FWIW, the only really interesting thing in the firefox and run-mozilla scripts are the -g and --debugger options. The rest is pretty pointless nowadays with the xpcom standalone glue. On Debian, we don't even use these scripts, and have our own wrapper script that only provide -g and --debugger options, on top of some extra sugar for dsp wrappers (esddsp, artsdsp, aoss, etc.), and a check to prevent users to shoot themselves in the foot when running through sudo (yes, there are people doing that and they end up thrashing their profile as a result)
(Assignee)

Comment 3

7 years ago
(In reply to comment #1)
> #3 is worth looking into (I think there's already a bug, but we can use this
> one). I think that would drop support for older versions of ld.so, so we should
> figure out exactly when it was added and which Linux versions we'd be breaking.

According to glibc's git, version 2.0.96, released in 1998.
(Reporter)

Comment 4

7 years ago
(In reply to comment #1)
> #1 is not an option

Please elaborate.

> The static build and XULRunner are incompatible options

That seems like it should just get fixed.

> Although I'm surprised that, if we fix the arithmetic in bug 550310, we'd still
> be seing a lot of overhead on Linux systems: typically execing a single process
> is very lightweight. Do we have numbers measuring the actual cost?

It's shell startup overhead that I was concerned with.  That's a lot cheaper than it used to be, though.  I'll attach a test program that runs the same minimal just-call-exit binary directly, via /bin/sh, and via /bin/bash.  This is what I get on my computer -- numbers are in seconds:

direct: 0.000165
    sh: 0.000578
  bash: 0.001584

So that's less than two milliseconds of overhead even with the heavyweight shell.
(Reporter)

Comment 5

7 years ago
Created attachment 433066 [details]
shell overhead benchmark
(Reporter)

Comment 6

7 years ago
Created attachment 433067 [details]
source for /tmp/true (x86-64 assembly)

Yeah, I wrote a minimal just-exit binary in assembler, because /bin/true itself costs on the order of 600 microseconds, comparable to the numbers we're measuring :-(
It seems $ORIGIN is what you want to use.

Speaking of the Linux distributor's perspective a few comments:
1) for Firefox-on-xulrunner that doesn't make a real difference as basically
everything is covered by the glue stuff and xulrunner system registration
2) we still (and probably will always) have a small wrapper script to set up
some environment (but not using run-mozilla.sh since around 8 years for that
(back in mozilla (the suite) times)
3) it's probably possible to get rid of the wrapper script for Thunderbird or
alike if ORIGIN is used

Actually all discussion is basically just about Mozilla.com's release builds
for Linux. While I don't have numbers I'd think that many people are not using
these because their handling is pretty inconvenient.
We currently have a stable method of doing binary linking (using XPCOM glue dynamic linking). This allows binary extensions to link against libxpcom.so and be compatible with multiple major versions if they are written to do so. Using a versioning scheme which changes the soname at each major release would make this difficult/impossible, and would increase the likelihood of loading two versions of XPCOM in the same process, which is obviously bad.
(Assignee)

Comment 9

7 years ago
Isn't the firefox-bin on mozilla release builds using the standalone glue under the hood ? As such, isn't loading libxpcom.so from the gredir ? Then, the nspr and nss libs are loaded by being listed in dependentlibs.list. Then what exactly is LD_LIBRARY_PATH supposed to add ?
No, mozilla release builds don't use the standalone glue/xulrunner stub (they aren't firefox+xulrunner, just an all-in-one build).

Updated

7 years ago
Blocks: 554421
I have a patch in bug 557027 that fixes the Fennec build to use -rpath=$ORIGIN and it works quite nicely on my Ubuntu system. We should just do the same thing for the Firefox build and ditch the wrapper script.
(Reporter)

Comment 12

7 years ago
FWIW I just did some experiments with a program that lets you change the rpath after the fact (it's called 'patchelf' and it's a little buggy, but not too much for experiments) and it appears that we would need -rpath=$ORIGIN:$ORIGIN/.. for every .so in components/, as well as -rpath=$ORIGIN for the main executable.  I'm not sure.  The difference between DT_RPATH and DT_RUNPATH (which I have not been able to find clear documentation on) may be coming into play here.

Nonetheless, I'm all for this approach.  I would like to know what the implications for a firefox-atop-xulrunner build are, though, since that seems to be how the distros do it.

(In reply to comment #8)
> We currently have a stable method of doing binary linking (using XPCOM glue
> dynamic linking). This allows binary extensions to link against libxpcom.so and
> be compatible with multiple major versions if they are written to do so. Using
> a versioning scheme which changes the soname at each major release would make
> this difficult/impossible, and would increase the likelihood of loading two
> versions of XPCOM in the same process, which is obviously bad.

If it is possible for binary extensions to link against libxpcom.so and be compatible with multiple major versions, then the hypothetical SONAME doesn't need bumping as often as every major release, ne?  I was just being conservative.

(In reply to comment #2)
> FWIW, the only really interesting thing in the firefox and run-mozilla scripts
> are the -g and --debugger options.

As I understand it, the -g/--debugger mode of the script is only necessary because the script exists in the first place -- if there is no script, I can just do "gdb --args ./dist/bin/firefox file://whatever" and it'll work.  Yes/no?

> On Debian, we don't even use these scripts, and have
> our own wrapper script that only provide -g and --debugger options, on top of
> some extra sugar for dsp wrappers (esddsp, artsdsp, aoss, etc.), 

I am completely unfamiliar with dsp wrappers.  Is there something we could be doing internal to the executable that would obviate the need for them?

> and a check to
> prevent users to shoot themselves in the foot when running through sudo (yes,
> there are people doing that and they end up thrashing their profile as a
> result)

The thought of people running firefox with root privileges fills me with nameless dread, but it's their funeral, I suppose.  This appears to be that check: 

if [ "${SUDO_USER}" ] && [ "${SUDO_USER}" != "${USER}" ]; then
    SUDO_HOME=`getent passwd ${SUDO_USER} | cut -f6 -d:`
    if [ "${SUDO_HOME}" = "${HOME}" ]; then
        echo "You shouldn't really run Iceweasel through sudo WITHOUT the -H opt
ion." >&2
        echo "Continuing as if you used the -H option." >&2
        HOME=`getent passwd ${USER} | cut -f6 -d:`
        if [ -z "${HOME}" ]; then
            echo "Could not find the correct home directory. Please use the -H o
ption of sudo." >&2
        fi
    fi
fi

Do you think it makes sense to merge that into early app startup?  Wouldn't be difficult.
(Assignee)

Comment 13

7 years ago
(In reply to comment #12)
> FWIW I just did some experiments with a program that lets you change the rpath
> after the fact (it's called 'patchelf' and it's a little buggy, but not too
> much for experiments) and it appears that we would need
> -rpath=$ORIGIN:$ORIGIN/.. for every .so in components/, as well as
> -rpath=$ORIGIN for the main executable.  I'm not sure.  The difference between
> DT_RPATH and DT_RUNPATH (which I have not been able to find clear documentation
> on) may be coming into play here.

You shouldn't need any rpath on components.

> Nonetheless, I'm all for this approach.  I would like to know what the
> implications for a firefox-atop-xulrunner build are, though, since that seems
> to be how the distros do it.

Much less of a problem, since the standalone glue doesn't need rpath at all.

> (In reply to comment #2)
> As I understand it, the -g/--debugger mode of the script is only necessary
> because the script exists in the first place -- if there is no script, I can
> just do "gdb --args ./dist/bin/firefox file://whatever" and it'll work. 
> Yes/no?

Yes

> > On Debian, we don't even use these scripts, and have
> > our own wrapper script that only provide -g and --debugger options, on top of
> > some extra sugar for dsp wrappers (esddsp, artsdsp, aoss, etc.), 
> 
> I am completely unfamiliar with dsp wrappers.  Is there something we could be
> doing internal to the executable that would obviate the need for them?

dsp wrappers redirect /dev/dsp to the corresponding audio daemon (esd, artsd, pulseaudio, etc.) or in the case of aoss, redirects it on alsa output. I'm not entirely sure this is necessary nowadays for firefox itself, but I guess this is still the case for the main contender: the flash plugin. One of the problems firefox has, and which got worse with the <video> support is that it has several ways it can interact with audio hardware: through libcanberra or libesd from nsISound, through alsa from libsoundfish, and through whatever a plugin might use.

> The thought of people running firefox with root privileges fills me with
> nameless dread, but it's their funeral, I suppose.  This appears to be that
> check: 
> 
> if [ "${SUDO_USER}" ] && [ "${SUDO_USER}" != "${USER}" ]; then
>     SUDO_HOME=`getent passwd ${SUDO_USER} | cut -f6 -d:`
>     if [ "${SUDO_HOME}" = "${HOME}" ]; then
>         echo "You shouldn't really run Iceweasel through sudo WITHOUT the -H
> opt
> ion." >&2
>         echo "Continuing as if you used the -H option." >&2
>         HOME=`getent passwd ${USER} | cut -f6 -d:`
>         if [ -z "${HOME}" ]; then
>             echo "Could not find the correct home directory. Please use the -H
> o
> ption of sudo." >&2
>         fi
>     fi
> fi
> 
> Do you think it makes sense to merge that into early app startup?  Wouldn't be
> difficult.

I think it would be better if that wasn't necessary at all, and if firefox wasn't destroying profiles when it doesn't quite have the right to write it, but has enough rights to destroy it. Though since the switch to places-based bookmarks, it might have gotten better (the worst case being the loss of bookmarks, previously)
I'll throw together a candidate patch and we can see what it does.
Assignee: nobody → ted.mielczarek
Created attachment 466732 [details] [diff] [review]
Ditch wrapper script, use rpath=$ORIGIN

This should be most of what we need. I guess we'll need a similar fix for Solaris. I know rpath=$ORIGIN is supported there, but I don't know the specifics. I have no idea what BSDs support in this regard.
(Assignee)

Comment 16

7 years ago
It's overkill to a) have everything copied over to the dist directory and b) have all binaries and library include that rpath.

IMHO, it would be better to have that rpath added to the firefox binary and probably libxul.so, too. But only these two.
Oops, I intended to do this only |ifdef PROGRAM| so that only binaries would get it. Guess I forgot to add that to the patch.
(Assignee)

Comment 18

7 years ago
IIRC how rpath works, |ifdef PROGRAM| won't be enough, because libxul.so needs to have an rpath too ; except if you link all the libraries to the firefox binary.
We're shipping Fennec on Maemo with this exact configuration, and it works fine.
(Assignee)

Comment 20

7 years ago
Confirmed, I just checked, and it works with rpath on the program only. I must have been confused with something else.

Note, however, that the change in js/src would make the js shell also have an rpath, which may not be entirely expected, especially on standalone builds of the js engine.
I don't think rpath=$ORIGIN is any worse than anything else. (And the JS shell is more of a testing tool than anything end-users would really use.) We keep config/rules.mk and js/src/config/rules.mk in sync just to preserve our sanity.
(Reporter)

Comment 22

7 years ago
I don't know if it's *supposed* to be this way, but I've had to use run-mozilla.sh to invoke the js shell in the past, because it wanted some library from dist/lib that it wasn't getting.

Possibly related, the shell pulls in a bunch of NSPR symbols, even though (I was under the impression that) js/ avoided using NSPR:

$ nm --dynamic ./js | grep ' U PR_'
                 U PR_AtomicDecrement
                 U PR_AtomicIncrement
                 U PR_AtomicSet
                 U PR_CreateThread
                 U PR_DestroyCondVar
                 U PR_DestroyLock
                 U PR_FindFunctionSymbol
                 U PR_FindSymbol
                 U PR_GetCurrentThread
                 U PR_IntervalNow
                 U PR_IntervalToMilliseconds
                 U PR_JoinThread
                 U PR_LoadLibraryWithFlags
                 U PR_Lock
                 U PR_NewCondVar
                 U PR_NewLock
                 U PR_NewThreadPrivateIndex
                 U PR_NotifyAllCondVar
                 U PR_NotifyCondVar
                 U PR_SetThreadPrivate
                 U PR_TicksPerSecond
                 U PR_UnloadLibrary
                 U PR_Unlock
                 U PR_WaitCondVar
Yeah, we compile spidermonkey as threadsafe, which pulls in some NSPR. (Which is probably the thing from dist/bin it wanted.)
Created attachment 471472 [details] [diff] [review]
get rid of firefox shell script
Attachment #466732 - Attachment is obsolete: true
(Assignee)

Comment 25

7 years ago
Just a small note on the binary name.

One of the reasons I'm still providing a firefox-bin executable is that historically, on unix, because of the startup script, some desktop environments would pick firefox-bin as the name of the application they need to restart (in desktop session management, most of the time). I never checked if that was still a concern, but if that is, a symlink could be helpful.
(Assignee)

Comment 26

7 years ago
(In reply to comment #24)
> Created attachment 471472 [details] [diff] [review]
> get rid of firefox shell script

You also need an update in browser/installer/package-manifest.in
Yeah, this is just a rebased patch. I was kind of just testing my bzexport extension and had this patch handy. :)
(In reply to comment #25)
> One of the reasons I'm still providing a firefox-bin executable is that
> historically, on unix, because of the startup script, some desktop environments
> would pick firefox-bin as the name of the application they need to restart (in
> desktop session management, most of the time).

gnome_client_set_restart_command tells the session manager the name of the binary, but I guess there could be an issue on systems that don't have libgnomeui.  (There's probably a less GNOME way to do that.)
(Reporter)

Comment 29

7 years ago
I've been sort of thinking of tackling a GTK+3 update project.  Innn my copious free time.

Comment 30

7 years ago
Can we run dist/bin/firefox-bin with RPATH on Linux?
It doesn't work on Solaris because RPATH=$ORIGIN doesn't follow symbolic links.
So we have
http://mxr.mozilla.org/mozilla-central/source/build/unix/run-mozilla.sh#300

I agree with Mike Hommey, we should at least leave firefox-bin as a symbolic link.
OS: Linux → Windows CE
OS: Windows CE → Linux
Ginn: no, it can't be a symbolic link. That's why I added the NSDISTMODE = copy in this patch:
https://bugzilla.mozilla.org/attachment.cgi?id=471472&action=diff#a/config/config.mk_sec1

Adding a firefox-bin symlink should be possible. It might be a little tricky because during packaging we explicitly dereference symlinks, but I can probably work something out.

Comment 32

7 years ago
FWIW:  (since this got mentioned in #573958)

The dynamic loader on Linux (at least) also understands $LIB, which expands to lib or lib64 according to whether it's used in a32- or 64-bi executable.  But, as I write this, I realize that is of more use to LD_LIBRARY_PATH setting (for a script that may call 32- or 64-bit binaries along the way) than an RPATH setting (which is obviously in a fixed-binary setting - that of the executable containing it).

>> The difference between DT_RPATH and DT_RUNPATH (which I have not
>> been able to find clear documentation on)

My understanding of the difference is based on how Solaris used to document it.  One of them, RPATH, was used before LD_LIBRARY_PATH (and hence could not be overridden by it) and the other, RUNPATH, was used after.  At one point (last century) I'm sure I had Solaris executables with *only* RPATH set, that was non-overridable.  Can't reproduce it now, though.  Still have the RPATH-only executables, but LD_LIBRARY_PATH overrides it.

See:

  http://www.mail-archive.com/nix-dev@cs.uu.nl/msg02578.html
(Assignee)

Comment 33

7 years ago
I just remembered an issue we hit a while ago with the wrapper script gone: other gecko application with the wrapper script broke the application without.

For example, you have both thunderbird and firefox. You run thunderbird with the wrapper script and firefox without. This means thunderbird sets MOZILLA_FIVE_HOME. When firefox is not already opened and the user opens an url link from thunderbird with firefox, firefox starts with MOZILLA_FIVE_HOME set and tries to use it.

I don't know if that still applies, but I thought it was worth mentioning.

Comment 34

7 years ago
Ahhh - another example of global names for app-specific values causing problems (LD_LIBRARY_PATH being the Prime Example).
If Firefox is built to start without a script why would it look for MOZILLA_FIVE_HOME at all?  And if it wants to provide a variable for debug/test then it should use something specific - such as MOZILLA_FIREFOX_HOME.
(and Thunderbird could use MOZILLA_THUNDERBIRD_HOME - you can easily set two variables to have the same value, but you can't set one variable to two different values at the same time).
(Assignee)

Comment 35

6 years ago
Created attachment 519866 [details] [diff] [review]
PoC using the standalone glue

This is a proof of concept of replacing the firefox binary with one using the standalone glue to load libxul.so. This method has two advantages: we don't need to rely on rpath, and it allows to more easily hook things such as bug 632404.

This patch works on Mac and Linux, though on Mac tests fail because of a script doing "find . -maxdepth 4 -type f -name firefox-bin", which fails because firefox-bin, while kept for compatibility, is a symbolic link.

It however fails to link on Windows with the following message:
nsBrowserApp.obj : error LNK2019: unresolved external symbol "void __cdecl NS_SetHasLoadedNewDLLs(void)" (?NS_SetHasLoadedNewDLLs@@YAXXZ) referenced in function "long __stdcall patched_LdrLoadDll(unsigned short *,unsigned long *,struct _UNICODE_STRING *,void * *)" (?patched_LdrLoadDll@@YGJPAGPAKPAU_UNICODE_STRING@@PAPAX@Z)
nsBrowserApp.obj : error LNK2019: unresolved external symbol _NS_SetDllDirectory referenced in function _wmain
firefox.exe : fatal error LNK1120: 2 unresolved externals

This is related to the DLL blacklisting. I wonder if we could initialize it in XRE_main instead of wmain.

Last but not least, I do think the GetBinaryPath function, that is mostly a copy of XRE_GetBinaryPath from toolkit/xre/nsAppRunner.cpp should be put as a helper function with a proper API in the XPCOM standalone glue.
(Assignee)

Comment 36

6 years ago
I'm also getting a small leak during reftest:
nsTraceRefcntImpl::DumpStatistics: 1181 entries
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 396 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3 instances of nsLocalFile with size 124 bytes each (372 bytes total)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 3 instances of nsStringBuffer with size 8 bytes each (24 bytes total)
(Assignee)

Updated

6 years ago
Depends on: 642450
(Assignee)

Comment 37

6 years ago
Created attachment 519935 [details] [diff] [review]
PoC using the standalone glue, v2

Same one, but against the build-system branch, and with the leaks fixed.
Attachment #519866 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Blocks: 632404
(Assignee)

Comment 38

6 years ago
Created attachment 522981 [details] [diff] [review]
part 1 - Move DLL blocklist in XRE, and  inline NS_SetDllDirectory

This solves the linking problems from comment 35, while keeping compatibility with seamonkey, that still relies on XRE_WANT_DLL_BLOCKLIST. Seamonkey would need the same change as what is done to browser/app/nsBrowserApp.cpp, and once this is done, some cleanup to toolkit/xre/nsWindowsDllBlocklist.cpp and toolkit/xre/nsWindowsWMain.cpp can occur (I'll attach the cleanup patch for reference, but that will have to be a followup bug, I guess)
Assignee: ted.mielczarek → mh+mozilla
Attachment #471472 - Attachment is obsolete: true
Attachment #519935 - Attachment is obsolete: true
Attachment #522981 - Flags: review?(benjamin)
(Assignee)

Comment 39

6 years ago
Created attachment 522982 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

I do understand that we want to get rid of the glue and the xpcom stub, but the standalone glue currently is the most convenient way to do this. I think we can still remove the glue later, by creating a better API for this dynamic loading type of embedding, which really doesn't need to be very elaborated, but can be a separate bug.
Attachment #522982 - Flags: review?(benjamin)
(Assignee)

Comment 40

6 years ago
Created attachment 522983 [details] [diff] [review]
cleanup patch, getting rid of XRE_WANT_DLL_BLOCKLIST
Attachment #522981 - Flags: review?(benjamin) → review+
Comment on attachment 522982 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

On linux the binary will still be firefox-bin, but on mac it will be firefox?

I really don't like copying the binary-path code like that. Can we move it into a .h-inl file which is included from the couple different places it is needed, or maybe move it into the glue code?
Attachment #522982 - Flags: review?(benjamin) → review-
(Assignee)

Comment 42

6 years ago
(In reply to comment #41)
> Comment on attachment 522982 [details] [diff] [review] [review]
> part 2 - Load the XUL library with the standalone glue and get rid of the
> startup shell script
> 
> On linux the binary will still be firefox-bin, but on mac it will be firefox?

It's firefox everywhere, with a compatibility symbolic link kept on linux and mac.

> I really don't like copying the binary-path code like that. Can we move it
> into a .h-inl file which is included from the couple different places it is
> needed, or maybe move it into the glue code?

The problem is that the version in toolkit returns a nsILocalFile while the version in this patch returns a char *. It can't return a nsILocalFile because the glue is not initialized yet, since the path is to be determined exactly to initialize the glue...
> The problem is that the version in toolkit returns a nsILocalFile while the
> version in this patch returns a char *. It can't return a nsILocalFile
> because the glue is not initialized yet, since the path is to be determined
> exactly to initialize the glue...

Make an intermediate .h-inl function which fills a char* buffer and wrap XRE_GetBinaryPath around it...
Blocks: 648581
(Assignee)

Updated

6 years ago
Attachment #522981 - Attachment filename: dll → bug552864-1
(Assignee)

Comment 44

6 years ago
Created attachment 532579 [details] [diff] [review]
part 0.5 - Move environment sanitization in wmain

This moves the code from bug 642469 in wmain instead of doing it on
the first NS_SetDllDirectory call.
Attachment #532579 - Flags: review?(benjamin)
(Assignee)

Comment 45

6 years ago
Created attachment 532580 [details] [diff] [review]
part 1 - Move DLL blocklist in XRE, and  inline NS_SetDllDirectory

Refreshed for context changes in toolkit/xre/nsWindowsDllBlocklist.cpp
(Assignee)

Updated

6 years ago
Attachment #522981 - Attachment is obsolete: true
(Assignee)

Comment 46

6 years ago
Note that we'll need to revert bug 627591 first, but this will be replaced with bug 632404 landing at the same time.
(Assignee)

Comment 47

6 years ago
Created attachment 532674 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class

It happens that nothing sets XRE_BINARY_PATH anymore, so I also
removed that part.
Attachment #532674 - Flags: review?(benjamin)
(Assignee)

Updated

6 years ago
Attachment #522982 - Attachment filename: glue → bug552864-2
(Assignee)

Comment 48

6 years ago
Created attachment 532675 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

The difference with the previous version is that it relies on part 1.5 instead
of using its own copy of GetBinaryPath.
Attachment #532675 - Flags: review?(benjamin)
(Assignee)

Updated

6 years ago
Attachment #522982 - Attachment is obsolete: true
Comment on attachment 532579 [details] [diff] [review]
part 0.5 - Move environment sanitization in wmain

I want ehsan to look at this also.
Attachment #532579 - Flags: review?(ehsan)
Attachment #532579 - Flags: review?(benjamin)
Attachment #532579 - Flags: review+
Comment on attachment 532579 [details] [diff] [review]
part 0.5 - Move environment sanitization in wmain

Under the new setup, how is wmain executed?  (I need to know who calls it, and which processes does it get run in).

Also, what is the purpose of this patch?  Is it just refactoring stuff?
(Assignee)

Comment 52

6 years ago
wmain is still the first function executed when the program starts (modulo static initializers). That doesn't change. AIUI, all processes run it.
(Assignee)

Comment 53

6 years ago
(In reply to comment #52)
> wmain is still the first function executed when the program starts (modulo
> static initializers). That doesn't change. AIUI, all processes run it.

browser/app/nsBrowserApp.cpp:#include "nsWindowsWMain.cpp"
ipc/app/MozillaRuntimeMain.cpp:#include "nsWindowsWMain.cpp"
ipc/ipdl/test/cxx/app/TestIPDL.cpp:#include "nsWindowsWMain.cpp"
mobile/app/nsBrowserApp.cpp:#include "nsWindowsWMain.cpp"
xulrunner/app/nsXULRunnerApp.cpp:#include "nsWindowsWMain.cpp"
xulrunner/stub/nsXULStub.cpp:#include "nsWindowsWMain.cpp"

this means both the main process and plugin-container run it.
wmain is the app entry point on Windows: it takes unicode arguments (WCHAR) and converts them to UTF8 so that we can be fully unicode-aware on Windows.
(In reply to comment #54)
> wmain is the app entry point on Windows: it takes unicode arguments (WCHAR)
> and converts them to UTF8 so that we can be fully unicode-aware on Windows.

Yeah, I was just checking to make sure that we're not changing that by passing something else besides -ENTRY:wmainCRTStarup to the linker with these patches.
Comment on attachment 532579 [details] [diff] [review]
part 0.5 - Move environment sanitization in wmain

We don't want NS_SetDllDirectory or SanitizeEnvironmentVariables to be called as plugin process startup <http://mxr.mozilla.org/mozilla-central/source/ipc/app/MozillaRuntimeMain.cpp#52>.  But we _do_ want SanitizeEnvironmentVariables to be called the first time NS_SetDllDirectory is called.  Which makes this patch wrong.
Attachment #532579 - Flags: review?(ehsan) → review-
Also, it seems to me that these patches will make NS_SetDllDirectory to be called at plugin process startup, which will break some plugins...
(Assignee)

Comment 58

6 years ago
(In reply to comment #56)
> Comment on attachment 532579 [details] [diff] [review] [review]
> part 0.5 - Move environment sanitization in wmain
> 
> We don't want NS_SetDllDirectory or SanitizeEnvironmentVariables to be
> called as plugin process startup
> <http://mxr.mozilla.org/mozilla-central/source/ipc/app/MozillaRuntimeMain.
> cpp#52>.  But we _do_ want SanitizeEnvironmentVariables to be called the
> first time NS_SetDllDirectory is called.  Which makes this patch wrong.

Mmmmm... How exactly does the following not do what you want?

#ifndef XRE_DONT_PROTECT_DLL_LOAD
  SanitizeEnvironmentVariables();
  mozilla::NS_SetDllDirectory(L"");
#endif
(Assignee)

Comment 59

6 years ago
FWIW, this is a try run including the full stack (this bug + bug 632404 + bug 657297)
http://tbpl.mozilla.org/?tree=Try&rev=8919b890936e
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-8919b890936e/
As I said to glandium on IRC, I think we should move SanitizeEnvironmentVariables calls to the callers of NS_SetDllDirectory.
(Assignee)

Comment 61

6 years ago
Created attachment 533349 [details] [diff] [review]
part 1 - Move DLL blocklist in XRE, and inline NS_SetDllDirectory and environment sanitization, which now needs to be called manually

This merges part 0.5 and part 1, because that was going to be messy.
This implements what we discussed on IRC.
Attachment #533349 - Flags: review?(ehsan)
(Assignee)

Updated

6 years ago
Attachment #532580 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Attachment #532579 - Attachment is obsolete: true
(Assignee)

Comment 62

6 years ago
Created attachment 533352 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

Refreshed
Attachment #533352 - Flags: review?(benjamin)
(Assignee)

Updated

6 years ago
Attachment #532675 - Attachment is obsolete: true
Attachment #532675 - Flags: review?(benjamin)
(Assignee)

Comment 63

6 years ago
New Try push for the new patch queue.
http://tbpl.mozilla.org/?tree=Try&rev=72cc4fe65a0d
Attachment #533349 - Flags: review?(ehsan) → review+
(Assignee)

Updated

6 years ago
Depends on: 658273
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-72cc4fe65a0d/ is the link to the build in case anyone testing needs it.
It took a little while, but I made sure to test all the bugs Ehsan mentioned in the private comment, with the exception of the last one which I couldn't figure out how to check. I found no regressions. Hopefully that's sufficient.
Just to be complete, I used the try builds in comment #65 on the OSs in the bugs were reported on.
(Assignee)

Comment 71

6 years ago
I tested the remaining one with success (that is, the try build does the right thing).
(Assignee)

Comment 72

6 years ago
Comment on attachment 532674 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class

Tentatively try someone else. Mike or Ted, would you have some time to review this? I'd like to land it this week-end.
Attachment #532674 - Flags: review?(ted.mielczarek)
Attachment #532674 - Flags: review?(shaver)
Attachment #532674 - Flags: review?(benjamin)
(Assignee)

Updated

6 years ago
Attachment #533352 - Flags: review?(ted.mielczarek)
Attachment #533352 - Flags: review?(shaver)
Attachment #533352 - Flags: review?(benjamin)
Comment on attachment 533352 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

Not qualified to do this one, sorry. Kyle maybe? Taras?
Attachment #533352 - Flags: review?(shaver)
(Assignee)

Comment 74

6 years ago
Created attachment 534048 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class.

Updated with review comments from IRC
Attachment #534048 - Flags: review?(tglek)
(Assignee)

Updated

6 years ago
Attachment #532674 - Attachment is obsolete: true
Attachment #532674 - Flags: review?(ted.mielczarek)
Attachment #532674 - Flags: review?(shaver)

Comment 75

6 years ago
Comment on attachment 533352 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

r+ for the C++ bits. This is mostly a trickier platform-independent equivalent of what I did with windows linker switches in bug 627591.
Attachment #533352 - Flags: review+
(Assignee)

Comment 76

6 years ago
Created attachment 534067 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class.

Brown paper bag fix
Attachment #534067 - Flags: review?(tglek)
(Assignee)

Updated

6 years ago
Attachment #534048 - Attachment is obsolete: true
Attachment #534048 - Flags: review?(tglek)

Comment 77

6 years ago
Comment on attachment 534067 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class.

Looks good
Attachment #534067 - Flags: review?(tglek) → review+
(Assignee)

Comment 78

6 years ago
Comment on attachment 533352 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

for the build bits.
Attachment #533352 - Flags: review?(ted.mielczarek) → review?(khuey)
(Assignee)

Updated

6 years ago
Summary: Investigate throwing away wrapper shell scripts on Unix-not-OSX (Linux, *BSD, Solaris) → Throw away wrapper shell script on unix and lazily load libxul
Comment on attachment 533352 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script

Review of attachment 533352 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/app/Makefile.in
@@ +94,1 @@
>  PROGRAM = $(MOZ_APP_NAME)$(BIN_SUFFIX)

I love this so much. Does this need to be fixed separately for other apps? I know I already fixed this for Fennec on Linux a while back, but presumably we ought to fix Thunderbird and maybe XULRunner.

While you're here you should change:
http://mxr.mozilla.org/mozilla-central/source/build/binary-location.mk#46
and probably:
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/linux/rpm/mozilla.desktop#48

@@ -104,2 @@
>  
> -ifdef BUILD_STATIC_LIBS

This code is all getting removed in bug 648911, but that is a huge patch and probably already bitrotted anyway. :-/

@@ +185,5 @@
>  
>  ifneq (,$(filter-out OS2 WINNT,$(OS_ARCH)))
>  
> +libs:: 
> +	ln -sf $(MOZ_APP_NAME)$(BIN_SUFFIX) $(DIST)/bin/$(MOZ_APP_NAME)-bin$(BIN_SUFFIX)

On Mac, does this survive as a symlink when we do all the silly rsync'ing to the app bundle?

::: xpinstall/packager/Packager.pm
@@ +317,5 @@
>        }
>      }
>      unlink("$destpath$destname$destsuffix") if ( -e "$destpath$destname$destsuffix");
> +    # If source is a symbolic link pointing in the same directory, create a
> +    # symbolic link

So awful, but nice hack.
Attachment #533352 - Flags: review?(khuey) → review+
(Assignee)

Comment 80

6 years ago
(In reply to comment #79)
> Comment on attachment 533352 [details] [diff] [review] [review]
> part 2 - Load the XUL library with the standalone glue and get rid of the
> startup shell script
> 
> Review of attachment 533352 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: browser/app/Makefile.in
> @@ +94,1 @@
> >  PROGRAM = $(MOZ_APP_NAME)$(BIN_SUFFIX)
> 
> I love this so much. Does this need to be fixed separately for other apps? I
> know I already fixed this for Fennec on Linux a while back, but presumably
> we ought to fix Thunderbird and maybe XULRunner.

I was planning to file a followup bug to have a generic app launcher, actually. There's no reason why we need to continue to duplicate that initialization code in mobile/app, xulrunner/app, suite/app, mail/app, and other places.

> While you're here you should change:
> http://mxr.mozilla.org/mozilla-central/source/build/binary-location.mk#46
> and probably:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/
> linux/rpm/mozilla.desktop#48

That wouldn't be safe until we fix other applications, I believe.

> @@ -104,2 @@
> >  
> > -ifdef BUILD_STATIC_LIBS
> 
> This code is all getting removed in bug 648911, but that is a huge patch and
> probably already bitrotted anyway. :-/
> 
> @@ +185,5 @@
> >  
> >  ifneq (,$(filter-out OS2 WINNT,$(OS_ARCH)))
> >  
> > +libs:: 
> > +	ln -sf $(MOZ_APP_NAME)$(BIN_SUFFIX) $(DIST)/bin/$(MOZ_APP_NAME)-bin$(BIN_SUFFIX)
> 
> On Mac, does this survive as a symlink when we do all the silly rsync'ing to
> the app bundle?

It looks like so, as it is a symlink in the .dmg archive.
(Assignee)

Comment 81

6 years ago
(In reply to comment #80)
> It looks like so, as it is a symlink in the .dmg archive.

Sadly, the updater doesn't seem to support symbolic links and simply removes the file :(
I think the best short term option would be to copy firefox to firefox-bin (it's only a few dozens kilobytes).
(In reply to comment #80)
> > While you're here you should change:
> > http://mxr.mozilla.org/mozilla-central/source/build/binary-location.mk#46
> > and probably:
> > http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/
> > linux/rpm/mozilla.desktop#48
> 
> That wouldn't be safe until we fix other applications, I believe.

Okay, make sure you remember this bit in the followup.

(In reply to comment #81)
> (In reply to comment #80)
> > It looks like so, as it is a symlink in the .dmg archive.
> 
> Sadly, the updater doesn't seem to support symbolic links and simply removes
> the file :(
> I think the best short term option would be to copy firefox to firefox-bin
> (it's only a few dozens kilobytes).

Okay, I can deal with that.
(Assignee)

Comment 83

6 years ago
Created attachment 534275 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script.

As I'll be landing (added a comment discussed with Taras on irc, and replaced ln -sf with cp -p)
(Assignee)

Updated

6 years ago
Attachment #533352 - Attachment is obsolete: true
(Assignee)

Comment 84

6 years ago
http://hg.mozilla.org/mozilla-central/rev/360237ec30a5
http://hg.mozilla.org/mozilla-central/rev/148fab7bda95
http://hg.mozilla.org/mozilla-central/rev/ad137668a558
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla6
(Assignee)

Updated

6 years ago
Blocks: 658847
(Assignee)

Updated

6 years ago
Blocks: 658850
(Assignee)

Comment 85

6 years ago
Backed out because of leak test bustage
http://hg.mozilla.org/mozilla-central/rev/aa83abd4fd01
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 86

6 years ago
So, what went wrong is that while most stuff work, a few of our leak reporting is happening with atexit(). Except that at the time the process exits, the function that was registered is nowhere, since libxul is unloaded. This is why, e.g. we don't unload gconf in the system pref service.

Interestingly, this only affected linux debug builds.

The chromium code comes with an AtExitManager which would allow to avoid this problem.
(Assignee)

Comment 87

6 years ago
(In reply to comment #86)
> Interestingly, this only affected linux debug builds.

And this is as such because on top of the atexit problem, the malloc/free/... wrappers from nsTraceMalloc.c are not used.
(Assignee)

Comment 88

6 years ago
And to be more specific, only trace malloc is broken. Other leak tests were still working (xpcom bloat log, string stats...)
(Assignee)

Updated

6 years ago
Depends on: 658995
(Assignee)

Comment 89

6 years ago
Created attachment 534508 [details] [diff] [review]
part 1 - Move DLL blocklist in XRE, and inline NS_SetDllDirectory and environment sanitization, which now needs to be called manually.

Refreshed after bug 648911
(Assignee)

Updated

6 years ago
Attachment #533349 - Attachment is obsolete: true
(Assignee)

Comment 90

6 years ago
Created attachment 534509 [details] [diff] [review]
part 1.5 - Move XRE_GetBinaryPath code in a dedicated class.

Refreshed after bug 648911
(Assignee)

Updated

6 years ago
Attachment #534067 - Attachment is obsolete: true
(Assignee)

Comment 91

6 years ago
Created attachment 534510 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script.

Refreshed after bug 648911
(Assignee)

Updated

6 years ago
Attachment #534275 - Attachment is obsolete: true
(Assignee)

Comment 92

6 years ago
Created attachment 539437 [details] [diff] [review]
part 2 - Load the XUL library with the standalone glue and get rid of the startup shell script.

Refreshed (context changes)
(Assignee)

Updated

6 years ago
Attachment #534510 - Attachment is obsolete: true
(Assignee)

Comment 93

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/4a319872b760
http://hg.mozilla.org/integration/mozilla-inbound/rev/46974e5278ff
http://hg.mozilla.org/integration/mozilla-inbound/rev/7b390fa15157
Whiteboard: [inbound]
Target Milestone: mozilla6 → ---
I merged this from m-i to m-c, but it will be backed out shortly while we investigate Talos regressions:
http://hg.mozilla.org/mozilla-central/rev/7b390fa15157
http://hg.mozilla.org/mozilla-central/rev/46974e5278ff
http://hg.mozilla.org/mozilla-central/rev/4a319872b760
(Assignee)

Comment 95

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/8ae2a073b044
http://hg.mozilla.org/integration/mozilla-inbound/rev/555171a0ca68
http://hg.mozilla.org/integration/mozilla-inbound/rev/c4bbac1f178b
http://hg.mozilla.org/mozilla-central/rev/8ae2a073b044
http://hg.mozilla.org/mozilla-central/rev/555171a0ca68
http://hg.mozilla.org/mozilla-central/rev/c4bbac1f178b
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla7

Comment 97

6 years ago
After this fix, I have to export NSDISTMODE=copy to compile firefox, otherwise it won't work.

See comment #31.
(Assignee)

Comment 98

6 years ago
(In reply to comment #97)
> After this fix, I have to export NSDISTMODE=copy to compile firefox,
> otherwise it won't work.
> 
> See comment #31.

This is unlikely to be the same thing as comment #31. Please file a followup bug with the error you get when you build without NSDISTMODE=copy.

Updated

6 years ago
Depends on: 665509
Depends on: 666657
This looks like it broke running with -g, unfortunately.  Was that an expected outcome?
(Assignee)

Comment 100

6 years ago
Yes, you either run it through run-mozilla.sh, or, like any other program, just run with gdb --args. We need to update the wiki.
Both of those solutions fail if you have a script that sets up the right profile and runs firefox with the given args (and you need to do profile setup because otherwise we stomp all over ourselves when multiple versions of firefox are involved).   You used to be able to pass -g to the script, it would pass all its args to firefox, and things would Just Work.  Now it's way more complicated, since the -g needs to be passed to run-mozilla.sh but the other args need to be passed to firefox...

Given that this is a startup performance improvement, would it make any sense to allow _debug_ builds to continue using the old setup?  It's not like we're worried about startup perf there.
(Reporter)

Comment 102

6 years ago
We could stick code at the very beginning of main (or possibly XRE_main, not sure) that scans argv for "-g" and "-d" and if it finds them, splices them out and execs the debugger.
(Assignee)

Comment 103

6 years ago
(In reply to comment #102)
> We could stick code at the very beginning of main (or possibly XRE_main, not
> sure) that scans argv for "-g" and "-d" and if it finds them, splices them
> out and execs the debugger.

That'd be main(), but I'm not a big fan of keeping that legacy option. Maybe we should bring this up on dev-platform?
Created attachment 541679 [details]
rungdb.sh

This script will run gdb like -g does:

rungdb.sh dist/bin/firefox -profile ~/test-profile -no-remote
(Assignee)

Comment 105

6 years ago
(In reply to comment #104)
> Created attachment 541679 [details]
> rungdb.sh
> 
> This script will run gdb like -g does:
> 
> rungdb.sh dist/bin/firefox -profile ~/test-profile -no-remote

gdb --args dist/bin/firefox -profile ~/test-profile -no-remote
works just as well ;)

But bz's usecase is more subtle than that...
(Assignee)

Updated

6 years ago
Depends on: 670788
Blocks: 673899

Updated

6 years ago
Depends on: 675585
(Assignee)

Updated

6 years ago
Depends on: 677247

Updated

6 years ago
Depends on: 678438
(Assignee)

Updated

6 years ago
Depends on: 689719
Blocks: 668869
Blocks: 722262
You need to log in before you can comment on or make changes to this bug.