fold all remaining shared libs(-nspr -gnome components) into libxul

NEW
Unassigned

Status

()

Core
Build Config
7 years ago
2 years ago

People

(Reporter: (dormant account), Unassigned)

Tracking

(Depends on: 3 bugs, Blocks: 3 bugs)

unspecified
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -)

Details

(Whiteboard: [ts])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

7 years ago
After more measurement turned out that bug 525013 was overly ambitious. Having a single static binary does save on relocations, but it is complicated and mostly precludes workarounds for inefficient library loaders(bug 554421).

Having libraries in a single giant binary reduces amount of random io on startup. A single library also allows for better compiler optimization which will further reduce the amount of io.
(Reporter)

Updated

7 years ago
Whiteboard: [ts]
Note that this approach will still cost us ~160k relocations on startup for static data (mostly vtables). That might not be a big deal, since those relocations are just math and we can most-likely do the I/O for those all in one disk read.
(Reporter)

Comment 2

7 years ago
(In reply to comment #1)
> Note that this approach will still cost us ~160k relocations on startup for
> static data (mostly vtables). That might not be a big deal, since those
> relocations are just math and we can most-likely do the I/O for those all in
> one disk read.

On linux that's not a problem as prelink takes care of them.
bug 534471 is the big one here. I'll talk to wtc and see if we can get that moving along. Aside from that we currently have:
libmozalloc.so - part of the patch in bug 525013 made this a static lib, will double-check with cjones that putting it in libxul would be ok
libmozjs.so - easy enough to statically link
libmozsqlite3.so - bug 525013 made us statically link this at the cost of linking it both into firefox and NSS
libxpcom.so - this is the XPCOM glue, I think we need this (bsmedberg?)
Depends on: 534471
We need it as long as we want to maintain binary compatibility. As soon as we can drop binary compat, we can and should drop xpcom.dll.
(Reporter)

Comment 5

7 years ago
(In reply to comment #4)
> We need it as long as we want to maintain binary compatibility. As soon as we
> can drop binary compat, we can and should drop xpcom.dll.

I think for the purposes of this bug that's a 'yes'. Lets take the same approach as before, add a mozconfig ( --only-xul?) and land it. Once landed we can tweak that configuration until it is better enough(or time is right) to justify binary breakage.
Blocks: 447581
One of Firefox's present capabilities is to be able to load third party 
crypto libraries that utilize their crypto hardware gizmos (e.g. 
"smart cards" or "usb tokens"), and in fact, Firefox's own crypto software
works as one of those modules, so that ALL of firefox's crypto, whether its
own or third party all works the same way, via the same shared library API.
I encourage you to preserve that aspect of NSS.  That means continuing to 
preserve some (a few), not all, of NSS's shared libs as separate libs.

Comment 7

7 years ago
Ted: as I noted in the two WARNINGs in bug 534471 comment 0,
neither I nor the NSS team supports that patch, and you lose
the FIPS validation status of the NSS software crypto module.
Firefox needs to stay FIPS-validated.  Using NSS as a static
library is okay if you will continue to provide a Firefox
build that uses NSS shared libraries.
Whether Firefox needs to remain FIPS-validated is a product decision that we can make based on the costs and benefits.
Depends on: 562313
What is the effect on Ts of leaving NSS as a dynamic library? Sorry if I missed it (could be we don't know yet because other dynamic libs need to be linked into libxul).

/be
WTC: what does the NSS team suggest to get equivalent performance gains in this key area?

I agree with Benjamin: we should carefully evaluate what the FIPS needs are.  If we have to pay to get Firefox-with-static-NSS FIPS certified, and we care enough about Firefox-as-FIPS to do that, I wouldn't rule it out.  It would mean that distributors who wanted to modify Firefox would have to do their own FIPS validation on the result, but since they also employ the NSS team that doesn't want to take the changes, they are free to choose their own adventure there too.
WTC: should Firefox just move to using the same NSS that Chrome does?  It seems like the patch in question is considered OK for Chrome to ship[*], and I can't think of much NSS evolution that I would want to track at the expense of this important performance characteristic.

[*] http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/nss/README.chromium?revision=45059&view=markup -- Local Modifications

It might be that it means that we need to fund some work on an NSS fork to contain the NSS shared library behaviour, fix other pain like the RNG-initialization situation, and then maybe re-FIPS it in the configuration that we choose to ship in Firefox.  I'm certainly willing to entertain bearing those costs if it leads to the expected improvements in user experience.

Comment 12

7 years ago
shaver: Chrome's use of NSS is hidden behind an SSLClientSocket
interface, and Chrome has a second implementation of SSLClientSocket
using Windows SChannel.  If necessary, a user can instruct Chrome
(by passing a command-line option) to use Windows SChannel and rely
on the FIPS validation of the Windows system crypto module.

Today, Mozilla is the only distributor of Firefox for Windows.
It's prohibitively expensive for another group to produce and
maintain a FIPS validated version of Firefox for Windows.  So
the Firefox users who must use FIPS-validated products are
counting on Mozilla.
I don't know why it's prohibitively expensive for another group -- AFAIK the companies and organizations who are dealing with this stuff are much bigger than Mozilla is, in terms of their revenue and resources.  But that's another issue -- we can certify Firefox if we decide that we need to.  (Those Firefox users didn't even get a Firefox 3, by my understanding, so I'm not sure how significant a group they are.)

We could also just ship a "slower, but if you care about FIPS, go ahead" version of FF, with NSS linked dynamically and crawling temporary directories all over the place, etc.  If there is a large community of users who need FIPS, I expect that we'll be able to find contributors to help us maintain that.
Can someone answer my question from comment 9?

Crawling temporary directories for entropy is a separate issue, and (I thought) fixed. Kinda a cheap shot :-/.

/be
(Reporter)

Comment 15

7 years ago
> What is the effect on Ts of leaving NSS as a dynamic library? Sorry if I missed
> it (could be we don't know yet because other dynamic libs need to be linked
> into libxul).

Here is my coldstartup perspective.

with prefetch off on windows nss dlls cause ~20% as much pagefaults as xul.dll(75 vs 396). Story is similar on linux.
When windows prefetch is on here, it seems to correctly preload nss(which suggests low overhead).

Having said that, relative nss overhead will increase as we optimize the layout of libxul. That 20% figure itches towards 50% once we unleash pgo or icegrind on our binary. Problem is that nss is broken up into many files(instead of 0) which prevents us from making effective use of readahead, etc.

I'm also hoping to be able to strip out parts of nss that we don't use(via objcopy?), seems like that would be easier with a static nss.(In reply to comment #9)
I believe, based on my experience in bugs about entropy collection, APIs, fixing outright bugs in ARM code, adding support for new platforms, and so forth, that Mozilla and the NSS team are not aligned with respect to the relative importance of various changes.  I propose to resolve that tension by cutting the FIPSian knot, such that Mozilla can bear the costs of FIPS certification for Firefox (something that we were asked to contribute to financially the last time it was pursued by either Sun or Red Hat), as we would also reap for our users the benefit of being able to make FIPS-breaking changes to our crypto subsystem.

The NSS team will be able to continue to work on the shared-library system for their uses, and we can specialize it for ours, such that not all of our users pay for the FIPS-supporting overhead, and not all of theirs need lose FIPS certification in order for us to support new platforms or improve performance for our use case.
Bleh, I missed a couple of libs on Linux:
components/libdbusservice.so
components/libmozgnome.so
components/libnkgnomevfs.so

these are a pain right now because:
a) They link to the xpcom glue and
b) They link directly to system libraries and expect to fail if those libs don't exist, making the component unavailable.

I'll file a separate bug about them.
Created attachment 443139 [details] [diff] [review]
Maximum libxul, v1.

Here's a quick first pass. if you add export MOZ_MAXIMUM_LIBXUL=1 to your mozconfig, this patch will then get rid of libmozalloc.so, libmozjs.so and libmozsqlite3.so.

bsmedberg suggests we could keep the xpcom glue, but fold it into libxul instead, and export the symbols from there.
Depends on: 563628
Created attachment 443330 [details] [diff] [review]
Maximum libxul, v2.

This also gets rid of the xpcom glue, folding it into libxul instead. Components can link against libxul to pick up the necessary symbols.

With this + the patches from bug 562313, I'm down to 16 sharedlibs on a Linux build, of which 11(!) are NSPR+NSS, 1 is the null plugin, and 3 are the components I filed bug 563628 on.
Attachment #443139 - Attachment is obsolete: true
I got Firefox to build with static NSPR+NSS. I had to tweak the patches in the dep bugs and some other things. If you'd like to try it the easiest way right now would be to clone my mq:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/
and apply up to the nss-static-moz patch.

The build starts, but crashes as soon as it inits NSS. I'll look at that tomorrow.
Also I've only tested on Linux currently, so this may or may not build elsewhere.
Doesn't build on Windows or OS X yet, I'm working on that. I've updated the patchqueue, if you push the patches up to and including "configure-hardcode-max-libxul" you'll get the right kind of build by default.
(Reporter)

Comment 23

7 years ago
Note, on Linux this depends on properly passing pgo flags to every binary within the fat xul. Preliminary Linux testing showed that a fat xul is currently a regression, proper binary layout should make it a significant win.

Will test on Windows when this builds there.
Depends on: 564511
(Reporter)

Updated

7 years ago
Blocks: 577741
(Reporter)

Updated

7 years ago
Depends on: 577522
(Reporter)

Updated

7 years ago
Depends on: 580407

Comment 24

7 years ago
Getting rid of -fPIC on ELF x86 systems should save one extra register for compiler and avoid expensive function prologues, so runtime speedups should be measurable easily. 

It is weird this leads to a regression.  Do you have more precise numbers what slows down?

Comment 25

7 years ago
also some old numbers on cost of -fPIC are here http://www.ucw.cz/~hubicka/papers/amd64/node4.html

numbers are old but nothing fundamental changed.  -fPIC on x86 should cost about 10%-20% of runtime and few percents of code size.  So performance critical stuff probably should be statically in main binary.  Perhaps integration of javascript will help then...
I don't believe that we can remove -fPIC without disabling ASLR, and losing its security benefits.  Am I mistaken?
-fPIC does not cost nearly that much if you are using primarily hidden-visibility symbols, as we are.
(Reporter)

Comment 28

7 years ago
Ted is confident we can get this in for ff4. It's a long-overdue startup improvement.
blocking2.0: --- → ?
Summary: fold all remaining shared libs into libxul → fold all remaining shared libs(-nspr -gnome components) into libxul
I'll sort out the deps soon, but our goal will be to fold everything except NSPR and NSS into libxul. We'll leave that as a followup post-Firefox 4.

Comment 30

7 years ago
hidden-visibility reduce the cost of PLT/GOT usage and allows more automatic inlining. It does not give you back the PIC register nor reduce the prologue costs. So on x86 the PIC is still quite expensive (I had numbers on that too, but don't seem to be able to find them).  x86-64 is better because of IP relative addressing, but the expenses was still high enough so we did not go for PIC everywhere compilation model.
At this point this is not going to block, but I'd still take a patch up through beta4 (ships 20-Aug).
blocking2.0: ? → -
(Reporter)

Updated

7 years ago
Blocks: 622908
(Reporter)

Updated

6 years ago
Depends on: 648407
Assignee: ted.mielczarek → nobody
(In reply to Taras Glek (:taras) from comment #15)
> I'm also hoping to be able to strip out parts of nss that we don't use(via
> objcopy?), seems like that would be easier with a static nss.(In reply to
> comment #9)

That is bug 611781.
Blocks: 611781
Can we close this as wontfix now that we had to split things out of xul because of windows pgo?
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #33)
> Can we close this as wontfix now that we had to split things out of xul
> because of windows pgo?

I talked to some Chrome developers and they said they're able to build their big DLL *with* PGO by building a bunch of static libraries with PGO and then linking those static libraries together. I don't know if that approach will work for us, but we should verify that we've tried it because if it works, it solves a lot of problems (e.g. no need for gkmedia.dll anymore).
If nothing else, this could still be useful for win64 builds some day.
(In reply to Brian Smith (:bsmith) from comment #34)
> (In reply to Rafael Ávila de Espíndola (:espindola) from comment #33)
> > Can we close this as wontfix now that we had to split things out of xul
> > because of windows pgo?
> 
> I talked to some Chrome developers and they said they're able to build their
> big DLL *with* PGO by building a bunch of static libraries with PGO and then
> linking those static libraries together.

I tried that. But there's apparently no such thing as static libraries with PGO. All you end up with is a collection of AST files that still need to be PGOed when linking them all together in the final link.

I think what they mean by PGO is LTCG, and LTCG alone sucks less memory than PGO. LTCG is hardly a big performance bump. PGO is.
Also, I'm still not convinced there's value is having a big fat library when a big chunk of it is stuff like webrtc, that is used once in a while if at all.
It's useful from a linkage standpoint, which is the biggest pain with things like WebRTC right now. For things that are essentially standalone libraries like NSPR and NSS there's not as much benefit (except that we use NSPR everywhere, so we probably would benefit from cross-module optimization).

Also, for the record, we enabled LTCG before we enabled PGO, and it was a fair perf win, but PGO was a bigger win on top of that.
(In reply to Ted Mielczarek [:ted] from comment #38)
> It's useful from a linkage standpoint

But it hurts at runtime.
No longer blocks: 611781
You need to log in before you can comment on or make changes to this bug.