After more measurement turned out that bug 525013 was overly ambitious. Having a single static binary does save on relocations, but it is complicated and mostly precludes workarounds for inefficient library loaders(bug 554421). Having libraries in a single giant binary reduces amount of random io on startup. A single library also allows for better compiler optimization which will further reduce the amount of io.
Note that this approach will still cost us ~160k relocations on startup for static data (mostly vtables). That might not be a big deal, since those relocations are just math and we can most-likely do the I/O for those all in one disk read.
(In reply to comment #1) > Note that this approach will still cost us ~160k relocations on startup for > static data (mostly vtables). That might not be a big deal, since those > relocations are just math and we can most-likely do the I/O for those all in > one disk read. On linux that's not a problem as prelink takes care of them.
bug 534471 is the big one here. I'll talk to wtc and see if we can get that moving along. Aside from that we currently have: libmozalloc.so - part of the patch in bug 525013 made this a static lib, will double-check with cjones that putting it in libxul would be ok libmozjs.so - easy enough to statically link libmozsqlite3.so - bug 525013 made us statically link this at the cost of linking it both into firefox and NSS libxpcom.so - this is the XPCOM glue, I think we need this (bsmedberg?)
We need it as long as we want to maintain binary compatibility. As soon as we can drop binary compat, we can and should drop xpcom.dll.
(In reply to comment #4) > We need it as long as we want to maintain binary compatibility. As soon as we > can drop binary compat, we can and should drop xpcom.dll. I think for the purposes of this bug that's a 'yes'. Lets take the same approach as before, add a mozconfig ( --only-xul?) and land it. Once landed we can tweak that configuration until it is better enough(or time is right) to justify binary breakage.
One of Firefox's present capabilities is to be able to load third party crypto libraries that utilize their crypto hardware gizmos (e.g. "smart cards" or "usb tokens"), and in fact, Firefox's own crypto software works as one of those modules, so that ALL of firefox's crypto, whether its own or third party all works the same way, via the same shared library API. I encourage you to preserve that aspect of NSS. That means continuing to preserve some (a few), not all, of NSS's shared libs as separate libs.
Ted: as I noted in the two WARNINGs in bug 534471 comment 0, neither I nor the NSS team supports that patch, and you lose the FIPS validation status of the NSS software crypto module. Firefox needs to stay FIPS-validated. Using NSS as a static library is okay if you will continue to provide a Firefox build that uses NSS shared libraries.
Whether Firefox needs to remain FIPS-validated is a product decision that we can make based on the costs and benefits.
What is the effect on Ts of leaving NSS as a dynamic library? Sorry if I missed it (could be we don't know yet because other dynamic libs need to be linked into libxul). /be
WTC: what does the NSS team suggest to get equivalent performance gains in this key area? I agree with Benjamin: we should carefully evaluate what the FIPS needs are. If we have to pay to get Firefox-with-static-NSS FIPS certified, and we care enough about Firefox-as-FIPS to do that, I wouldn't rule it out. It would mean that distributors who wanted to modify Firefox would have to do their own FIPS validation on the result, but since they also employ the NSS team that doesn't want to take the changes, they are free to choose their own adventure there too.
WTC: should Firefox just move to using the same NSS that Chrome does? It seems like the patch in question is considered OK for Chrome to ship[*], and I can't think of much NSS evolution that I would want to track at the expense of this important performance characteristic. [*] http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/nss/README.chromium?revision=45059&view=markup -- Local Modifications It might be that it means that we need to fund some work on an NSS fork to contain the NSS shared library behaviour, fix other pain like the RNG-initialization situation, and then maybe re-FIPS it in the configuration that we choose to ship in Firefox. I'm certainly willing to entertain bearing those costs if it leads to the expected improvements in user experience.
shaver: Chrome's use of NSS is hidden behind an SSLClientSocket interface, and Chrome has a second implementation of SSLClientSocket using Windows SChannel. If necessary, a user can instruct Chrome (by passing a command-line option) to use Windows SChannel and rely on the FIPS validation of the Windows system crypto module. Today, Mozilla is the only distributor of Firefox for Windows. It's prohibitively expensive for another group to produce and maintain a FIPS validated version of Firefox for Windows. So the Firefox users who must use FIPS-validated products are counting on Mozilla.
I don't know why it's prohibitively expensive for another group -- AFAIK the companies and organizations who are dealing with this stuff are much bigger than Mozilla is, in terms of their revenue and resources. But that's another issue -- we can certify Firefox if we decide that we need to. (Those Firefox users didn't even get a Firefox 3, by my understanding, so I'm not sure how significant a group they are.) We could also just ship a "slower, but if you care about FIPS, go ahead" version of FF, with NSS linked dynamically and crawling temporary directories all over the place, etc. If there is a large community of users who need FIPS, I expect that we'll be able to find contributors to help us maintain that.
Can someone answer my question from comment 9? Crawling temporary directories for entropy is a separate issue, and (I thought) fixed. Kinda a cheap shot :-/. /be
> What is the effect on Ts of leaving NSS as a dynamic library? Sorry if I missed > it (could be we don't know yet because other dynamic libs need to be linked > into libxul). Here is my coldstartup perspective. with prefetch off on windows nss dlls cause ~20% as much pagefaults as xul.dll(75 vs 396). Story is similar on linux. When windows prefetch is on here, it seems to correctly preload nss(which suggests low overhead). Having said that, relative nss overhead will increase as we optimize the layout of libxul. That 20% figure itches towards 50% once we unleash pgo or icegrind on our binary. Problem is that nss is broken up into many files(instead of 0) which prevents us from making effective use of readahead, etc. I'm also hoping to be able to strip out parts of nss that we don't use(via objcopy?), seems like that would be easier with a static nss.(In reply to comment #9)
I believe, based on my experience in bugs about entropy collection, APIs, fixing outright bugs in ARM code, adding support for new platforms, and so forth, that Mozilla and the NSS team are not aligned with respect to the relative importance of various changes. I propose to resolve that tension by cutting the FIPSian knot, such that Mozilla can bear the costs of FIPS certification for Firefox (something that we were asked to contribute to financially the last time it was pursued by either Sun or Red Hat), as we would also reap for our users the benefit of being able to make FIPS-breaking changes to our crypto subsystem. The NSS team will be able to continue to work on the shared-library system for their uses, and we can specialize it for ours, such that not all of our users pay for the FIPS-supporting overhead, and not all of theirs need lose FIPS certification in order for us to support new platforms or improve performance for our use case.
Bleh, I missed a couple of libs on Linux: components/libdbusservice.so components/libmozgnome.so components/libnkgnomevfs.so these are a pain right now because: a) They link to the xpcom glue and b) They link directly to system libraries and expect to fail if those libs don't exist, making the component unavailable. I'll file a separate bug about them.
Created attachment 443139 [details] [diff] [review] Maximum libxul, v1. Here's a quick first pass. if you add export MOZ_MAXIMUM_LIBXUL=1 to your mozconfig, this patch will then get rid of libmozalloc.so, libmozjs.so and libmozsqlite3.so. bsmedberg suggests we could keep the xpcom glue, but fold it into libxul instead, and export the symbols from there.
Created attachment 443330 [details] [diff] [review] Maximum libxul, v2. This also gets rid of the xpcom glue, folding it into libxul instead. Components can link against libxul to pick up the necessary symbols. With this + the patches from bug 562313, I'm down to 16 sharedlibs on a Linux build, of which 11(!) are NSPR+NSS, 1 is the null plugin, and 3 are the components I filed bug 563628 on.
I got Firefox to build with static NSPR+NSS. I had to tweak the patches in the dep bugs and some other things. If you'd like to try it the easiest way right now would be to clone my mq: http://hg.mozilla.org/users/tmielczarek_mozilla.com/mq/ and apply up to the nss-static-moz patch. The build starts, but crashes as soon as it inits NSS. I'll look at that tomorrow.
Also I've only tested on Linux currently, so this may or may not build elsewhere.
Doesn't build on Windows or OS X yet, I'm working on that. I've updated the patchqueue, if you push the patches up to and including "configure-hardcode-max-libxul" you'll get the right kind of build by default.
Note, on Linux this depends on properly passing pgo flags to every binary within the fat xul. Preliminary Linux testing showed that a fat xul is currently a regression, proper binary layout should make it a significant win. Will test on Windows when this builds there.
Getting rid of -fPIC on ELF x86 systems should save one extra register for compiler and avoid expensive function prologues, so runtime speedups should be measurable easily. It is weird this leads to a regression. Do you have more precise numbers what slows down?
I don't believe that we can remove -fPIC without disabling ASLR, and losing its security benefits. Am I mistaken?
-fPIC does not cost nearly that much if you are using primarily hidden-visibility symbols, as we are.
Ted is confident we can get this in for ff4. It's a long-overdue startup improvement.
I'll sort out the deps soon, but our goal will be to fold everything except NSPR and NSS into libxul. We'll leave that as a followup post-Firefox 4.
hidden-visibility reduce the cost of PLT/GOT usage and allows more automatic inlining. It does not give you back the PIC register nor reduce the prologue costs. So on x86 the PIC is still quite expensive (I had numbers on that too, but don't seem to be able to find them). x86-64 is better because of IP relative addressing, but the expenses was still high enough so we did not go for PIC everywhere compilation model.
At this point this is not going to block, but I'd still take a patch up through beta4 (ships 20-Aug).
(In reply to Taras Glek (:taras) from comment #15) > I'm also hoping to be able to strip out parts of nss that we don't use(via > objcopy?), seems like that would be easier with a static nss.(In reply to > comment #9) That is bug 611781.
Can we close this as wontfix now that we had to split things out of xul because of windows pgo?
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #33) > Can we close this as wontfix now that we had to split things out of xul > because of windows pgo? I talked to some Chrome developers and they said they're able to build their big DLL *with* PGO by building a bunch of static libraries with PGO and then linking those static libraries together. I don't know if that approach will work for us, but we should verify that we've tried it because if it works, it solves a lot of problems (e.g. no need for gkmedia.dll anymore).
If nothing else, this could still be useful for win64 builds some day.
(In reply to Brian Smith (:bsmith) from comment #34) > (In reply to Rafael Ávila de Espíndola (:espindola) from comment #33) > > Can we close this as wontfix now that we had to split things out of xul > > because of windows pgo? > > I talked to some Chrome developers and they said they're able to build their > big DLL *with* PGO by building a bunch of static libraries with PGO and then > linking those static libraries together. I tried that. But there's apparently no such thing as static libraries with PGO. All you end up with is a collection of AST files that still need to be PGOed when linking them all together in the final link. I think what they mean by PGO is LTCG, and LTCG alone sucks less memory than PGO. LTCG is hardly a big performance bump. PGO is.
Also, I'm still not convinced there's value is having a big fat library when a big chunk of it is stuff like webrtc, that is used once in a while if at all.
It's useful from a linkage standpoint, which is the biggest pain with things like WebRTC right now. For things that are essentially standalone libraries like NSPR and NSS there's not as much benefit (except that we use NSPR everywhere, so we probably would benefit from cross-module optimization). Also, for the record, we enabled LTCG before we enabled PGO, and it was a fair perf win, but PGO was a bigger win on top of that.
(In reply to Ted Mielczarek [:ted] from comment #38) > It's useful from a linkage standpoint But it hurts at runtime.