Closed Bug 1478124 Opened 6 years ago Closed 5 years ago

Use static hash tables/DAFSAs for static component registration

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: kmag, Assigned: kmag)

References

(Blocks 1 open bug)

Details

(Whiteboard: [overhead:100k])

Attachments

(14 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
We currently have two large hashtables of registered components, one of contract IDs and one of CIDs. Together, these use about 150K per process.

We could save most of this per-process overhead by just using static tables for static components, with dynamic hashtables holding only overrides for JS components.

That would probably also simplify a lot of things. It would require us to define static modules in JSON or python files, and then generate headers and static data modules from their contents. That would be a big change, but it would also allow us to remove a lot of the duplication and indirection of our current module definition process.
Depends on: 1492584
An option here might be to switch to using something like manifest files uniformly for all component registrations, and adding build-system support for codegen (like the build system support we have for XPIDL).

If we wanted, we could actually use XPIDL itself as the file with component registrations, which would let us avoid adding new build system support. We already generate C++ code from this information using xptcodegen for XPConnect, so it wouldn't be too hard to add it on. It may feel like a conflation of concerns however.

That logic also already uses my perfecthash.py code for lookup by both name and nsID, so it would be pretty easy to just add one more table.
(In reply to :Nika Layzell from comment #1)
> An option here might be to switch to using something like manifest files
> uniformly for all component registrations, and adding build-system support
> for codegen (like the build system support we have for XPIDL).

That was my plan, yeah. It looks like it's going to be a pain in the ass, but doable.

> If we wanted, we could actually use XPIDL itself as the file with component
> registrations, which would let us avoid adding new build system support.

Hm... that might be a good idea...
(In reply to Kris Maglione [:kmag] from comment #2)
> > If we wanted, we could actually use XPIDL itself as the file with component
> > registrations, which would let us avoid adding new build system support.
> 
> Hm... that might be a good idea...

I think you could do this as an incremental change this way.

Say we support a block like this:

    // UUID is specified as an attribute (for alignment with interfaces *shrug*)
    //
    // NOTE: I'm vaguely interested in killing the need to specify interface uuids 
    // as we can generate deterministic uuids based on the interface's name.
    //
    // For most components this might be a good idea too because people only 
    // construct them through their contractid.
    [uuid(aeaee24c-1054-49e6-a5be-3ad4fbdb00bd)]
    component nsSomeClass { // Name exists for debugging purposes. May be good to include in final binary?
                            // We could also use this to generate a default constructor without needing an
                            // explicit declaration if we wanted.

        // Component can technically have multiple (or no) contractIDs, so no reason
        // to not support it *shrug*.
        contractid "@mozilla.org/random/cid;1";

        // The constructor probably needs some headers, so we need to allow them 
        // to be included, and CDATA entries should probably go into the generated
        // code here.
        #include "nsSomeHeader.h"
        %{C++
        static void SomeRandomCode() { ... }
        %}

        // -- Mechanisms for describing constructor strategy (mutually exclusive) --

        // Basic generic constructors
        [init(Init)]       // Optional [init(...)] attribute to specify auto init method.
        class nsSomeClass; // This type will be 'new'-ed with the default ctor.

        // Custom implemented constructors. Will be called like current C++ ctors.
        constructor nsSomeClassConstructor;

        // JS constructors (This may not be in the first version because I don't 
        // think you can write one of these with the NSMODULE_DEFN syntax currently)
        jsimpl "resource://...jsFilename.js";
    }

    // These probably need to be separate, because the value isn't always a contractid.
    category "name" "entry" "value"; 
    // Alternative syntax perhaps? (IIRC names are basically always identifiers + '-')
    category name["entry"] = "value";

Not sure if that's the best mechanism but it would fit with the syntax of the rest of .idl files decently well.

We'd then include it in the XPT file next to the list of interfaces, and in xptcodegen we could generate a NSMODULE_DEFN(...) = ...; style definition. Once other NSMODULE_DEFN defintions are gone, this can be shifted over to a static table.
For static components, I don't intend to allow removing or replacing CID
entries, only contract ID entries. And I would generally prefer, when
restoring overrides of those classes, to not create a new dynamic factory
entry for the contract ID.

We already have the ability to mock components without either of those issues,
but registering a new CID entry for the mock (without unregistering the
original), and then restoring the original by calling `registerFactory` with a
null factory object.

This patch updates our existing mocks to behave that way, and paves the way
for the rest of the patches.
The current implementations of GetService are slightly different for contract
IDs than they are for CIDs, but I'm pretty sure that's unintentional.

This patch factors out the common parts of the two implementations, which
should prevent them from diverging in the future, and avoids the need to make
the same changes in multiple places in the following patches.
Currently, when we build the component registry at startup, we exclude any
entry with a process selector which doesn't match the current process. When we
switch to static lookup tables, however, that check is going to have to happen
for every lookup, since we can't alter the table at runtime.

That may not matter much, given how expensive the rest of the component lookup
code is relative to ProcessMatchesSelector, but it's also easy and cheap
enough to generate a lookup table for all possible ProcessSelector values, and
do a quick index check instead.
This aggregates a list of all static component manifests in the tree, and
writes them out to a `manifests-lists.json` file, which is read by the codegen
scripts in the next patch.

It slightly abuses the IDL lists machinery, given that these aren't
technically IDL files. But the semantics are similar enough that it seemed
like the best option.
This patch essentially creates a separate, static component database for
statically-defined CID and contract ID entries, and gives it precedence over
the runtime DB. It combines the two separate databases by updating existing
code to use lookup functions which understand both databases, and then access
all entries through wrappers which defer to the appropriate underlying type.

Static component entries require no runtime relocations, and require no
writable data allocation aside from one pointer-sized BSS entry per CID, and
one bit of BSS per contract ID.

To achieve this, all strings in the static lookup tables are stored as indexes
into a static string table, all constructor functions live in a switch
statement which compiles to a relative jump table, and all writable data for
static entries is accessed by indexed lookups into BSS arrays.

We also avoid creating nsIFactory entries for static components when possible
by adding a CreateInstance method to nsFactoryEntry and the corresponding
entry wrapper to directly call the appropriate constructor method, and only
create a factory object when required by external code.
We have tons of code in the component manager which stringifies nsIDs so that
it can print the result. The standard stringification process is pretty
bloated, and makes the code difficult to update. And, frankly, I mostly just
got tired of copying it around.

This patch adds a helper which stringifies a nsID to a nsAutoCString, which
can be used as a temporary value in a single statement, rather than requiring
a separate local variable and function call for each operation.
The static XPCOM manifest format makes it easy to define a component in a
single place, without separate contract ID and CID macro definitions in
headers, and variable constants in module files. Without any other changes,
however, those macros are still required in order to create instances of or
retrieve services for the component.

This patch solves that problem by allowing component definitions to include an
explicit component name, and adding helpers for each named component to
Components.h:

  mozilla::components::<Name>::CID() to retrieve its class ID.

  mozilla::components::<Name>::Create() to create a new instance.

  mozilla::components::<Name>::Service() to retrieve its service instance.

These getters have the benefit of doing full compile-time sanity checking,
with no possibilty of using a mistyped contract ID string, or a macro constant
which has gotten out of sync with the component entry.

Moreover, when possible, these getters are optimized to operate on module
entries directly, without going through expensive hash lookups or virtual
calls.
Those .conf files are imho a step in the wrong direction, except if we have a policy to never accept new components anymore. If anything, that's a step in the opposite direction bug 938437 went.
Note: These patches update the largest modules to use static config files, which gets us down to about 230 dynamic entries, and a hash table capacity of 256, and about 60K or per-process overhead. There's a long tail of about 150 more, which which would get us down to a capacity of 128, and save some tens of K more. I'll tackle those in a follow-up.
(In reply to Mike Hommey [:glandium] from comment #18)
> Those .conf files are imho a step in the wrong direction, except if we have
> a policy to never accept new components anymore. If anything, that's a step
> in the opposite direction bug 938437 went.

As much as I'd love to have a policy of never accepting new components, there's still too much that relies on them:

 - Protocol handlers
 - About handlers
 - Object serialization
 - Content viewer handlers
 - ...

For the cases where we can easily avoid them creating new components, though, yeah, the policy should be not to create them.
(In reply to Kris Maglione [:kmag] from comment #20)
> (In reply to Mike Hommey [:glandium] from comment #18)
> > Those .conf files are imho a step in the wrong direction, except if we have
> > a policy to never accept new components anymore. If anything, that's a step
> > in the opposite direction bug 938437 went.
> 
> As much as I'd love to have a policy of never accepting new components,
> there's still too much that relies on them:
> 
>  - Protocol handlers
>  - About handlers
>  - Object serialization
>  - Content viewer handlers
>  - ...

Do you think it's worthwhile to port those things to use mechanisms other than component registrations (for example like what I did for authentication providers in bug 1502774)?  Arguably all of the examples you've listed are abuses of component registrations now that we don't support old-style extensions.
(In reply to :Ehsan Akhgari from comment #21)
> Do you think it's worthwhile to port those things to use mechanisms other
> than component registrations (for example like what I did for authentication
> providers in bug 1502774)?  Arguably all of the examples you've listed are
> abuses of component registrations now that we don't support old-style
> extensions.

Maybe. The possibility certainly occurred to me, and I've really never liked the idea of having to construct a long string and do a hash lookup for it every time we want to access one of these things.

But I also don't really have a better solution, at the moment. One of the nice things about the current model is that we can define things like protocol handlers in various places around the tree, where the code for that protocol logically belongs, without having some sort of central static registry, or having to register them at runtime.

And, in particular, with the static component registry model, there really isn't any runtime memory or computational cost to registering the things.

So if we wanted to stop using the component registry for these things, we'd need something else with similar properties. And I don't know what such a thing would look like, or if it would be worth implementing.

(In reply to Mike Hommey [:glandium] from comment #18)

Those .conf files are imho a step in the wrong direction, except if we have
a policy to never accept new components anymore. If anything, that's a step
in the opposite direction bug 938437 went.

I don't think the situation in bug 938437 and here are comparable.

I got to part 8a and realized that the component definitions are full Python files. My initial reaction was "uh, no, we should do something else". But what else are we going to do? We need some kind of build-time configurability for the components we're including, so a static JSON-y file is out. The component definitions contain so much C++-specific stuff that it seems quite odd to put all the information into moz.build files.

I suppose you could have static JSON-y files, with separate files for components that are conditionally included, and then conditionally build up a list of files in moz.build, but then you'd have a bunch of small, single-entry files, which doesn't seem super-helpful. We do have a sort of analogue in category auto-registration, but I'm not sure we want to replicate that model here.

Making the configuration files be Python is probably the most reasonable thing to do.

(In reply to Nathan Froyd [:froydnj] from comment #23)

I got to part 8a and realized that the component definitions are full Python
files. My initial reaction was "uh, no, we should do something else". But
what else are we going to do? We need some kind of build-time configurability
for the components we're including, so a static JSON-y file is out. The
component definitions contain so much C++-specific stuff that it seems quite
odd to put all the information into moz.build files.

I suppose you could have static JSON-y files, with separate files for
components that are conditionally included, and then conditionally build up a
list of files in moz.build, but then you'd have a bunch of small, single-entry
files, which doesn't seem super-helpful.

Yeah, I went back and forth on this. I initially wanted to use JSON files, and I
thought about allowing preprocessing for the places where we needed to do things
conditionally, but when I looked at what we needed for existing modules, it
seemed like it would wind up being much messier than just doing it in Python.
And we already had some prior art in Bindings.conf which seemed to work pretty
well.

I did also consider just putting the definitions in moz.build files, and I it
would probably work well enough, but it just didn't really feel right. That's
not the kind of information we usually use moz.build files for, so a separate
manifest file seemed to make more sense.

https://hg.mozilla.org/integration/mozilla-inbound/rev/bedaa9c437ad30ea88bdc0e8fc83f4a2e980812e
Bug 1478124: Part 1 - Update component mocks to replace and restore components sanely. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/538e40c5ee1336a7ba467f0f4128dcddf97ef75d
Bug 1478124: Part 2 - Factor out common GetService code. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/dd00365ebb55a06b4d6896bc86dd0fc94482d805
Bug 1478124: Part 3 - Add a lookup table for ProcessMatchesSelector. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/f500020a273a27c66bf2166505a0e97bbc34a214
Bug 1478124: Part 4a - Add XPCOM_MANIFESTS moz.build variable for XPCOM component manifests. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/b8d2dfdfc324c53ce5aacc822ce52d4e2bfdc31a
Bug 1478124: Part 4b - Support loading components from static lookup tables. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/1ddd80d9e91a17c01f0a8a73036810042a0ab080
Bug 1478124: Part 5 - Add AutoIDString helper, and use where appropriate. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/929fd654c9dfc3222e1972faadea3cc066e51654
Bug 1478124: Part 6 - Add helpers for creating/inspecting static modules. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/5d85deac61c2ee54a69525de8bdfff4be72d224c
Bug 1478124: Part 7 - Add docs for static component manifests. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/d94039b199437180309264cb4c206ae7ebb7d21d
Bug 1478124: Part 8a - Update toolkit module to use a static component manifest. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/b0444e0bc801f828b49f9953a73498cf5ff5024b
Bug 1478124: Part 8b - Update DocShell module to use a static component manifest. r=bzbarsky

https://hg.mozilla.org/integration/mozilla-inbound/rev/21f4fda0315963e42bae8784c63116f00ee0fa92
Bug 1478124: Part 8c - Update Layout module to use a static component manifest. r=Ehsan

https://hg.mozilla.org/integration/mozilla-inbound/rev/496aaf774697f817a689ee0d59f2f866fdb16801
Bug 1478124: Part 8d - Update netwerk module to use a static component manifest. r=mayhemer

https://hg.mozilla.org/integration/mozilla-inbound/rev/012fd0107204da802f04b7c133b33a5dd22123a4
Bug 1478124: Part 8e - Update XPCOM module to use a static component manifest. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/8dacce59fcc0c966a3753b3ced9b1afd0043475a
Bug 1478124: Part 8f - Update NSS module to use a static component manifest. r=keeler
https://hg.mozilla.org/integration/mozilla-inbound/rev/9014cbf1d71b97d7ca017f17de40c9c2c27534eb
Bug 1478124: Part 1 - Update component mocks to replace and restore components sanely. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/60843989ea28bd35c31e4e666c7eb37ba1a425f0
Bug 1478124: Part 2 - Factor out common GetService code. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/1b419ed6e5cb5a01f9ee6345a30a2ab6c26d7248
Bug 1478124: Part 3 - Add a lookup table for ProcessMatchesSelector. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/81324e422e17402f9c773de0eefdd70704cbd91c
Bug 1478124: Part 4a - Add XPCOM_MANIFESTS moz.build variable for XPCOM component manifests. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/29a6f9f0ba6dbb216591c9bdd91f63e9ca6a7080
Bug 1478124: Part 4b - Support loading components from static lookup tables. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/302ccff16b7b5746ad877a2db3e84982115c158f
Bug 1478124: Part 5 - Add AutoIDString helper, and use where appropriate. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/e61622f485158ba95d864c259ddebc0640131b1d
Bug 1478124: Part 6 - Add helpers for creating/inspecting static modules. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/a7e8dd650fee6a3e5bab8ee52dcd8d8dd84f7c76
Bug 1478124: Part 7 - Add docs for static component manifests. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/6a412d8f24dbd09f001034a8ad369c40d9ebe7b0
Bug 1478124: Part 8a - Update toolkit module to use a static component manifest. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/f2f20322a757baa9c2be0f309456c2a41a63ae7d
Bug 1478124: Part 8b - Update DocShell module to use a static component manifest. r=bzbarsky

https://hg.mozilla.org/integration/mozilla-inbound/rev/1b7d8d7be13d4befd087089938a9bec464c9e3e8
Bug 1478124: Part 8c - Update Layout module to use a static component manifest. r=Ehsan

https://hg.mozilla.org/integration/mozilla-inbound/rev/4a36400ab63cbc7ff3758c1a6fc1d1a209d9d131
Bug 1478124: Part 8d - Update netwerk module to use a static component manifest. r=mayhemer

https://hg.mozilla.org/integration/mozilla-inbound/rev/c4df3fabb4e639156559b1d7bae31ba83a620f17
Bug 1478124: Part 8e - Update XPCOM module to use a static component manifest. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/066365f5d7422a1558fc40578150606b5942050f
Bug 1478124: Part 8f - Update NSS module to use a static component manifest. r=keeler
Depends on: 1524450
Blocks: 1524687
Blocks: 1524688
Blocks: 1528437
Regressions: 1545381
Regressions: 1530660
Flags: needinfo?(kmaglione+bmo)
Regressions: 1700420
No longer regressions: 1700420
Blocks: 1773770
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: