Closed Bug 905210 Opened 11 years ago Closed 7 years ago

the setup of VMFunctions for the JIT generates large-ish static initializers

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: froydnj, Assigned: away, Mentored)

References

(Blocks 1 open bug)

Details

(Whiteboard: [lang=c++])

Attachments

(1 file)

All the VMFunctions for the JIT are initialized thusly:

typedef ...SomeOpFn...;
static const VMFunction SomeOpInfo = FunctionInfo<SomeOpFn>(ion::SomeOpImpl);

(side note: the use of templates for all this is very nice)

For better or for worse, GCC compiles this into a static initializer that does:

{
  FunctionInfo<SomeOpFn> tmp(ion::SomeOpImpl);
  SomeOpInfo = tmp;
  ...repeat for numerous other functions...
}

which spends a good bit of code to initialize |tmp| on-stack and then performs an out-of-line call for the copy constructor of SomeOpInfo.  For all the VMFunctions on ARM, this is about 12K of static initializers.  I haven't run the numbers, but I estimate this is easily 2x the amount of code for all the other static initializers in Gecko (libxul) combined.

Declaring a separate FunctionInfo enables the compiler to eliminate the on-stack initialization, which is a win (down to maybe 3-4K of static initializers), but still leaves static initializers to perform the copy constructor for VMFunction.

AFAICS, we don't dynamically create VMFunctions; it would be nice if we could specify all the relevant VMFunctions in some table somewhere and avoid the runtime initialization (copying the VMFunctions and initializing next pointers, etc.) entirely.
(In reply to Nathan Froyd (:froydnj) from comment #0)
> All the VMFunctions for the JIT are initialized thusly:
> 
> typedef ...SomeOpFn...;
> static const VMFunction SomeOpInfo = FunctionInfo<SomeOpFn>(ion::SomeOpImpl);
> 
> (side note: the use of templates for all this is very nice)

In fact, we could get rid of the typedef, but having it provides a compilation error at the VMFunction declaration if the prototype change.  This act as some kind of reminder for both the person who is making the patch and the reviewer.

> For better or for worse, GCC compiles this into a static initializer that
> does:
> 
> {
>   FunctionInfo<SomeOpFn> tmp(ion::SomeOpImpl);
>   SomeOpInfo = tmp;
>   ...repeat for numerous other functions...
> }
> 
> which spends a good bit of code to initialize |tmp| on-stack and then
> performs an out-of-line call for the copy constructor of SomeOpInfo.  For
> all the VMFunctions on ARM, this is about 12K of static initializers.  I
> haven't run the numbers, but I estimate this is easily 2x the amount of code
> for all the other static initializers in Gecko (libxul) combined.

Whoa … I guess the number of VMFunction grew fast.  I insisted that Brian Hackett provides benchmark results to know the start-up impact  both in terms of time & space consumed by all the VMFunctions created at the time.  If we have 12 KB of static variables, then we should be scared by the number of CodeGen that we produce for these trampoline, because the VMFunctions are likely smaller than the generated code.

Before Bug 786146, we were initializing all these VMFunctions as a per-codegen basis.  This had to move to the static variables, as we cannot allocate during parallel compilation.

A suggestion made at that time, was to mark all the callVM sites with place-holders, and allocate the right one at the time when the code is linked.

Another approach, which might be a bit more chalenging but might bring some interesting feature to the JS engine, would be to re-use the ForkJoin slice allocation system, which is creating a dedicated chunk of memory for all the allocations.  And at the end we can just merge back the pages when we finish the compilation.

> AFAICS, we don't dynamically create VMFunctions; it would be nice if we
> could specify all the relevant VMFunctions in some table somewhere and avoid
> the runtime initialization (copying the VMFunctions and initializing next
> pointers, etc.) entirely.

One of the point that I would like to keep is that a VMFunction must be defined near its call-site.

I think we can easily switch back to the place-holder version that we used to have before Bug 786146.  And generates repatch labels with an associated index if the trampoline has not yet been generated.

I can (work on / mentor / review) this bug later if needed.
Blocks: 786146
Another crazy idea for handling this: All our supported compilers support assigning variable names to user-defined sections.  So we could, in theory, do something like:

1) Define new sections (only needed for MSVC);
2) Make sure to annotate definitions of VMFunctions with appropriate section attributes
3) Modify expandlibs_exec.py to ensure that all the things in this new section are emitted to the appropriate contiguous place in the output file (only necessary for Linux/Android; not quite sure if we can do this on Mac...)
4) Run through the array of VMFunctions to initialize everything at runtime.

Downsides are that it's somewhat complicated and requires embedders to know about the special semantics of how VMFunctions are initialized.
Assignee: general → martin
Whiteboard: [mentor=nbp][lang=c++]
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> A suggestion made at that time, was to mark all the callVM sites with
> place-holders, and allocate the right one at the time when the code is
> linked.

So we can't allocate during parallel compilation, but it's safe to do so during linking?
(In reply to Martin Törnwall [:mtornwall] from comment #3)
> (In reply to Nicolas B. Pierron [:nbp] from comment #1)
> > A suggestion made at that time, was to mark all the callVM sites with
> > place-holders, and allocate the right one at the time when the code is
> > linked.
> 
> So we can't allocate during parallel compilation, but it's safe to do so
> during linking?

Yes, but allocating during the link phase implies that we need to patch all locations which are referring to one wrapper of a VMFunction.
Mentor: nicolas.b.pierron
Whiteboard: [mentor=nbp][lang=c++] → [lang=c++]
Hello,
Is the bug still available to fix?

Thanks
(In reply to vinayakagarwal6996 from comment #5)
> Is the bug still available to fix?

Yes, I think this issue is still present, our VMFunction initialization did not changed.

The initial issue, which matters for the start-up of Firefox for Android, is the time taken to register all the VMFunctions.

VMFunctions have a next field which as far as I know is the only source which is not known at compilation-time.  If we were to move the "next" field initialization out of the VMFunction, then we could use const-expr constructor and enforce that each FunctionInfo is a const-expr, and that VMFunction are only taking references to these const-expr.
I noticed these large initializers while working on bug 1334254.

On Windows, in the generated code for VMFunction(const VMFunction& o), the compiler is actually doing "*this = o;" by hand, copying field by field. Given that everything is static data except for the "next" piece, it would be nice if the compiler could recognize that and populate the fields appropriately. I rewrote the copy constructor with a member initializer list, and it seems to have worked. The VMFunction fields are now mostly pre-populated, and the runtime constructor code for each XYZInfo becomes merely:

0:000> u 1800ce6b0
00000001`800ce6b0 488d0d79b11903  lea     rcx,[xul!js::jit::DoNewArrayInfo (00000001`83269830)]
00000001`800ce6b7 e964905602      jmp     xul!js::jit::VMFunction::addToFunctions (00000001`82637720)
Blocks: 1334254
This removes about 20k from Win64 xul.dll.

The only piece that's really necessary is the member initializer list. However, I recognize that I'm breaking a bit of a safety net in that now you have to remember to add new fields to this ctor. To atone for this, I made the neighbor ctor be constexpr so that you're forced to initialize all fields, and hopefully it should be easy to visually verify that the two constructors stay in sync.

Nathan, want to see if this also works on ARM?
Attachment #8831842 - Flags: feedback?(nfroyd)
Comment on attachment 8831842 [details] [diff] [review]
Member initializers

Review of attachment 8831842 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking over this bug :)
I don't think the removal of the "safety net" is much of a concern as people modify this when they need to extend it, and so are likely adding a test at the same time.

Also, this code has been quite stable for years now, so we should be quite safe already.

r=nbp with the following nit fixed.

::: js/src/jit/VMFunctions.h
@@ +245,5 @@
>          outParamRootType(outParamRootType),
>          extraValuesToPop(extraValuesToPop),
>          expectTailCall(expectTailCall)
>      {
> +#ifdef DEBUG

I do not understand why it is necessary to add a #ifdef DEBUG around MOZ_ASSERT statements?
Is windows compiler const-expr unable to remove   “do{}while(0)”  statements?  Should we consider another form of void statements such as   “(void)0”  ?

@@ +252,3 @@
>          MOZ_ASSERT_IF(outParam != Type_Void, returnType == Type_Bool);
>          MOZ_ASSERT(returnType == Type_Bool ||
>                     returnType == Type_Object);

nit: Move these assertions to the next constructor, and remove the #ifndef around the constexpr keyword.
Attachment #8831842 - Flags: feedback?(nfroyd) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #10)
> I do not understand why it is necessary to add a #ifdef DEBUG around
> MOZ_ASSERT statements?
> Is windows compiler const-expr unable to remove   “do{}while(0)” 
> statements?  Should we consider another form of void statements such as  
> “(void)0”  ?

constexpr constructors are defined by the language in C++11 to not have a constructor body, not even no-op constructor bodies: [dcl.constexpr]p4 or thereabouts.  In later versions, this restriction is relaxed, but this is what we have to deal with currently.
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/097237b12c7a
Use a member initializer list in VMFunction for better codegen with static data. r=nbp
Assignee: martin → dmajor
https://hg.mozilla.org/mozilla-central/rev/097237b12c7a
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: