Closed Bug 758509 Opened 12 years ago Closed 12 years ago

Create JSScript before doing bytecode generation

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: [js:t])

Attachments

(9 files, 1 obsolete file)

6.90 KB, patch
luke
: review+
Details | Diff | Splinter Review
26.80 KB, patch
luke
: review+
Details | Diff | Splinter Review
14.02 KB, patch
luke
: review+
Details | Diff | Splinter Review
13.88 KB, patch
luke
: review+
Details | Diff | Splinter Review
13.98 KB, patch
luke
: review+
Details | Diff | Splinter Review
16.31 KB, patch
luke
: review+
Details | Diff | Splinter Review
14.36 KB, patch
luke
: review+
Details | Diff | Splinter Review
33.59 KB, patch
luke
: review+
Details | Diff | Splinter Review
14.71 KB, patch
luke
: review+
Details | Diff | Splinter Review
Attached patch draft patch (obsolete) — Splinter Review
This draft patch takes a step towards splitting JSScript into two parts, one
(larger) holding bytecode and related things, and the other holding
everything else.  This should make lazy bytecode generation easier.

JSScript::NewScript() is a new function that creates a JSScript and
initializes some of the non-bytecode-related fields -- principals,
noScriptRval, version, staticLevel.  There are some more fields that can
receive the same treatment.

The old JSScript::NewScript() is now called JSScript::init, and
JSScript:NewScriptFromEmitter is now called JSScript::initFromEmitter.
They're both no longer static.

Good things:

- BytecodeEmitter (BCE) now has a pointer to the JSScript being created.
  (Previously the JSScript was only created at the end.)  Because we can
  initialize some fields in JSScript up-front, we don't need to carry around
  that data in BCE or elsewhere.  For example, I've removed
  BCE::noScriptRval and Parser::{principals,originPrincipals}.

- initFromEmitter() is simpler.

Bad things:

- Initializing JSScript in two steps could lead to problems (e.g. forgetting
  to initialize a field).

- The patch hasn't been updated for the s/RootedVar/Rooted/ renaming.  I'm
  also not confident with my use of the new rooting APIs.

- It feels a bit like I've pushed around a lot of code without that much
  effect.

Ultimately it would make sense to split JSScript into two -- say JSScript and
js::CompiledScript -- but I haven't got that far yet.

Just looking for feedback at the moment.  It's obviously incomplete, but does it feel like it's heading in the right direction?
Attachment #627104 - Flags: feedback?(luke)
Attachment #627104 - Flags: feedback?(bhackett1024)
Whiteboard: [js:t]
Comment on attachment 627104 [details] [diff] [review]
draft patch

I didn't fully appreciate the benefits of splitting JSScript before (I blame it on jetlag), but this actually looks pretty nice.  Having the JSScript created before emitting seems righteous in the way you described.  Going further, I wonder if we could avoid jsscript.cpp poking directly at BytecodeEmitter.
Attachment #627104 - Flags: feedback?(luke) → feedback+
> Going
> further, I wonder if we could avoid jsscript.cpp poking directly at
> BytecodeEmitter.

Thinking superficially:  I think NewScriptFromEmitter() should probably be in frontend/BytecodeEmitter.cpp, but it hasn't been worth the effort to change.

Thinking deeper: BytecodeEmitter (BCE) duplicates a lot of the stuff in JSScript.  That data gets copied into JSScript in NewScriptFromEmitter().  Ideally, we'd get rid of all this duplication and just generate all the bytecode and related stuff directly into the JSScript -- BCE would then be a thin wrapper around JSScript, basically.

However, this is difficult because JSScript::data is packed so tightly. You can't create it until you know exactly how many consts, objects, regexps, bytecodes, srcnotes, etc, you have.  Some other JSScript fields are just scalars and so this problem doesn't occur (nTypeSets comes to mind), but generating scalar values directly into JSScript and variable-length data indirectly via BCE would be a bit confusing.
(In reply to Nicholas Nethercote [:njn] from comment #2)
> However, this is difficult because JSScript::data is packed so tightly. You
> can't create it until you know exactly how many consts, objects, regexps,
> bytecodes, srcnotes, etc, you have.  Some other JSScript fields are just
> scalars and so this problem doesn't occur (nTypeSets comes to mind), but
> generating scalar values directly into JSScript and variable-length data
> indirectly via BCE would be a bit confusing.

FWIW, JITScript and the various JM ICs have the same problem, and have parallel structures in mjit::Compiler which get filled in and then copied over as compilation wraps up.  I don't think this design is terrible, but it's best if the correspondence between the temporary and final data is clear (ideally with a single shared definition), which is not really the case for JM.
Comment on attachment 627104 [details] [diff] [review]
draft patch

Looks good.  I would suggest though, that starting with the structural changes (splitting JSScript up) and then working through how that affects the places where those fields are accessed will make the change easier and cleaner, e.g. this troublesome stuff with partially initialized scripts floating around shouldn't exist.  That's the approach I took doing objshrink and it worked pretty well.
Attachment #627104 - Flags: feedback?(bhackett1024) → feedback+
> Looks good.  I would suggest though, that starting with the structural
> changes (splitting JSScript up) and then working through how that affects
> the places where those fields are accessed will make the change easier and
> cleaner, e.g. this troublesome stuff with partially initialized scripts
> floating around shouldn't exist.  That's the approach I took doing objshrink
> and it worked pretty well.

I have a different patch that does that :)  It's really boring, though, so I worked on this one for a bit because it at least does the create-JSScript-earlier thing, which had an element of risk/interest.  I'll combine the two.

bhackett, do you think a single big patch is the way to go?  I can't really see how to naturally break this up into smaller patches.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> bhackett, do you think a single big patch is the way to go?  I can't really
> see how to naturally break this up into smaller patches.

Sure, that's fine.
Summary: Split JSScript in two → Create JSScript before doing bytecode generation
This just removes some dead things.
Attachment #627104 - Attachment is obsolete: true
Attachment #630854 - Flags: review?(luke)
Currently you can't create a JSScript without also filling in heaps of
bytecode-related fields (via NewScript or NewScriptFromEmitter).

This patch changes that.  It:

- Adds CreateScript(), which just creates the JSScript and PodZeroes it.
  BytecodeEmitter can now get a pointer to the JSScript.

- Renames NewScript() as partiallyInit() and NewScriptFromEmitter() as
  fullyInitFromEmitter() and makes them non-static methods.

- Adds fullyInitTrivial() for the special trivial script created for
  Function.prototype -- this is nice because it moves the
  JSScript-initialization details from vm/GlobalObject.cpp to jsscript.cpp.

This is a step towards lazy bytecode generation.

This will also allow moving as much initialization of JSScript as possible
from partiallyInit/fullInitFromEmitter/fullInitTrivial into CreateScript().
The following patches do lots of this (and CreateScript will end up with
lots of arguments).  Benefits of this:

- It allows some state to be removed from BytecodeEmitter.

- It makes much of the JSScript initialization more explicit, which makes
  the characteristics of each JSScript more obvious at its creation point.
Attachment #630855 - Flags: review?(luke)
This patch initializes JSScript::noScriptVal in CreateScript().  This
removes the need for BytecodeEmitter::noScriptVal.
Attachment #630856 - Flags: review?(luke)
This patch initializes JSScript::version in CreateScript().  This allows
version getters to be removed from Parser and BytecodeEmitter.
Attachment #630857 - Flags: review?(luke)
This patch initializes JSScript::{principals,originPrincipals} in
CreateScript().  This removes the need for storing them in Parser, and also
allows us to do the "principals implies originPrincipals" manipulation in a
single place instead of two.
Attachment #630858 - Flags: review?(luke)
This patch initializes JSScript::compileAndGo in CreateScript().  This
allows Parser::compileAndGo to become private.
Attachment #630859 - Flags: review?(luke)
This patch initializes JSScript::savedCallerFun in CreateScript().  The only
callsite where it can be true is the one in CompileScript(), and so I moved
the code that inspects compileAndGo/isFunctionFrame() there, which removes
the need for some subsequent assertions.
Attachment #630860 - Flags: review?(luke)
> This patch initializes JSScript::savedCallerFun in CreateScript().  The only
> callsite where it can be true is the one in CompileScript(), and so I moved
> the code that inspects compileAndGo/isFunctionFrame() there, which removes
> the need for some subsequent assertions.

Hmm.  This patch preserves the existing behaviour, which is that if a script has savedCallerFun==true (which only happens for eval scripts), then every nested script within it will also have savedCallerFun==true.  But that suggests that script->objects[0] is set to the enclosing function, which seems entirely wrong.

If it is wrong, then it's easy to fix in this patch -- I can just change |parent->savedCallerFun| to |false| in BytecodeEmitter.cpp.  Without this patch (and those preceding it) it's a bit more work to fix.

AFAICT, that change would also allow Parser::callerFrame to be removed.
eserves the existing behaviour, which is that if a script
> has savedCallerFun==true (which only happens for eval scripts), then every
> nested script within it will also have savedCallerFun==true.  But that
> suggests that script->objects[0] is set to the enclosing function, which
> seems entirely wrong.

Just to clarify:  it's correct for the eval script, but seems wrong for scripts nested within the eval script.
Depends on: 761914
SharedContext::staticLevel is shared by Parser and BytecodeEmitter.  But
JSScript also has a staticLevel, so this patch moves
SharedContext::staticLevel into TreeContext, which the Parser uses, and the
JSScript's staticLevel is used by BytecodeEmitter.

This required replacing SharedContext* arguments to several functions with
TreeContext* parameters.
Attachment #631291 - Flags: review?(luke)
This patch initializes JSScript::globaObject in CreateScript().  This
allows BytecodeEmitter::needScriptGlobal to be removed.
Attachment #631292 - Flags: review?(luke)
Comment on attachment 631292 [details] [diff] [review]
Part 8: Initialize JSScript::globalObject in CreateScript()

Review of attachment 631292 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/BytecodeCompiler.cpp
@@ +105,5 @@
>      if (!tc.init())
>          return NULL;
>  
>      bool savedCallerFun = compileAndGo && callerFrame && callerFrame->isFunctionFrame();
> +    js::GlobalObject *globalObject = needScriptGlobal ? GetCurrentGlobal(cx) : NULL;

Why the 'js::'?
Comment on attachment 630854 [details] [diff] [review]
Part 0: Remove some dead things

Appetizers
Attachment #630854 - Flags: review?(luke) → review+
Comment on attachment 630855 [details] [diff] [review]
Part 1: Create JSScript before starting bytecode generation

Review of attachment 630855 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/BytecodeEmitter.cpp
@@ +2652,5 @@
>              return false;
>          bce->switchToMain();
>      }
>  
> +    if (!EmitTree(cx, bce, body) || Emit1(cx, bce, JSOP_STOP) < 0)

Since you are already breaking this up, could you give EmitTree and Emit1 their own separate 'if (!...) return false' statements?

::: js/src/jsscript.cpp
@@ +1229,5 @@
> +
> +    script->noScriptRval = true;
> +
> +    script->code[0] = JSOP_STOP;
> +    script->notes()[0] = SRC_NULL;

Could this reuse partiallyInit (for symmetry with fullyInitFromEmitter)?

@@ +1490,5 @@
>  
> +    if (data) {
> +        JS_POISON(data, 0xdb, computedSizeOfData());
> +        fop->free_(data);
> +    }

So one fundamental change is that the GC can observe allocated-but-not-fully-initialized scripts.  Mostly these functions check "if has X" before doing anything, so I don't see any immediate problems.  However, I would add a comment to the top of JSScript::markChildren and JSScript::finalize explaining that callers should expect a script in the state left by JSScript::CreateScript (allocate :) before being fully initialized.  Also, could you add a test (using gczeal or other) that causes GC/finalization of such a script so that way we don't find out about bugs (in this patch or future) via browser failures.

::: js/src/jsscript.h
@@ +575,5 @@
> +                       uint32_t nobjects, uint32_t nregexps, uint32_t ntrynotes, uint32_t nconsts,
> +                       uint16_t nClosedArgs, uint16_t nClosedVars, uint32_t nTypeSets,
> +                       JSVersion version);
> +    bool fullyInitTrivial(JSContext *cx, JSVersion version);  // inits a JSOP_STOP-only script
> +    bool fullyInitFromEmitter(JSContext *cx, js::BytecodeEmitter *bce);

JSScript::CreateScript sounds a bit redundant.  I would suggest 'JSScript::construct' but this usually implies that the resulting object has finished initialization which is not the case here.  How about JSScript::allocate?
Attachment #630855 - Flags: review?(luke) → review+
Attachment #630856 - Flags: review?(luke) → review+
Attachment #630857 - Flags: review?(luke) → review+
Comment on attachment 630858 [details] [diff] [review]
Part 4: Initialize JSScript::{principals,originPrincipals} in CreateScript()

Nice
Attachment #630858 - Flags: review?(luke) → review+
Attachment #630859 - Flags: review?(luke) → review+
Comment on attachment 631291 [details] [diff] [review]
Part 7: Initialize JSScript::staticLevel in CreateScript()

Review of attachment 631291 [details] [diff] [review]:
-----------------------------------------------------------------

On the subject of TreeContext, could you remove the #include "TreeContext-inl.h from BytecodeEmitter.cpp?

::: js/src/frontend/Parser.cpp
@@ +1064,4 @@
>  {
> +    /* Initialize non-default members of funtc->sc. */
> +    funtc->sc->blockidGen = outertc->sc->blockidGen;
> +    if (!GenerateBlockId(funtc->sc, funtc->sc->bodyid))

I don't see why this change is necessary; seems worse.

::: js/src/jsscript.cpp
@@ +585,5 @@
>          script = JSScript::CreateScript(cx, !!(scriptBits & (1 << SavedCallerFun)),
>                                          /* principals = */ NULL, /* originPrincipals = */ NULL,
>                                          /* compileAndGo = */ false,
> +                                        !!(scriptBits & (1 << NoScriptRval)), version_,
> +                                        /* staticLevel = */ 0);

As the height of JSScript::CreateScript increases, I think it would be more aesthetically pleasing to put a single parameter on each line rather.
Attachment #631291 - Flags: review?(luke) → review+
Attachment #631292 - Flags: review?(luke) → review+
Attachment #630860 - Flags: review?(luke) → review+
> > +    if (data) {
> > +        JS_POISON(data, 0xdb, computedSizeOfData());
> > +        fop->free_(data);
> > +    }
>
> So one fundamental change is that the GC can observe
> allocated-but-not-fully-initialized scripts.  Mostly these functions check
> "if has X" before doing anything, so I don't see any immediate problems. 
> However, I would add a comment to the top of JSScript::markChildren and
> JSScript::finalize explaining that callers should expect a script in the
> state left by JSScript::CreateScript (allocate :) before being fully
> initialized.

Good idea.


> Also, could you add a test (using gczeal or other) that causes
> GC/finalization of such a script so that way we don't find out about bugs
> (in this patch or future) via browser failures.

I'm not sure how to do this -- just do |gczeal(2,1)| and run a bunch of eval scripts in a loop and hope the case is hit?  Or is there a more targeted way to do it?  I can't work out how to deliberately do it from C++ using the JSAPI, let alone in JS code...

FWIW, if I add |JS_ASSERT(data)| to JSScript::finalize(), I get 10 failures during jit_test.py and 430 failures during jstests.py.  In other words, it becomes very crashy.  Knowing that, do you still think it's worth having a standalone test for this?
(In reply to Nicholas Nethercote [:njn] from comment #23)
> FWIW, if I add |JS_ASSERT(data)| to JSScript::finalize(), I get 10 failures
> during jit_test.py and 430 failures during jstests.py.

Ah good.  It makes sense that this would be frequent: any failed compilation should leave a script in such a state.  No additional test needed there.  What if you put the same assert in JSScript::markChildren; any hits?  If not, one easy way should be gczeal(2,1) and then make a bunch of let blocks (each one allocates a StaticBlockObject).
> What if you put the same assert in JSScript::markChildren; any hits?  If
> not, one easy way should be gczeal(2,1) and then make a bunch of let blocks
> (each one allocates a StaticBlockObject).

No hits in jit-tests or jstests, but it was easy to write a test case with let blocks that hit that case.  Thanks for the suggestion!
Blocks: 764714
> On the subject of TreeContext, could you remove the #include
> "TreeContext-inl.h from BytecodeEmitter.cpp?

Unfortunately not;  it defines the SharedContext constructor, which BytecodeEmitter.cpp uses.

There's definitely a case for renaming TreeContext{.cpp,.h,-inl.h} as SharedContext{.cpp,.h,-inl.h} and moving the TreeContext class into Parser.h (and also renaming TreeContext as ParseContext), but that's beyond the scope of this bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: