Closed Bug 586533 Opened 11 years ago Closed 11 years ago

TM: Flag rarely used JSStackFrame members

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(6 files, 5 obsolete files)

13.48 KB, patch
Details | Diff | Splinter Review
37.76 KB, patch
luke
: review+
Details | Diff | Splinter Review
75.14 KB, patch
luke
: review+
Details | Diff | Splinter Review
15.30 KB, patch
luke
: review+
Details | Diff | Splinter Review
168.28 KB, patch
luke
: review+
Details | Diff | Splinter Review
49.06 KB, patch
luke
: review+
Details | Diff | Splinter Review
Attached patch Patch flagging callobj, argsobj (obsolete) — Splinter Review
It would be nice to flag JSStackFrame members rarely used in JIT code, stubs and natives so that they can be left uninitialized during frame construction.  Ones I see (naively, probably wrong on some of these) include imacpc (covered by bug 586358), callobj, argsobj, script, function, annotation, scopeChain, blockChain, hookData, maybe thisv and rval.

Attached patch covers callobj and argsobj, passes trace-tests and a couple jstests failures.  I doubt this will improve benchmarks for TM much; JM constructs more frames, and the goal is to simplify frame construction enough that work can start on inlining frame construction and things it is blocking (scripted call ICs), without waiting for frame evisceration to finish.

All these writes close together in frame construction are expensive; microbenchmarking C programs (large grain of salt) for 1.2 million calls times 10 writes is 15-20ms on my machine.
Attached patch patch fixing regressions (obsolete) — Splinter Review
This patch gets the same failures for me as clean TM.
Attachment #465056 - Attachment is obsolete: true
Attachment #465099 - Flags: review?(lw)
Out of curiosity, does this have a measurable effect on TM/JM?

As I was saying earlier today, I have already made a good number of these changes in my WIP argv/argc removal patch, so I'd like to make a few syntactic requests:
- Spelling out ArgumentsObject and CallObject is a is a bit long.  Perhaps just getArgsObj/getCallObj()?
- Could getCallObj()/getArgsObj() return a JSObject&, since its non-null?  We're starting to do this more.
- Also, for returning a nullable object, we tend to use 'maybe'.  So perhaps getArgumentsObjectOptional could just be maybeArgsObj, and the same for maybeCallObj.

More substantial comments:
- I don't think this syncForTrace() is the right strategy to deal with argsobj/callobj and the tracer.  It looks like callobj is only accessed in a single case in setCallProp that can be changed to interpreted JSStackFrame::flags appropriately, and argsobj isn't accessed from trace at all, but rather just during compilation and when entering/leaving trace.  (We can talk about these in more detail if you like.)  But basically, I don't think it should be necessary to introduce this concept of 'syncing' a stack frame or littering calls to it in various places.
- in js_NewGenerator, I think newfp->flags = ... will clobber the flag info set by setCallObject above.  I think the call/args obj stealing can be moved below setting the flags.  With this there is no need to clearCallObject() in the else branch.
bhackett:  cdleary added assertions like this in his imacpc patch:

  +    JS_ASSERT(!fp->hasIMacroPC());

everywhere that a "fp->imacpc = NULL;" assignment was removed.  (Well, the assert had to be moved after fp->flags was assigned.)  I think you should do the same for callobj and argsobj.
Here's a patch doing this for annotations.  I wrote the patch before I saw this bug, it'll conflict horribly, sorry.  It'll also conflict with cdleary's imacpc patch (which is against TM instead of JM, just to confuse things further).  I tried to rebase cdleary's patch against JM and apply my patch on top of it, but had trace-test failures with cdleary's patch, I don't know why.

But at least it shows that you can do this easily for annotations.  I measured the change, it's a 0.1% instruction reduction for JM on SS.

To avoid us all stepping on each other's toes, one person should do all of these changes, and to a single repo (cdleary's patch hasn't landed yet, so it could still be rebased to JM if necessary).  bhackett, cdleary, if either of you want to take it that's fine, or I can do it if you have other useful stuff to work on.  I'm about to sign off for today, let me know what you decide.
Why shouldn't this stuff land on tracemonkey first, to minimize JM/TM diff to the changes essential to JM?

/be
(In reply to comment #2)
> Out of curiosity, does this have a measurable effect on TM/JM?
> 
> As I was saying earlier today, I have already made a good number of these
> changes in my WIP argv/argc removal patch, so I'd like to make a few syntactic
> requests:
> - Spelling out ArgumentsObject and CallObject is a is a bit long.  Perhaps just
> getArgsObj/getCallObj()?
> - Could getCallObj()/getArgsObj() return a JSObject&, since its non-null? 
> We're starting to do this more.
> - Also, for returning a nullable object, we tend to use 'maybe'.  So perhaps
> getArgumentsObjectOptional could just be maybeArgsObj, and the same for
> maybeCallObj.
> 
> More substantial comments:
> - I don't think this syncForTrace() is the right strategy to deal with
> argsobj/callobj and the tracer.  It looks like callobj is only accessed in a
> single case in setCallProp that can be changed to interpreted
> JSStackFrame::flags appropriately, and argsobj isn't accessed from trace at
> all, but rather just during compilation and when entering/leaving trace.  (We
> can talk about these in more detail if you like.)  But basically, I don't think
> it should be necessary to introduce this concept of 'syncing' a stack frame or
> littering calls to it in various places.
> - in js_NewGenerator, I think newfp->flags = ... will clobber the flag info set
> by setCallObject above.  I think the call/args obj stealing can be moved below
> setting the flags.  With this there is no need to clearCallObject() in the else
> branch.

I'll make the API changes you suggest.  I think the syncForTrace is pretty nasty too; it's in there because if it's not then the tracer breaks, and I don't know how to fix the tracer, even which path leads to the errors (random assertion failures in jsrecursion.cpp, corruption elsewhere).  The tracer is taking the address of argsobj for specific frames, but it also seems to generate code which accesses that offset for an unknown frame, and even with TMFLAGS=full I don't know what happens to these pointers.

Alternative: make patches that just hide the fields in JSStackFrame behind an interface (get/setCallObj etc.) but doesn't change the semantics of the system at all.  Land on TM.  Then, either in TM or JM, make local modifications to jsinterp.h and code constructing frames that gets the intended behavior (and sleeks down frame construction), for experimentation/measuring.

Would this conflict with your argc/argv patch?  This will be quick to put together and I'm happy to do it.
This patch flags blockChain and scopeChain, passes trace-tests / jstests.
This patch hides callobj and argsobj behind an interface, and does not change the semantics of accessing them.
Attachment #465099 - Attachment is obsolete: true
Attachment #465375 - Flags: review?(lw)
Attachment #465099 - Flags: review?(lw)
Comment on attachment 465375 [details] [diff] [review]
patch for callobj/argsobj preserving semantics

Awesome, thanks!
Attachment #465375 - Flags: review?(lw) → review+
This patch hides scopeChain and blockChain behind an interface, and does not change the semantics of accessing them.
Attachment #465375 - Attachment is obsolete: true
Attachment #465410 - Flags: review?(lw)
Attachment #465375 - Attachment is obsolete: false
Attachment #465347 - Attachment is obsolete: true
(In reply to comment #5)
> Why shouldn't this stuff land on tracemonkey first, to minimize JM/TM diff to
> the changes essential to JM?

My motivation in doing the annotation patch against JM was to measure the effect.  But landing on TM first is a good idea.
bhackett, I'll take your addition of new patches as indicating that you're happy to do all the work for this bug.  Please reply if I've misinterpreted.
Yes, I'm happy to work on this bug.  My biggest interest right now is scripted calls, the rough plan is:

- Abstract away JSStackFrame internals with access methods (TM).
- Replace with new internals that initialize most fields lazily (TM or JM).
- Get fast path and ICs working for scripted calls/new with lazy stack frame.  This should be pretty close to what is needed once stack frame evisceration finishes.
(In reply to comment #14)
That all sounds good.  How does the hoisting-ensureSpace check and merging-with-inlineCallCount-check fit in to this plan?
Attachment #465410 - Flags: review?(lw) → review+
(In reply to comment #15)
> (In reply to comment #14)
> That all sounds good.  How does the hoisting-ensureSpace check and
> merging-with-inlineCallCount-check fit in to this plan?

Those are in the fairly open-ended third part, optimizations to do once the call fast path is in place and the big wad of frame writes is (mostly) removed.
this was checked in, but I backed it out, since it didn't compile on any platform. kind of a dealbreaker.
This builds on OSX x64 and passes trace-tests, jstests, xpcshell-tests, and jsreftests.
Attachment #465410 - Attachment is obsolete: true
Attachment #465641 - Flags: review?(lw)
This patch hides the JSStackFrame members annotation, debugHook and callerVersion behind an interface, and does not change the semantics of accessing them.
Attachment #465664 - Flags: review?(lw)
This patch hides the JSStackFrame members fun, script, thisv and rval behind an interface, and does not change the semantics of accessing them.
Attachment #465665 - Flags: review?(lw)
Comment on attachment 465641 [details] [diff] [review]
scopeChain/blockChain build break fix

>@@ -5666,24 +5666,17 @@ SynthesizeFrame(JSContext* cx, const Fra
>-    fp->blockChain = fi.block;
>-    fp->blockChain = fi.block;
>-#ifdef DEBUG
>-    if (fi.block != fp->blockChain) {
>-        for (JSObject* obj = fi.block; obj != fp->blockChain; obj = obj->getParent())
>-            JS_ASSERT(obj);
>-    }
>-#endif

lol.
Attachment #465641 - Flags: review?(lw) → review+
Attachment #465664 - Flags: review?(lw) → review+
Comment on attachment 465665 [details] [diff] [review]
patch for fun/script/thisv/rval preserving semantics

Phew!  This will also make the removal patch of these members smaller, so thanks.

>-JS_STATIC_ASSERT(offsetof(JSStackFrame, rval) % sizeof(Value) == 0);
>-JS_STATIC_ASSERT(offsetof(JSStackFrame, thisv) % sizeof(Value) == 0);
>+// FIXME
>+//JS_STATIC_ASSERT(offsetof(JSStackFrame, rval) % sizeof(Value) == 0);
>+//JS_STATIC_ASSERT(offsetof(JSStackFrame, thisv) % sizeof(Value) == 0);

It would be nice to keep these on as things get shuffled around in the stack frame.  JS_STATIC_ASSERT/offsetof have all sorts of annoying restrictions when used in class context; one thing I've found that works is:

struct JSStackFrame {
 ...
 void staticAsserts() {
   JS_STATIC_ASSERT(offsetof(JSStackFrame, rval) % sizeof(Value) == 0);
   ...
 }
};

r+ with that and assuming benchmarks don't show the compiler doing anything silly like not inlining.
Attachment #465665 - Flags: review?(lw) → review+
(In reply to comment #18)
> Created attachment 465641 [details] [diff] [review]
> scopeChain/blockChain build break fix
> 
> This builds on OSX x64 and passes trace-tests, jstests, xpcshell-tests, and
> jsreftests.

http://hg.mozilla.org/tracemonkey/rev/b630d28840ea
(In reply to comment #19)
> Created attachment 465664 [details] [diff] [review]
> patch for annotation/debugHook/callerVersion preserving semantics
> 
> This patch hides the JSStackFrame members annotation, debugHook and
> callerVersion behind an interface, and does not change the semantics of
> accessing them.

http://hg.mozilla.org/tracemonkey/rev/0bc3ab72c26b
(In reply to comment #20)
> Created attachment 465665 [details] [diff] [review]
> patch for fun/script/thisv/rval preserving semantics
> 
> This patch hides the JSStackFrame members fun, script, thisv and rval behind an
> interface, and does not change the semantics of accessing them.

http://hg.mozilla.org/tracemonkey/rev/13e72485d286
http://hg.mozilla.org/mozilla-central/rev/d8e6eff0f77e
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
One more lazifiable field.  savedPC will also be lazified for JM (compute using ncode), but it's pretty much private to JSStackFrame/JSContext as is.
Assignee: general → bhackett1024
Status: RESOLVED → REOPENED
Attachment #467439 - Flags: review?(lw)
Resolution: FIXED → ---
Comment on attachment 467439 [details] [diff] [review]
patch for argc preserving semantics

The argv-removal patch has:

  Value *formalArgs();
  Value *actualArgs();

(which are disjoint sets in the case of argc > fun->nargs).  To parallel those nice, could you s/getArgCount/numActualArgs/ and s/getFormalCount/numFormalArgs/ ?  (The XCount to numX change is just a personal preference to hear up front, when reading the identifier in some bigger expression, that this is a number, and not the array.)
Fixes for comment 28.
Attachment #467439 - Attachment is obsolete: true
Attachment #467449 - Flags: review?(lw)
Attachment #467439 - Flags: review?(lw)
Comment on attachment 467449 [details] [diff] [review]
revised argc patch

Thanks!  Also, I forgot, for consistency: s/setArgCount/setNumActualArgs/
Attachment #467449 - Flags: review?(lw) → review+
Style guide should say we prefer noun phrases (empty not isEmpty) for infallible getters and predicates, but there's obviously a mix in the current source. That's another reason to prefer numFormalArgs to getArgCount.

The formal vs. actual distinction is crisp, better still than trying to use param for formal and arg for actual.

/be
Random vaguely-related comment.  If you're interested in measuring
which offsets of which objects see a lot of action and which don't
(iow, finding hot vs cold fields) by observing jsengine as it runs,
the DHAT tool in bug 576861 might be a good starting point.  It can't
currently measure exactly that, but it sounds to me feasible to extend
it to do so.
A related property is which fields have far more writes than reads, which makes them good candidates for lazifying.  It's not too hard to do this for specific objects (JSStackFrame, JSContext), but an analysis that could extend this to the rest of JS and the browser would be great.
(In reply to comment #32)
Julian, that sounds like an awesome tool; have you considered applying it to the fields of JSContext?
(In reply to comment #34)
> have you considered applying it to the fields of JSContext?

At the moment it can only map memory accesses to specific blocks,
but it doesn't build up a histogram, for each allocation stack,
of the number of times each offset from the start of the block
is accessed.  If you see what I mean.  And that histogram is
needed to find what you want, viz, offsets which are never or
very rarely accessed.  Probably doable in a day's hacking tho.
(In reply to comment #35)
> Probably doable in a day's hacking tho.

That could really pay off: JSContext is large and I would think only a small subset of the fields are hot.

A related valgrind tool that keeps coming up in discussions is one that counts unaligned loads/stores.  We've had at least 3 cases of serious perf regressions from this.  I wouldn't be surprised if such a tool uncovered a few more.  Also, we could use it to track the overall number and catch such regressions automatically.
See also the static analysis (complementary, want both) at bug 492185. FYI,

/be
Status: REOPENED → ASSIGNED
(In reply to comment #36)
>
> A related valgrind tool that keeps coming up in discussions is one that counts
> unaligned loads/stores.  We've had at least 3 cases of serious perf regressions
> from this.  I wouldn't be surprised if such a tool uncovered a few more.  Also,
> we could use it to track the overall number and catch such regressions
> automatically.

Here's one I prepared earlier: bug 476122.  I haven't run it since I wrote it 18 months ago, but there's a good chance it'll work out of the box.
> (In reply to comment #35)
> > Probably doable in a day's hacking tho.
> 
> That could really pay off: JSContext is large and I would think only a small
> subset of the fields are hot.

Working on it now.

> (In reply to comment #37)
> See also the static analysis (complementary, want both) at bug 492185. FYI,

Will need some kind of static support for full automation, since at
best the tool can produce a histogram of number of memory references
per-block-offset for each allocation stack.  But it doesn't know the
C++ type of any (heap allocated) object since that information is not
present in Dwarf3.  Maybe Taras has some magic trick that can help.
In the short term we can work around this merely by peering at Fx
sources to see the type of object allocated at each allocation point.
http://hg.mozilla.org/mozilla-central/rev/76ede469d9a3
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 557378
This can be marked FIXED, right?
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.