Last Comment Bug 747288 - Generate faster jitcode for DOM getters/setters
: Generate faster jitcode for DOM getters/setters
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Eric Faust [:efaust]
:
Mentors:
Depends on: 747287 757932 766447
Blocks: 747285
  Show dependency treegraph
 
Reported: 2012-04-19 20:24 PDT by Boris Zbarsky [:bz]
Modified: 2012-08-08 17:05 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Whole system patch for DOM getter specialization (84.29 KB, patch)
2012-06-28 12:09 PDT, Eric Faust [:efaust]
no flags Details | Diff | Splinter Review
Remove DOMPropertyDispatch() (17.66 KB, patch)
2012-06-28 21:54 PDT, Eric Faust [:efaust]
no flags Details | Diff | Splinter Review
Matching JSC (89.63 KB, patch)
2012-06-29 19:50 PDT, Eric Faust [:efaust]
no flags Details | Diff | Splinter Review
Patch (41.95 KB, patch)
2012-07-11 19:10 PDT, Eric Faust [:efaust]
sstangl: review+
nicolas.b.pierron: feedback+
Details | Diff | Splinter Review
Rename methodjit's jitInfo to avoid name collision (12.69 KB, patch)
2012-07-12 20:44 PDT, Eric Faust [:efaust]
bhackett1024: review+
Details | Diff | Splinter Review
Testcase that exercises this in a polymorphic setting (440 bytes, text/html)
2012-07-17 07:55 PDT, Boris Zbarsky [:bz]
no flags Details

Description Boris Zbarsky [:bz] 2012-04-19 20:24:47 PDT
This is bug 747285 part (C).  Might need to be further broken up.
Comment 1 Eric Faust [:efaust] 2012-06-28 04:16:25 PDT
First blush runs of called bottom halves yields the following runtimes on a bz microbenchmark that calls XMLHttpRequest.readyState 10,000,000 times.

JM + IM: ~125ms, though I've seen it as low as 110
JM + IM with just calling the original native getter: ~137ms
JM: ~140ms
IM: ~135ms
Safari: 90ms.

We've made up about half the distance.
Comment 2 Boris Zbarsky [:bz] 2012-06-28 09:23:45 PDT
Patches please?  Slightly curious as to how much time we spend in the jitcode vs the binding now....
Comment 3 Eric Faust [:efaust] 2012-06-28 12:09:46 PDT
Created attachment 637625 [details] [diff] [review]
Whole system patch for DOM getter specialization
Comment 4 Boris Zbarsky [:bz] 2012-06-28 14:30:07 PDT
With that patch, on the readyState testcase, I see numbers approximately like so (on x86-64):

  58% ion code.  I have no real visibility into this, but whatever we can do to reduce the
      amount of code here would presumably help.  This is about 6800 samples at 20us
      sample interval.
  19% self code in js::ion::DOMPropertyDispatch.  This is almost entirely function call
      overhead plus the branching that js::GetReservedSlot ends up having to do.  Inlining
      DOMPropertyDispatch into the jitcode, if at all possible would eliminate pretty much
      all of this, I think.  About 2200 samples here.
  14% self code in specialized_get_readyState.  This is basically 36% register gunk on
      entry, 12% doing the actual call into XHR (but not including time in XHR!), 15%
      doing INT_TO_JSVAL, 37% register gunk on exit.  Wish we could inline this too!  Can
      we win by using a better calling convention, perhaps?  About 1600 samples here.
   7% nsXMLHttpRequest::GetReadyState.  About 800 samples.

Total samples, 11800 or so.

For comparison, before this patch we had 13000 samples, of which about 6100 were in JM jitcode, 6000 in the DOM JSNative, and 800 in GetReadyState.  So we've cut down the time in the DOM code from 6800 samples to 2400 or so, but now have an extra 2200 in DOMPropertyDispatch and 2800 in jitcode.
Comment 5 Bill McCloskey (:billm) 2012-06-28 14:35:12 PDT
Boris, could you try changing the branch in GetReservedSlot to be an assertion that the slot being accessed is always a fixed slot? I think this might be true, and if it's not, we could probably make it true.
Comment 6 Boris Zbarsky [:bz] 2012-06-28 15:38:56 PDT
Reserved slot access on globals uses non-fixed slots because globals have so many reserved slots at the moment.
Comment 7 Eric Faust [:efaust] 2012-06-28 21:54:13 PDT
Created attachment 637781 [details] [diff] [review]
Remove DOMPropertyDispatch()

Apply on top of other patch.

Runtimes on microbenchmark now < 100ms. Running about 97ms or so with *and* without JM.
Comment 8 Eric Faust [:efaust] 2012-06-28 22:41:00 PDT
As for moving boxing the result to jitcode, that's probably also doable, if the bindings pass type information in the JSJitInfo, then the callsite can generate appropriate boxing and the bindings can just fill a pointer to the stack with the desired return value. I don't know at present how this interacts with the calls that need to JS_WrapValue().
Comment 9 Boris Zbarsky [:bz] 2012-06-29 00:05:00 PDT
Hmm.  So boxing is a bit of a pain in many cases (e.g. boxing up an nsINode involves some function calls to get the JSObject*, etc).  Doing it in the simple readyState case, where the return value is int, is easy, of course.
Comment 10 Eric Faust [:efaust] 2012-06-29 00:08:03 PDT
OK, but the current boxing method involves a function call to do the boxing. We can do even coarse boxing given the JSObject*, for example in the expensive case. I'ts unclear whether this is actually a win, but it would shave a function call.
Comment 11 Boris Zbarsky [:bz] 2012-06-29 00:10:12 PDT
> OK, but the current boxing method involves a function call to do the boxing.

Er, it does?  Where?  All the hot paths should be inlined, at least into the binding C++ method....
Comment 12 Eric Faust [:efaust] 2012-06-29 00:12:56 PDT
Indeed. It seems remarkably unlikely that the contents of jsval.h are not inlined, as you say.

Looking for broader win, is there a reasonable way we can try and slide the whole bottom half to jitcode somehow?
Comment 13 Boris Zbarsky [:bz] 2012-06-29 00:21:51 PDT
So in order...

Looking at a profile with that last patch, 72% of the time is jitcode.  19% of time is self time in the specialized getter (almost entirely call overhead stuff).  8% is in GetReadyState itself.  We're dow to 9600 samples, so about 20% faster than the previous patch, as expected.  Still a bit slower than Safari, but they have it somewhat easier because their props are not on the proto...  Still, it feels like we should be able to do just as well here.

As far as comment 12 goes the answer is "maybe".  In a simple case like this one (getter with a simple-to-convert return value), it's maybe doable.  If the return value were more complicated, or if this were a setter with a non-simple-to-convert argument, I think it's pretty hard to do in jitcode.

Even for the getter it's a bit hard, though, because you have to know how to call the underlying C++ getter, and we don't have a specific calling convention for those.  In this case it's a non-virtual out-of-line function, but it could have been virtual.  Or it could have been inlined.

In any case, the biggest win bar inlining all the C++ into jitcode at this point would be trying to figure out exactly what the jitcode is doing during those 72% of the time and seeing what we can do to reduce it.  Apart from that, starting to look pretty good!  ;)
Comment 14 Eric Faust [:efaust] 2012-06-29 18:22:27 PDT
Now with Bug 769853 closed, those patches applied to IonMonkey tip run the same speed as JSC, both with and without JM.
Comment 15 Eric Faust [:efaust] 2012-06-29 19:50:15 PDT
Created attachment 638074 [details] [diff] [review]
Matching JSC
Comment 16 Boris Zbarsky [:bz] 2012-06-29 22:16:38 PDT
Hmm.  I'm still seeing this be slower than Safari... maybe I need to rebuild with -fomit-frame-pointer?
Comment 17 Boris Zbarsky [:bz] 2012-06-29 22:17:14 PDT
Specifically, I see about 90 for safari, 120 for our 64-bit code, 130-140 for our 32-bit code.
Comment 18 Eric Faust [:efaust] 2012-06-30 00:17:04 PDT
I am building with --disable-debug --enable-optimize. If we are still really 140ms slow on 32 bit, then that's something I will have to look at. Do we have profiles?
Comment 19 Eric Faust [:efaust] 2012-06-30 01:26:07 PDT
Timings were generated on a x86-64 build done with clang, which is compiling -fomit-frame-pointer. Were the numbers you were seeing always higher than those I had been reporting?
Comment 20 Boris Zbarsky [:bz] 2012-07-07 23:54:34 PDT
> Were the numbers you were seeing always higher than those I had been reporting?

Yes, for my own builds.  About comparable for Safari.

I just tried a 64-bit build with gcc and -fomit-frame-pointer, and that gets me down to about 100 or so.  clang numbers tomorrow.
Comment 21 Boris Zbarsky [:bz] 2012-07-08 10:16:56 PDT
Same number with clang.
Comment 22 Eric Faust [:efaust] 2012-07-11 19:10:12 PDT
Created attachment 641315 [details] [diff] [review]
Patch
Comment 23 Boris Zbarsky [:bz] 2012-07-11 19:54:16 PDT
One note: Since this is using 0 explicitly, not the #define for the actual slot number, can you please add a comment next to that #define saying that if it ever changes this code needs to change?  Similar for anything else about the data structures involved that's baked into the JIT C++, though I _think_ this should be the only dependency.
Comment 24 Eric Faust [:efaust] 2012-07-11 20:15:07 PDT
(In reply to Boris Zbarsky (:bz) from comment #23)
> One note: Since this is using 0 explicitly, not the #define for the actual
> slot number, can you please add a comment next to that #define saying that
> if it ever changes this code needs to change?  Similar for anything else
> about the data structures involved that's baked into the JIT C++, though I
> _think_ this should be the only dependency.

Absolutely. I would have used the #define itself, but it hardly seems reasonable to move that from DOMJSClass.h to somewhere the engine could get at it.
Comment 25 :Ms2ger 2012-07-12 02:09:55 PDT
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +3762,5 @@
> +
> +    masm.checkStackAlignment();
> +
> +    /* Make Space for the outparam */
> +    masm.adjustStack((int)-sizeof(Value));

Should this be -int(sizeof(Value))?

@@ +3828,5 @@
> +    DebugOnly<uint32> initialStack = masm.framePushed();
> +
> +    masm.checkStackAlignment();
> +
> +    // Push thei argument. Rooting will happen at GC time.

thei?

::: js/src/ion/shared/IonFrames-shared.h
@@ +1,5 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
> + * vim: set ts=4 sw=4 et tw=79:
> + *
> + * ***** BEGIN LICENSE BLOCK *****
> + * Version: MPL 1.1/GPL 2.0/LGPL 2.1

MPL2
Comment 26 Eric Faust [:efaust] 2012-07-12 20:44:14 PDT
Created attachment 641718 [details] [diff] [review]
Rename methodjit's jitInfo to avoid name collision

Renames jitInfo in methodjit to mJITInfo, and makes associated method name changes.
Comment 27 Sean Stangl [:sstangl] 2012-07-16 12:57:45 PDT
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable, although I understand that a bunch of code is going to be changed/removed by an upcoming patch. r+ with a bunch of irritating little nits and some non-nits that look like they come from other patches.

::: js/src/ion/CodeGenerator.cpp
@@ +3785,5 @@
> +        return false;
> +
> +    masm.setupUnalignedABICall(4, JSContextReg);
> +
> +    masm.loadJSContext(JSContextReg);

It's technically unsafe to load into the register passed into setupUnalignedABICall() before the completion of callWithABI(). It doesn't look like we're changing the implementation any time soon, though, and it works currently.

::: js/src/ion/IonBuilder.cpp
@@ +4887,3 @@
>              // Otherwise try using the prototype.
>              curObj = typeObj->proto;
> +        } else {

The previous block guarantees that curObj is set. Instead of duplicating the thinkDOM-updating code, just let the code in this else block always execute after the "if (!curObj)" block and remove the !curObj thinkDOM-updating code.

@@ +4888,5 @@
>              curObj = typeObj->proto;
> +        } else {
> +            // Check the DOM-ness of the singleton.
> +            types::TypeObject *objType = curObj->getType(cx);
> +            thinkDOM = thinkDOM && !objType->hasAnyFlags(types::OBJECT_FLAG_NON_DOM);

non-nit: This should be OBJECT_FLAG_DOM. Besides being better from a readability standpoint to ask questions positively, the default (zeroed) state of a positive assertion is non-DOM, requiring no additional action from us.

@@ +4983,5 @@
>  
>              curType = obj->getType(cx);
>          }
>  
> +        // Freeze the types as being DOM objects if they are

nit: // Freeze any DOM objects.

@@ +4985,5 @@
>          }
>  
> +        // Freeze the types as being DOM objects if they are
> +        if (thinkDOM) {
> +            // Asking the question adds the freeze

nit: it's classy to end comments with a period.

Also, this behavior isn't particularly obvious. It's generally a bad sign if we feel the need to add a comment to explain the effect of a single function call. Would it be cleaner to have a FreezeObjectFlags() function, then call that from both here and HasObjectFlags()?

@@ +4986,5 @@
>  
> +        // Freeze the types as being DOM objects if they are
> +        if (thinkDOM) {
> +            // Asking the question adds the freeze
> +            bool wasntDOM = types::TypeSet::HasObjectFlags(cx, curType,

This either needs to be DebugOnly<bool>, or ignored. Or, instead, if we have a FreezeObjectFlags() function, then it should be safe to do:

JS_ASSERT(HasObjectFlags(cx, ..., OBJECT_FLAG_DOM));
FreezeObjectFlags(cx, ..., OBJECT_FLAG_DOM));

In opt builds it obviously works. In debug builds, if HasObjectFlags(), then the freeze isn't set, so FreezeObjectFlags() isn't redundant.

@@ +5020,5 @@
> +    return true;
> +}
> +
> +static bool
> +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func)

Would prefer removing the isDOM parameter and moving the logic to the callsites.

@@ +5024,5 @@
> +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func)
> +{
> +    if (!isDOM || !func->isNative() || !func->jitinfo())
> +        return false;
> +    // If all the DOM objects flowing through are legal with this

nit: vertical whitespace before comment. Period at end of comment.

@@ +5045,5 @@
> +            curType = curObj->getType(cx);
> +        }
> +
> +        JSObject *typeProto = curType->proto;
> +        RootedObject proto(cx, typeProto);

RootedObject proto(cx, curType->proto);

@@ +5225,5 @@
> +        RootedFunction setter(cx, commonSetter);
> +        if (TestShouldDOMCall(cx, isDOM, types, setter)) {
> +            MSetDOMProperty *set = MSetDOMProperty::New(setter->jitinfo()->op, obj, value);
> +            if (!set)
> +                return false;

Don't need failure check.

::: js/src/ion/LIR-Common.h
@@ +568,5 @@
> +        this->setOperand(0, ObjectReg);
> +    }
> +
> +  public:
> +

nit: remove whitespace after public:

::: js/src/ion/MIR.h
@@ +4534,5 @@
> +    {
> +        return new MSetDOMProperty(func, obj, val);
> +    }
> +
> +    const JSJitPropertyOp fun() {

nit: const before "{". Also object() and value().

@@ +4543,5 @@
> +        return getOperand(0);
> +    }
> +
> +    MDefinition *value()
> +    {

nit: { on previous line. Remove vertical whitespace between this and object().

@@ +4575,5 @@
> +    {
> +        return new MGetDOMProperty(func, obj, isInfallible);
> +    }
> +
> +    const JSJitPropertyOp fun() {

nit: const before "{". Also isInfallible() and object().

::: js/src/ion/shared/CodeGenerator-shared-inl.h
@@ +135,5 @@
>  #error "Unknown"
>  #endif
>  }
>  
> +static inline ValueOperand

can just be inline.

@@ +144,5 @@
> +#elif defined(JS_PUNBOX64)
> +    (void)type;
> +    return ValueOperand(payload);
> +#else
> +#error "Unknown"

nit: # error "Unknown"

::: js/src/ion/shared/IonFrames-shared.h
@@ +41,5 @@
> +#ifndef jsion_ionframes_shared_h__
> +#define jsion_ionframes_shared_h__
> +
> +#define ION_FRAME_DOMGETTER ((IonCode *)0x1)
> +#define ION_FRAME_DOMSETTER ((IonCode *)0x2)

Probably should include the file that defines IonCode.

@@ +43,5 @@
> +
> +#define ION_FRAME_DOMGETTER ((IonCode *)0x1)
> +#define ION_FRAME_DOMSETTER ((IonCode *)0x2)
> +
> +#endif

ultra-nit: #endif // jsion_ionframes_shared_h__
Comment 28 Boris Zbarsky [:bz] 2012-07-17 07:55:10 PDT
Created attachment 642953 [details]
Testcase that exercises this in a polymorphic setting

For what it's worth, in the presence of polymorphism this code is a _huge_ win over what we used to have.  10x or more faster once we hit the limit on IC size, for example.
Comment 29 Nicolas B. Pierron [:nbp] 2012-07-18 10:20:07 PDT
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +3763,5 @@
> +    masm.checkStackAlignment();
> +
> +    /* Make Space for the outparam */
> +    masm.adjustStack((int)-sizeof(Value));
> +    masm.movePtr(StackPointer, ValueReg);

These 2 instructions can be done in one on ARM.  It might be worth adding

  masm.stackAlloc(sizeof(Value), ValueReg);

for this.

Have a look at retn implementation, except that you are looking for a PreIndex and not a PostIndex.

@@ +3770,5 @@
> +
> +    ValueOperand temp = GetTempValue(PrivateReg, JSContextReg);
> +    // GetReservedSlot(obj, 0).toPrivate()
> +    masm.loadValue(Address(ObjectReg, JSObject::getFixedSlotOffset(0)), temp);
> +    masm.unboxPrivate(temp, PrivateReg);

You can implement loadPrivate instead of unboxing, so you won't have to load extra memory in an unused register.

@@ +3783,5 @@
> +
> +    if (!markSafepointAt(safepointOffset, ins))
> +        return false;
> +
> +    masm.setupUnalignedABICall(4, JSContextReg);

Why unaligned ?  You have a masm.checkStackAlignment(), so we should be able to know how the stack is aligned at compile time and from what I read I guess it is just unaligned by 4.

So you can create a setupMissAlignedABICall which will save 2 instructions.

@@ +3868,5 @@
> +    masm.jump(&success);
> +
> +    {
> +        masm.bind(&exception);
> +        masm.handleException();

It sounds like we should share the handleExceptions between all calls and move them to a shared outOfLine path.  This will save us the success jump. I will open a follow-up bug for that.

@@ +3871,5 @@
> +        masm.bind(&exception);
> +        masm.handleException();
> +    }
> +    masm.bind(&success);
> +    masm.adjustStack(IonDOMExitFrameLayout::Size());

I would prefer a leaveDOMExitFrame for this one.  Such as we can have some sort of scoping of the DOMExitFrame.

::: js/src/ion/IonFrames.cpp
@@ +524,5 @@
>      }
>  
> +    if (frame.isDOMExit()) {
> +        IonDOMExitFrameLayout *dom = frame.exitFrame()->DOMExit();
> +        gc::MarkObjectRoot(trc, dom->thisObjAddress(), "ion-dom-args");

Sounds good for getters and setters.

::: js/src/ion/IonMacroAssembler.h
@@ +462,5 @@
>      }
>  
> +    void enterFakeDOMFrame(void *codeVal) {
> +        linkExitFrame();
> +        Push(ImmWord(uintptr_t(codeVal)));

JS_ASSERT(codeVal == ION_FRAME_DOMGETTER || codeVal == ION_FRAME_DOMSETTER);

::: js/src/ion/Lowering.cpp
@@ +1783,5 @@
> +
> +    LSetDOMProperty *lir = new LSetDOMProperty(tempFixed(CallTempReg0),
> +                                               useFixed(ins->object(), CallTempReg1),
> +                                               tempFixed(CallTempReg2),
> +                                               tempFixed(CallTempReg3));

You might save at least 4 moves by using IntArgReg[0-3] on ARM and x64.

@@ +1796,5 @@
> +{
> +    LGetDOMProperty *lir = new LGetDOMProperty(tempFixed(CallTempReg0),
> +                                               useFixed(ins->object(), CallTempReg1),
> +                                               tempFixed(CallTempReg2),
> +                                               tempFixed(CallTempReg3));

same here.
Comment 30 Ryan VanderMeulen [:RyanVM] 2012-07-28 18:35:08 PDT
https://hg.mozilla.org/mozilla-central/rev/b921323fa99e
Comment 31 Eric Faust [:efaust] 2012-08-08 17:05:28 PDT
https://hg.mozilla.org/projects/ionmonkey/rev/30894762f1fc

Note You need to log in before you can comment on or make changes to this bug.