Closed Bug 747288 Opened 12 years ago Closed 12 years ago

Generate faster jitcode for DOM getters/setters

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: efaust)

References

Details

Attachments

(4 files, 2 obsolete files)

This is bug 747285 part (C).  Might need to be further broken up.
Assignee: general → nicolas.b.pierron
Depends on: 757932
Depends on: 766447
First blush runs of called bottom halves yields the following runtimes on a bz microbenchmark that calls XMLHttpRequest.readyState 10,000,000 times.

JM + IM: ~125ms, though I've seen it as low as 110
JM + IM with just calling the original native getter: ~137ms
JM: ~140ms
IM: ~135ms
Safari: 90ms.

We've made up about half the distance.
Patches please?  Slightly curious as to how much time we spend in the jitcode vs the binding now....
Assignee: nicolas.b.pierron → efaust
Status: NEW → ASSIGNED
With that patch, on the readyState testcase, I see numbers approximately like so (on x86-64):

  58% ion code.  I have no real visibility into this, but whatever we can do to reduce the
      amount of code here would presumably help.  This is about 6800 samples at 20us
      sample interval.
  19% self code in js::ion::DOMPropertyDispatch.  This is almost entirely function call
      overhead plus the branching that js::GetReservedSlot ends up having to do.  Inlining
      DOMPropertyDispatch into the jitcode, if at all possible would eliminate pretty much
      all of this, I think.  About 2200 samples here.
  14% self code in specialized_get_readyState.  This is basically 36% register gunk on
      entry, 12% doing the actual call into XHR (but not including time in XHR!), 15%
      doing INT_TO_JSVAL, 37% register gunk on exit.  Wish we could inline this too!  Can
      we win by using a better calling convention, perhaps?  About 1600 samples here.
   7% nsXMLHttpRequest::GetReadyState.  About 800 samples.

Total samples, 11800 or so.

For comparison, before this patch we had 13000 samples, of which about 6100 were in JM jitcode, 6000 in the DOM JSNative, and 800 in GetReadyState.  So we've cut down the time in the DOM code from 6800 samples to 2400 or so, but now have an extra 2200 in DOMPropertyDispatch and 2800 in jitcode.
Boris, could you try changing the branch in GetReservedSlot to be an assertion that the slot being accessed is always a fixed slot? I think this might be true, and if it's not, we could probably make it true.
Reserved slot access on globals uses non-fixed slots because globals have so many reserved slots at the moment.
Attached patch Remove DOMPropertyDispatch() (obsolete) — Splinter Review
Apply on top of other patch.

Runtimes on microbenchmark now < 100ms. Running about 97ms or so with *and* without JM.
As for moving boxing the result to jitcode, that's probably also doable, if the bindings pass type information in the JSJitInfo, then the callsite can generate appropriate boxing and the bindings can just fill a pointer to the stack with the desired return value. I don't know at present how this interacts with the calls that need to JS_WrapValue().
Hmm.  So boxing is a bit of a pain in many cases (e.g. boxing up an nsINode involves some function calls to get the JSObject*, etc).  Doing it in the simple readyState case, where the return value is int, is easy, of course.
OK, but the current boxing method involves a function call to do the boxing. We can do even coarse boxing given the JSObject*, for example in the expensive case. I'ts unclear whether this is actually a win, but it would shave a function call.
> OK, but the current boxing method involves a function call to do the boxing.

Er, it does?  Where?  All the hot paths should be inlined, at least into the binding C++ method....
Indeed. It seems remarkably unlikely that the contents of jsval.h are not inlined, as you say.

Looking for broader win, is there a reasonable way we can try and slide the whole bottom half to jitcode somehow?
So in order...

Looking at a profile with that last patch, 72% of the time is jitcode.  19% of time is self time in the specialized getter (almost entirely call overhead stuff).  8% is in GetReadyState itself.  We're dow to 9600 samples, so about 20% faster than the previous patch, as expected.  Still a bit slower than Safari, but they have it somewhat easier because their props are not on the proto...  Still, it feels like we should be able to do just as well here.

As far as comment 12 goes the answer is "maybe".  In a simple case like this one (getter with a simple-to-convert return value), it's maybe doable.  If the return value were more complicated, or if this were a setter with a non-simple-to-convert argument, I think it's pretty hard to do in jitcode.

Even for the getter it's a bit hard, though, because you have to know how to call the underlying C++ getter, and we don't have a specific calling convention for those.  In this case it's a non-virtual out-of-line function, but it could have been virtual.  Or it could have been inlined.

In any case, the biggest win bar inlining all the C++ into jitcode at this point would be trying to figure out exactly what the jitcode is doing during those 72% of the time and seeing what we can do to reduce it.  Apart from that, starting to look pretty good!  ;)
Now with Bug 769853 closed, those patches applied to IonMonkey tip run the same speed as JSC, both with and without JM.
Attached patch Matching JSCSplinter Review
Attachment #637625 - Attachment is obsolete: true
Attachment #637781 - Attachment is obsolete: true
Hmm.  I'm still seeing this be slower than Safari... maybe I need to rebuild with -fomit-frame-pointer?
Specifically, I see about 90 for safari, 120 for our 64-bit code, 130-140 for our 32-bit code.
I am building with --disable-debug --enable-optimize. If we are still really 140ms slow on 32 bit, then that's something I will have to look at. Do we have profiles?
Timings were generated on a x86-64 build done with clang, which is compiling -fomit-frame-pointer. Were the numbers you were seeing always higher than those I had been reporting?
> Were the numbers you were seeing always higher than those I had been reporting?

Yes, for my own builds.  About comparable for Safari.

I just tried a 64-bit build with gcc and -fomit-frame-pointer, and that gets me down to about 100 or so.  clang numbers tomorrow.
Same number with clang.
Attached patch PatchSplinter Review
Attachment #641315 - Flags: review?(sstangl)
One note: Since this is using 0 explicitly, not the #define for the actual slot number, can you please add a comment next to that #define saying that if it ever changes this code needs to change?  Similar for anything else about the data structures involved that's baked into the JIT C++, though I _think_ this should be the only dependency.
(In reply to Boris Zbarsky (:bz) from comment #23)
> One note: Since this is using 0 explicitly, not the #define for the actual
> slot number, can you please add a comment next to that #define saying that
> if it ever changes this code needs to change?  Similar for anything else
> about the data structures involved that's baked into the JIT C++, though I
> _think_ this should be the only dependency.

Absolutely. I would have used the #define itself, but it hardly seems reasonable to move that from DOMJSClass.h to somewhere the engine could get at it.
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +3762,5 @@
> +
> +    masm.checkStackAlignment();
> +
> +    /* Make Space for the outparam */
> +    masm.adjustStack((int)-sizeof(Value));

Should this be -int(sizeof(Value))?

@@ +3828,5 @@
> +    DebugOnly<uint32> initialStack = masm.framePushed();
> +
> +    masm.checkStackAlignment();
> +
> +    // Push thei argument. Rooting will happen at GC time.

thei?

::: js/src/ion/shared/IonFrames-shared.h
@@ +1,5 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
> + * vim: set ts=4 sw=4 et tw=79:
> + *
> + * ***** BEGIN LICENSE BLOCK *****
> + * Version: MPL 1.1/GPL 2.0/LGPL 2.1

MPL2
Renames jitInfo in methodjit to mJITInfo, and makes associated method name changes.
Attachment #641718 - Flags: review?(sstangl)
Attachment #641718 - Flags: review?(sstangl) → review?(dvander)
Attachment #641718 - Flags: review?(dvander) → review?(bhackett1024)
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable, although I understand that a bunch of code is going to be changed/removed by an upcoming patch. r+ with a bunch of irritating little nits and some non-nits that look like they come from other patches.

::: js/src/ion/CodeGenerator.cpp
@@ +3785,5 @@
> +        return false;
> +
> +    masm.setupUnalignedABICall(4, JSContextReg);
> +
> +    masm.loadJSContext(JSContextReg);

It's technically unsafe to load into the register passed into setupUnalignedABICall() before the completion of callWithABI(). It doesn't look like we're changing the implementation any time soon, though, and it works currently.

::: js/src/ion/IonBuilder.cpp
@@ +4887,3 @@
>              // Otherwise try using the prototype.
>              curObj = typeObj->proto;
> +        } else {

The previous block guarantees that curObj is set. Instead of duplicating the thinkDOM-updating code, just let the code in this else block always execute after the "if (!curObj)" block and remove the !curObj thinkDOM-updating code.

@@ +4888,5 @@
>              curObj = typeObj->proto;
> +        } else {
> +            // Check the DOM-ness of the singleton.
> +            types::TypeObject *objType = curObj->getType(cx);
> +            thinkDOM = thinkDOM && !objType->hasAnyFlags(types::OBJECT_FLAG_NON_DOM);

non-nit: This should be OBJECT_FLAG_DOM. Besides being better from a readability standpoint to ask questions positively, the default (zeroed) state of a positive assertion is non-DOM, requiring no additional action from us.

@@ +4983,5 @@
>  
>              curType = obj->getType(cx);
>          }
>  
> +        // Freeze the types as being DOM objects if they are

nit: // Freeze any DOM objects.

@@ +4985,5 @@
>          }
>  
> +        // Freeze the types as being DOM objects if they are
> +        if (thinkDOM) {
> +            // Asking the question adds the freeze

nit: it's classy to end comments with a period.

Also, this behavior isn't particularly obvious. It's generally a bad sign if we feel the need to add a comment to explain the effect of a single function call. Would it be cleaner to have a FreezeObjectFlags() function, then call that from both here and HasObjectFlags()?

@@ +4986,5 @@
>  
> +        // Freeze the types as being DOM objects if they are
> +        if (thinkDOM) {
> +            // Asking the question adds the freeze
> +            bool wasntDOM = types::TypeSet::HasObjectFlags(cx, curType,

This either needs to be DebugOnly<bool>, or ignored. Or, instead, if we have a FreezeObjectFlags() function, then it should be safe to do:

JS_ASSERT(HasObjectFlags(cx, ..., OBJECT_FLAG_DOM));
FreezeObjectFlags(cx, ..., OBJECT_FLAG_DOM));

In opt builds it obviously works. In debug builds, if HasObjectFlags(), then the freeze isn't set, so FreezeObjectFlags() isn't redundant.

@@ +5020,5 @@
> +    return true;
> +}
> +
> +static bool
> +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func)

Would prefer removing the isDOM parameter and moving the logic to the callsites.

@@ +5024,5 @@
> +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func)
> +{
> +    if (!isDOM || !func->isNative() || !func->jitinfo())
> +        return false;
> +    // If all the DOM objects flowing through are legal with this

nit: vertical whitespace before comment. Period at end of comment.

@@ +5045,5 @@
> +            curType = curObj->getType(cx);
> +        }
> +
> +        JSObject *typeProto = curType->proto;
> +        RootedObject proto(cx, typeProto);

RootedObject proto(cx, curType->proto);

@@ +5225,5 @@
> +        RootedFunction setter(cx, commonSetter);
> +        if (TestShouldDOMCall(cx, isDOM, types, setter)) {
> +            MSetDOMProperty *set = MSetDOMProperty::New(setter->jitinfo()->op, obj, value);
> +            if (!set)
> +                return false;

Don't need failure check.

::: js/src/ion/LIR-Common.h
@@ +568,5 @@
> +        this->setOperand(0, ObjectReg);
> +    }
> +
> +  public:
> +

nit: remove whitespace after public:

::: js/src/ion/MIR.h
@@ +4534,5 @@
> +    {
> +        return new MSetDOMProperty(func, obj, val);
> +    }
> +
> +    const JSJitPropertyOp fun() {

nit: const before "{". Also object() and value().

@@ +4543,5 @@
> +        return getOperand(0);
> +    }
> +
> +    MDefinition *value()
> +    {

nit: { on previous line. Remove vertical whitespace between this and object().

@@ +4575,5 @@
> +    {
> +        return new MGetDOMProperty(func, obj, isInfallible);
> +    }
> +
> +    const JSJitPropertyOp fun() {

nit: const before "{". Also isInfallible() and object().

::: js/src/ion/shared/CodeGenerator-shared-inl.h
@@ +135,5 @@
>  #error "Unknown"
>  #endif
>  }
>  
> +static inline ValueOperand

can just be inline.

@@ +144,5 @@
> +#elif defined(JS_PUNBOX64)
> +    (void)type;
> +    return ValueOperand(payload);
> +#else
> +#error "Unknown"

nit: # error "Unknown"

::: js/src/ion/shared/IonFrames-shared.h
@@ +41,5 @@
> +#ifndef jsion_ionframes_shared_h__
> +#define jsion_ionframes_shared_h__
> +
> +#define ION_FRAME_DOMGETTER ((IonCode *)0x1)
> +#define ION_FRAME_DOMSETTER ((IonCode *)0x2)

Probably should include the file that defines IonCode.

@@ +43,5 @@
> +
> +#define ION_FRAME_DOMGETTER ((IonCode *)0x1)
> +#define ION_FRAME_DOMSETTER ((IonCode *)0x2)
> +
> +#endif

ultra-nit: #endif // jsion_ionframes_shared_h__
Attachment #641315 - Flags: review?(sstangl) → review+
For what it's worth, in the presence of polymorphism this code is a _huge_ win over what we used to have.  10x or more faster once we hit the limit on IC size, for example.
Attachment #641718 - Flags: review?(bhackett1024) → review+
Attachment #641315 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 641315 [details] [diff] [review]
Patch

Review of attachment 641315 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +3763,5 @@
> +    masm.checkStackAlignment();
> +
> +    /* Make Space for the outparam */
> +    masm.adjustStack((int)-sizeof(Value));
> +    masm.movePtr(StackPointer, ValueReg);

These 2 instructions can be done in one on ARM.  It might be worth adding

  masm.stackAlloc(sizeof(Value), ValueReg);

for this.

Have a look at retn implementation, except that you are looking for a PreIndex and not a PostIndex.

@@ +3770,5 @@
> +
> +    ValueOperand temp = GetTempValue(PrivateReg, JSContextReg);
> +    // GetReservedSlot(obj, 0).toPrivate()
> +    masm.loadValue(Address(ObjectReg, JSObject::getFixedSlotOffset(0)), temp);
> +    masm.unboxPrivate(temp, PrivateReg);

You can implement loadPrivate instead of unboxing, so you won't have to load extra memory in an unused register.

@@ +3783,5 @@
> +
> +    if (!markSafepointAt(safepointOffset, ins))
> +        return false;
> +
> +    masm.setupUnalignedABICall(4, JSContextReg);

Why unaligned ?  You have a masm.checkStackAlignment(), so we should be able to know how the stack is aligned at compile time and from what I read I guess it is just unaligned by 4.

So you can create a setupMissAlignedABICall which will save 2 instructions.

@@ +3868,5 @@
> +    masm.jump(&success);
> +
> +    {
> +        masm.bind(&exception);
> +        masm.handleException();

It sounds like we should share the handleExceptions between all calls and move them to a shared outOfLine path.  This will save us the success jump. I will open a follow-up bug for that.

@@ +3871,5 @@
> +        masm.bind(&exception);
> +        masm.handleException();
> +    }
> +    masm.bind(&success);
> +    masm.adjustStack(IonDOMExitFrameLayout::Size());

I would prefer a leaveDOMExitFrame for this one.  Such as we can have some sort of scoping of the DOMExitFrame.

::: js/src/ion/IonFrames.cpp
@@ +524,5 @@
>      }
>  
> +    if (frame.isDOMExit()) {
> +        IonDOMExitFrameLayout *dom = frame.exitFrame()->DOMExit();
> +        gc::MarkObjectRoot(trc, dom->thisObjAddress(), "ion-dom-args");

Sounds good for getters and setters.

::: js/src/ion/IonMacroAssembler.h
@@ +462,5 @@
>      }
>  
> +    void enterFakeDOMFrame(void *codeVal) {
> +        linkExitFrame();
> +        Push(ImmWord(uintptr_t(codeVal)));

JS_ASSERT(codeVal == ION_FRAME_DOMGETTER || codeVal == ION_FRAME_DOMSETTER);

::: js/src/ion/Lowering.cpp
@@ +1783,5 @@
> +
> +    LSetDOMProperty *lir = new LSetDOMProperty(tempFixed(CallTempReg0),
> +                                               useFixed(ins->object(), CallTempReg1),
> +                                               tempFixed(CallTempReg2),
> +                                               tempFixed(CallTempReg3));

You might save at least 4 moves by using IntArgReg[0-3] on ARM and x64.

@@ +1796,5 @@
> +{
> +    LGetDOMProperty *lir = new LGetDOMProperty(tempFixed(CallTempReg0),
> +                                               useFixed(ins->object(), CallTempReg1),
> +                                               tempFixed(CallTempReg2),
> +                                               tempFixed(CallTempReg3));

same here.
Attachment #641315 - Flags: feedback?(nicolas.b.pierron) → feedback+
Whiteboard: [leave open]
https://hg.mozilla.org/projects/ionmonkey/rev/30894762f1fc
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
You need to log in before you can comment on or make changes to this bug.