Closed
Bug 747288
Opened 11 years ago
Closed 11 years ago
Generate faster jitcode for DOM getters/setters
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: efaust)
References
Details
Attachments
(4 files, 2 obsolete files)
89.63 KB,
patch
|
Details | Diff | Splinter Review | |
41.95 KB,
patch
|
sstangl
:
review+
nbp
:
feedback+
|
Details | Diff | Splinter Review |
12.69 KB,
patch
|
bhackett1024
:
review+
|
Details | Diff | Splinter Review |
440 bytes,
text/html
|
Details |
This is bug 747285 part (C). Might need to be further broken up.
Updated•11 years ago
|
Assignee: general → nicolas.b.pierron
Assignee | ||
Comment 1•11 years ago
|
||
First blush runs of called bottom halves yields the following runtimes on a bz microbenchmark that calls XMLHttpRequest.readyState 10,000,000 times. JM + IM: ~125ms, though I've seen it as low as 110 JM + IM with just calling the original native getter: ~137ms JM: ~140ms IM: ~135ms Safari: 90ms. We've made up about half the distance.
![]() |
Reporter | |
Comment 2•11 years ago
|
||
Patches please? Slightly curious as to how much time we spend in the jitcode vs the binding now....
Assignee | ||
Comment 3•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Assignee: nicolas.b.pierron → efaust
Status: NEW → ASSIGNED
![]() |
Reporter | |
Comment 4•11 years ago
|
||
With that patch, on the readyState testcase, I see numbers approximately like so (on x86-64): 58% ion code. I have no real visibility into this, but whatever we can do to reduce the amount of code here would presumably help. This is about 6800 samples at 20us sample interval. 19% self code in js::ion::DOMPropertyDispatch. This is almost entirely function call overhead plus the branching that js::GetReservedSlot ends up having to do. Inlining DOMPropertyDispatch into the jitcode, if at all possible would eliminate pretty much all of this, I think. About 2200 samples here. 14% self code in specialized_get_readyState. This is basically 36% register gunk on entry, 12% doing the actual call into XHR (but not including time in XHR!), 15% doing INT_TO_JSVAL, 37% register gunk on exit. Wish we could inline this too! Can we win by using a better calling convention, perhaps? About 1600 samples here. 7% nsXMLHttpRequest::GetReadyState. About 800 samples. Total samples, 11800 or so. For comparison, before this patch we had 13000 samples, of which about 6100 were in JM jitcode, 6000 in the DOM JSNative, and 800 in GetReadyState. So we've cut down the time in the DOM code from 6800 samples to 2400 or so, but now have an extra 2200 in DOMPropertyDispatch and 2800 in jitcode.
Boris, could you try changing the branch in GetReservedSlot to be an assertion that the slot being accessed is always a fixed slot? I think this might be true, and if it's not, we could probably make it true.
![]() |
Reporter | |
Comment 6•11 years ago
|
||
Reserved slot access on globals uses non-fixed slots because globals have so many reserved slots at the moment.
Assignee | ||
Comment 7•11 years ago
|
||
Apply on top of other patch. Runtimes on microbenchmark now < 100ms. Running about 97ms or so with *and* without JM.
Assignee | ||
Comment 8•11 years ago
|
||
As for moving boxing the result to jitcode, that's probably also doable, if the bindings pass type information in the JSJitInfo, then the callsite can generate appropriate boxing and the bindings can just fill a pointer to the stack with the desired return value. I don't know at present how this interacts with the calls that need to JS_WrapValue().
![]() |
Reporter | |
Comment 9•11 years ago
|
||
Hmm. So boxing is a bit of a pain in many cases (e.g. boxing up an nsINode involves some function calls to get the JSObject*, etc). Doing it in the simple readyState case, where the return value is int, is easy, of course.
Assignee | ||
Comment 10•11 years ago
|
||
OK, but the current boxing method involves a function call to do the boxing. We can do even coarse boxing given the JSObject*, for example in the expensive case. I'ts unclear whether this is actually a win, but it would shave a function call.
![]() |
Reporter | |
Comment 11•11 years ago
|
||
> OK, but the current boxing method involves a function call to do the boxing.
Er, it does? Where? All the hot paths should be inlined, at least into the binding C++ method....
Assignee | ||
Comment 12•11 years ago
|
||
Indeed. It seems remarkably unlikely that the contents of jsval.h are not inlined, as you say. Looking for broader win, is there a reasonable way we can try and slide the whole bottom half to jitcode somehow?
![]() |
Reporter | |
Comment 13•11 years ago
|
||
So in order... Looking at a profile with that last patch, 72% of the time is jitcode. 19% of time is self time in the specialized getter (almost entirely call overhead stuff). 8% is in GetReadyState itself. We're dow to 9600 samples, so about 20% faster than the previous patch, as expected. Still a bit slower than Safari, but they have it somewhat easier because their props are not on the proto... Still, it feels like we should be able to do just as well here. As far as comment 12 goes the answer is "maybe". In a simple case like this one (getter with a simple-to-convert return value), it's maybe doable. If the return value were more complicated, or if this were a setter with a non-simple-to-convert argument, I think it's pretty hard to do in jitcode. Even for the getter it's a bit hard, though, because you have to know how to call the underlying C++ getter, and we don't have a specific calling convention for those. In this case it's a non-virtual out-of-line function, but it could have been virtual. Or it could have been inlined. In any case, the biggest win bar inlining all the C++ into jitcode at this point would be trying to figure out exactly what the jitcode is doing during those 72% of the time and seeing what we can do to reduce it. Apart from that, starting to look pretty good! ;)
Assignee | ||
Comment 14•11 years ago
|
||
Now with Bug 769853 closed, those patches applied to IonMonkey tip run the same speed as JSC, both with and without JM.
Assignee | ||
Comment 15•11 years ago
|
||
Attachment #637625 -
Attachment is obsolete: true
Attachment #637781 -
Attachment is obsolete: true
![]() |
Reporter | |
Comment 16•11 years ago
|
||
Hmm. I'm still seeing this be slower than Safari... maybe I need to rebuild with -fomit-frame-pointer?
![]() |
Reporter | |
Comment 17•11 years ago
|
||
Specifically, I see about 90 for safari, 120 for our 64-bit code, 130-140 for our 32-bit code.
Assignee | ||
Comment 18•11 years ago
|
||
I am building with --disable-debug --enable-optimize. If we are still really 140ms slow on 32 bit, then that's something I will have to look at. Do we have profiles?
Assignee | ||
Comment 19•11 years ago
|
||
Timings were generated on a x86-64 build done with clang, which is compiling -fomit-frame-pointer. Were the numbers you were seeing always higher than those I had been reporting?
![]() |
Reporter | |
Comment 20•11 years ago
|
||
> Were the numbers you were seeing always higher than those I had been reporting?
Yes, for my own builds. About comparable for Safari.
I just tried a 64-bit build with gcc and -fomit-frame-pointer, and that gets me down to about 100 or so. clang numbers tomorrow.
![]() |
Reporter | |
Comment 21•11 years ago
|
||
Same number with clang.
Assignee | ||
Comment 22•11 years ago
|
||
Attachment #641315 -
Flags: review?(sstangl)
![]() |
Reporter | |
Comment 23•11 years ago
|
||
One note: Since this is using 0 explicitly, not the #define for the actual slot number, can you please add a comment next to that #define saying that if it ever changes this code needs to change? Similar for anything else about the data structures involved that's baked into the JIT C++, though I _think_ this should be the only dependency.
Assignee | ||
Comment 24•11 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #23) > One note: Since this is using 0 explicitly, not the #define for the actual > slot number, can you please add a comment next to that #define saying that > if it ever changes this code needs to change? Similar for anything else > about the data structures involved that's baked into the JIT C++, though I > _think_ this should be the only dependency. Absolutely. I would have used the #define itself, but it hardly seems reasonable to move that from DOMJSClass.h to somewhere the engine could get at it.
Comment 25•11 years ago
|
||
Comment on attachment 641315 [details] [diff] [review] Patch Review of attachment 641315 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/CodeGenerator.cpp @@ +3762,5 @@ > + > + masm.checkStackAlignment(); > + > + /* Make Space for the outparam */ > + masm.adjustStack((int)-sizeof(Value)); Should this be -int(sizeof(Value))? @@ +3828,5 @@ > + DebugOnly<uint32> initialStack = masm.framePushed(); > + > + masm.checkStackAlignment(); > + > + // Push thei argument. Rooting will happen at GC time. thei? ::: js/src/ion/shared/IonFrames-shared.h @@ +1,5 @@ > +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- > + * vim: set ts=4 sw=4 et tw=79: > + * > + * ***** BEGIN LICENSE BLOCK ***** > + * Version: MPL 1.1/GPL 2.0/LGPL 2.1 MPL2
Assignee | ||
Comment 26•11 years ago
|
||
Renames jitInfo in methodjit to mJITInfo, and makes associated method name changes.
Attachment #641718 -
Flags: review?(sstangl)
Assignee | ||
Updated•11 years ago
|
Attachment #641718 -
Flags: review?(sstangl) → review?(dvander)
![]() |
||
Updated•11 years ago
|
Attachment #641718 -
Flags: review?(dvander) → review?(bhackett1024)
Comment 27•11 years ago
|
||
Comment on attachment 641315 [details] [diff] [review] Patch Review of attachment 641315 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable, although I understand that a bunch of code is going to be changed/removed by an upcoming patch. r+ with a bunch of irritating little nits and some non-nits that look like they come from other patches. ::: js/src/ion/CodeGenerator.cpp @@ +3785,5 @@ > + return false; > + > + masm.setupUnalignedABICall(4, JSContextReg); > + > + masm.loadJSContext(JSContextReg); It's technically unsafe to load into the register passed into setupUnalignedABICall() before the completion of callWithABI(). It doesn't look like we're changing the implementation any time soon, though, and it works currently. ::: js/src/ion/IonBuilder.cpp @@ +4887,3 @@ > // Otherwise try using the prototype. > curObj = typeObj->proto; > + } else { The previous block guarantees that curObj is set. Instead of duplicating the thinkDOM-updating code, just let the code in this else block always execute after the "if (!curObj)" block and remove the !curObj thinkDOM-updating code. @@ +4888,5 @@ > curObj = typeObj->proto; > + } else { > + // Check the DOM-ness of the singleton. > + types::TypeObject *objType = curObj->getType(cx); > + thinkDOM = thinkDOM && !objType->hasAnyFlags(types::OBJECT_FLAG_NON_DOM); non-nit: This should be OBJECT_FLAG_DOM. Besides being better from a readability standpoint to ask questions positively, the default (zeroed) state of a positive assertion is non-DOM, requiring no additional action from us. @@ +4983,5 @@ > > curType = obj->getType(cx); > } > > + // Freeze the types as being DOM objects if they are nit: // Freeze any DOM objects. @@ +4985,5 @@ > } > > + // Freeze the types as being DOM objects if they are > + if (thinkDOM) { > + // Asking the question adds the freeze nit: it's classy to end comments with a period. Also, this behavior isn't particularly obvious. It's generally a bad sign if we feel the need to add a comment to explain the effect of a single function call. Would it be cleaner to have a FreezeObjectFlags() function, then call that from both here and HasObjectFlags()? @@ +4986,5 @@ > > + // Freeze the types as being DOM objects if they are > + if (thinkDOM) { > + // Asking the question adds the freeze > + bool wasntDOM = types::TypeSet::HasObjectFlags(cx, curType, This either needs to be DebugOnly<bool>, or ignored. Or, instead, if we have a FreezeObjectFlags() function, then it should be safe to do: JS_ASSERT(HasObjectFlags(cx, ..., OBJECT_FLAG_DOM)); FreezeObjectFlags(cx, ..., OBJECT_FLAG_DOM)); In opt builds it obviously works. In debug builds, if HasObjectFlags(), then the freeze isn't set, so FreezeObjectFlags() isn't redundant. @@ +5020,5 @@ > + return true; > +} > + > +static bool > +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func) Would prefer removing the isDOM parameter and moving the logic to the callsites. @@ +5024,5 @@ > +TestShouldDOMCall(JSContext *cx, bool isDOM, types::TypeSet *inTypes, HandleFunction func) > +{ > + if (!isDOM || !func->isNative() || !func->jitinfo()) > + return false; > + // If all the DOM objects flowing through are legal with this nit: vertical whitespace before comment. Period at end of comment. @@ +5045,5 @@ > + curType = curObj->getType(cx); > + } > + > + JSObject *typeProto = curType->proto; > + RootedObject proto(cx, typeProto); RootedObject proto(cx, curType->proto); @@ +5225,5 @@ > + RootedFunction setter(cx, commonSetter); > + if (TestShouldDOMCall(cx, isDOM, types, setter)) { > + MSetDOMProperty *set = MSetDOMProperty::New(setter->jitinfo()->op, obj, value); > + if (!set) > + return false; Don't need failure check. ::: js/src/ion/LIR-Common.h @@ +568,5 @@ > + this->setOperand(0, ObjectReg); > + } > + > + public: > + nit: remove whitespace after public: ::: js/src/ion/MIR.h @@ +4534,5 @@ > + { > + return new MSetDOMProperty(func, obj, val); > + } > + > + const JSJitPropertyOp fun() { nit: const before "{". Also object() and value(). @@ +4543,5 @@ > + return getOperand(0); > + } > + > + MDefinition *value() > + { nit: { on previous line. Remove vertical whitespace between this and object(). @@ +4575,5 @@ > + { > + return new MGetDOMProperty(func, obj, isInfallible); > + } > + > + const JSJitPropertyOp fun() { nit: const before "{". Also isInfallible() and object(). ::: js/src/ion/shared/CodeGenerator-shared-inl.h @@ +135,5 @@ > #error "Unknown" > #endif > } > > +static inline ValueOperand can just be inline. @@ +144,5 @@ > +#elif defined(JS_PUNBOX64) > + (void)type; > + return ValueOperand(payload); > +#else > +#error "Unknown" nit: # error "Unknown" ::: js/src/ion/shared/IonFrames-shared.h @@ +41,5 @@ > +#ifndef jsion_ionframes_shared_h__ > +#define jsion_ionframes_shared_h__ > + > +#define ION_FRAME_DOMGETTER ((IonCode *)0x1) > +#define ION_FRAME_DOMSETTER ((IonCode *)0x2) Probably should include the file that defines IonCode. @@ +43,5 @@ > + > +#define ION_FRAME_DOMGETTER ((IonCode *)0x1) > +#define ION_FRAME_DOMSETTER ((IonCode *)0x2) > + > +#endif ultra-nit: #endif // jsion_ionframes_shared_h__
Attachment #641315 -
Flags: review?(sstangl) → review+
![]() |
Reporter | |
Comment 28•11 years ago
|
||
For what it's worth, in the presence of polymorphism this code is a _huge_ win over what we used to have. 10x or more faster once we hit the limit on IC size, for example.
Updated•11 years ago
|
Attachment #641718 -
Flags: review?(bhackett1024) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #641315 -
Flags: feedback?(nicolas.b.pierron)
Comment 29•11 years ago
|
||
Comment on attachment 641315 [details] [diff] [review] Patch Review of attachment 641315 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ion/CodeGenerator.cpp @@ +3763,5 @@ > + masm.checkStackAlignment(); > + > + /* Make Space for the outparam */ > + masm.adjustStack((int)-sizeof(Value)); > + masm.movePtr(StackPointer, ValueReg); These 2 instructions can be done in one on ARM. It might be worth adding masm.stackAlloc(sizeof(Value), ValueReg); for this. Have a look at retn implementation, except that you are looking for a PreIndex and not a PostIndex. @@ +3770,5 @@ > + > + ValueOperand temp = GetTempValue(PrivateReg, JSContextReg); > + // GetReservedSlot(obj, 0).toPrivate() > + masm.loadValue(Address(ObjectReg, JSObject::getFixedSlotOffset(0)), temp); > + masm.unboxPrivate(temp, PrivateReg); You can implement loadPrivate instead of unboxing, so you won't have to load extra memory in an unused register. @@ +3783,5 @@ > + > + if (!markSafepointAt(safepointOffset, ins)) > + return false; > + > + masm.setupUnalignedABICall(4, JSContextReg); Why unaligned ? You have a masm.checkStackAlignment(), so we should be able to know how the stack is aligned at compile time and from what I read I guess it is just unaligned by 4. So you can create a setupMissAlignedABICall which will save 2 instructions. @@ +3868,5 @@ > + masm.jump(&success); > + > + { > + masm.bind(&exception); > + masm.handleException(); It sounds like we should share the handleExceptions between all calls and move them to a shared outOfLine path. This will save us the success jump. I will open a follow-up bug for that. @@ +3871,5 @@ > + masm.bind(&exception); > + masm.handleException(); > + } > + masm.bind(&success); > + masm.adjustStack(IonDOMExitFrameLayout::Size()); I would prefer a leaveDOMExitFrame for this one. Such as we can have some sort of scoping of the DOMExitFrame. ::: js/src/ion/IonFrames.cpp @@ +524,5 @@ > } > > + if (frame.isDOMExit()) { > + IonDOMExitFrameLayout *dom = frame.exitFrame()->DOMExit(); > + gc::MarkObjectRoot(trc, dom->thisObjAddress(), "ion-dom-args"); Sounds good for getters and setters. ::: js/src/ion/IonMacroAssembler.h @@ +462,5 @@ > } > > + void enterFakeDOMFrame(void *codeVal) { > + linkExitFrame(); > + Push(ImmWord(uintptr_t(codeVal))); JS_ASSERT(codeVal == ION_FRAME_DOMGETTER || codeVal == ION_FRAME_DOMSETTER); ::: js/src/ion/Lowering.cpp @@ +1783,5 @@ > + > + LSetDOMProperty *lir = new LSetDOMProperty(tempFixed(CallTempReg0), > + useFixed(ins->object(), CallTempReg1), > + tempFixed(CallTempReg2), > + tempFixed(CallTempReg3)); You might save at least 4 moves by using IntArgReg[0-3] on ARM and x64. @@ +1796,5 @@ > +{ > + LGetDOMProperty *lir = new LGetDOMProperty(tempFixed(CallTempReg0), > + useFixed(ins->object(), CallTempReg1), > + tempFixed(CallTempReg2), > + tempFixed(CallTempReg3)); same here.
Attachment #641315 -
Flags: feedback?(nicolas.b.pierron) → feedback+
Assignee | ||
Updated•11 years ago
|
Whiteboard: [leave open]
Assignee | ||
Comment 31•11 years ago
|
||
https://hg.mozilla.org/projects/ionmonkey/rev/30894762f1fc
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
You need to log in
before you can comment on or make changes to this bug.
Description
•