Closed
Bug 890365
Opened 12 years ago
Closed 10 years ago
IonMonkey: inline NewDenseArray intrinsic in sequential execution mode, too
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
INVALID
People
(Reporter: till, Assigned: isk)
References
Details
(Whiteboard: [js:t])
Attachments
(1 file, 4 obsolete files)
|
12.76 KB,
patch
|
Details | Diff | Splinter Review |
As discussed on IRC, inlining this operation in sequential mode should hopefully speed up Array#map.
@shu, you said that it might be ok to remove the intrinsic entirely. That'd mean changing UnsafeSetElement to work in situations where the array might not be large enough, right?
| Assignee | ||
Comment 1•12 years ago
|
||
I'm interested in to fix this bug.
But I think my knowledge is lack to fix it.
Would you mind helping me to fix this bug?
I think the way to fix bug.
1. add condition to inline in IonBuilder::inlineNewDenseArrayForSequentialExecution
2. make MNewDenseArraySeq like MNewDenseArrayPar
3. make LNewDenseArraySeq like LNewDenseArrayPar
4. make CodeGenerator::visitNewDenseArraySeq
Flags: needinfo?(nicolas.b.pierron)
Comment 2•12 years ago
|
||
You can probably use MNewArray with NewArray_Allocating.
| Assignee | ||
Comment 3•12 years ago
|
||
I make WIP-patch using MNewArray and benchmark.
function add(v) {
return v + 5;
}
function f() {
var arr = new Array(500,1);
var t = new Date;
for (var i=0; i<1000000; i++)
var res = arr.map(add);
print(new Date - t);
}
f();
This benchmark is not fast by this patch.
I found NewDenseArray is not inlined because argument(length) has the value not constant.
MNewArray::New needs the constant length.
Is there good way except making new MNewDenseArraySeq ?
Attachment #8337344 -
Flags: feedback?(evilpies)
Comment 4•12 years ago
|
||
(In reply to iseki.m.aa from comment #1)
> I'm interested in to fix this bug.
> But I think my knowledge is lack to fix it.
> Would you mind helping me to fix this bug?
Who did the code for MNewDenseArrayPar? Would this person mentor you?
Flags: needinfo?(nicolas.b.pierron)
| Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #4)
> (In reply to iseki.m.aa from comment #1)
> > I'm interested in to fix this bug.
> > But I think my knowledge is lack to fix it.
> > Would you mind helping me to fix this bug?
>
> Who did the code for MNewDenseArrayPar? Would this person mentor you?
How can I search the person who coding MNewDenseArrayPar?
Search from the patch?
Comment 6•12 years ago
|
||
(In reply to iseki.m.aa from comment #5)
> How can I search the person who coding MNewDenseArrayPar?
> Search from the patch?
You can use the blame / annotate? feature of hg which provide a similar output as:
http://hg.mozilla.org/mozilla-central/annotate/53d55d2d0a25/js/src/jit/MIR.h#l8795
| Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to iseki.m.aa from comment #5)
> > How can I search the person who coding MNewDenseArrayPar?
> > Search from the patch?
>
> You can use the blame / annotate? feature of hg which provide a similar
> output as:
>
> http://hg.mozilla.org/mozilla-central/annotate/53d55d2d0a25/js/src/jit/MIR.
> h#l8795
Thanks you for your comment.
| Assignee | ||
Comment 8•12 years ago
|
||
I found your names in http://hg.mozilla.org/mozilla-central/annotate/53d55d2d0a25/js/src/jit/MIR.h#l8795.
I'd like to fix this bug.
But my knowledge is lack to fix it.
Would you teach me as a mentor?
Flags: needinfo?(shu)
Comment 9•12 years ago
|
||
Comment on attachment 8337344 [details] [diff] [review]
WIP-bug890365.patch
I am not the right person here.
Attachment #8337344 -
Flags: feedback?(evilpies)
Comment 10•12 years ago
|
||
Hi Iseki,
Sorry for the late reply. I'll be happy to mentor you on this bug, but I am traveling tomorrow (Dec 3rd) and won't be available. Why don't you hop on IRC and ping me (shu) this week?
Flags: needinfo?(shu)
| Assignee | ||
Comment 11•12 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #10)
Thanks for your comment.
> Why don't you hop on IRC and ping me (shu) this week?
That's a good idea. But I don't know how to use IRC well.
I live in Japan(UTC+9).
| Assignee | ||
Updated•12 years ago
|
Assignee: general → iseki.m.aa
Comment 12•12 years ago
|
||
(In reply to iseki.m.aa from comment #11)
> (In reply to Shu-yu Guo [:shu] from comment #10)
> > Why don't you hop on IRC and ping me (shu) this week?
> That's a good idea. But I don't know how to use IRC well.
> I live in Japan(UTC+9).
You will find all the help you need on this web page:
https://wiki.mozilla.org/IRC
Connect to the channel named "#jsapi".
You can contact me with my nickname (nbp), nicknames are often indicated on bugzilla accounts with the square bracket notation such as [:nbp] or [:shu].
See you soon on IRC.
| Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Comment 13•12 years ago
|
||
Comment on attachment 8337344 [details] [diff] [review]
WIP-bug890365.patch
Review of attachment 8337344 [details] [diff] [review]:
-----------------------------------------------------------------
The patch looks fine for inlining calls to |NewDenseArray| with a constant argument, but that won't speed up Array.prototype.map as hoped for in comment 0, since map doesn't call |NewDenseArray| with a constant length. For that either |MNewArray| needs to be extended to work with non-constant length (with a VM call to |JSObject::ensureDenseElements| or something), or make a new MIR.
I'm giving this r+ since I don't think inlining calls to |NewDenseArray| with constant arguments can hurt, but to actually address comment 0 more is needed.
::: js/src/jit/MCallOptimize.cpp
@@ +1413,5 @@
>
> IonBuilder::InliningStatus
> IonBuilder::inlineNewDenseArrayForSequentialExecution(CallInfo &callInfo)
> {
> // not yet implemented; in seq. mode the C function is not so bad
Update the comment please.
@@ +1438,5 @@
> +
> + callInfo.unwrapArgs();
> +
> + MNewArray::AllocatingBehaviour allocating = MNewArray::NewArray_Allocating;
> + MNewArray *ins = MNewArray::New(alloc(), initLength, templateObject, allocating);
This needs rebasing against the current tip. This call needs 5 arguments now, you're missing the |initialHeap| argument. You can get that by calling |templateObject->type()->initialHeap(constraints())|.
Attachment #8337344 -
Flags: review+
| Assignee | ||
Comment 14•12 years ago
|
||
Thank you for your reviewing.
I have many question.
Could you tell me?
I will make a new MIR as NewDenseArraySeq.
1. Is it possible to allocate element without calling ABI like "ExtendArrayPar" and VM function.
2. What do OutOfLineNewArray do? I think OutOfLineNewArray can be used in NewDenseArraySeq like NewArray. But I could not understand due to read the code.
3. What do OutOfLineAbortPar do? I think this method needs only when parallel compilation.
4. What is the difference between NewArray and NewDenseArraySeq? Is it only that NewDenseArraySeq can accept variable length but NewArray can't accept variable length?
5. Why LNewDenseArray allocate three temporary object? In visitNewDenseArrayPar, tempReg0 is used as the register to accept the return value which ExtendArrayPar return and tempReg2 is used as second argument of ExtendArrayPar and tempReg1 is not used. Is it related to "coalescing" which speed up memory access?
6. What is the means of three argument in LInstructionHelper such as LInstructionHelper<1,2,3>.
7. Where do we receive the bound value by masm.bind()?
Comment 15•12 years ago
|
||
(In reply to iseki.m.aa from comment #14)
> Thank you for your reviewing.
>
> I have many question.
> Could you tell me?
> I will make a new MIR as NewDenseArraySeq.
>
Sorry for the late reply, I had forgotten about this.
> 1. Is it possible to allocate element without calling ABI like
> "ExtendArrayPar" and VM function.
>
You can write a stub (like the string concat stub [1]) using just the macro assembler, but the path for extending the array will require at least calling malloc, so you might as well just call back into the VM.
> 2. What do OutOfLineNewArray do? I think OutOfLineNewArray can be used in
> NewDenseArraySeq like NewArray. But I could not understand due to read the
> code.
>
OutOfLineNewArray is the out of line path. In our JIT, in order to increase the instruction cache locality, we often put slow paths "out of line", meaning we generate code for them at the end of the instruction stream, and jump to them in case the fast path cannot be taken.
In this particular case, we have fast paths for making GC things (see masm.newGCThing [2]). However, that doesn't always work depending on the state of the GC. If we cannot allocate something on the fast path, we fall back to the slow, out of line path.
> 3. What do OutOfLineAbortPar do? I think this method needs only when
> parallel compilation.
OutOfLineAbortPar generates code that aborts the parallel execution inside of a ForkJoin. Parallel compilation (that is, off-main-thread compilation is something different). You don't have to worry about this for the sequential path.
>
> 4. What is the difference between NewArray and NewDenseArraySeq? Is it only
> that NewDenseArraySeq can accept variable length but NewArray can't accept
> variable length?
You're right, but that's not the only difference. NewDenseArray ensures that we allocate the dense storage and initializes it for a certain length, but NewArray doesn't always allocate the storage.
NewDenseArray was made back when there was a greater distinction between normal and dense arrays. Maybe it's possible to merge the two MIR now, but I'm not entirely sure.
>
> 5. Why LNewDenseArray allocate three temporary object? In
> visitNewDenseArrayPar, tempReg0 is used as the register to accept the return
> value which ExtendArrayPar return and tempReg2 is used as second argument of
> ExtendArrayPar and tempReg1 is not used. Is it related to "coalescing" which
> speed up memory access?
>
tempReg1 is used in emitGCThingPar [3].
> 6. What is the means of three argument in LInstructionHelper such as
> LInstructionHelper<1,2,3>.
That means the LIR has defines 1 thing, takes 2 operands, and 3 temporaries. See [4].
>
> 7. Where do we receive the bound value by masm.bind()?
I'm not sure what you're asking here, that method is void and doesn't return anything. You use it to bind labels, for example:
Label skip;
masm.branchTestPtr(Assembler::NotEqual, ptr, Imm32(0xdeadbeef), &skip);
masm.printf("foo");
masm.bind(&skip);
is equivalent to the following pseudocode
if (ptr == 0xdeadbeef)
printf("foo");
Hope this helps,
[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#4542
[2] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/IonMacroAssembler.cpp#688
[3] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#3359
[4] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/LIR.h?from=LInstructionHelper#813
| Assignee | ||
Comment 16•11 years ago
|
||
Thank you for your answering.
I tried to make a new MIR as NewDenseArraySeq.
But it is not complete.
I have a question.
How to extend array like ExtendArrayPar in sequential?
Use NewGCThing?
Attachment #8337344 -
Attachment is obsolete: true
Attachment #8355053 -
Flags: feedback?(shu)
| Assignee | ||
Updated•11 years ago
|
Attachment #8355053 -
Flags: feedback?(shu)
| Assignee | ||
Comment 17•11 years ago
|
||
I have many question.
1. There are two way in visitNewArray[1] . One way is VMFunction, Another way is inlining. What cause call VMFunction? Lack of type information?
2. NewArray stub can allocate the arbitrary number of elements when length is constant. VMFunction[2] use "count" to allocate the arbitrary number of elements. But another way(using masm.newGCThing) don't use "count". How to allocate arbitrary the number of elements?
3. The method in Codegeneratot.cpp use ABI or VMFunction. Why use ABI instead of VMFunction or Why use VMFunction instead of ABI.
[1]http://dxr.mozilla.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#3134
[2]http://dxr.mozilla.org/mozilla-central/source/js/src/jit/VMFunctions.cpp#296
Comment 18•11 years ago
|
||
(In reply to Masaya Iseki[:isk](UTC+9) from comment #17)
> 3. The method in Codegeneratot.cpp use ABI or VMFunction. Why use ABI
> instead of VMFunction or Why use VMFunction instead of ABI.
Most of the time you want to use callVM with a VMFunction, there is more assertion guarding that you are doing things correctly.
Usually people are using callWithABI to handle function calls which are either not-effectful, or rarely which are on fast-path such as DOM functions.
VMFunction & CallVM is a generic way to avoid all trouble, with callWithABI you might win a little in terms of performances, but the overhead is on the developer to ensure that no GC can happen, or that it is marked correctly.
| Assignee | ||
Comment 19•11 years ago
|
||
I add NewDenseArraySeq node and stub.
But this patch causes *segmentation fault* in following script.
function add(v) {
return v + 5;
}
function f() {
var arr = new Array(500,1);
var t = new Date;
for (var i=0; i<1000000; i++)
var res = arr.map(add);
print(new Date - t);
}
f();
I investigate using gdb.
I found where segv is happened[1].
I don't know why segv is happen.
Would you tell me?
[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/CodeGenerator-shared.cpp#380
Attachment #8355053 -
Attachment is obsolete: true
Attachment #8356442 -
Flags: feedback?(shu)
| Assignee | ||
Comment 20•11 years ago
|
||
This patch make inlining NewDenseArray in sequential execution mode.
I run following script.
function add(v) {
return v + 5;
}
function f() {
var arr = new Array(500,1);
var t = new Date;
for (var i=0; i<1000000; i++)
var res = arr.map(add);
print(new Date - t);
}
f();
Before patch : 1691
After patch : 2104
This patch is slow.
Why don't be faster?
Attachment #8356442 -
Attachment is obsolete: true
Attachment #8356442 -
Flags: feedback?(shu)
Attachment #8357610 -
Flags: feedback?(shu)
| Assignee | ||
Comment 21•11 years ago
|
||
Comment on attachment 8357610 [details] [diff] [review]
Bug890365.patch
I misunderstand about syntax.
I thought the mean of *new Array(500,1)* and *vector<T>(500,1)* is the same.
But the mean of *new Array(500,1)* is [500,1].
I fix it and result is following.
Before patch: 13555
After patch : 5833
Attachment #8357610 -
Flags: feedback?(shu) → review?(shu)
Comment 22•11 years ago
|
||
Comment on attachment 8357610 [details] [diff] [review]
Bug890365.patch
Review of attachment 8357610 [details] [diff] [review]:
-----------------------------------------------------------------
This patch doesn't correctly ensure that the created array has the appropriate dense storage.
It's very possible that trying to inline this for sequential execution is unfruitful: ensuring dense storage is complex enough that it will always result in a VM call anyways.
::: js/src/jit/CodeGenerator.cpp
@@ -3145,5 @@
> JS_ASSERT(gen->info().executionMode() == SequentialExecution);
>
> Register objReg = ToRegister(lir->output());
>
> - JS_ASSERT(!lir->isCall());
Why was this line deleted?
@@ +3549,5 @@
> +
> + masm.newGCThing(objReg, templateObject, ool->entry(), lir->mir()->initialHeap());
> + masm.initGCThing(objReg, templateObject);
> +
> + masm.bind(ool->rejoin());
This code is wrong: on the fast path, when |newGCThing| succeeds, it doesn't actually allocate the dense storage.
::: js/src/jit/CodeGenerator.h
@@ +128,5 @@
> bool visitFloat32ToInt32(LFloat32ToInt32 *lir);
> bool visitNewSlots(LNewSlots *lir);
> bool visitNewArrayCallVM(LNewArray *lir);
> bool visitNewArray(LNewArray *lir);
> + bool visitNewArray(LNewDenseArraySeq *lir);
Remove this overload.
::: js/src/jit/MCallOptimize.cpp
@@ +1320,5 @@
> + JSObject *templateObject = inspector->getTemplateObjectForNative(pc, intrinsic_NewDenseArray);
> + if (!templateObject || templateObject->type() != typeObject)
> + return InliningStatus_NotInlined;
> +
> + callInfo.setFoldedUnchecked();
This function no longer exists.
::: js/src/jit/MIR.h
@@ +8939,5 @@
>
> +// Creates a dense array of the given length.
> +//
> +// Note: the template object should be an *empty* dense array!
> +class MNewDenseArraySeq : public MAryInstruction<1>
Nit: Change name to {L,M}NewDenseArray here and elsewhere. That is, without the Seq suffix. Only parallel nodes should have the Par suffix; nodes are sequential by default.
@@ +8951,5 @@
> + initialHeap_(initialHeap)
> + {
> + setOperand(0, length);
> + setResultType(MIRType_Object);
> + if (!templateObject->hasSingletonType())
Why are you checking if the templateObject has a singleton type?
::: js/src/jit/VMFunctions.cpp
@@ +278,5 @@
>
> JSObject*
> NewInitArray(JSContext *cx, uint32_t count, types::TypeObject *typeArg)
> {
> + count += 100;
Why are you adding 100 here?
Attachment #8357610 -
Flags: review?(shu) → review-
| Assignee | ||
Comment 23•11 years ago
|
||
Thank you for your reviewing.
Sorry previous patch had many useless code.
I will delete it.
> ::: js/src/jit/CodeGenerator.cpp
> @@ -3145,5 @@
> > JS_ASSERT(gen->info().executionMode() == SequentialExecution);
> >
> > Register objReg = ToRegister(lir->output());
> >
> > - JS_ASSERT(!lir->isCall());
> Why was this line deleted?
This reason is that |JS_ASSERT| is called in |saveLive|[1].
>@@ +3549,5 @@
>> +
>> + masm.newGCThing(objReg, templateObject, ool->entry(), lir->mir()->initialHeap());
>> + masm.initGCThing(objReg, templateObject);
>> +
>> + masm.bind(ool->rejoin());
>This code is wrong: on the fast path, when |newGCThing| succeeds, it doesn't actually allocate the dense >storage.
Is there a way to allocate the dense storage on the fast path?
[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/CodeGenerator-shared-inl.h#144
| Assignee | ||
Comment 24•11 years ago
|
||
I delete useless code and the fast path to allocate the dense storage.
Attachment #8357610 -
Attachment is obsolete: true
Attachment #8361914 -
Flags: review?(shu)
Comment 25•11 years ago
|
||
Comment on attachment 8361914 [details] [diff] [review]
Bug890365.patch
Review of attachment 8361914 [details] [diff] [review]:
-----------------------------------------------------------------
The patch is still not quite right, but more importantly, I don't think this approach can actually make us faster. The logic for extending dense elements is fairly complex and we end up having to do a |callVM| to some function that takes care of it anyways, and in the end it doesn't end up being any faster than just emitting a call to the |NewDenseArray| intrinsic.
I ran some numbers on a modified microbenchmark from comment 3 on my desktop:
function add(v) {
return v + 5;
}
function f() {
var arr = new Array(500,1);
var t = new Date;
for (var i=0; i<5000000; i++)
var res = arr.map(add);
print(new Date - t);
}
f();
Without patch: ~400ms
With patch: ~435ms
So this patch actually ends up causing a *slowdown*.
I think the only viable way moving forward is to write a fast stub (like the string concat stub) for the fast case of allocating and initializing new dense elements. This path wouldn't need to check for dynamic elements, reallocating, etc, and so seems conceptually simpler. It is still complicated, however, and even this simplified path might be too complex to be feasibly written in masm.
Iseki, sorry the callVM approach didn't work out to be any faster. Would you like to try at hand-inlining a fast path for ensuring dense elements? That is definitely not a "good first bug", though.
::: js/src/jit/CodeGenerator.cpp
@@ -3175,5 @@
> JS_ASSERT(gen->info().executionMode() == SequentialExecution);
>
> Register objReg = ToRegister(lir->output());
>
> - JS_ASSERT(!lir->isCall());
Why did you remove this assert?
@@ +3538,5 @@
> + pushArg(ImmGCPtr(type));
> + pushArg(lengthReg);
> +
> + if (!callVM(NewInitArrayInfo, lir))
> + return false;
NewInitArray doesn't ensure that dense storage is initialized just allocated, for that you need to call something like |JSObject::ensureDenseElementsNoPackedCheck|.
Attachment #8361914 -
Flags: review?(shu)
| Assignee | ||
Comment 26•11 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #25)
> Iseki, sorry the callVM approach didn't work out to be any faster. Would you
> like to try at hand-inlining a fast path for ensuring dense elements? That
> is definitely not a "good first bug", though.
I'd like to try it.
Would you keep mentoring to me?
> ::: js/src/jit/CodeGenerator.cpp
> @@ -3175,5 @@
> > JS_ASSERT(gen->info().executionMode() == SequentialExecution);
> >
> > Register objReg = ToRegister(lir->output());
> >
> > - JS_ASSERT(!lir->isCall());
>
> Why did you remove this assert?
JS_ASSERT(!lir->isCall()) is called in saveLive() (Comment 23).
I delete it because it is duplicate.
Is it need to double check?
Comment 27•11 years ago
|
||
(In reply to Masaya Iseki[:isk](UTC+9) from comment #26)
> (In reply to Shu-yu Guo [:shu] from comment #25)
> > Iseki, sorry the callVM approach didn't work out to be any faster. Would you
> > like to try at hand-inlining a fast path for ensuring dense elements? That
> > is definitely not a "good first bug", though.
>
> I'd like to try it.
> Would you keep mentoring to me?
Sure, but keep in mind that I have no idea if that approach will work out either.
| Assignee | ||
Comment 28•11 years ago
|
||
I consider the step making generateNewDenseArray.
1. Allocate new GC thing.
2. Check whether GC thing is dense.
3. Return GC thing.
I have many question.
Would you please tell me?
1.How to call |newGCThing| in stub? We cannot get templateObject and initialHeap.
2.How to check whether |newGCThing| allocate the dense storage? or how to allocate the dense storage?
3.What is the |computeEffectiveAddress|[1]. I don't know what this do.
4.Why a fast stub is faster than a fast path?
[1]http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/MacroAssembler-x86-shared.h#624
Flags: needinfo?(shu)
Comment 29•11 years ago
|
||
(In reply to Masaya Iseki[:isk](UTC+9) from comment #28)
> I consider the step making generateNewDenseArray.
> 1. Allocate new GC thing.
> 2. Check whether GC thing is dense.
> 3. Return GC thing.
>
> I have many question.
> Would you please tell me?
> 1.How to call |newGCThing| in stub? We cannot get templateObject and
> initialHeap.
> 2.How to check whether |newGCThing| allocate the dense storage? or how to
> allocate the dense storage?
> 3.What is the |computeEffectiveAddress|[1]. I don't know what this do.
> 4.Why a fast stub is faster than a fast path?
>
> [1]http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/
> MacroAssembler-x86-shared.h#624
Iseki, I'm very sorry that this dropped off my radar!
1) You would probably have to pass them in somehow from the inline site.
2) You would have to read the array allocation VM functions like NewArray [1] to figure out if it's possible to extract a fast path you can inline in assembly.
3) On x86, that's the lea instruction.
4) I'm not sure what you mean by a "fast path" here. I suggested a stub because I imagine even a hand-written asm version of a fast path allocating a dense array is complicated enough that it wouldn't be something we want to inline. Instead, a stub is lightweight piece of reusable code with its own calling convention.
[1]http://dxr.mozilla.org/mozilla-central/source/js/src/jsarray.cpp?from=NewArray&case=true#3146
Iseki, unfortunately I am somewhat busy. I am happy to discuss higher level approaches, but I won't have time to answer questions at this level of implementation detail.
Flags: needinfo?(shu)
| Reporter | ||
Comment 30•10 years ago
|
||
The NewDenseArray intrinsics was removed a while ago.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•