Closed Bug 1248289 Opened 5 years ago Closed 5 years ago

Inline _GetNextMapEntryForIterator.


(Core :: JavaScript Engine: JIT, defect)

Not set



Tracking Status
firefox47 --- affected
firefox48 --- fixed


(Reporter: arai, Assigned: arai)




(2 files, 1 obsolete file)

Inlining _GetNextMapEntryForIterator should improve Map iteration performance significantly.


var map = new Map();
for (var i=0; i<100; i++)
    map.set(i, i);

function f(m) {
    var res;
    for (var i = 10000; i--;) {
        for (var v of m) {
            res = v[0];
    return res;
var t = new Date;
print(new Date - t);

[result with WIP patch]

  before: 72 [ms]
  after:  26 [ms]
  before: 75 [ms]
  after:  24 [ms]

try is running
here's WIP patch.

Now I'm thinking it might be better to use dedicated object instead of array for mapIterationResultPair, as current JIT code does not support unboxed array, and actually we don't need it to be an array, but just need 2 slots that can be read/written with simple operation both in JIT and native function.
Assignee: nobody → arai.unmht
Comment on attachment 8719360 [details] [diff] [review]
Inline _GetNextMapEntryForIterator intrinsic.

I'll add a micro-benchmark to AWFY to detect regression comes from unboxed array when it's enabled.

I have one question.
in CodeGenerator::visitGetNextMapEntryForIterator, I saved/restored `iter` register with masm.push/pop because clobbering it causes crash.
in which case should we keep the value of input registers as is?
Attachment #8719360 - Attachment description: (WIP) Inline _GetNextMapEntryForIterator intrinsic. → Inline _GetNextMapEntryForIterator intrinsic.
Attachment #8719360 - Flags: review?(jdemooij)
Here's draft patch for AWFY.
will open PR after fixing this bug.
Comment on attachment 8719360 [details] [diff] [review]
Inline _GetNextMapEntryForIterator intrinsic.

Review of attachment 8719360 [details] [diff] [review]:

Sorry for the delay. This looks great, thanks for working on this!

::: js/src/jit/CodeGenerator.cpp
@@ +5664,5 @@
> +                      Register temp)
> +{
> +    masm.load32(Address(range, ValueMap::Range::offsetOfCount()), temp);
> +    masm.add32(Imm32(1), temp);
> +    masm.store32(temp, Address(range, ValueMap::Range::offsetOfCount()));

These 3 lines can be shortened to

masm.add32(Imm32(1), Address(range, ValueMap::Range::offsetOfCount());

(the code below is different because it uses |temp| afterwards)

@@ +5678,5 @@
> +    MOZ_ASSERT(ValueMap::sizeofImplData() == 24);
> +    masm.addPtr(Imm32(24), front);
> +
> +    masm.branchTestMagic(Assembler::NotEqual, Address(front, ValueMap::Entry::offsetOfKey()),
> +                         JS_HASH_KEY_EMPTY, temp, &done);

I think it should be possible to implement this without the scratch register (see another review comment). Then we can get rid of the load32 below, because |temp| will always hold |i|.

@@ +5692,5 @@
> +static inline void
> +ValueMapRangeDestruct(MacroAssembler& masm, Register range, Register temp0, Register temp1)
> +{
> +    const Register& next = temp0;
> +    const Register& prevp = temp1;

Nit: just |Register x = y;|

@@ +5729,5 @@
> +    masm.loadPtr(Address(range, ValueMap::Range::offsetOfHashTable()), dataLength);
> +    masm.load32(Address(dataLength, ValueMap::offsetOfImplDataLength()), dataLength);
> +    masm.branch32(Assembler::AboveOrEqual, temp, dataLength, &iterDone);
> +    {
> +        masm.push(iter);

To answer your question: it's not okay to modify operand registers, so saving/restoring this register is indeed the way to go (if we don't have another temp register available).

@@ +5731,5 @@
> +    masm.branch32(Assembler::AboveOrEqual, temp, dataLength, &iterDone);
> +    {
> +        masm.push(iter);
> +
> +        const Register& front = iter;

Nit: Register is tiny, so |Register front = iter;|

@@ +5736,5 @@
> +        ValueMapRangeFront(masm, range, temp, front);
> +
> +        size_t elementsOffset = NativeObject::offsetOfFixedElements();
> +        masm.moveValue(Address(front, ValueMap::Entry::offsetOfKey()),
> +                       Address(result, elementsOffset), temp);

We have to emit prebarriers and postbarriers for these stores. Also, we have to ensure the types of the key and value are part of the TypeSet for the array elements. That's pretty annoying... Maybe we could mark this array as having unknown properties though... It'd be nice if we could get rid of the array. I'll think about it more.

We should also add an assert to the C++ code to check the elements are indeed 'fixed', because this code relies on that.

@@ +5738,5 @@
> +        size_t elementsOffset = NativeObject::offsetOfFixedElements();
> +        masm.moveValue(Address(front, ValueMap::Entry::offsetOfKey()),
> +                       Address(result, elementsOffset), temp);
> +        masm.moveValue(Address(front, ValueMap::Entry::offsetOfValue()),
> +                       Address(result, elementsOffset + sizeof(Value)), temp);

Nit: s/moveValue/storeValue/ ? We usually use move* for moving a constant/register to a register.

::: js/src/jit/Lowering.cpp
@@ +2592,5 @@
> +{
> +    MOZ_ASSERT(ins->iter()->type() == MIRType_Object);
> +    MOZ_ASSERT(ins->result()->type() == MIRType_Object);
> +    define(new(alloc()) LGetNextMapEntryForIterator(useRegisterAtStart(ins->iter()),
> +                                                    useRegisterAtStart(ins->result()),

Using *AtStart means it's okay if one of the temps or the output get assigned the same register as the operand. I think this should be useRegister, as the code uses the operands after writing to temps.

::: js/src/jit/x86/MacroAssembler-x86.h
@@ +813,5 @@
> +                         Label* label)
> +    {
> +        branchTestMagic(cond, valaddr, label);
> +        load32(ToPayload(valaddr), temp);
> +        branch32(cond, temp, Imm32(why), label);

You can combine the load32 and branch32 in a single branch32 right?

Also, we should change the JSWhyMagic enum in js/public/Value.h to have uint32_t storage: compilers may otherwise store it as int8 or int16 and leave the remaining bits garbage.
Attachment #8719360 - Flags: review?(jdemooij) → feedback+
Thank you for reviewing :D

We could get rid of the array by splitting _GetNextMapEntryForIterator into following 4 functions:
  * checks if the iteration is done
  * returns front element's key
  * returns front element's value
  * pops front element

will try implementing it and checking performance.
with comment #5's way, we can get rid of the barrier issue, but the type issue is still there.

Calling `setResultType(MIRType_Value)` for functions that returns front element's key/value makes it much slower.
with WIP patch, 74 (before) vs 179 (after) in comment #0's case.

Calling `setResultType(MIRType_Int32)` makes it faster, but of course this is wrong.
Maybe the performance improvement observed in comment #0 was based on this incorrect type assumption? (it's using int32 for all pairs)

if there's a way to predict the all key's and value's type sets, we could still improve the performance with comment #5's way in some case.
So for the array stores we have:

* The pre-barrier: I think we actually don't need this one, because the previous value in the array is |null|, is that right? We could easily assert that.

* The type check: we could mark the array as having unknown types for the JSID_VOID property (JSID_VOID is used for elements) after we create the array (a call to an intrinsic in the global scope).

* That leaves the post barrier: when the array is tenured and the element is in the nursery. We can pretty easily inline this (maybe we can factor out the MPostWriteBarrier code).

What do you think?
Thanks!  Implemented those 3 ppints, and it seems working locally, at least on x64, keeping the improved performance :)
It needs some more branch related MacroAssembler methods, so will work on this after the cleanup in bug 1245112.
1. Added assertion for invariants in MapIteratorObject::next.

  a. for pre-barrier, array elements are null
  b. for post-barrier, array is tenured
  c. for element store, elements are fixed and there are at least 2 capacity  (can it be exact 2 capacity?)

2. Changed resultPairObj initialization to _CreateMapIterationResultPair self-hosting intrinsic, to

  a. create the array as tenured
  b. add unknown type for elements

3. Added post-barrier code to CodeGenerator::visitGetNextMapEntryForIterator

It checks if either key or value is an object in nursery.  If so, jumps to ool code MapIteratorResultPairPostBarrier with result array as an argument.

MapIteratorResultPairPostBarrier calls PostWriteElementBarrier for both key (0-th element) and value (1-th element).
Do we have to avoid calling PostWriteElementBarrier when the elemnt is not an object?

4. Added Address+JSWhyMagic variant for branchTestMagic

5. Added Address variant for branchValueIsNurseryObject

Green on try run:
Attachment #8719360 - Attachment is obsolete: true
Attachment #8726166 - Flags: review?(jdemooij)
>  MOZ_ASSERT(resultPairObj->getDenseInitializedLength() >= 2);
forgot to update the patch.

I meant following
    MOZ_ASSERT(resultPairObj->getDenseInitializedLength() == 2);
Comment on attachment 8726166 [details] [diff] [review]
Inline _GetNextMapEntryForIterator intrinsic.

Review of attachment 8726166 [details] [diff] [review]:

Thanks, looks great! r=me with comments below addressed.

::: js/src/builtin/MapObject.cpp
@@ +233,5 @@
>      return false;
>  }
> +/* static */ bool
> +MapIteratorObject::createResultPair(JSContext* cx, MutableHandleObject result)

Nit: remove the outparam and return JSObject* (nullptr on failure)

@@ +249,5 @@
> +    resultPairObj->setDenseInitializedLength(2);
> +    resultPairObj->initDenseElement(0, NullValue());
> +    resultPairObj->initDenseElement(1, NullValue());
> +
> +    AddTypePropertyId(cx, resultPairObj, JSID_VOID, TypeSet::UnknownType());

Nit: Please add a brief comment before this line: // See comments in MapIteratorObject::next.

::: js/src/jit/CodeGenerator.cpp
@@ +5752,5 @@
> +        masm.storeValue(keyAddress, Address(result, elementsOffset), temp);
> +        masm.storeValue(valueAddress, Address(result, elementsOffset + sizeof(Value)), temp);
> +
> +        OutOfLineCode* ool = oolCallVM(MapIteratorResultPairPostBarrierInfo, lir, ArgList(result),
> +                                       StoreNothing());

Hm usually the callVM has to be the last thing we do for a LIR instruction. Also, this function can't GC so it seems more efficient to do a callWithABI to PostWriteBarrier. Then you also don't need to add MapIteratorResultPairPostBarrier.

Maybe we can share some code with LPostWriteBarrier, so we can just call callPostWriteBarrier(..)?

(It probably doesn't have to be an OOL path here, you could emit it here and then jump to or over it.)

::: js/src/jit/x86/MacroAssembler-x86.h
@@ +149,5 @@
>      }
>      void storeValue(ValueOperand val, BaseIndex dest) {
>          storeValue(val, Operand(dest));
>      }
> +    void storeValue(const Address& src, const Address& dest, Register temp) {

Nit: add these asserts:

MOZ_ASSERT(src.base != temp);
MOZ_ASSERT(dest.base != temp);
Attachment #8726166 - Flags: review?(jdemooij) → review+
Can PostWriteBarrier be used here?
I thought I should call PostWriteElementBarrier for 0th and 1st elements of resultPair object.
Calling PostWriteBarrier with resultPair object does same thing?
Flags: needinfo?(jdemooij)
Sorry, I re-read the implementation and noticed that they're same thing, as I'm going to call PostWriteElementBarrier for whole elements.
Flags: needinfo?(jdemooij)
opened to track regression from unboxed array.
I'm experiencing strange crash only on windows.
might take some time to investigate
So, the crash was caused by |offsetof(Range, ht)|, where ht is a reference member.
just changing |OrderedHashTable& ht;| to |OrderedHashTable* ht;| would be a simple fix for this.
Attachment #8731916 - Flags: review?(sphink)
Comment on attachment 8731916 [details] [diff] [review]
Part 0: Change OrderedHashTable::Range::ht member from a reference to a pointer to use offsetof.

Review of attachment 8731916 [details] [diff] [review]:

::: js/src/ds/OrderedHashTable.h
@@ +283,5 @@
>      class Range
>      {
>          friend class OrderedHashTable;
> +        OrderedHashTable* ht;

Can you add a comment // Cannot be a reference since we need to be able to do offsetof(Range, ht)
Attachment #8731916 - Flags: review?(sphink) → review+
Bug 1248289 - Part 0: Change OrderedHashTable::Range::ht member from a reference to a pointer to use offsetof. r=sfink
Bug 1248289 - Part 1: Inline _GetNextMapEntryForIterator intrinsic. r=jandem
For what it's worth, C++11 [support.types]p4:

The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard. If type is not a standard-layout class (Clause 9), the results are undefined.

And [class]p7:

A standard-layout class is a class that:
— has no non-static data members of type non-standard-layout class (or array of such types) or reference,

So not only is MSVC permitted to do whatever it does for this particular offsetof, but it *could* have done arbitrarily stupid things for all the other offsetofs in this class, because of this single reference field.

Moral of the story: don't use references in code where offsetof is going to be used.  (Which basically means don't use references in anything the JITs might touch.)
Thank you for the info :D
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Depends on: 1264823
You need to log in before you can comment on or make changes to this bug.