Optimize array_join

RESOLVED FIXED in Firefox 57



JavaScript Engine
29 days ago
8 days ago


(Reporter: djvj, Assigned: djvj)


(Blocks: 1 bug)


Firefox Tracking Flags

(firefox57 fixed)


(Whiteboard: [qf:p1])


(2 attachments, 3 obsolete attachments)



29 days ago
From the latest measurements after StringSplitString and Array_push optimizations (see bug 1365361 comment 8), array_join is leading the pack with 6x more time spent than the next builtin (RegExpMatcher).  Anecdotally, ehsan says array_join spends 22x more time than RegExpMatcher.

Ehsan noted that in his profiles, array_join was spending a lot of time calling in to |ValueToStringBufferSlow|.
Also note bug 1377343.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #1)
> Also note bug 1377343.

Yeah I think a large portion of this is that bug and bug 1376948. In that case the time is not in join itself but in the ToPrimitive code under it, there are a number of ways to optimize that.
(In reply to Jan de Mooij [:jandem] from comment #2)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #1)
> > Also note bug 1377343.
> Yeah I think a large portion of this is that bug and bug 1376948. In that
> case the time is not in join itself but in the ToPrimitive code under it,
> there are a number of ways to optimize that.

A significant amount of time ends up being spent in the prototype chain search in MaybeHasInterestingSymbolProperty() <https://searchfox.org/mozilla-central/rev/3a3af33f513071ea829debdfbc628caebcdf6996/js/src/jsobjinlines.h#274>.
Whiteboard: [qf] → [qf:p1]

Comment 4

23 days ago
Did some quick measurements of how we call array_join, using the only hammer I know: printf instrumentation ;)

Looking at speedometer, 67% of calls to array_join being arrays of pure strings.  48% of calls to array_push are arrays of strings of length <= 1.

Looking at general browsing, 93% of calls to array_join are with arrays of pure strings.  81% of calls to array_join are with arrays of strings of length <= 3.

First obvious step here is just to fastpath array_push when it's on a small array (<= 10 elements) of strings.  Basic strategy:

Stage 1:
Add a helper fastpath that checks for plain arrays of length <= 10, consisting only of strings.
Helper function scans array, confirms elements are all strings, and sum up their lengths while scanning.  Then we allocate a single result string and fill it.

Stage 2:
Add baseline call optimization that calls out to fastpath helper function.

Comment 5

23 days ago
Implemented the vm-only fastpaths yesterday and they yielded marginal improvements.  A couple percent or so total on a microbench specifically targeted at the optimization.  Most of the work being done in the join is result string allocation and copying the actual chardata, so there's not much to be gained from fastpathing the control logic around that.

We're skipping straight to stage 2: optimize this in baseline, and then Ion.

Ion handles array_join in MCallOptimize, but the codegen for the ArrayJoin IR instruction is just a vmcall into the array_join vmfunction.  This is the best option when strings are actually allocated, but not the best option in the high percentage cases above where the array is empty or contains a single string.  Eliminating the VMCall overhead for these would be a significant win, and requires no allocation by the VM.

We can optimize this in baseline and store information about whether we see 0-length or singleton-string arrays at the site.  Ion can use that as a hint to emit code for ArrayJoin that checks for empty or singleton-string arrays and handles them without vmcall.

Comment 6

22 days ago
Created attachment 8891076 [details] [diff] [review]

Prelim patch for optimizing join on empty and length-1 arrays in baseline.  debug build is passing reftests (90% complete so far, still running) on my box.

Comment 7

18 days ago
Created attachment 8892071 [details] [diff] [review]

This patch adds baseline stubs to optimize |array.join| for empty and singleton element arrays.  A microbench of joins on those types of arrays shows a speedup of 100x on the optimized cases.

Additional instrumented runs of general browsing and speedometer show the following:

In speedometer, about 45% of all calls to Array.join are with empty or one-element arrays.  In general browsing across top-tier sites, more than 80% of all calls to Array.join are with empty or one-element arrays.

Additional work: this check should be extended to Ion as well - and would yield significant benefits.
Attachment #8891076 - Attachment is obsolete: true
Attachment #8892071 - Flags: review?(tcampbell)

Comment 8

18 days ago
Try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d32477040d3e20501dc3d50b1a850489bd21539

Overall should save us around 400ms (on my Ryzen 1600) across a full run of speedometer v2.

Comment 9

18 days ago
Just measured the calls to array.join from Ion in Speedometer2, and discovered that it is much more favourable to optimization: more than 95% of calls to Array.join that happen from within Ion are for arrays of length 0 and 1.

Google docs scrolling on page uses large numbers of Joins on arrays of length 1.  Otherwise in general browsing, array.join gets exercised less often overall across all sites, but shows up heavily on some sites.  Google docs hits it hard.

Overall ratio of 0-and-1-length arrays is 50%.

Fixing this in ion (handling 0 and 1 length cases) should save us about ~630k VMCalls on a single run of speedometer v2, and also 50% of all calls to Array.join during general browsing.

Comment 10

18 days ago
New try run with correct try syntax :)


Comment 11

18 days ago
Created attachment 8892114 [details] [diff] [review]

Preliminary ion optimization patch.

Comment 12

18 days ago
Created attachment 8892167 [details] [diff] [review]

Updated ion optimization patch.  This drops cost of joins about 66% on my targeted microbench.  It's the ion corollary to the baseline optimization.
Attachment #8892114 - Attachment is obsolete: true

Comment 13

18 days ago
Comment on attachment 8892167 [details] [diff] [review]

This looks mostly OK in try.  The scary oranges are not related or systemic it seems.  Probably good for review.

Attachment #8892167 - Flags: review?(tcampbell)
Perhaps I'm off-base, but it is my understanding that |argc| is a known quantity on the caller side. Therefore your tryAttachArrayJoin code should statically know if it is 0-args or 1-args (or unsupported). As a result, you should be able to use LoadStackValue instead of LoadStackValueDynamic. You would have two different stubs generated, but it seems better than what you have now.
Flags: needinfo?(kvijayan)

Comment 15

16 days ago
Technically, array_join with argc > 1 is fine.  Only the first arg is looked at, and it has to be a string.  The current approach catches all calls regardless of argc.

If we specialize on argc, we generate a different stub for each argc value.  This should only lead to the creation of 2 stubs most of the time, but I dislike leaving hidden cliffs around.  We'll come around a year from now and for some inscrutable reason there's some super-abstracted framework code that calls array.join() with all sorts of different argcounts.

I mean, I'm not married to this approach, but it seems like a reasonable goal to want to avoid that cliff.
Flags: needinfo?(kvijayan)
Comment on attachment 8892167 [details] [diff] [review]

Review of attachment 8892167 [details] [diff] [review]:

Nice and straight-forward :)

::: js/src/jit/CodeGenerator.cpp
@@ +9275,5 @@
> +
> +        // Check for length == 0
> +        Label notEmpty;
> +        masm.branch32(Assembler::NotEqual, length, Imm32(0), &notEmpty);
> +        const JSAtomState& names = GetJitContext()->runtime->names();

In Bug 1386646, we are changing these to |gen->runtime->names()|

@@ +9292,5 @@
> +        // At this point, 'output' can be used as a scratch register, since we're
> +        // guaranteed to succeed.
> +        Register str = masm.extractString(elem0, output);
> +        if (str != output)
> +            masm.movePtr(str, output);

|masm.unboxString(elem0, output);| should be sufficient

::: js/src/jit/shared/LIR-shared.h
@@ +6162,5 @@
>      const MArrayJoin* mir() const {
>          return mir_->toArrayJoin();
>      }
> +    const LDefinition* output() {
> +        return this->getDef(0);

Nit: drop the |this->| since the accessors below don't use it
Attachment #8892167 - Flags: review?(tcampbell) → review+
Comment on attachment 8892071 [details] [diff] [review]

Review of attachment 8892071 [details] [diff] [review]:

I'm not sold on your argument for dynamic argc and would prefer avoiding it. It is fixed per-callsite and we aren't giving this special argc treatment to any other call fast paths. If anything, I think your concerns of weird frameworks and cliffs is more relevant for spreadcalls where extra arguments wouldn't be surprising. Making spreadcalls better in baseline is a whole other problem.

We should probably plumb |constructing_| into CallIRGenerator and check if we don't support it. Code currently works, but I'd rather we by more explicit than just checking |thisv.isObject|.

Do we need new testcases? (eg, does jit-test still pass if you comment out individual guards?)

::: js/src/jit/BaselineCacheIRCompiler.cpp
@@ +2115,5 @@
> +    //  Callee
> +    //  ThisValue
> +    //  Arg0  |
> +    //  ...   |-- args.
> +    //  ArgN  | <-- Top of stack.

If we were to use this helper under JSOP_NEW, there would be a |new.target| on stack. At very least need a comment here and at loadStackValueDynamic special handling needed if we may be constructing.

::: js/src/jit/CacheIR.cpp
@@ +3898,5 @@
>  }
>  bool
> +CallIRGenerator::tryAttachArrayJoin()
> +{

Let's add constructing_ to CallIRGenerator and then assert it is false here and don't try attaching for constructing. Your check for thisval_.isObject() currently prevents problems, but seems like a landmine since thisval.isObject doesn't imply !constructing_ in Ion (template objects).

@@ +3926,5 @@
> +    AutoAssertNoPendingException aanpe(cx_);
> +    Int32OperandId argcId(writer.setInputOperandId(0));
> +
> +    // Ensure argc < 1.
> +    writer.guardSpecificInt32Immediate(argcId, 1, Assembler::LessThanOrEqual);

Based on this, we wouldn't handle argc > 1 anyways.

@@ +3959,5 @@
> +
> +    trackAttached("ArrayJoin");
> +    return true;
> +
> +    return false;


::: js/src/jit/CacheIR.h
@@ +582,4 @@
>          writeOpWithOperandId(CacheOp::GuardSpecificSymbol, sym);
>          addStubField(uintptr_t(expected), StubField::Type::Symbol);
>      }
> +    void guardSpecificInt32Immediate(Int32OperandId operand, int32_t expected,

Good addition

::: js/src/jit/MacroAssembler.h
@@ +1215,4 @@
>      inline void branchTestBoolean(Condition cond, const ValueOperand& value, Label* label)
>          DEFINED_ON(arm, arm64, mips32, mips64, x86_shared);
> +    inline void branchTestString(Condition cond, const Address& address, Label* label) PER_SHARED_ARCH;

Make sure this passes configurations on try. Win32 is failing to build for me.
Attachment #8892071 - Flags: review?(tcampbell)

Comment 18

10 days ago
Created attachment 8895013 [details] [diff] [review]

Updated patch.  Should address issues.
Attachment #8892071 - Attachment is obsolete: true
Attachment #8895013 - Flags: review?(tcampbell)
Comment on attachment 8895013 [details] [diff] [review]

Review of attachment 8895013 [details] [diff] [review]:

Looks good. Make sure to handle the JSOP_CALL_IGNORES_RV case too.

::: js/src/jit/CacheIR.cpp
@@ +3940,5 @@
> +    //  1: ThisValue
> +    //  0: Arg0 [optional] <-- Top of stack.
> +
> +    // Guard callee is the |js::array_join| native function.
> +    uint32_t calleeIndex = (argc_ == 0) ? 1 : 2;


@@ +3978,4 @@
>  CallIRGenerator::tryAttachStub()
>  {
> +    // Only optimize on JSOP_CALL.  No fancy business for now.
> +    if (op_ != JSOP_CALL)

|| JSOP_CALL_IGNORES_RV. Otherwise we regress Array.prototype.push
Attachment #8895013 - Flags: review?(tcampbell) → review+

Comment 20

9 days ago
This looks clean.  Try run is somewhat dirty but it's due to general try issues and other problems at tree tip.


Pushing the baseline patch now.  Adding leave-open tag to allow for landing of ion optimization patch.

Comment 21

9 days ago
Pushed by kvijayan@mozilla.com:
Optimize Array.join in baseline for empty and single-item arrays. r=tcampbell


9 days ago
Keywords: leave-open

Comment 22

9 days ago
Pushed by kvijayan@mozilla.com:
Optimize Array.join in ion for empty and single-item arrays. r=tcampbell
Assignee: nobody → kvijayan
Keywords: leave-open

Comment 23

8 days ago
Last Resolved: 8 days ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.