BaselineJIT: Add IC stub for String_split

RESOLVED FIXED in Firefox 56

Status

()

enhancement
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: djvj, Assigned: djvj)

Tracking

(Blocks 2 bugs)

unspecified
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [qf:p1])

Attachments

(3 attachments, 14 obsolete attachments)

50.10 KB, patch
tcampbell
: review+
Details | Diff | Splinter Review
13.27 KB, patch
djvj
: review+
Details | Diff | Splinter Review
4.24 KB, patch
djvj
: review+
Details | Diff | Splinter Review
Assignee

Description

2 years ago
This is another top hit on the tick counts when looking at time spent in builtins.

The String_split function is self-hosted, and fast-paths the |String.split(String)| case to the |StringSplitString| intrinsic.

The intrinsic has the usual value-unboxing and rooting overheads, but also spends a bunch of time identifying the right ObjectGroup to use for the resulting array, which involves a lookup on the script using a pc.  This can all be static-ized at stub-creation time, and passed in.

The kernel function for this fast-path already exists:
str_split_string(cx, group, string, sep, limit)

This can be easily be directly invoked from the baseline IC stub, skipping a bunch of overhead.
Assignee

Comment 1

2 years ago
Posted patch optimize-string-split.patch (obsolete) — Splinter Review
saving WIP patch.  I have the optimization criteria written out, and the basic CacheIR infrastructure started.
Assignee: nobody → kvijayan
Whiteboard: [qf] → [qf:p1]
Assignee

Comment 2

2 years ago
Posted patch optimize-string-split.patch (obsolete) — Splinter Review
Updated patch that should do all the things.  Compiles.  Gonna test then put up for review after fixing any (inevitable) bugs I find.
Attachment #8871458 - Attachment is obsolete: true
Assignee

Comment 3

2 years ago
Posted patch optimize-string-split.patch (obsolete) — Splinter Review
Updated patch that builds and seems to run fine in shell.  Still needs testing in try and in browser.  Need perf numbers as well.
Attachment #8872459 - Attachment is obsolete: true
Assignee

Comment 4

2 years ago
This patch seems to inhibit IonMonkey optimization of the call site.  Presumably because IonMonkey doesn't get type-information for StringSPlitString since it's never used (as it's bypassed by Baseline entirely), and probably never collects type-info.

So I updated the patch last night to just have baseline optimize the StringSplitString intrinsic, instead of the wrapping self-hosted function.  The problem of no-ion-optimization still shows up.  Looking at the MCallOptimization for IntrinsicStringSplitString, the type-info for the input arguments is not resolving to MIRType_String, and instead staying MIRType_Value.

Not sure why this is happening, but it's probably some weird interaction.
Assignee

Comment 5

2 years ago
Updated patch to optimize just the StringSplitString intrinsic in baseline.
Assignee

Comment 6

2 years ago
Figured out the issue.  Ion was using the inspected baseline stubs to do its MCallOptimize for StringSplit and constant-input StringSplit.  Since my updates removed its usage, it was failing.

The following steps fix the issue:

1. Update the baseline code to keep attaching the "ConstStringSplit" stub if possible.
2. Update the ion MCallOptimize code's non-const StringSplit optimization to not rely on the baseline stub existing, and instead just use the types of the inputs and callee function, as well as directly querying the objectgroup.

Patches incoming.
Assignee

Comment 7

2 years ago
Baseline string-split optimization patch.

Renames the old StringSplit stub to ConstStringSplit.  Adds a new CacheIR-based string splitting stub for non-const split optimization.
Attachment #8872767 - Attachment is obsolete: true
Attachment #8872987 - Attachment is obsolete: true
Assignee

Comment 8

2 years ago
Ion patch to optimize non-const StringSplit directly instead of relying on an ICCall_Native stub existing on the baseline IC chain.
Assignee

Comment 9

2 years ago
This patch adds a somewhat faster path for splitting strings when the splitter is a single char, and there is no limit on the size of the array.  This is by far the most common usage for the string split code.
Assignee

Comment 10

2 years ago
Naveed suggested using a single ObjectGroup for the results of StringSplitString.  This would allow us to make the group lookup constant across the interpeter, baseline fallback stub, and ion compilation.

Right now, we look up the ObjectGroup based on call-site, but the call-site for StringSplit is always one of 3 locations, all within self-hosted functions.  A hashtable lookup for each of these, and a different array ObjectGroup for each of them, doesn't really make sense.
Assignee

Comment 11

2 years ago
Updated patch to optimize non-const string split in baseline.
Attachment #8873506 - Attachment is obsolete: true
Assignee

Comment 12

2 years ago
Posted patch 2-update-ionmonkey.patch (obsolete) — Splinter Review
Updated patch to optimize non-const string splits in ion.
Attachment #8873507 - Attachment is obsolete: true
Assignee

Comment 13

2 years ago
Updated patch to fast-path single-char separator splitting in the VM.
Attachment #8873552 - Attachment is obsolete: true
Assignee

Comment 14

2 years ago
Patch to use a single ObjectGroup for all results of StringSplitString.
Assignee

Comment 15

2 years ago
With all of these patches together, I see about a 40% speedup in a string-split microbench (when running the benchmark with all jits enabled, not just baseline).
Assignee

Comment 17

2 years ago
Latest try run for this seems good.  Putting up for review.
Attachment #8873890 - Attachment is obsolete: true
Attachment #8874582 - Flags: review?(tcampbell)
Comment on attachment 8874582 [details] [diff] [review]
1-update-baseline-string-split.patch

Review of attachment 8874582 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great

::: js/src/jit/BaselineCacheIRCompiler.cpp
@@ +1832,5 @@
> +BaselineCacheIRCompiler::emitLoadStackValue()
> +{
> +    ValueOperand val = allocator.defineValueRegister(masm, reader.valOperandId());
> +    uint32_t idx = reader.uint32Immediate();
> +    Address addr(masm.getStackPointer(), allocator.stackPushed() + sizeof(void*) + idx*sizeof(Value));

CacheRegisterAllocator::addressOf

@@ +1944,5 @@
> +      case CacheKind::Call:
> +        MOZ_ASSERT(numInputs == 1);
> +        allocator.initInputLocation(0, R0.scratchReg(), JSVAL_TYPE_INT32);
> +#if defined(JS_NUNBOX32)
> +        // availableGeneralRegs can't know that GetName/BindName is only using

Nit: s|GetName/BindName|Call|

::: js/src/jit/BaselineIC.cpp
@@ +2422,4 @@
>  
>      // Try attaching a call stub.
>      bool handled = false;
> +    if (!optimizeAfterCall) {

MOZ_ASSERT(optStrategy == OptStrategy::None);

@@ +2472,5 @@
>          return false;
>  
> +    if (optimizeAfterCall && !handled && optStrategy != CallIRGenerator::OptStrategy::None) {
> +        if (gen.tryAttachStub()) {
> +            ICStub* newStub = AttachBaselineCacheIRStub(cx, gen.writerRef(), gen.cacheKind(),

Maybe a comment why this is after the operation? You gave me a reason but I've since forgotten.

::: js/src/jit/CacheIR.cpp
@@ +12,5 @@
>  #include "jit/BaselineIC.h"
>  #include "jit/CacheIRSpewer.h"
>  #include "jit/IonCaches.h"
>  
> +#include "vm/SelfHosting.h"

Should this be a few lines down?

::: js/src/jit/CacheIR.h
@@ +384,5 @@
>          MOZ_ASSERT(nextInstructionId_ > 0);
>          operandLastUsed_[opId.id()] = nextInstructionId_ - 1;
>      }
>  
> +    void writeStringLiteral(const char* str) {

My preference would be to just writePointer() since only use is debug tooling.

@@ +391,5 @@
> +            buffer_.writeByte((strWord >> (i * 8)) & 0xFF);
> +        }
> +    }
> +    void writeInt32Immediate(int32_t i32) {
> +        buffer_.writeFixedUint32_t(uint32_t(i32));

I wonder if we should just use writeUnsigned (compressed form). The raw bytes of CacheIR are used as a lookup key.

@@ +399,5 @@
> +    }
> +    void writePointer(void* ptr) {
> +        uintptr_t ptrWord = reinterpret_cast<uintptr_t>(ptr);
> +        for (unsigned i = 0; i < sizeof(uintptr_t); i++) {
> +            buffer_.writeByte((ptrWord >> (i*8)) & 0xFF);

Perhaps add to CompartBuffer so serdes code is in one place.

@@ +531,5 @@
>          buffer_.writeByte(uint32_t(kind));
>      }
> +    void guardIsNativeFunction(ObjOperandId obj, JSNative nativeFunc) {
> +        writeOpWithOperandId(CacheOp::GuardIsNativeFunction, obj);
> +        writePointer(reinterpret_cast<void*>(nativeFunc));

JS_FUNC_TO_DATA_PTR

@@ +964,5 @@
> +    void callPrintString(const char* str) {
> +        writeOp(CacheOp::CallPrintString);
> +        writeStringLiteral(str);
> +    }
> +    void breakpoint() {

Good idea.

@@ +1406,5 @@
> +    HandleValue callee_;
> +    HandleValue thisval_;
> +    HandleValueArray args_;
> +
> +    bool hasCachedStrategy_;

mozilla::Maybe?

::: js/src/jit/CacheIRCompiler.cpp
@@ +1415,5 @@
> +    const Class* clasp = &JSFunction::class_;
> +    masm.branchTestObjClass(Assembler::NotEqual, obj, scratch, clasp, failure->label());
> +
> +    // Ensure function native matches.
> +    masm.cmpPtr(Address(obj, JSFunction::offsetOfNativeOrScript()), ImmPtr(nativeFunc));

masm.branchPtr
Attachment #8874582 - Flags: review?(tcampbell) → review+
Assignee

Comment 19

2 years ago
Ok, so the second patch to update Ion to use groups was failing in try.  My patch updated the MStringSplit instruction to not take a template object, and instead use a computed group directly.  This interfered with the Recovery instruction for MStringSplit which expected the template object as an operand, so it could extract the group from it to pass to the str_split_string vm method.

MIR doesn't have good support for encoding ObjectGroups as constants, either, so it seemed unwieldy to just add the ObjectGroup as an MConstant operand to the MStringSplit instruction.

One easy way to resolve this is to pull forward the unify-string-split-groups patch, which changes the result of all |StringSplitString| calls to the same object group (cached on ObjectGroupCompartment).

So I merged the two patches and fixed up some things and jit-tests seems to be running through locally.
Assignee

Comment 20

2 years ago
Posted patch 2-update-inmonkey.patch (obsolete) — Splinter Review
Try run for new update-ion patch looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f2f13a77569a6e5d7f6ae28e4921a3a173179f6

Putting up for review.
Attachment #8873892 - Attachment is obsolete: true
Attachment #8873894 - Attachment is obsolete: true
Attachment #8875456 - Flags: review?(tcampbell)
Assignee

Comment 21

2 years ago
Updated patch for VM fast-path for string splitting using a single-char separator.  After some GC-related issues, and correctness issues that came up with the previous version, the new one seems to be passing reftests and green on try.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=19021365d9e8e087302ccfd91aa7bdf14595c467
Attachment #8873893 - Attachment is obsolete: true
Assignee

Updated

2 years ago
Attachment #8876239 - Flags: review?(jorendorff)
Comment on attachment 8875456 [details] [diff] [review]
2-update-inmonkey.patch

Review of attachment 8875456 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Add a few comments about where little fragments came from.

::: js/src/jit/MIR.h
@@ +7500,3 @@
>      public MixPolicy<StringPolicy<0>, StringPolicy<1> >::Data
>  {
> +    CompilerObjectGroup group_;

Bug 1301343 requires you to implement MStringSplit::appendRoots()

::: js/src/vm/ObjectGroup.cpp
@@ +1761,5 @@
> +    group = makeGroup(cx, clasp, tagged, /* initialFlags = */ 0);
> +    if (!group)
> +        return nullptr;
> +
> +    if (cx->options().unboxedArrays()) {

Can we have a comment referencing ObjectGroup::allocationSiteGroup or common up the code? The motivation is unclear in this context otherwise.

::: js/src/vm/ObjectGroup.h
@@ +622,5 @@
>  
>      // Table for referencing types of objects keyed to an allocation site.
>      AllocationSiteTable* allocationSiteTable;
>  
> +    // A single per-compartment ObjectGroup for all calls to StringSplitString.

Maybe clarify why? StringSplitString only used by self-hosted

@@ +623,5 @@
>      // Table for referencing types of objects keyed to an allocation site.
>      AllocationSiteTable* allocationSiteTable;
>  
> +    // A single per-compartment ObjectGroup for all calls to StringSplitString.
> +    ReadBarrieredObjectGroup stringSplitStringGroup;

In future would be nice to generalize this a bit for other cases. Maybe use something like mozilla::EnumeratedArray to bundle up different special groups.
Attachment #8875456 - Flags: review?(tcampbell) → review+
Assignee

Comment 23

2 years ago
(In reply to Ted Campbell [:tcampbell] from comment #18)
> Comment on attachment 8874582 [details] [diff] [review]
> 1-update-baseline-string-split.patch
> ::: js/src/jit/BaselineIC.cpp
> @@ +2422,4 @@
> >  
> >      // Try attaching a call stub.
> >      bool handled = false;
> > +    if (!optimizeAfterCall) {
> 
> MOZ_ASSERT(optStrategy == OptStrategy::None);

I would prefer not to add this assert.  It would technically be true now, but become not true once we add other Cache-IR paths for pre-call optimized stubs.  Assertions should be about logical invariants, and this feels to particular to circumstance.


> Maybe a comment why this is after the operation? You gave me a reason but
> I've since forgotten.

I added a simple comment.


> Should this be a few lines down?

This satisfies check_spidermonkey_style.py, and I got it to do so by following its directions.  I'm very wary of that script, and I do just enough to make it not shout at me.  Should it be lower?  I don't know.  What's the reasoning for ordering header includes?
Attachment #8874582 - Attachment is obsolete: true
Ah, you are correct about #includes. It should be above the inlines https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style#Header_files . (I'd move it above the empty line though). r+
Attachment #8876285 - Flags: review+
Assignee

Comment 25

2 years ago
Updated ionmonkey patch with review comments addressed.
Attachment #8875456 - Attachment is obsolete: true
Assignee

Comment 26

2 years ago
Comment on attachment 8876288 [details] [diff] [review]
2-update-inmonkey.patch

Forwarding r+ from earlier review.
Attachment #8876288 - Flags: review+
Comment on attachment 8876239 [details] [diff] [review]
3-fastpath-string-split-single-char.patch

Review of attachment 8876239 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! Just some drive-by comments :)

::: js/src/jsstr.cpp
@@ +2957,5 @@
> +{
> +    // Count the number of occurrences of patCh within text.
> +    uint32_t count = 0;
> +    for (const TextChar* cur = text; cur < text + textLen; cur++) {
> +        if (static_cast<char16_t>(*cur) == patCh)

Consider using FirstCharMatcherUnrolled and maybe FirstCharMatcher8bit here and below.

@@ +2973,5 @@
> +    // Step 13.
> +    size_t index = 0;
> +
> +    // Step 14.
> +    while (index != textLen) {

Maybe add a fast path here for count == 0?

@@ +2975,5 @@
> +
> +    // Step 14.
> +    while (index != textLen) {
> +        MOZ_ASSERT(lastEndIndex <= index);
> +        if (static_cast<char16_t>(text[index]) == patCh) {

What is perf like if we go through the string just once? If we're worried about realloc cost for splits.append we could overallocate.

@@ +3006,5 @@
> +        RootedValue strValue(cx, StringValue(cx->runtime()->emptyString));
> +        return NewCopiedArrayTryUseGroup(cx, group, &strValue.get(), 1);
> +    }
> +
> +    AutoStableStringChars linearChars(cx);

We should avoid AutoStableStringChars for fast paths like this, it's pretty slow because it has to copy inline strings and flattens external strings. What is perf like for a string of length 20 or so?

@@ +3013,5 @@
> +
> +    if (linearChars.isLatin1()) {
> +        return SplitSingleCharHelper(cx, str, linearChars.latin1Chars(), strLength, ch, group);
> +    } else {
> +        return SplitSingleCharHelper(cx, str, linearChars.twoByteChars(), strLength, ch, group);

Nit: no else after return
Comment on attachment 8876239 [details] [diff] [review]
3-fastpath-string-split-single-char.patch

Review of attachment 8876239 [details] [diff] [review]:
-----------------------------------------------------------------

Fun review! Thanks. r=me with comments addressed.

::: js/src/jsstr.cpp
@@ +2963,5 @@
> +    }
> +
> +    // Step 3 (reordered).
> +    AutoValueVector splits(cx);
> +    if (!splits.reserve(count))

count + 1, right?

@@ +2974,5 @@
> +    size_t index = 0;
> +
> +    // Step 14.
> +    while (index != textLen) {
> +        MOZ_ASSERT(lastEndIndex <= index);

This assertion seems overly obvious, but ok. Consider changing the loop to read:

    // Steps 13-14.
    for (size_t index = 0; index < textLen; index++) {
        ...
    }

Or if you want extra wonky code, you could change the condition to `splits.length() < count`... which then only needs to be checked on first entering the loop and when appending, and saves rescanning the last field of str. (Seems like a long shot whether that'll save measurably more time than you're already saving, but who knows, I wouldn't have guessed this was a bottleneck to begin with.)

@@ +2980,5 @@
> +            size_t subLength = size_t(index - lastEndIndex);
> +            JSString* sub = NewDependentString(cx, str, lastEndIndex, subLength);
> +            if (!sub || !splits.append(StringValue(sub)))
> +                return nullptr;
> +            lastEndIndex = index+1;

Style nit: spaces around `+`.

@@ +3001,5 @@
> +
> +    // Step 12.
> +    size_t strLength = str->length();
> +    if (strLength == 0) {
> +        // Splitting a zero-length string with a single-char yields [""]

Unless this is worth optimizing for, or the helper doesn't handle it correctly somehow, please delete this special case.
Attachment #8876239 - Flags: review?(jorendorff) → review+

Comment 29

2 years ago
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a9a6334ee2e
Add CacheIR stub for String_split. r=tcampbell

Comment 30

2 years ago
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc6159614e01
Unify StringSplitString ObjectGroup and fix Ion MCallOptimize. r=tcampbell
Assignee

Updated

2 years ago
Keywords: leave-open
Assignee

Comment 31

2 years ago
(In reply to Jan de Mooij [:jandem] from comment #27)
> Comment on attachment 8876239 [details] [diff] [review]
> 3-fastpath-string-split-single-char.patch
> 
> Review of attachment 8876239 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice! Just some drive-by comments :)
> 
> ::: js/src/jsstr.cpp
> @@ +2957,5 @@
> > +{
> > +    // Count the number of occurrences of patCh within text.
> > +    uint32_t count = 0;
> > +    for (const TextChar* cur = text; cur < text + textLen; cur++) {
> > +        if (static_cast<char16_t>(*cur) == patCh)
> 
> Consider using FirstCharMatcherUnrolled and maybe FirstCharMatcher8bit here
> and below.

I tried modifying the algorithm to use those and it was a tad slower.  I'm not sure why exactly, as I expect the unrolled loop to be fast, but the simple loop seems to do well even after I modifed my micro-bench to run through long strings.

> Maybe add a fast path here for count == 0?

Done.

> What is perf like if we go through the string just once? If we're worried
> about realloc cost for splits.append we could overallocate.

Tried skipping it, seems to make things slightly slower.  Also tried only doing pre-allocation for long input strings, but it still seemed worse.

> We should avoid AutoStableStringChars for fast paths like this, it's pretty
> slow because it has to copy inline strings and flattens external strings.
> What is perf like for a string of length 20 or so?

Perf is still improved compared to not having it.  The max inline string length is 16 bytes on 64-bit systems, that's not much to copy - two pointers worth.  I had a variant of the patch that used non-gc allocations to fallibly try the split, but switched to this because it seemed better.
(In reply to Kannan Vijayan [:djvj] from comment #31)
> Perf is still improved compared to not having it.  The max inline string
> length is 16 bytes on 64-bit systems, that's not much to copy - two pointers
> worth.

Fat inline strings store 24 bytes, but sure maybe it's okay. We can revisit if it shows up in profiles. Thanks for checking!
Assignee

Comment 34

2 years ago
(In reply to Jason Orendorff [:jorendorff] from comment #28)
> Fun review! Thanks. r=me with comments addressed.
> 
> ::: js/src/jsstr.cpp
> @@ +2963,5 @@
> > +    }
> > +
> > +    // Step 3 (reordered).
> > +    AutoValueVector splits(cx);
> > +    if (!splits.reserve(count))
> 
> count + 1, right?

Yep. Done.

> 
> @@ +2974,5 @@
> > +    size_t index = 0;
> > +
> > +    // Step 14.
> > +    while (index != textLen) {
> > +        MOZ_ASSERT(lastEndIndex <= index);
> 
> This assertion seems overly obvious, but ok. Consider changing the loop to
> read:
> 
>     // Steps 13-14.
>     for (size_t index = 0; index < textLen; index++) {
>         ...
>     }

Assertion removed and done.

> Or if you want extra wonky code, you could change the condition to
> `splits.length() < count`... which then only needs to be checked on first
> entering the loop and when appending, and saves rescanning the last field of
> str. (Seems like a long shot whether that'll save measurably more time than
> you're already saving, but who knows, I wouldn't have guessed this was a
> bottleneck to begin with.)

Feels too clever for my tastes.  The loop with some extra cleanup afterward is clunky, but more plain about its objective, IMHO.  I'd like to leave it as is, if you don't mind.

> Unless this is worth optimizing for, or the helper doesn't handle it
> correctly somehow, please delete this special case.

Yeah, removed.  The inner procedure handles the case fine, especially with zero-count opt added as suggested by jandem.
Assignee

Comment 35

2 years ago
Updated string split vm fastpath patch with review comments addressed.  Try run is here, looking reasonably green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa4dec4031bbe27d7030bd66836d45583a875e61
Attachment #8876239 - Attachment is obsolete: true
Attachment #8878584 - Flags: review+

Comment 36

2 years ago
Pushed by kvijayan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c291abc4ae2c
Optimize single-char splitting in VM. r=jorendorff.
Assignee

Updated

2 years ago
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Updated

2 years ago
Keywords: leave-open
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.