Closed Bug 977966 Opened 6 years ago Closed 5 years ago

Optimize String#split('foo').join('bar') pattern (IonMonkey part)

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: till, Assigned: vash, Mentored)

References

Details

(Whiteboard: [lang=c++][js:p2])

Attachments

(4 files, 20 obsolete files)

3.55 KB, patch
nbp
: review+
Details | Diff | Splinter Review
25.46 KB, patch
vash
: review+
Details | Diff | Splinter Review
9.11 KB, patch
till
: review+
Details | Diff | Splinter Review
1.97 KB, patch
till
: review+
Details | Diff | Splinter Review
Splitting a string on one string pattern and joining it using another is a very common practice. I wonder if it might be possible to check if both split and join are the original methods, and the argument to split is a string, and optimize the entire pattern to get rid of the temporary array creation entirely.
(In reply to Till Schneidereit [:till] (on vacation March 1st to 23rd) from comment #0)
> check if both split and join are the original methods

Yes, this would be in js/src/jit/MCallOptimize.cpp (we already have MStringSplit, we would need to add MArrayJoin), and then in a later optimization, we can fold the 2 into one MStringReplace instruction.  Note that such optimization will only work with, 

  str.split("foo").join("bar")

but not with

  var x = str.split("foo");
  …
  x.join("bar");

because we might have other instructions capturing the state of x with an MResumePoints, and this kind of optimization cannot be done before Bug 878503.

So, when doing the replacement of these instructions, we would need to check that there is only one MUse of the result of the split instruction.
Whiteboard: [mentor=nbp][lang=c++]
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> Note that such optimization will only work with, 
> 
>   str.split("foo").join("bar")
> 
> but not with
> 
>   var x = str.split("foo");
>   …
>   x.join("bar");
> 
> because we might have other instructions capturing the state of x with an
> MResumePoints, and this kind of optimization cannot be done before Bug
> 878503.
> 
> So, when doing the replacement of these instructions, we would need to check
> that there is only one MUse of the result of the split instruction.

Yeah, that's what I thought. Still, the exact pattern of using split and join chained together is very popular, so this should give some nice wins in real-world code.
Whiteboard: [mentor=nbp][lang=c++] → [mentor=nbp][lang=c++][js:p2]
can you assign me the bug please ?
hi Aditya: please feel free to work on this bug or ask questions. Assigning a bug to a person is just a formality.

I see you have volunteered to fix four other bugs yesterday. I recommend focusing on just one first bug at a time. :)
Hi, I would like to work on this bug but I'm completely new to Mozilla development. I have completed setting up mozilla environment, so can you tell me how to go about it? Thank you.
(In reply to Debkanya Mazumder from comment #5)
> Hi, I would like to work on this bug but I'm completely new to Mozilla
> development. I have completed setting up mozilla environment, so can you
> tell me how to go about it? Thank you.

Hi Debkanya,

I see that you ask to be mentored on old bugs, I suggest you focus on this one for now.

This bug should be easy as soon as you understand how MStringSplit and MStringReplace are working.  I suggest you to work as follow:

 1/ Make a first patch to implement MArrayJoin.
 2/ Make a second patch to replace MStringSplit & MArrayJoin by MStringReplace.
Assignee: general → rimjhim.mazumder17
Status: NEW → ASSIGNED
Hi Nicolas, 

Thank you. I intend to work on this bug for now, and I appreciate all the help since I am completely new. I will start working on the bug and get back as soon as possible.
Hey Debkanya, are you still working on this? If so, can you give us an update on where you are, or, if you're stuck, what's giving you problems? We'd gladly support you with feedback and suggestions.

You can also join us on IRC (https://wiki.mozilla.org/IRC) in the #jsapi channel, where we can discuss things in realtime.
Flags: needinfo?(rimjhim.mazumder17)
I have a doubt which I'd like to be answered. Is it necessary to implement StringReplace using MArrayJoin? 
Can it be implemented using the following procedure? 

Considering all the "foo" in str is to be replaced by "/". 

string str="barfoobarfooulz";
string str2="foo";

found=str.find(str2);  [This returns all the position of "foo" in str] 

str.replace(found,str2.length(),"/"); [replaces "foo" with "/"]  

which gives the result bar/bar/ulz. 

Can this MStringReplace be implemented using the above method?
Flags: needinfo?(rimjhim.mazumder17)
(In reply to Debkanya Mazumder from comment #9)
> I have a doubt which I'd like to be answered. Is it necessary to implement
> StringReplace using MArrayJoin? 
> Can it be implemented using the following procedure? 
> 
> Considering all the "foo" in str is to be replaced by "/". 
> 
> string str="barfoobarfooulz";
> string str2="foo";
> 
> found=str.find(str2);  [This returns all the position of "foo" in str] 
> 
> str.replace(found,str2.length(),"/"); [replaces "foo" with "/"]  
> 
> which gives the result bar/bar/ulz. 
> 
> Can this MStringReplace be implemented using the above method?

Hi Debkanya, you could give it a try if you want, maybe using a self-hosted implementation (i.e. written in JS, see also [1]), but the performance of our RegExp engine is such that I doubt it will be a win and would rather implement the approach suggested by nbp in comment 6. Ideally, we'd try both techniques, compare them when being run on micro-benchmarks (and / or tracked benchmarks, as shown in [2]), and would choose the best among these. In practice, implementing both of them will take much time, while implementing directly the suggested approach will surely be a win.

[1] https://developer.mozilla.org/en-US/docs/SpiderMonkey/Internals/self-hosting
[2] http://arewefastyet.com/
What we want here is a method that is substantially faster than what we have right now. The method you're proposing implements the split().join() pattern in terms of other language features, but, as far as I can tell, in a way that would be faster than what we have now.

Note also that String#replace doesn't work the way your example would need it to work: it takes the string to search for (or a regular expression) as its first argument and the string to replace it with as the second: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace

More importantly, we already have an implementation of MStringReplace, so the task here is to identify the pattern "MStringSplit, immediately followed by MArrayJoin", and completely replace that with MStringReplace. Implementing MArrayJoin is a good first step for that. (And useful in itself, I think.)
Ah, I didn't see bbouvier's comment before posting mine. I still don't think that it makes sense to implement anything else but the technique suggested by nbp.
Okay, thank you!
Is the use of vector header file from standard library allowed when implementing MArrayJoin in MCallOptimize.cpp?
Attachment #8413341 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8413341 [details] [diff] [review]
rev1 - Implemented MArrayJoin in MCallOptimize.CPP.

Review of attachment 8413341 [details] [diff] [review]:
-----------------------------------------------------------------

IonMonkey is using SSA form [1], where the graph is composed of MBasicBlock which contains a list of MDefinitions.  One MDefinition corresponds to one assignment, and also one function.
MArrayJoin is supposed to be the abstract representation of the Array.join JavaScript function:

  var x = arr.join("/");  // In JavaScript

  // In MCallOptimize.cpp
  MDefinition *array = …;
  MDefinition *str = …;
  […]

  MDefinition *join = MArrayJoin::New(alloc(), array, str);
  […]

[1] http://en.wikipedia.org/wiki/Static_single_assignment_form

::: js/src/jit/MCallOptimize.cpp
@@ +1876,5 @@
>      return InliningStatus_Inlined;
>  }
>  
> +class MArrayJoin
> +{

Have a look at other function call optimizations on Arrays, such as MArrayPush [2], which is also applied on an Array and also take one argument.

[2] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/MIR.h?from=MArrayPush#6257
Attachment #8413341 - Flags: review?(nicolas.b.pierron) → review-
Mentor: nicolas.b.pierron
Whiteboard: [mentor=nbp][lang=c++][js:p2] → [lang=c++][js:p2]
Attached patch rev2 - Implement MArrayJoin (obsolete) — Splinter Review
Attachment #8448693 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8448693 [details] [diff] [review]
rev2 - Implement MArrayJoin

Review of attachment 8448693 [details] [diff] [review]:
-----------------------------------------------------------------

I cannot give a review because this patch is incomplete.
There is no code to produce a MArrayJoin (MCallOptimize.cpp), and no code to lower the MIR to a LIR instruction (Lowering.cpp & LIR-Common.h) and no code to generate the code (CodeGenerator.cpp) which needs to be executed for such instruction.
Still this is a good step forward compare to the previous patch :)

::: js/src/jit/LOpcodes.h
@@ +133,5 @@
>      _(ConcatPar)                    \
>      _(CharCodeAt)                   \
>      _(FromCharCode)                 \
>      _(StringSplit)                  \
> +	_(ArrayJoin)                    \

style-nit: avoid tabs.

::: js/src/jit/MCallOptimize.cpp
@@ -1958,5 @@
>  
> -class MArrayJoin
> -{
> -    vector<string> str;
> -    string res;

Do not base your changes on a bad patch.

::: js/src/jit/MIR.h
@@ +4488,5 @@
> +      : MBinaryInstruction(object, sep),
> +        typeObject_(templateObject->type())
> +    {
> +        setResultType(MIRType_String);
> +		setMovable(); 

style-nit: avoid tabs and trailing whitespace.

@@ +4517,5 @@
> +    bool possiblyCalls() const {
> +        return true;
> +    }
> +    AliasSet getAliasSet() const {
> +        return AliasSet::None();

I would be surprised if this instruction does not alias anything.  AliasSet::None() would mean that this instruction does not make any side effect (AliasSet::Store(…)) and that it does not depends on any side effects (AliasSet::Load(…)).  Having an AliasSet::None() would mean that we could hoist MArrayJoin before a function call.

  function mutateArray(arr) {
    arr[0] = "H";
    arr[1] = "e";
  }

  function f(c) {
    var arr = ["B", "a", "l", "l", "o"];
    mutateArray(arr);
    return arr.join(""); // == "Hello"
  }

::: js/src/jit/MOpcodes.h
@@ +74,5 @@
>      _(ConcatPar)                                                            \
>      _(CharCodeAt)                                                           \
>      _(FromCharCode)                                                         \
>      _(StringSplit)                                                          \
> +	_(ArrayJoin)                                                            \

style-nit: avoid tabs.

::: js/src/jit/ParallelSafetyAnalysis.cpp
@@ +169,5 @@
>      SAFE_OP(ConcatPar)
>      UNSAFE_OP(CharCodeAt)
>      UNSAFE_OP(FromCharCode)
>      UNSAFE_OP(StringSplit)
> +	UNSAFE_OP(ArrayJoin)

style-nits: avoid tabs.
Attachment #8448693 - Flags: review?(nicolas.b.pierron) → feedback+
How do I implement the changes without the bad patch?
(In reply to Debkanya Mazumder from comment #19)
> How do I implement the changes without the bad patch?

If you are using mercurial queues, you need to delete the patch [1].

[1] https://developer.mozilla.org/en-US/docs/Mercurial_Queues
Thank you!
Attached patch Implement MArrayJoin. (obsolete) — Splinter Review
Attachment #8459333 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8459333 [details] [diff] [review]
Implement MArrayJoin.

Review of attachment 8459333 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +6091,5 @@
> +{
> +    pushArg(ToRegister(lir->separator()));
> +    pushArg(ToRegister(lir->array()));
> +
> +    return callVM(ArrayJoinInfo, lir);

Can you make tests for this code and add a helper function, such as the arguments types are matching the argument type of the function.

> typedef bool(*ArrayJoinFn)(JSContext *, unsigned, Value *);

  ToRegister(lir->array())  is not supposed to contain an  unsigned.
  ToRegister(lir->separator())  is not supposed to contain an  Value *.

::: js/src/jit/LIR-Common.h
@@ +4223,5 @@
>      }
>  };
>  
> +class LArrayJoin : public LCallInstructionHelper<1, 2, 1>
> +{

Add a TypePolicy to ensure that the arguments are an Object and a String.

::: js/src/jit/Lowering.cpp
@@ +2763,5 @@
> +LIRGenerator::visitArrayJoin(MArrayJoin *ins)
> +{
> +    JS_ASSERT(ins->type() == MIRType_String);
> +    JS_ASSERT(ins->array()->type() == MIRType_Object);
> +    JS_ASSERT(ins->sep()->type() == MIRType_String);

As there is no type policy (see comment above), these assertions do not hold.

  Array.join.call("not an Array", { notAString: true });
Attachment #8459333 - Flags: review?(nicolas.b.pierron)
Attachment #8459333 - Flags: feedback+
Attached patch patch.v2.diff (obsolete) — Splinter Review
Attachment #8459333 - Attachment is obsolete: true
Attachment #8460510 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8460510 [details] [diff] [review]
patch.v2.diff

Review of attachment 8460510 [details] [diff] [review]:
-----------------------------------------------------------------

I am ok, with the Ion part of the patch, but I am not a specialist of the Array part, Waldo might know better.
In Any case, I think it would be better to just provide an _impl function used by Ion, which share the same backend, even if we have to duplicate a bit of the logic to get the length.

::: js/src/jit-test/tests/ion/bug977966.js
@@ +19,5 @@
> +var uceFault_join = eval(uneval(uceFault).replace('uceFault', 'uceFault_join'))
> +function join(i) {
> +    var x = [i, i].join("->");
> +    if (uceFault_join(i) || uceFault_join(i)) {
> +        assertEq(x, "99->99");

You don't need ucefault for these test cases, unless you are making attempts at handling the equivalent recover instruction.

::: js/src/jsarray.cpp
@@ -1044,4 @@
>  {
>      // This method is shared by Array.prototype.join and
>      // Array.prototype.toLocaleString. The steps in ES5 are nearly the same, so
>      // the annotations in this function apply to both toLocaleString and join.

Please, try to keep this comment valid.

@@ -1048,5 @@
>  
>      // Step 1
> -    RootedObject obj(cx, ToObject(cx, args.thisv()));
> -    if (!obj)
> -        return false;

http://www.ecma-international.org/ecma-262/5.1/#sec-15.4.4.5

@@ +1127,5 @@
> +
> +
> +    RootedLinearString sepstr(cx);
> +    if (!Locale && args.hasDefined(0)) {
> +        JSString *s = ToString<CanGC>(cx, args[0]);

This does not have the same semantic as specified in the ecma standard.  The access of GetLengthProperty is observable.

Add a test case which break with the current proposal, where the behaviour of ToString(separator) depends on the call of GetLengthProperty being done before.  Something similar to:

  var lengthWasCalled = false;
  var obj = {"0": "", "1": ""};
  Object.defineProperty(obj, "length", {
    get : function(){ lengthWasCalled = true; return 2; },
    enumerable : true,
    configurable : true
  });

  var res = Array.join.call(obj, { valueOf: function () {
    if (lengthWasCalled)
      return "good";
    else
      return "bad";
  } })

  assertEq(res, "good");
Attachment #8460510 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch patch.v3.diff (obsolete) — Splinter Review
in array_join_impl, sep is well defined while array isn't.
Attachment #8466943 - Flags: feedback?(nicolas.b.pierron)
Attached patch patch.v4.diff (obsolete) — Splinter Review
Attachment #8466950 - Flags: feedback?(nicolas.b.pierron)
Attachment #8466943 - Attachment is obsolete: true
Attachment #8466943 - Flags: feedback?(nicolas.b.pierron)
Attachment #8460510 - Attachment is obsolete: true
Comment on attachment 8466950 [details] [diff] [review]
patch.v4.diff

Review of attachment 8466950 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +6862,5 @@
>  
> +class MArrayJoin
> +    : public MBinaryInstruction,
> +      public ObjectPolicy<0>,
> +      public StringPolicy<1>

Use a MixPolicy.

@@ +6871,5 @@
> +        setResultType(MIRType_String);
> +    }
> +  public:
> +    INSTRUCTION_HEADER(ArrayJoin)
> +    static MArrayJoin *New(TempAllocator &alloc, MDefinition *array, MDefinition *sep)

There this no typePolicy function on the MIR Instruction, add one.  This will highlight the above error also.

::: js/src/jsarray.cpp
@@ +1045,5 @@
>      // This method is shared by Array.prototype.join and
>      // Array.prototype.toLocaleString. The steps in ES5 are nearly the same, so
>      // the annotations in this function apply to both toLocaleString and join.
>  
> +    // Steps 1 to 6, done by called function

comment: should be done by the caller (?)

@@ +1205,5 @@
>      return ArrayJoin<false>(cx, args);
>  }
>  
> +JSString *
> +js::array_join_impl(JSContext *cx, HandleValue array, HandleString sep)

Move this function to VMFunction.cpp.

@@ +1209,5 @@
> +js::array_join_impl(JSContext *cx, HandleValue array, HandleString sep)
> +{
> +    // This method is shared by Array.prototype.join and
> +    // Array.prototype.toLocaleString. The steps in ES5 are nearly the same, so
> +    // the annotations in this function apply to both toLocaleString and join.

This method is not used for Array.prototype.toLocaleString.
Attachment #8466950 - Flags: feedback?(nicolas.b.pierron) → feedback+
Attached patch patch.v5.diff (obsolete) — Splinter Review
Attachment #8466950 - Attachment is obsolete: true
Attachment #8467026 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8467026 [details] [diff] [review]
patch.v5.diff

Review of attachment 8467026 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +2766,5 @@
> +    JS_ASSERT(ins->array()->type() == MIRType_Object);
> +    JS_ASSERT(ins->sep()->type() == MIRType_String);
> +
> +    LArrayJoin *lir = new(alloc()) LArrayJoin(useFixed(ins->array(), CallTempReg1),
> +                                              useFixed(ins->sep(), CallTempReg2));

CallTempReg should no longer be added, use GetTempRegForIntArg instead.

::: js/src/jit/MIR.h
@@ +6887,5 @@
> +    bool possiblyCalls() const {
> +        return true;
> +    }
> +    virtual AliasSet getAliasSet() const {
> +        return AliasSet::None();

This is incorrect.
This instruction is reading the content of an array.
Attachment #8467026 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch patch.v6.diff (obsolete) — Splinter Review
Attachment #8467026 - Attachment is obsolete: true
Attachment #8467062 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8467062 [details] [diff] [review]
patch.v6.diff

Review of attachment 8467062 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work :)
Upload a new version with the fix style issue and let the second part of the bug done ;)

::: js/src/jit/Lowering.cpp
@@ +2765,5 @@
> +    JS_ASSERT(ins->type() == MIRType_String);
> +    JS_ASSERT(ins->array()->type() == MIRType_Object);
> +    JS_ASSERT(ins->sep()->type() == MIRType_String);
> +
> +    LArrayJoin *lir = new(alloc()) LArrayJoin(useRegisterAtStart(ins->array()), useRegisterAtStart(ins->sep()));

style-nit: wrap this line in two (100 columns max).
Attachment #8467062 - Flags: review?(nicolas.b.pierron) → review+
Attached patch patch.v7.diff (obsolete) — Splinter Review
Forgot to change Store to Load in MIR.h. Fixed it with the style fix.
Attachment #8467062 - Attachment is obsolete: true
Attachment #8467080 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8467080 [details] [diff] [review]
patch.v7.diff

Review of attachment 8467080 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +10,5 @@
>  #include "mozilla/Attributes.h"
>  #include "mozilla/DebugOnly.h"
>  #include "mozilla/MathAlgorithms.h"
>  
> +#include "jsarray.h"

nit: this include is no longer needed as jit::ArrayJoin is in VMFunctions.h
Attachment #8467080 - Flags: review?(nicolas.b.pierron) → review+
Attached patch patch.v8.diff (obsolete) — Splinter Review
Attachment #8467080 - Attachment is obsolete: true
Attachment #8467090 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8467090 [details] [diff] [review]
patch.v8.diff

Review of attachment 8467090 [details] [diff] [review]:
-----------------------------------------------------------------

R+ from the previous patch.
Attachment #8467090 - Flags: review?(nicolas.b.pierron) → review+
Attached patch patch.v9.diff (obsolete) — Splinter Review
Remove the resume point IonBuilder::inlineArrayJoin as discussed with nbp.
Attachment #8467090 - Attachment is obsolete: true
Attachment #8467731 - Flags: review+
Attached patch patch.p2.diff (obsolete) — Splinter Review
Second part of the bug.
Attachment #8467746 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8467746 [details] [diff] [review]
patch.p2.diff

Review of attachment 8467746 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but they are still some corner cases that we want to avoid, as the array is mutable.

::: js/src/jit/MIR.cpp
@@ +3242,5 @@
>  
> +MDefinition *
> +MArrayJoin::foldsTo(TempAllocator &alloc) {
> +    MDefinition *arr = array();
> +    if (arr->isStringSplit()) {

We only want to do so if MArrayJoin is the only use of the array.

function split_join_2(i) {
  var x = (i + "-" + i).split("-")
  x.push("" + i);
  var res = x.join("->");
  assertEq(x, i + "->" + i + "->" + i);
  return i;
}

::: js/src/jsarray.cpp
@@ +1087,5 @@
>                  return nullptr;
>          } else {
>              CharSeparatorOp<jschar> op(c);
>              if (!ArrayJoinKernel<Locale>(cx, op, obj, length, sb))
> +                return nullptr;

This should be part of the previous patch.
Attachment #8467746 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch patch.v10.diff (obsolete) — Splinter Review
Move a shunk to this commit.
Attachment #8467731 - Attachment is obsolete: true
Attachment #8468312 - Flags: review+
Attached patch patch.p2.v2.diff (obsolete) — Splinter Review
Attachment #8467746 - Attachment is obsolete: true
Attachment #8468315 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8468315 [details] [diff] [review]
patch.p2.v2.diff

Review of attachment 8468315 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome \o/
Upload the new patch with the additional test case and I will send both patches to the try server later today.

::: js/src/jit/MIR.cpp
@@ +3248,5 @@
> +        MDefinition *pattern = arr->toStringSplit()->separator();
> +        MDefinition *replacement = sep();
> +
> +        return MStringReplace::New(alloc, string, pattern, replacement);
> +    }

nit: add a comment to mention that the MStringSplit can be recovered on bailout as we are removing its last remaining use, which implies the a.split(b).join(c) is replaced by a.replace(b, c), but if the result of the split is captured by resume points then it will be executed on the bailout path.

Also add a test case to test that we will recover the result of MStringSplit, such as:

function resumeHere() { bailout(); }
function split_join_3(i) {
  var x = (i + "-" + i).split("-");
  resumeHere();
  var res = x.join("->");
  assertEq(res, i + "->" + i);
  return i;
}
Attachment #8468315 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8468315 [details] [diff] [review]
patch.p2.v2.diff

Review of attachment 8468315 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I missed on of the corner cases in the previous review, where the MStringSplit will not be removed.

::: js/src/jit/MIR.cpp
@@ +3242,5 @@
>  
> +MDefinition *
> +MArrayJoin::foldsTo(TempAllocator &alloc) {
> +    MDefinition *arr = array();
> +    if (arr->isStringSplit() && arr->hasOneDefUse()) {

Note: we also want to check that the string split is not captured by an observable operand of a resume point, as we do before optimizing with recover instructions.  Otherwise we might end up running the String split in addition to the String replace.  An example of such case would be:

function trip(i) {
  if (i == 99)
    assertEq(myjoin.arguments[0][0], "" + i)
}
function myjoin(i, x) {
  trip(i);
  return x.join("->");
}
function split_join_4(i) {
  var x = (i + "-" + i).split("-");
  var res = myjoin(x);
  assertEq(res, i + "->" + i);
  return i;
}

arguments are observable, as somebody can ask the argument object which captures the result of the string split.  This is limitation of the recover instructions that we would have to remove later.
Attachment #8468315 - Flags: review+ → feedback+
Attached patch patch.p2.v3.diff (obsolete) — Splinter Review
Attachment #8468378 - Flags: review?(nicolas.b.pierron)
Attachment #8468315 - Attachment is obsolete: true
Comment on attachment 8468378 [details] [diff] [review]
patch.p2.v3.diff

Review of attachment 8468378 [details] [diff] [review]:
-----------------------------------------------------------------

see comment 43.

You might also want to report the performance improvement on a micro-benchmark dedicated to this optimization, see
  https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips#Benchmarking_with_sub-milliseconds_%28JS_shell%29
Attachment #8468378 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v4.diff (obsolete) — Splinter Review
Attachment #8468378 - Attachment is obsolete: true
Attachment #8468460 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8468460 [details] [diff] [review]
patch.p2.v4.diff

Review of attachment 8468460 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +3247,5 @@
> +        // We're replacing foo.split(bar).join(baz) by
> +        // foo.replace(bar, baz).  MStringSplit could be recovered by
> +        // a bailout.  As we are removing its last use, and its result
> +        // could be captured by a resume point, this MStringSplit will
> +        // be executed on the bailout path.

This comment does not match the condition which is above.  If there is only one use, then the String split is not captured in any resume point, and only used as argument of MArrayJoin, which is fine for what this bug is aiming at.

So either update the comment to reflect that MStringSplit is only by the MArrayJoin and nothing else, or use hasLiveDefUse() after flagging the current instruction as recovered on bailout (and reseting the flag after if not folded)
Attachment #8468460 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v5.diff (obsolete) — Splinter Review
Attachment #8468460 - Attachment is obsolete: true
Attachment #8468517 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v6.diff (obsolete) — Splinter Review
Attachment #8468517 - Attachment is obsolete: true
Attachment #8468517 - Flags: review?(nicolas.b.pierron)
Attachment #8468527 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8468527 [details] [diff] [review]
patch.p2.v6.diff

Review of attachment 8468527 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is the same as the previous patch.

::: js/src/jit/MIR.cpp
@@ +3245,5 @@
> +    MDefinition *arr = array();
> +    if (!arr->isStringSplit())
> +        return this;
> +
> +    bool wasRecoverOnBailout = arr->isRecoveredOnBailout();

here:
  arr->isRecoveredOnBailout() == false.

@@ +3248,5 @@
> +
> +    bool wasRecoverOnBailout = arr->isRecoveredOnBailout();
> +    arr->setRecoveredOnBailout();
> +    if (!arr->hasLiveDefUses()) {
> +        // We're replacing foo.split(bar).join(baz) by

un set the recovered on bailout flag after this condition.
Attachment #8468527 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v7.diff (obsolete) — Splinter Review
Attachment #8468527 - Attachment is obsolete: true
Attachment #8469081 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v8.diff (obsolete) — Splinter Review
Attachment #8469081 - Attachment is obsolete: true
Attachment #8469081 - Flags: review?(nicolas.b.pierron)
Attachment #8469131 - Flags: review?(nicolas.b.pierron)
Attached patch patch.p2.v9.diffSplinter Review
Bench with:

========
var str = "i";
var replace = "i";
for (var i = 100; i < 100; ++i) {
    str += "-i";
    str += "->i";
}

function split_join(i) {
    return (str + i).split("-").join("->");
}

var sum = 0;

for (var i = 0; i < 10000000; ++i) {
    sum += split_join(i).length;
}
========

Before patch.p2.v9.diff:
8592.118ms
After:
5148.276ms

41% faster.
Attachment #8469131 - Attachment is obsolete: true
Attachment #8469131 - Flags: review?(nicolas.b.pierron)
Attachment #8469248 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8469248 [details] [diff] [review]
patch.p2.v9.diff

Review of attachment 8469248 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, this sounds good :)
I'll fix the patch locally and push it to the try server.

::: js/src/jit/MIR.cpp
@@ +3248,5 @@
> +        return this;
> +
> +    this->setRecoveredOnBailout();
> +    if (arr->hasLiveDefUses()) {
> +        this->setNotRecoveredOnBailout();

style-nit: Remove "this->", as they are not needed to disambiguate anything.

@@ +3261,5 @@
> +    MDefinition *string = arr->toStringSplit()->string();
> +    MDefinition *pattern = arr->toStringSplit()->separator();
> +    MDefinition *replacement = sep();
> +
> +    setNotRecoveredOnBailout();

note that there is no "this->" here.
Attachment #8469248 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8413341 - Attachment is obsolete: true
Attachment #8448693 - Attachment is obsolete: true
Assignee: rimjhim.mazumder17 → davidmoreirafr
Attached patch patch.v11.diffSplinter Review
Attachment #8468312 - Attachment is obsolete: true
Attachment #8469391 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/4168157414b5
https://hg.mozilla.org/mozilla-central/rev/9a6455a2eb93
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
`x.split(y).join(z)` and `x.replace(y, z)` aren't equivalent, due to replacement patterns - for instance, `"abc".replace("a", "$'")` evaluates to "bcbc".

This causes the following:

function repl(x, y, z) {
  return x.split(y).join(z);
}
print(repl("abc", "a", "$'"));
print(repl("abc", "a", "$'"));

to print $'bc, bcbc with --ion-eager --ion-offthread-compile=off. Should I open a new bug for this?
Apart from which the replacement also needs to be global, and not just affect the first occurrence:

print(repl("aa", "a", "c"));
print(repl("aa", "a", "c"));

prints cc, ca.

And behavior when the pattern string is empty is also different...
Ugh, thanks for noticing, Simon.

nbp, can you back out and work on a fix with vash? Ideally, the new version should contain tests for the differences Simon noted.
Status: RESOLVED → REOPENED
Flags: needinfo?(nicolas.b.pierron)
Resolution: FIXED → ---
If the string is a flat replacement, such as split().join(), then we reset
the dollarIndex to UINT32_MAX.  An UINT32_MAX dollarIndex is used to
indicate that no dollar signed was found in the pattern, which is the same
as not interpreting any of the dollar sign in the pattern.
Attachment #8474780 - Flags: review?(till)
Comment on attachment 8474780 [details] [diff] [review]
IonMonkey: split().join() no longer interpret replacement patterns.

Review of attachment 8474780 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. One nit: the description should probably be something like "Make IonMonkey-optimized split().join() not interpret replacement patterns".

Also, this doesn't fix the second issue Simon identified, so I assume another patch is coming?
Attachment #8474780 - Flags: review?(till) → review+
(In reply to Till Schneidereit [:till] from comment #64)
> Also, this doesn't fix the second issue Simon identified, so I assume
> another patch is coming?

Oh, right, I did not noticed this was a different one.  The problem is that String.replace only replace the first occurence in the string and we need to replace all of them as String.split.  I think this is doable but this require some tweaking to make this more with a few modifications.

Where this problem becomes complex is that we can make this work by using one of the SpiderMonkey extension to string replace, the only issue being that this extension register the last matches in RegExpStatics.  We need to a add a returnsCapture flag to replaceData to either avoid adding the last MatchPairs, or to discard that last one.  Sadly the MatchPairs are mostly accessed throught the RegExpStatics, which means that the second approach might be easier to implement at first.

Note, this is already an issue with SpiderMonkey extension of String.replace.  I think I will just no-op the optimization for the time being until we fix the String.replace to use flat-regexp without mutating the RegExpStatics.
(In reply to Nicolas B. Pierron [:nbp] {N/A 22-29/08} from comment #66)
> Note, this is already an issue with SpiderMonkey extension of
> String.replace.  I think I will just no-op the optimization for the time
> being until we fix the String.replace to use flat-regexp without mutating
> the RegExpStatics.

That's unfortunate, but given that the uplift is on Friday and this isn't *that* important an optimization, it makes a lot of sense.
Comment on attachment 8475232 [details] [diff] [review]
Disable split().join() optimization.

Review of attachment 8475232 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks

::: js/src/jit-test/tests/ion/bug977966.js
@@ +79,5 @@
>      assertEq(s.split("-").join("$`$&$'"), i + "$`$&$'" + i);
>      assertEq(s.replace("-", "$`$&$'"), "" + i + i + "-" + i + i);
>  }
>  
> +// Check that, as opposed to String.replace we are doing a global replacement as String.split do.

uber-nits: comma after String.replace, and `s/do/does`

::: js/src/jit/MIR.cpp
@@ +3380,5 @@
>  
>  MDefinition *
>  MArrayJoin::foldsTo(TempAllocator &alloc) {
> +    // :TODO: Enable this optimization after fixing Bug 977966 test cases.
> +    if (true)

why not just an unconditional `return this`?
Attachment #8475232 - Flags: review?(till) → review+
(In reply to Till Schneidereit [:till] from comment #69)
> ::: js/src/jit/MIR.cpp
> @@ +3380,5 @@
> >  
> >  MDefinition *
> >  MArrayJoin::foldsTo(TempAllocator &alloc) {
> > +    // :TODO: Enable this optimization after fixing Bug 977966 test cases.
> > +    if (true)
> 
> why not just an unconditional `return this`?

This was an attempt at muting compilers, but after checking with gcc4.8 and clang3.3, I do not see any warning about unreachable code, so I will remove this if before pushing.
Flags: needinfo?(nicolas.b.pierron)
Thanks for taking care of that. (I just saw the commit fly by in the push log and thought it felt a little wrong.)

::: js/src/jit-test/tests/ion/bug977966.js
@@ +79,5 @@
>      assertEq(s.split("-").join("$`$&$'"), i + "$`$&$'" + i);
>      assertEq(s.replace("-", "$`$&$'"), "" + i + i + "-" + i + i);
>  }
>  
> +// Check that, as opposed to String.replace we are doing a global replacement as String.split do.

Might as well add:

assertEq("".split("").join("x"), "");
assertEq("abc".split("").join("x"), "axbxc");

which was the last thing from comment 61.
(In reply to Simon Lindholm from comment #71)
> assertEq("".split("").join("x"), "");
> assertEq("abc".split("").join("x"), "axbxc");

oh, right.  I'll will add these test cases too.

Then maybe we should take a different approach and instead of basing our optimization on str_replace, we should base it on str_split, which will build a rope instead of an array of dependent strings.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d94086070578

I've push the test cases of both patches and comment 71 tests with the last patch (attachment 8475232 [details] [diff] [review]) which disables this feature for the time being.

I did not push the previous patch (attachment 8474780 [details] [diff] [review]) as I am no longer sure this is the right approach anymore.
Keywords: leave-open
Depends on: 1105727
Hi!

Can I work on this bug?
(In reply to Victor Carlquist from comment #75)
> Can I work on this bug?

This bug has some code associated to it, it would be better to close this one and make another one to solve the rest of the problem.

Fixing the remaining issue implies moving some code around to reuse what we do with split, except that we want to construct a new string at the same time instead of constructing an array.
Are you sure you still want to work on this bug?

I opened Bug 1112537 for this purpose.
This bug fixed the most of the Ion part of the problem, now we are lacking the JS engine part to provide a good substitute to map to.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Keywords: leave-open
Resolution: --- → FIXED
Summary: Optimize String#split('foo').join('bar') pattern → Optimize String#split('foo').join('bar') pattern (IonMonkey part)
Depends on: 1137624
You need to log in before you can comment on or make changes to this bug.