Closed Bug 1112537 Opened 10 years ago Closed 8 years ago

Optimize String#split('foo').join('bar') pattern

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: nbp, Assigned: victorcarlquist)

References

Details

(Whiteboard: [lang=c++][js:p2])

Attachments

(2 files, 21 obsolete files)

2.18 KB, patch
nbp
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
12.02 KB, patch
nbp
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #977966 +++

Splitting a string on one string pattern and joining it using another is a very common practice. I wonder if it might be possible to check if both split and join are the original methods, and the argument to split is a string, and optimize the entire pattern to get rid of the temporary array creation entirely.

So far the code for folding the ArrayJoin and the StringSplit is disabled, as this does not map to a StringReplace which has different set of corner cases.

If we want to optimize this case to do the deforestation of the Array, then we should implement a clone of String.split which produce a String instead of an Array by interleaving a separator.
Hi Nicolas, 

> If we want to optimize this case to do the deforestation of the Array, then we should implement a clone of String.split which produce a String instead of an Array by interleaving a separator.

I'm wondering, we could create a flag that will be true if the next instruction in split is a join instruction, and in CodeGenerate we can use this to decide if we'll call "normal" split or call the clone split, what do you think about it?

How can I check if the next instruction in StringSplit is a Join instruction on Ion?

Thanks in advanced :)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #1)
> Hi Nicolas, 
> 
> > If we want to optimize this case to do the deforestation of the Array, then we should implement a clone of String.split which produce a String instead of an Array by interleaving a separator.
> 
> I'm wondering, we could create a flag that will be true if the next
> instruction in split is a join instruction, and in CodeGenerate we can use
> this to decide if we'll call "normal" split or call the clone split, what do
> you think about it?

This part is already done in IonMonkey.
The missing part is the rewriting of the js_str_split function such that it can produce a string instead of an array.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> (In reply to Victor Carlquist from comment #1)
> > Hi Nicolas, 
> > 
> > > If we want to optimize this case to do the deforestation of the Array, then we should implement a clone of String.split which produce a String instead of an Array by interleaving a separator.
> > 
> > I'm wondering, we could create a flag that will be true if the next
> > instruction in split is a join instruction, and in CodeGenerate we can use
> > this to decide if we'll call "normal" split or call the clone split, what do
> > you think about it?
> 
> This part is already done in IonMonkey.
> The missing part is the rewriting of the js_str_split function such that it
> can produce a string instead of an array.


[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#3871
Attached patch WIP - bug1112537.patch (obsolete) — Splinter Review
Hi Nicolas!

This patch is faster with:
> var str = "i";
> for (var i = 1; i < 100; ++i) {
>    str += "-i";
>    str += "->i";
> }
> for(var i = 0; i < 40000; i++ )
>    str.split("").join("->");

before: 609ms
after: 425ms

But is slower with:
> for(var i = 0; i < 40000; i++ )
> str.split("-").join("->");

before: 601ms
after: 820ms

I made the same test with:
> for(var i = 0; i < 40000; i++ )
> str.replace(/-/g, "->");

It got 881ms, 280ms more than |split.join|.

I'll study a new approach to create a replace more faster.

Thanks!
Attachment #8539829 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8539829 [details] [diff] [review]
WIP - bug1112537.patch

Review of attachment 8539829 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, even if I am not yet convinced by the isReturnString flag.

::: js/src/jit/CodeGenerator.cpp
@@ +6195,5 @@
> +        pushArg(ToRegister(lir->string()));
> +        pushArg(ImmGCPtr(lir->mir()->typeObject()));
> +        callVM(StringSplitInfo, lir);
> +    } else {
> +        masm.movePtr(ToRegister(lir->string()), ToRegister(lir->output()));

If we were to do so, we should do it in the Lowering, but as mentioned below, we don't want to do it that way.

::: js/src/jit/MIR.cpp
@@ +3951,2 @@
>      MDefinition *arr = array();
> +    if (!arr->isStringSplit() || !arr->hasOneUse())

This condition will fail because if a resume point captures MStringSplit result, as MStringSplit is supposed to allocate and array, and the allocation is fallible.

> function mySplit(str, sep) { return str.split(sep); }
> function myReplace(str, sep, rep) { return mySplit(str, sep).join(rep); }

In the case above, the MStringSplit result is captured by the entry resume point of the block which is after the inlined block for mySplit.

Using recovered instructions was a way to keep the MStringSplit in the resume points (if captured in more than one), while still optimizing the MArrayJoin case, by folding the 2 instructions into a MStringReplace.

Note that instructions flagged as RecoveredOnBailout are not generating any code, but they are still use in cases where we need to return to baseline sooner.  Normally you shouldn't have to modify this code in any way.

@@ -3956,5 @@
>          return this;
>  
> -    this->setRecoveredOnBailout();
> -    if (arr->hasLiveDefUses()) {
> -        this->setNotRecoveredOnBailout();

This code was testing:  arr->hasOneLiveDef(this)

::: js/src/jit/MIR.h
@@ +6127,4 @@
>    : public MTernaryInstruction,
>      public MixPolicy<StringPolicy<0>, StringPolicy<1> >::Data
>  {
> +    bool isReturnString_;

This flag does not seems needed, except for making a work-around for the cases where there is only one use, and this use is MArrayJoin.  By using setRecoveredOnBailout, we do not have to change the CodeGen nor anything else in this instruction.

::: js/src/jsstr.cpp
@@ +3345,2 @@
>  {
> +    if (!pattern->length() && flatReplacement) {

Wouldn't it make more sense to create a new function for the flatReplacement code?

@@ +3346,5 @@
> +    if (!pattern->length() && flatReplacement) {
> +        StringBuffer sb(cx);
> +        char16_t c;
> +        unsigned i = 0;
> +        for (; i < string->length()-1; ++i) {

nit: add spaces around the "-".

@@ +3349,5 @@
> +        unsigned i = 0;
> +        for (; i < string->length()-1; ++i) {
> +            if (!string->getChar(cx, i, &c))
> +                return false;
> +            if (!sb.append(char16_t(c)))

nit: no need for a cast, knowing that c is already a char16_t.

@@ +3356,5 @@
> +                return false;
> +        }
> +        if (!string->getChar(cx, i, &c))
> +            return false;
> +        if (!sb.append(char16_t(c)))

nit: same here.
Attachment #8539829 - Flags: feedback?(nicolas.b.pierron) → feedback+
Assignee: nobody → victorcarlquist
Status: NEW → ASSIGNED
I didn't look at the patch but just noticed this part in the review:

> > +        if (!string->getChar(cx, i, &c))
> > +            return false;
> > +        if (!sb.append(char16_t(c)))

getChar() is pretty slow, please don't call it in a loop for each char. You can use latin1OrTwoByteChar() but even faster is to use separate paths for latin1/twobyte (maybe with templates).
(In reply to Jan de Mooij [:jandem] from comment #6)

Thanks Jan!
Attached patch bug1112537v3.patch (obsolete) — Splinter Review
This patch is faster.
before: 609ms
after: 4ms

Thanks!
Attachment #8539829 - Attachment is obsolete: true
Attachment #8540484 - Flags: review?(nicolas.b.pierron)
Attached patch bug1112537v3.patch (obsolete) — Splinter Review
This patch is faster.
before: 609ms
after: 4ms

Thanks!
Attachment #8540484 - Attachment is obsolete: true
Attachment #8540484 - Flags: review?(nicolas.b.pierron)
Attachment #8540489 - Flags: review?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #9)
> Created attachment 8540489 [details] [diff] [review]
> bug1112537v3.patch
> 
> This patch is faster.
> before: 609ms
> after: 4ms
> 
> Thanks!

I made the test with a constant... Sorry, the real result is:
before: 609ms
after: 530ms
Comment on attachment 8540489 [details] [diff] [review]
bug1112537v3.patch

Review of attachment 8540489 [details] [diff] [review]:
-----------------------------------------------------------------

This is awesome :)

I am sure we can do even better with the following suggestions.

::: js/src/jsstr.cpp
@@ +3331,5 @@
> +static JSString *
> +StrFlatReplaceGlobal(JSContext *cx, JSLinearString *str, JSLinearString *pat, JSLinearString *rep)
> +{
> +    StringBuffer sb(cx);
> +    if (!pat->length()) {

nit: Add a comment to say "The pattern is empty, so we interleave the replacement string in-between each character."

@@ +3339,5 @@
> +        if(!sb.reserve(length))
> +            return nullptr;
> +
> +        for (unsigned i = 0; i < str->length() - 1; ++i, ++textChars) {
> +            if (!sb.append(*textChars))

As you did a "reserve" of the right size and there is no way this code can overflow the buffer, you might want to use infallibleAppend here, and below.

@@ +3352,5 @@
> +    }
> +
> +    // If it's true, we are sure that the result's length is, at least, the same
> +    // length as |str->length()|.
> +    if(pat->length() >= rep->length()) {

comment-nit: at least?  lower or equal?

@@ +3355,5 @@
> +    // length as |str->length()|.
> +    if(pat->length() >= rep->length()) {
> +        if(!sb.reserve(str->length()))
> +            return nullptr;
> +    }

Would it make sense to also reserve some space for the else-case?

@@ +3361,5 @@
> +    size_t start = 0;
> +    JSString *sub;
> +    for (;;) {
> +        int match = StringMatch(str, pat, start);
> +        if (match < 0) {

nit: use "break;" and move the content of this if after the loop.

@@ +3367,5 @@
> +                return str;
> +            sub = NewDependentString(cx, str, start, str->length() - start);
> +            if (!sub)
> +                return nullptr;
> +            if (!sb.append(sub))

A dependent string is a small data-structure used to keep a pointer to the original string, this is useful to avoid larger allocations.

But in this case, we append the content of the original string into the StringBuffer, so we should probably avoid this allocation and use

  sb.appendSubstring(str, start, str->length() - start)

And if you move the textChars just below the StringBuffer, then you might also be able to do:

  sb.append(textChars + start, str->length() - start)

in which case this avoid checking for the encoding of "str" every time an append is made, as Jan suggested with the template.

@@ +3374,5 @@
> +        }
> +        sub = NewDependentString(cx, str, start, match - start);
> +        if (!sub)
> +            return nullptr;
> +        if (!sb.append(sub))

same here.

@@ +3383,5 @@
> +    }
> +}
> +
> +bool
> +js::str_flat_replace_string(JSContext *cx, HandleString string, HandleString pattern,

nit: Add a comment above this function to mention that it is meant to be identical to "str.split(pattern).join(replacement)" except that we do some deforestation optimization in Ion using this function instead of the last 2.
Attachment #8540489 - Flags: review?(nicolas.b.pierron) → feedback+
Comment on attachment 8540489 [details] [diff] [review]
bug1112537v3.patch

Review of attachment 8540489 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +6942,4 @@
>  {
>    private:
>  
> +    bool isFlatReplacement_;

You will have to either prevent using recover instruction if this flag is set, or use recover instructions with this falg.  Have a look at js/src/jit/Recover.cpp RStringReplace.
Attached patch bug1112537v4.patch (obsolete) — Splinter Review
Thanks for your reply :)

I made the changes that you suggested, this patch is faster now!

before: 609ms~
after:  350ms~

Thanks again.
Attachment #8540489 - Attachment is obsolete: true
Attachment #8540743 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8540743 [details] [diff] [review]
bug1112537v4.patch

Review of attachment 8540743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +3337,5 @@
> +
> +    // The pattern is empty, so we interleave the replacement string in-between
> +    // each character.
> +    if (!pat->length()) {
> +        const CharT *repChars = rep->chars<CharT>(nogc);

This is a good idea, but you might want to add a different template parameter for the replacement string.  Otherwise this might be a problem for

   str.split("").join("\u03c0")

can you add a test case with the above example?

@@ +3363,5 @@
> +    for (;;) {
> +        int match = StringMatch(str, pat, start);
> +        if (match < 0)
> +            break;
> +        sb.append(textChars + start, match - start);

We still have to check for the return value of append functions ;)

@@ +3370,5 @@
> +        start = match + pat->length();
> +    }
> +    if (sb.empty())
> +        return str;
> +    sb.append(textChars + start, str->length() - start);

same here.
Attachment #8540743 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch bug1112537v5.patch (obsolete) — Splinter Review
Thanks Nicolas!
Attachment #8540743 - Attachment is obsolete: true
Attachment #8540793 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8540793 [details] [diff] [review]
bug1112537v5.patch

Review of attachment 8540793 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, this patch looks good!
Can you fix the following nits a create a new patch with a proper subject[1] such that somebody else can take this patch and land it for you ;)

[1] https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F

(asking Jan for feedback in case I missed something in StrFlatReplaceGlobal)

::: js/src/jsstr.cpp
@@ +3327,4 @@
>  
>  static const uint32_t ReplaceOptArg = 2;
>  
> +template <typename TextChar,typename RepChar>

style-nit: add a space after the comma.

@@ +3342,5 @@
> +
> +    // The pattern is empty, so we interleave the replacement string in-between
> +    // each character.
> +    if (!pat->length()) {
> +        const RepChar *repChars = rep->chars<RepChar>(nogc);

nit: Move this one just below textChars.

@@ +3370,5 @@
> +        if (match < 0)
> +            break;
> +        if (!sb.append(textChars + start, match - start))
> +            return nullptr;
> +        if (!sb.append(rep))

nit: sb.append(repChars, rep->length())
Attachment #8540793 - Flags: review?(nicolas.b.pierron)
Attachment #8540793 - Flags: review+
Attachment #8540793 - Flags: feedback?(jdemooij)
> +        size_t length = rep->length() * (str->length() - 1) + str->length();

This could overflow, right? Also, what about the case str->length() == 0?
True, you are right! 
|size_t| is small, I'll change it to |unsigned long int|, thanks!

I found another problem:  
"abc".split("a").join("")  
it will return "abc", but the right result is "bc".
|unsigned long int| is an uint32_t on Windows. If you want a 64-bit type you should use uint64_t.
Oh, and of course it's a uint32_t on 32-bit platforms as well. Either way it's no better than size_t :)
Ah! I can't change this type, because the |sb.reserve(size_t len)| method has a size_t parameter.
Comment on attachment 8540793 [details] [diff] [review]
bug1112537v5.patch

Review of attachment 8540793 [details] [diff] [review]:
-----------------------------------------------------------------

Nice patch! Some comments below that should be addressed.

::: js/src/jit/VMFunctions.cpp
@@ +1118,5 @@
> +    MOZ_ASSERT(pattern);
> +    MOZ_ASSERT(repl);
> +    RootedValue rval(cx);
> +
> +    if (!str_flat_replace_string(cx, string, pattern, repl, &rval))

Can we change str_flat_replace_string to return a JSString * so that we can remove this function and call str_flat_replace_string directly?

::: js/src/jsstr.cpp
@@ +3336,5 @@
> +        if (!sb.ensureTwoByteChars())
> +            return nullptr;
> +    }
> +
> +    AutoCheckCannotGC nogc;

The rooting analysis will complain because the sb.finishString() below can GC. Maybe you can move the StringBuffer to the caller and pass a "StringBuffer &sb" to this function, then call sb.finishString() in the caller?

The caller can then also do the sb.ensureTwoByteChars().

(It's a bit annoying but it also avoids some code duplication due to the templates :)

@@ +3337,5 @@
> +            return nullptr;
> +    }
> +
> +    AutoCheckCannotGC nogc;
> +    const TextChar *textChars = str->chars<TextChar>(nogc);

Nit: it might be clearer to rename "str" to "text" or textChars to strChars.

@@ +3344,5 @@
> +    // each character.
> +    if (!pat->length()) {
> +        const RepChar *repChars = rep->chars<RepChar>(nogc);
> +        size_t length = rep->length() * (str->length() - 1) + str->length();
> +        if(!sb.reserve(length))

Simon is right, we should handle overflow. You could use CheckedInt, something like this:

CheckedInt<uint32_t> strLength(rep->length());
CheckedInt<uint32_t> repLength(rep->length());
CheckedInt<uint32_t> length = repLength * (strLength - 1) + strLength;
if (!length.isValid()) {
    js_ReportAllocationOverflow(cx);
    return false;
}
... use length.value() ...

Unfortunately you also have to special-case the str->length() == 0 case, to avoid underflow in strLength - 1, we don't want to throw an error in that case.

You can also use fallible append instead of reserve/infallibleAppend. Is it much slower?

@@ +3358,5 @@
> +    }
> +
> +    // If it's true, we are sure that the result's length is, at least, the same
> +    // length as |str->length()|.
> +    if(rep->length() >= pat->length()) {

Nit: space after 'if'
Attachment #8540793 - Flags: feedback?(jdemooij)
This patch executes a replacement when the script has the pattern str.split("foo").join("a").

Following some results:
When the string pattern is empty:

> for(var i = 0; i < 40000; i++ )
>    str.split("").join("->");

before: 609ms~
after:  235ms~

With a ordinary replacement: 

> for(var i = 0; i < 40000; i++ )
>    str.split("-").join("->");

before: 601ms~
after:  360ms~
Attachment #8540793 - Attachment is obsolete: true
Attachment #8541285 - Flags: review?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #22)
> Comment on attachment 8540793 [details] [diff] [review]
 
> You can also use fallible append instead of reserve/infallibleAppend. Is it
> much slower?

Hi Jan, 

Yes, I did the test with fallible append, it got 326ms and infallibleAppend got 250ms.

Thanks for your nice reply!
Comment on attachment 8541285 [details] [diff] [review]
Optimize str.split('foo').join('bar') pattern.

Review of attachment 8541285 [details] [diff] [review]:
-----------------------------------------------------------------

Nice improvements :)

::: js/src/jit-test/tests/ion/bug977966.js
@@ +76,5 @@
> +    var s1 = "ab";
> +    assertEq(s1.split("").join("\u03c0"), "a\u03c0b");
> +    var s2 = i + "\u03c0" + i;
> +    assertEq(s2.split("\u03c0").join("-"), i + "-" + i);
> +}

Can you add test cases for comment 17 (both overflow & underflow) & comment 18?

try {
    var s = "  ";
    for (var i = 1; i < 10; i++)
        s = s.split("").join(s); // 2^(2^i)
} catch (exn) {
    assertEq(exn instanceof InternalError, true);
};

::: js/src/jsstr.cpp
@@ +3340,5 @@
> +    // each character.
> +    if (!pat->length()) {
> +        CheckedInt<uint32_t> strLength(str->length());
> +        CheckedInt<uint32_t> repLength(rep->length());
> +        CheckedInt<uint32_t> length = repLength * (strLength - 1) + strLength;

The check for str->length() is in another function, you might want to add an assertion at the beginning of this function:

  MOZ_ASSERT(str->length() > 0);
Attachment #8541285 - Flags: review+
(In reply to Nicolas B. Pierron [:nbp] from comment #25)
> Comment on attachment 8541285 [details] [diff] [review]
> Optimize str.split('foo').join('bar') pattern.

> 
> try {
>     var s = "  ";
>     for (var i = 1; i < 10; i++)
>         s = s.split("").join(s); // 2^(2^i)
> } catch (exn) {
>     assertEq(exn instanceof InternalError, true);
> };
> 

The patch is failing on this test.
This error is occurring in interpreter.
Flags: needinfo?(nicolas.b.pierron)
Attached patch bug1112537.patch (obsolete) — Splinter Review
Attachment #8541285 - Attachment is obsolete: true
Attachment #8541285 - Flags: review?(jdemooij)
Attachment #8541302 - Flags: review?(nicolas.b.pierron)
Attached patch OOM (obsolete) — Splinter Review
Hi Nicolas!
I found where is the "problem", but I am not sure if this patch is valid ;)
Well, when the ArrayJoin is called, it'll calculate the following formula:

res = seplen * (length -1)
sb.reserve(res);

if seplen = 1 and length = 2^(2^5), it's ok, it shows up OOM in my prompt because I only have 4GB.
but, when Ion optimizes it, in |StrFlatReplaceGlobal|, it'll calculate:

length = repLength * (strLength - 1) + strLength;

Ha! It'll overflow.
So, the two exceptions will be thrown on the same script.

Thanks.
Flags: needinfo?(nicolas.b.pierron)
Attachment #8541351 - Flags: review?(nicolas.b.pierron)
Attachment #8541351 - Flags: review?(nicolas.b.pierron) → feedback?(nicolas.b.pierron)
Comment on attachment 8541351 [details] [diff] [review]
OOM

Review of attachment 8541351 [details] [diff] [review]:
-----------------------------------------------------------------

Add a subject to this patch and set checkin flag.

https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F

::: js/src/jsarray.cpp
@@ +1141,4 @@
>      // The separator will be added |length - 1| times, reserve space for that
>      // so that we don't have to unnecessarily grow the buffer.
>      size_t seplen = sepstr->length();
> +    CheckedInt<uint32_t> res = seplen * (length -1);

nit: CheckedInt<uint32_t> res = CheckedInt<uint32_t>(seplen) * (length - 1);

@@ +1147,3 @@
>          return nullptr;
> +    }
> +    if (length > 0 && !sb.reserve(res.value())) {

if (length > 0 && (!res.isValid() || !sb.reserve(res.value()))) {

Note that the underflow is handled by the 'length > 0'.
Attachment #8541351 - Flags: feedback?(nicolas.b.pierron) → review+
Comment on attachment 8541302 [details] [diff] [review]
bug1112537.patch

Review of attachment 8541302 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, thanks for going to all these iterations :)
Attachment #8541302 - Flags: review?(nicolas.b.pierron) → review+
Awesome!
Thank you very much Nicolas!
Attached patch Overflow message instead of oom (obsolete) — Splinter Review
Attachment #8541351 - Attachment is obsolete: true
Attachment #8542166 - Flags: checkin?
Comment on attachment 8542166 [details] [diff] [review]
Overflow message instead of oom

Review of attachment 8542166 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsarray.cpp
@@ +1141,5 @@
>      // The separator will be added |length - 1| times, reserve space for that
>      // so that we don't have to unnecessarily grow the buffer.
>      size_t seplen = sepstr->length();
> +    CheckedInt<uint32_t> res = CheckedInt<uint32_t>(seplen) * (length - 1);
> +    if (!res.isValid()) {

Please fix the previous nits before asking for checkin.
Attachment #8542166 - Flags: checkin?
Ouch, I'm sorry, I'll fix it.
Attachment #8542166 - Attachment is obsolete: true
Attachment #8542221 - Flags: checkin?
You should also post a successful try run [1] and set the checkin-needed keyword. Also, don't both patches in this bug need to land? You should label them as 'Part n: Description' so that the sheriffs know how to land them.

The checkin flag is only really useful if some attachments have already landed, but others still need to land.

[1] https://wiki.mozilla.org/ReleaseEngineering/TryServer
I've pushed the last 2 patches to our try-server, and requested to run the JS tests.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=26bef624c372
Comment on attachment 8542221 [details] [diff] [review]
Part 1 - Overflow message instead of oom

Those 'e' failures on that try run look like they're related to this patch, so I'm going to pass on checking these in for now.
Flags: needinfo?(victorcarlquist)
Flags: needinfo?(nicolas.b.pierron)
Attachment #8542221 - Flags: checkin?
Hmm, I don't how many memory has the Try server, but I think that failure is connected with it.

When the memory ram avaliable is small, in ArrayJoin function on interpreter with the "oom path" applied, it'll return a InternalError object (Overflow) because the condition is :

> if (length > 0 && (!res.isValid() || !sb.reserve(res.value()))) {

but in JIT the "length" variable in StrFlatReplaceGlobal won't overflow, but it'll fall in:

>  if (!sb.reserve(length.value()))
>       return false;

It'll show up OOM message, so, the test below will fail:

> 
> try {
>     var s = "  ";
>     for (var i = 1; i < 10; i++)
>         s = s.split("").join(s); // 2^(2^i)
> } catch (exn) {
>     assertEq(exn instanceof InternalError, true);
> };
> 

I'll write the same condition on JIT, it'll solve the problem, I believe.

Thanks Wes!
Flags: needinfo?(victorcarlquist)
I wrote the same condition the ArrayJoin in StrFlatReplaceGlobal:

> if (!length.isValid() || !sb.reserve(length.value())) {
>    js_ReportAllocationOverflow(cx);
>    return false;
> }

Thanks!
Attachment #8541302 - Attachment is obsolete: true
Attachment #8542297 - Flags: review?(nicolas.b.pierron)
Attachment #8542221 - Attachment description: Overflow message instead of oom → Part 1 - Overflow message instead of oom
(In reply to Victor Carlquist from comment #41)
> Created attachment 8542297 [details] [diff] [review]
> Part 2 - Opt pattern "str".split().join()
> 
> I wrote the same condition the ArrayJoin in StrFlatReplaceGlobal:
> 
> > if (!length.isValid() || !sb.reserve(length.value())) {
> >    js_ReportAllocationOverflow(cx);
> >    return false;
> > }
> 
> Thanks!

The reserve function is already supposed to report an allocation overflow.

The String buffer[1] use Vector with the default allocator[2] (TempAllocPolicy).  This allocator is well returning the exception in case of allocation overflow[3].  So there is no need for making such exception here.

So we should probably modify the part 1 to avoid creating an exception on top of another exception.

The problem with the test case is not the error message but only the fact that it takes ages to run to completion.  I think this is just a matter of making the test case faster by only checking split_join_overflow ~5 times instead of 100 times.  You should probably make another outer loop.

[1] http://dxr.mozilla.org/mozilla-central/source/js/src/vm/StringBuffer.h?from=StringBuffer#30
[2] http://dxr.mozilla.org/mozilla-central/source/js/public/Vector.h#48
[3] http://dxr.mozilla.org/mozilla-central/source/js/src/jsalloc.cpp#20
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #42)
> (In reply to Victor Carlquist from comment #41)
> > Created attachment 8542297 [details] [diff] [review]
> > Part 2 - Opt pattern "str".split().join()
> > 

Thanks for your reply :)
I changed the part 1, but it's reporting OOM instead of overflow...
Maybe it occurs because the allocation is falling in [1]: 

> if (usingInlineStorage()) {
>  return convertToHeapStorage(newCap);

It can report OOM [2].
 
[1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Vector.h#812
[2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Vector.h#715
Flags: needinfo?(nicolas.b.pierron)
I think a differential behaviour which either returns OOM or AllocationOverflow is not a big deal, I will ask (ni?) the fuzzing guys for feedback but this does not sounds like something we care much about.

(In reply to Victor Carlquist from comment #43)
> (In reply to Nicolas B. Pierron [:nbp] from comment #42)
> > (In reply to Victor Carlquist from comment #41)
> > > Created attachment 8542297 [details] [diff] [review]
> > > Part 2 - Opt pattern "str".split().join()
> > > 
> 
> Thanks for your reply :)
> I changed the part 1, but it's reporting OOM instead of overflow...
> Maybe it occurs because the allocation is falling in [1]: 
> 
> > if (usingInlineStorage()) {
> >  return convertToHeapStorage(newCap);
> 
> It can report OOM [2].
>  
> [1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Vector.h#812
> [2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Vector.h#715
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(gary)
Flags: needinfo?(choller)
(In reply to Nicolas B. Pierron [:nbp] from comment #44)
> I think a differential behaviour which either returns OOM or
> AllocationOverflow is not a big deal, I will ask (ni?) the fuzzing guys for
> feedback but this does not sounds like something we care much about.

No, their exit codes are not 0 so I don't think it matters for differential testing.
Flags: needinfo?(gary)
The test was fixed :)
Attachment #8542297 - Attachment is obsolete: true
Attachment #8542297 - Flags: review?(nicolas.b.pierron)
Hi, 

The OOM doesn't create InternalError object, so, the new test will fail (comment 26).

Should we keep the AllocationOverflow in part 1?

Thanks ;)
(In reply to Victor Carlquist from comment #47)
> The OOM doesn't create InternalError object, so, the new test will fail
> (comment 26).

Maybe you can check that this is one or the other.

> Should we keep the AllocationOverflow in part 1?

Only for   (length > 0 && !res.isValid())   , not for    (length > 0 && !sb.reserve(res.value())) .
Attachment #8543277 - Flags: review?(nicolas.b.pierron)
Thanks for your reply!
I changed the test ;)
Attachment #8542221 - Attachment is obsolete: true
Attachment #8542699 - Attachment is obsolete: true
Attachment #8543279 - Flags: review?(nicolas.b.pierron)
Attachment #8543277 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8543279 [details] [diff] [review]
Part 2 - Opt pattern "str".split().join()

Review of attachment 8543279 [details] [diff] [review]:
-----------------------------------------------------------------

Nice Work!
Attachment #8543279 - Flags: review?(nicolas.b.pierron) → review+
Great! Thank you!
Hi,
Can someone push the patches in the Try server, please?
(In reply to Victor Carlquist from comment #53)
> Can someone push the patches in the Try server, please?

Sorry, I forgot to do it the last time.  Feel free to needinfo your reviewer (me in this case) if does not do it within a day ;)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=48666413d183
The test failed with ASan... Can we disable this test when executing with ASan?
(I don't know how to make it), or should I delete the overflow test? :)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #55)
> The test failed with ASan... Can we disable this test when executing with
> ASan?
> (I don't know how to make it), or should I delete the overflow test? :)

Usually we try to avoid disabling tests, especially these kind of tests, but here this seems to be an issue with the Address Sanitizer.

I think we should surround the function split_join_overflow with:

if (getBuildConfiguration()['asan']) {
  …
} else {
  function split_join_overflow() {}
}
Flags: needinfo?(nicolas.b.pierron)
Thanks for your reply,
I've made the change that you suggested.
Attachment #8543279 - Attachment is obsolete: true
Attachment #8551813 - Flags: review?(nicolas.b.pierron)
Attachment #8551813 - Flags: review?(nicolas.b.pierron) → review+
I pushed these patches based on the greenish try push (comment 54) and simple fix (and the proper fix in the test case)

Congratulation, these changes are now on mozilla-inbound, and should be available on the next release of Firefox Nightly ;)

https://hg.mozilla.org/integration/mozilla-inbound/rev/076426ec9ed6
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe340da3fb4c

Is there any other issue that you might be interested in?
Flags: needinfo?(choller)
(In reply to Nicolas B. Pierron [:nbp] from comment #58)

Awesome! Thank you ;)

I'll search for other issue, I have nothing in my mind yet.
Backed out for causing the jit-test failures noted in bug 1092547 comment 64 on OSX and Win8.
https://hg.mozilla.org/integration/mozilla-inbound/rev/c8b693e4b0de
Thanks Ryan.
Hi,
I reduced the loop from 100 to 50 iterations in bug977966.js and rebased the patch.
Attachment #8551813 - Attachment is obsolete: true
Attachment #8552081 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8552081 [details] [diff] [review]
Part 2 - Opt pattern "str".split().join()

Review of attachment 8552081 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/bug977966.js
@@ +146,4 @@
>  }
> +
> +for (var i = 0; i < 5; i++)
> +   split_join_overflow();

Can you move this to another test case, I think the timeout is more likely coming from this test case than any of the previous one.
Attachment #8552081 - Flags: review?(nicolas.b.pierron) → feedback+
I created a bug1112537.js file and moved that test into it.

Thanks!
Attachment #8552081 - Attachment is obsolete: true
Attachment #8552462 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8552462 [details] [diff] [review]
Part 2 - Opt pattern "str".split().join()

Review of attachment 8552462 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the following modification ;)

::: js/src/jit-test/tests/ion/bug1112537.js
@@ +1,2 @@
> +if (getBuildConfiguration()['asan']) {
> +    function split_join_overflow() {}

if (getBuildConfiguration()['asan'])
  quit();
Attachment #8552462 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8552462 - Attachment is obsolete: true
Attachment #8553119 - Flags: review+
Attachment #8553119 - Flags: checkin?
(In reply to Nicolas B. Pierron [:nbp] from comment #65)
> Comment on attachment 8552462 [details] [diff] [review]
> Part 2 - Opt pattern "str".split().join()
> 
> Review of attachment 8552462 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with the following modification ;)

Thanks again!
Attachment #8543277 - Flags: checkin+
Attachment #8553119 - Flags: checkin? → checkin+
Backed out Part 2 for Win8 jit-test failures again. Please verify that this is green on Try before requesting checkin again.
https://hg.mozilla.org/integration/mozilla-inbound/rev/6940eaeea289

https://treeherder.mozilla.org/logviewer.html#?job_id=5831323&repo=mozilla-inbound
Keywords: leave-open
Attachment #8553119 - Flags: checkin+ → checkin-
Time out again...

> for (var i = 1; i < 10; i++)
>    s = s.split("").join(s); // 2^(2^i)

Maybe should we reduce the loop from 10 to 6 iterations?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #70)
> Time out again...
> 
> > for (var i = 1; i < 10; i++)
> >    s = s.split("").join(s); // 2^(2^i)
> 
> Maybe should we reduce the loop from 10 to 6 iterations?

The problem of reducing the number of iterations, is that we are no longer going to be testing the Jit anymore.  One option might be to use either

  gcparam("maxBytes", 1234567890 /* some value */);

or

  oomAfterAllocations(5);

the second might be harder to use for matching precisely the string/array allocation, as we also have to allocate the JitCode.
Flags: needinfo?(nicolas.b.pierron)
Fixed the test, 

Thanks :)
Attachment #8553119 - Attachment is obsolete: true
Attachment #8554149 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8554149 [details] [diff] [review]
Part 2 - Opt pattern "str".split().join()

Review of attachment 8554149 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/bug1112537.js
@@ +16,5 @@
> +
> +for (var i = 0; i < 9; i++)
> +    split_join_overflow();       
> +
> +// Exercises the allocation in JIT.

Is this enough to compile & optimize in IonMonkey?
You might want to reuse the  setJitCompilerOption  prelude calls to tell Ion to compile sooner than expected, but wait something like 20 iteration such that Type inference can infer the result of the function calls.

@@ +19,5 @@
> +
> +// Exercises the allocation in JIT.
> +s = "  ";
> +for (var i = 0; i < 9; i++)
> +    split_join_overflow();       

style-nit: remove trailing whitespaces. (same in the previous loop)
Attachment #8554149 - Flags: review?(nicolas.b.pierron)
You are right! That loop wasn't enough to compile & optimize :)

I needed to set baseline.warn.up and ion.warn.up with low values to test ion without --ion-eager flag.

Thanks.
Attachment #8554149 - Attachment is obsolete: true
Attachment #8554738 - Flags: review?(nicolas.b.pierron)
This looks good, I've pushed the patches to try, and we will see if it continue to fail on Win8 ;)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c687f60275e6
Attachment #8554738 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #76)
> This looks good, I've pushed the patches to try, and we will see if it
> continue to fail on Win8 ;)
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=c687f60275e6

The jit tests weren't activated on Win8 :)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #77)
> (In reply to Nicolas B. Pierron [:nbp] from comment #76)
> > This looks good, I've pushed the patches to try, and we will see if it
> > continue to fail on Win8 ;)
> > 
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=c687f60275e6
> 
> The jit tests weren't activated on Win8 :)

Thanks for the notification, … I pushed the same patches again with a different set of options and hope this would check the right things.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e6d254f1174
Flags: needinfo?(nicolas.b.pierron)
I will try to modify the test case, and see if I can make it work.
Thanks Nicolas!
(In reply to Nicolas B. Pierron [:nbp] from comment #44)
> I think a differential behaviour which either returns OOM or
> AllocationOverflow is not a big deal, I will ask (ni?) the fuzzing guys for
> feedback but this does not sounds like something we care much about.

It's fine if one run hits js_ReportOutOfMemory and another hits js_ReportAllocationOverflow. The former fprints a message (in JS_MORE_DETERMINISTIC builds), and compareJIT won't compare the output if either run prints that message.

I would worry if one run hits js_ReportAllocationOverflow, causing a JS exception, while another run succeeds. Can that happen? If so, js_ReportAllocationOverflow should fprint a similar message (in JS_MORE_DETERMINISTIC builds).
Flags: needinfo?(nicolas.b.pierron)
(In reply to Jesse Ruderman from comment #81)

Hi Jesse,
If we attempt to alloc some memory, but the memory isn't enough, both will hit js_ReportOutOfMemory.

> I would worry if one run hits js_ReportAllocationOverflow, causing a JS
> exception, while another run succeeds. Can that happen? 

In this patch we can get success in interpreter/baseline but doesn't in Ion, 
because we are using two different formulas to reserve memory:
Formula 1 (used in interpreter/baseline):
> CheckedInt<uint32_t> res = CheckedInt<uint32_t>(seplen) * (length - 1);
Formula 2 (used in Ion):
> CheckedInt<uint32_t> length = repLength * (strLength - 1) + strLength;

So, if interpreter/baseline (formula 1) does not hit js_ReportAllocationOverflow, Ion (formula 2) might hit it.
But it is unlikely that the interpreter/baseline causes JS exception and Ion doesn't.
Hi, 
Is this still a valid bug?

If so, we could rewrite the test, using the oomTest function to test the overflow (and rebase the patch).

What do you think?
(In reply to Victor Carlquist from comment #83)
> Hi, 
> Is this still a valid bug?

Yes, I think this is still a valid case to optimize.

> If so, we could rewrite the test, using the oomTest function to test the
> overflow (and rebase the patch).
> 
> What do you think?

Yes, the oomTest function should answer Jesse concerns, and we would no longer need to take all the memory as well.

Thanks for looking into it again :)
Flags: needinfo?(nicolas.b.pierron)
Attached patch Patch. (obsolete) — Splinter Review
Microbench:

var str = "i";
for (var i = 1; i < 100; i++ ) {
    str += "-i";
    str += "->i";
}
for(var i = 0; i < 40000; i++ )
    if (i % 2 == 0)
        str = str.split("-").join(">");
    else
        str = str.split(">").join("-");

Before: 1.111s
After:  0.487s

For some reason, we weren't inlining the join function but this patch does it now.
When we execute the test bug1112537.js with --no-baseline, it is showing up the following message: 
> s.split is not a function
Is it a bug?
I think this test can answer the Jesse's question.

Thanks.
Attachment #8554738 - Attachment is obsolete: true
Attachment #8699619 - Flags: review?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #85)
> For some reason, we weren't inlining the join function but this patch does
> it now.

That's because of bug 1137624. Please make sure we fix all issues reported here.
Comment on attachment 8699619 [details] [diff] [review]
Patch.

Review of attachment 8699619 [details] [diff] [review]:
-----------------------------------------------------------------

> When we execute the test bug1112537.js with --no-baseline, it is showing up the following message: 
> > s.split is not a function

This might be an related to the returned value of ArrayJoin, maybe?

::: js/src/jit-test/tests/ion/bug1112537.js
@@ +10,5 @@
> +var s = "--";
> +
> +oomTest(() => {
> +    try {
> +        s = s.split("-").join("->");

remove the "s = " from this test case, to make sure we have the same work-load.
Also make this a real function, which fill an object given as argument with the value of inIon, such that we can ensure that the function is running in Ion before using the oomTest function.

function test(obj) {
  obj.res = s.split("-").join("->");
  obj.inIon = inIon();
}

var context = { res: "", inIon: false };

oomTest(() => test(context));
while (!context.inIon)
  test(context);
oomTest(() => test(context));

We should no longer see any internal error in this test case.  And we only want to check if OOM are reported as OOM.
The try catch is not necessary as the oomTest function catch OOM for us.

@@ +34,5 @@
> +    assertEq(oomInBaseline, true);
> +
> +// if baseline failed, Ion has to fail too.
> +if (compiledOnIon) 
> +    assertEq(oomInIon, true);
\ No newline at end of file

There is no need for these assertions, and it is even more miss-leading as the failure are not happening at the same location.
Attachment #8699619 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch Patch (obsolete) — Splinter Review
Thank you for your review, again =D

I fixed the test case.
We need to fix the ArrayJoin before landing it.
Attachment #8699619 - Attachment is obsolete: true
Attachment #8700095 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8700095 [details] [diff] [review]
Patch

Review of attachment 8700095 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +66,5 @@
>        // Array natives.
>        case InlinableNative::Array:
>          return inlineArray(callInfo);
> +      case InlinableNative::ArrayJoin:
> +        return inlineArrayJoin(callInfo);

I will do that while rebasing & landing a patch that I forgot on Bug 1137624.
Attachment #8700095 - Flags: review?(nicolas.b.pierron) → review+
Depends on: 1137624
I was thinking, we could optimize the replace function too.
I made the following microbench:

var str = "i";
for (var i = 1; i < 100; i++ ) {
    str += "-i";
    str += "->i";
}
for(var i = 0; i < 40000; i++ )
    if (i % 2 == 0)
        str = str.replace(/-/g, ">");
    else
        str = str.replace(/>/g, "-");

It takes 1.158s while the other microbench in comment 85 took 0.487s.
Attached patch Patch rebased.Splinter Review
The last patch was hitting |!iter->hasLiveDefUses(), at jit/IonAnalysis.cpp:2343| after the patch on bug 1137624 was landed.

So, now we are setting the ArrayJoin as recoveredOnBailout in MArrayJoin::foldsTo, it solves the issue.
Attachment #8700095 - Attachment is obsolete: true
Attachment #8702065 - Flags: review?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #90)
> I was thinking, we could optimize the replace function too.
> 
>         str = str.replace(/>/g, "-");

I suggest you try that in another Bug ;)

I would also expect that RegExp have semantic which include side effects which records matches.  So we would have to mirror the same if we want to do so.
Comment on attachment 8702065 [details] [diff] [review]
Patch rebased.

Review of attachment 8702065 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, let's push it to try and ask for checkin-needed.

::: js/src/jit/MIR.h
@@ +7681,5 @@
>  
>      bool writeRecoverData(CompactBufferWriter& writer) const override;
>      bool canRecoverOnBailout() const override {
> +        if (isFlatReplacement_)
> +            return false;

We should open a follow-up bug to add flat replacement bit to the recover instruction.
Attachment #8702065 - Flags: review?(nicolas.b.pierron) → review+
Blocks: 1235403
https://hg.mozilla.org/mozilla-central/rev/31c27281f518
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: