Cache String.prototype.split

RESOLVED FIXED in Firefox 38

Status

()

defect
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: luke, Assigned: victorcarlquist)

Tracking

unspecified
mozilla38
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 fixed)

Details

Attachments

(3 attachments, 24 obsolete attachments)

4.34 KB, patch
luke
: feedback+
Details | Diff | Splinter Review
4.65 KB, text/plain
Details
10.95 KB, patch
victorcarlquist
: review+
Details | Diff | Splinter Review
(Reporter)

Description

8 years ago
On this micro-bench:

  for (var i = 0; i < 500000; ++i)
      a = "done fail unicorns ponies esperanto".split(" ");

v8 is more than 2x faster.  Super-optimized string matching? Fast array creation?  Generation GC ftw?  No, a cache that maps (string, pattern) to results.  The question asked by this bug is whether this was added to cheat on some unrealistic benchmark or whether this is optimizing real-world usage.  Seems simple enough to instrument a browser and measure.

To wit, on this loop:

  for (var i = 0; i < 500000; ++i)
      a = ("done fail unicorns ponies esperanto" + i).split(" ");

we are 25% faster in the shell on my machine.

"Yo cash ain't nothin' but trash" -- Fats Waller (via brendan)
"Ain't misbehavin'" -- v8
(Reporter)

Comment 2

8 years ago
From http://code.google.com/p/v8/source/detail?r=9164:

"Optimize the common obfuscator pattern where ["foo","bar","baz"]
gets converted fo "foo,bar,baz".split(",").  If the inputs are
symbols we cache the result and make the substrings into symbols."

Obfuscators.  I wonder if we should be benchmarking more of them.
Alternatively, is it reasonable to constant-fold this pattern?  Or do something like:

if (String.prototype.split == WHAT_WE_EXPECT)
  return <constant-folded-result>;
else
  return <call-it>
(Reporter)

Comment 4

7 years ago
Totally.  TI can tell us 'split' is the real split so, in the best case, we wouldn't even need a test.  We could just emit code to build the array.  Doing this from jit (IonMonkey, preferably) would be easiest.  When IonMonkey is on stable and on mozilla-central, perhaps this would be a [good first bug]?

Comment 5

7 years ago
Frankly I don't think it's a huge win.  I have mainly seen it in js1k entries, where it annoyed me and it was an easy change so I did it.  I don't know of a real benchmark that uses it, but I did fall over http://jsperf.com/occurence-counting-2 which I guess is what caused this bug to be filed.
(In reply to Erik Corry from comment #5)
> Frankly I don't think it's a huge win.

http://jsperf.com/character-counting/3 Firefox is ahead in every test ’cept for the .split() case

Comment 7

7 years ago
That hardly qualifies as a real benchmark.

Comment 8

7 years ago
Also, the split version gets the wrong answer for example on the input string ".a.b.".
Luke, I'd like to take this on as a side project to get to know the js engine. Can you provide some tips on what files to look into and a general starting direction?
Assignee: general → jaws
Status: NEW → ASSIGNED
To return the spirit, I am going to help you out here. :)

So String.prototype.split is implemented in jsstr.cpp as js::str_split. We already do caching for other stuff like eval or Math.sin etc. MathCache from my point of view is the cleanest implementation and you can find that mostly in jsmath.h and it's used int jsmath.cpp.

The EvalCache uses strings for lookups and needs to guard on more conditions so it might be more related to this case. You want to look at EvalCacheLookup, EvalCacheHashPolicy, EvalCache in jscntx.h The EvalCache needs some special class to handle function destruction and construction and thus is implemented in Eval.cpp as EvalScriptGuard. You probably won't need such complexity.

Of course in the end you should make sure that we are as good as v8 and don't regress anything. (v8 seems to only to this of Atoms/symbols, so memory usage shouldn't increase because we never garbage collect atoms anyway.)
(Reporter)

Comment 11

7 years ago
(In reply to Tom Schuster [:evilpie] from comment #10) 
> Of course in the end you should make sure that we are as good as v8 and
> don't regress anything. (v8 seems to only to this of Atoms/symbols, so
> memory usage shouldn't increase because we never garbage collect atoms
> anyway.)

Atoms do indeed get GC'd.  I'd consider purging the cache with the other caches that point to GC things in PurgeRuntime (js/src/jsgc.cpp).
(In reply to Erik Corry from comment #8)
> Also, the split version gets the wrong answer for example on the input
> string ".a.b.".

A bit OT, but as far as I can tell, it gets the answer wrong for every string, as it should be s.split(".").length-1 - a change that is reflected in later revisions of the jsperf.
Assignee: jaws → general
Status: ASSIGNED → NEW
(Reporter)

Comment 13

6 years ago
Someone just pointed out that jQuery uses "x".split("y") expression which have a constant value.  It'd be great to measure this on a jQuery benchmark or on some heavy jQuery sites.
Blocks: 917839
(Assignee)

Comment 14

5 years ago
I would like to work on this bug. As this would be my first bug I would like additional guidance.
(Reporter)

Comment 15

5 years ago
A good starting point would be to instrument a browser to measure how often this pattern is used and how much time we could save from this optimization.   The implementation of String.prototype.splitis js::str_split in js/src/jsstr.cpp.  String literals are "atomized" (allocated once), so a good approximation of "literal".split("literal") would be test whether the 'this' and separator arguments of str_split are atoms (JSString::isAtom()).  It'd also be good to accumulate the total time spent in str_split for cases where we have atom.split(atom).  After getting these measurements working, I'd browser around some popular sites (particularly those using jQuery).
(Assignee)

Comment 16

5 years ago
Hi luke(In reply to Luke Wagner [:luke] from comment #15)
> A good starting point would be to instrument a browser to measure how often
> this pattern is used and how much time we could save from this optimization.
> The implementation of String.prototype.splitis js::str_split in
> js/src/jsstr.cpp.  String literals are "atomized" (allocated once), so a
> good approximation of "literal".split("literal") would be test whether the
> 'this' and separator arguments of str_split are atoms (JSString::isAtom()). 
> It'd also be good to accumulate the total time spent in str_split for cases
> where we have atom.split(atom).  After getting these measurements working,
> I'd browser around some popular sites (particularly those using jQuery).

Hi Luke, I'm 'cached' the last split executed. I'll write code to measure how often this pattern is used

JSObject *
js::str_split_string(JSContext *cx, HandleTypeObject type, HandleString str, HandleString sep)
{
    static Rooted<JSLinearString*> cachedString;
    static RootedObject cachedAobj;

    Rooted<JSLinearString*> linearStr(cx, str->ensureLinear(cx));
    if (!linearStr)
        return nullptr;

    Rooted<JSLinearString*> linearSep(cx, sep->ensureLinear(cx));
    if (!linearSep)
        return nullptr;
    
    RootedObject aobj(cx);
    if(cachedString != linearStr){
        cachedString = linearStr;
    
        uint32_t limit = UINT32_MAX;

        if (linearSep->length() == 0) {
            aobj = CharSplitHelper(cx, linearStr, limit);
        } else {
            SplitStringMatcher matcher(cx, linearSep);
            aobj = SplitHelper(cx, linearStr, limit, matcher, type);
        }

        if (!aobj)
            return nullptr;

        aobj->setType(type);    
        cachedAobj = aobj;
           
    }else
        aobj = cachedAobj;
    return aobj;
}

--------------------------
tested:
var start = new Date().getTime();
for (var i = 0; i < 500000; ++i)
      a = ("done fail unicorns ponies esperanto").split(" ");
var end = new Date().getTime();
var time = end - start;

------------------------------
before: 
9064
after:
6338 

So,  I'm doing it right?
(Assignee)

Comment 17

5 years ago
(In reply to Victor Carlquist from comment #16)
> Hi luke(In reply to Luke Wagner [:luke] from comment #15)
> > A good starting point would be to instrument a browser to measure how often
> > this pattern is used and how much time we could save from this optimization.
> > The implementation of String.prototype.splitis js::str_split in
> > js/src/jsstr.cpp.  String literals are "atomized" (allocated once), so a
> > good approximation of "literal".split("literal") would be test whether the
> > 'this' and separator arguments of str_split are atoms (JSString::isAtom()). 
> > It'd also be good to accumulate the total time spent in str_split for cases
> > where we have atom.split(atom).  After getting these measurements working,
> > I'd browser around some popular sites (particularly those using jQuery).
> 
> Hi Luke, I'm 'cached' the last split executed. I'll write code to measure
> how often this pattern is used
> 
> JSObject *
> js::str_split_string(JSContext *cx, HandleTypeObject type, HandleString str,
> HandleString sep)
> {
>     static Rooted<JSLinearString*> cachedString;
>     static RootedObject cachedAobj;
> 
>     Rooted<JSLinearString*> linearStr(cx, str->ensureLinear(cx));
>     if (!linearStr)
>         return nullptr;
> 
>     Rooted<JSLinearString*> linearSep(cx, sep->ensureLinear(cx));
>     if (!linearSep)
>         return nullptr;
>     
>     RootedObject aobj(cx);
>     if(cachedString != linearStr){
>         cachedString = linearStr;
>     
>         uint32_t limit = UINT32_MAX;
> 
>         if (linearSep->length() == 0) {
>             aobj = CharSplitHelper(cx, linearStr, limit);
>         } else {
>             SplitStringMatcher matcher(cx, linearSep);
>             aobj = SplitHelper(cx, linearStr, limit, matcher, type);
>         }
> 
>         if (!aobj)
>             return nullptr;
> 
>         aobj->setType(type);    
>         cachedAobj = aobj;
>            
>     }else
>         aobj = cachedAobj;
>     return aobj;
> }
> 
> --------------------------
> tested:
> var start = new Date().getTime();
> for (var i = 0; i < 500000; ++i)
>       a = ("done fail unicorns ponies esperanto").split(" ");
> var end = new Date().getTime();
> var time = end - start;
> 
> ------------------------------
> before: 
> 9064
> after:
> 6338 
> 
> So,  I'm doing it right?

Sorry, this code is wrong.
(Reporter)

Comment 18

5 years ago
Hi!  It looks like you're off to a good start.  One issue is that, in general, we can't keep anything in static variables, especially not Rooted (they need to be on the stack, since they are kept in a LIFO linked list).  Instead, you'd probably want to keep some weak JSString pointer in the JSRuntime that gets cleared on GC (at least for now, maybe there is a better way).
(Reporter)

Comment 19

5 years ago
Also, in the future, you can post 'work in progress' patches as attachments to the bug, which make them a bit easier to read.
The ideal place to do this is in the jits.

The Baseline IC can easily generate an optimized stub for the |literalString.split(literalString)| case, and can hold template array result object within the stub.

Note that the |split| optimization will always need to allocate a new array, not just return the same one, since the result of |split| needs to be a newly allocated array.

Ion can handle these cases as well.  |TI| will tell us that the |.split| is the canonical String.prototype.split, and Ion can generate inline MNewArray and MStoreElement(Constant) instructions for the split call.
In fact, Ion already does some optimizations for |String#split|.  See MCallOptimize.cpp, IonBuilder::inlineStringSplit.  Currently the "optimization" is only to carry the TypeObject for the resulting array into the split function.

Optimizing this out can go the following route:

1. Baseline generates optimized JSOP_CALL stubs for constantString.split(constantString)
2. Baseline's optimized stub contains the template Array that is copied and returned.
3. Ion modifies IonBuilder::inlineStringSplit to check for optimized case (constant string inputs), and if so, inspects baseline ICs, pulls out template array object from the baseline stub, and generates inline MIR instructions for constructing and populating the returned array.

Sticking a global cache for the result of the last split string seems unappealing compared to the general approach above.
Also note bug 977966, which might be an interesting next task for whoever ends up implementing this.
(Assignee)

Comment 23

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #21)
> In fact, Ion already does some optimizations for |String#split|.  See
> MCallOptimize.cpp, IonBuilder::inlineStringSplit.  Currently the
> "optimization" is only to carry the TypeObject for the resulting array into
> the split function.
> 
> Optimizing this out can go the following route:
> 
> 1. Baseline generates optimized JSOP_CALL stubs for
> constantString.split(constantString)
> 2. Baseline's optimized stub contains the template Array that is copied and
> returned.
> 3. Ion modifies IonBuilder::inlineStringSplit to check for optimized case
> (constant string inputs), and if so, inspects baseline ICs, pulls out
> template array object from the baseline stub, and generates inline MIR
> instructions for constructing and populating the returned array.
> 
> Sticking a global cache for the result of the last split string seems
> unappealing compared to the general approach above.

Ok, I'll follow this route, thanks Kannan
(Assignee)

Comment 24

5 years ago
Hi! I have some questions, 
How I check if parameter is a constant string or not? and where can I find documentation about CallInfo members? =)
(Assignee)

Comment 25

5 years ago
I cached all split on this patch, but, in the future, I want to cache only constant string.
(Assignee)

Comment 26

5 years ago
sorry. this is right code.

I cached all split on this patch, but, in the future, I want to cache only constant string.
During Ion compilation, to find out if a MInstruction is a constant, you do the following:

  if (ins->isConstant()) {
    Value val = ins->toConstant()->value();
    ...
  }

However, I would advise starting this with a simple Baseline optimized stub for JSOP_CALL that implements a constant string split.  The baseline stub can be written to hold the template array object that will be copied and returned.

Once that's done, we can turn to enabling Ion optimization.  When ion-optimizing the call to str_split, Ion can inspect the baseline stub chain for the call operation to see if there are any StringSplit stubs with matching inputs.  If there are (likely), it can pull the template array object from the stub, pull each individual element, and directly generate inline instructions for NewArray + ElementStore.
(Assignee)

Comment 28

5 years ago
Hi!
I copied and modified MathCache to support JSString and JSObject (StrCache), but I don't know if it is correct. 
Also, I don't understand how to work inlineStringSplit, can you help me?
Attachment #8439500 - Attachment is obsolete: true
Attachment #8439505 - Attachment is obsolete: true
Attachment #8440465 - Flags: feedback?(kvijayan)
(Assignee)

Comment 29

5 years ago
Posted patch work in progress (no Jit) (obsolete) — Splinter Review
This patch is cleaner than the other patch..

Also, I ran the SunSpider benchmark, but the results wasn't satisfactory.

Maybe using jit, it'll become faster.
Attachment #8440465 - Attachment is obsolete: true
Attachment #8440465 - Flags: feedback?(kvijayan)
Attachment #8441739 - Flags: feedback+
(Assignee)

Comment 30

5 years ago
Posted file SunSpider benchmark (obsolete) —
Comment on attachment 8441739 [details] [diff] [review]
work in progress (no Jit)

Setting the feedback flag to ?kannan.

Victor, the way these flags work is that you request someone's feedback (or review or a number of other things) by setting the flag to "?" and entering their name in the field to the right of the flag. That person then gives feedback and sets the flag to "+" or "-" or resets it entirely, which usually means something like "I like the approach, but there are enough issues here that I'm not comfortable giving a +".

Adding the right person to the request is important because bugzilla sends out a special email notification that will stand out in the flood of bugzilla emails most developers get on a daily basis.
Attachment #8441739 - Flags: feedback+ → feedback?(kvijayan)
Great you're working on it, but we should not forget to do the measurements Luke suggested in comment 15. This optimization is a bit of a hack and we should have some (measurable) justification for adding it...
(Assignee)

Comment 33

5 years ago
(In reply to Till Schneidereit [:till] from comment #31)
> Comment on attachment 8441739 [details] [diff] [review]
> work in progress (no Jit)
> 
> Setting the feedback flag to ?kannan.
> 
> Victor, the way these flags work is that you request someone's feedback (or
> review or a number of other things) by setting the flag to "?" and entering
> their name in the field to the right of the flag. That person then gives
> feedback and sets the flag to "+" or "-" or resets it entirely, which
> usually means something like "I like the approach, but there are enough
> issues here that I'm not comfortable giving a +".
> 
> Adding the right person to the request is important because bugzilla sends
> out a special email notification that will stand out in the flood of
> bugzilla emails most developers get on a daily basis.

Ok! Thanks. Till
(Assignee)

Comment 34

5 years ago
I want to know if I modified the correct files, or not.

In negative case, which file should I modify?

Thanks!
(Assignee)

Comment 35

5 years ago
(In reply to Luke Wagner [:luke] from comment #15)
> A good starting point would be to instrument a browser to measure how often
> this pattern is used and how much time we could save from this optimization.
> The implementation of String.prototype.splitis js::str_split in
> js/src/jsstr.cpp.  String literals are "atomized" (allocated once), so a
> good approximation of "literal".split("literal") would be test whether the
> 'this' and separator arguments of str_split are atoms (JSString::isAtom()). 
> It'd also be good to accumulate the total time spent in str_split for cases
> where we have atom.split(atom).  After getting these measurements working,
> I'd browser around some popular sites (particularly those using jQuery).

Hi Luke.

To measure the spent time in str_split, can I use clock() function?
And how can I print this result?
(Reporter)

Comment 36

5 years ago
(In reply to Victor Carlquist from comment #35)
I'd use PRMJ_Now() to measure 'before' and 'after'; you can then convert this into milliseconds via double(after - before) / PRMJ_USEC_PER_MSEC.  For ad hoc measurements, you can accumulate the total time in some global variable and printf the final reuse in, say, JS_DestroyRuntime (so, when the browser closes), or in some obscure JS function you can trigger from the JS console (personally, I like math_toSource in jsmath.cpp).
(Assignee)

Comment 37

5 years ago
(In reply to Luke Wagner [:luke] from comment #36)
> (In reply to Victor Carlquist from comment #35)
> I'd use PRMJ_Now() to measure 'before' and 'after'; you can then convert
> this into milliseconds via double(after - before) / PRMJ_USEC_PER_MSEC.  For
> ad hoc measurements, you can accumulate the total time in some global
> variable and printf the final reuse in, say, JS_DestroyRuntime (so, when the
> browser closes), or in some obscure JS function you can trigger from the JS
> console (personally, I like math_toSource in jsmath.cpp).

Thanks Luke! =)
(Assignee)

Comment 38

5 years ago
Posted file Measure - possible future gain (obsolete) —
Attachment #8442939 - Flags: feedback?(luke)
(Assignee)

Comment 39

5 years ago
Attachment #8442939 - Attachment is obsolete: true
Attachment #8442939 - Flags: feedback?(luke)
Attachment #8442942 - Flags: feedback?(luke)
(Reporter)

Comment 40

5 years ago
Comment on attachment 8442942 [details] [diff] [review]
Measure - future possible  gain

Review of attachment 8442942 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +3755,5 @@
>  /* ES5 15.5.4.14 */
>  bool
>  js::str_split(JSContext *cx, unsigned argc, Value *vp)
>  {
> +	double start, endAtom;

You probably want to use PRMJ_Now() since JS_Now() is a JS_PUBLIC_API function and thus incurs some calling overhead.  Second, you might want to use int64_t since this is the return type of PRMJ_Now() and it avoid the potential for PRMJ_Now()'s result falling outside the 2^53 integer range of doubles.

@@ +3771,5 @@
>          return false;
>      AddTypePropertyId(cx, type, JSID_VOID, Type::StringType());
>  
> +    if(str.get()->isAtom() && args.get(0).toString()->isAtom())
> +    	endAtom =  JS_Now();

This will assert/crash in the unlikely case that args.get(0) is not a string.  I think you need the additional condition args.get(0).isString().

@@ +3845,5 @@
>          return false;
>  
> +    if(str.get()->isAtom() && args.get(0).toString()->isAtom()){
> +    	milli_Optimize += double(endAtom - start) / PRMJ_USEC_PER_MSEC;
> +    	milli_No_Optimize += double(JS_Now() - start) / PRMJ_USEC_PER_MSEC;

I was thinking something more like: take the time before/after the portion of split that you can optimize and then:
  if (optimizable)
    savedTime += delta;
  totalTime += delta;
Attachment #8442942 - Flags: feedback?(luke)
(Assignee)

Comment 41

5 years ago
Attachment #8442942 - Attachment is obsolete: true
Attachment #8443087 - Flags: feedback?(luke)
(Reporter)

Comment 42

5 years ago
Comment on attachment 8443087 [details] [diff] [review]
Measure - future possible gain

Looks good!
Attachment #8443087 - Flags: feedback?(luke) → feedback+
(Assignee)

Comment 43

5 years ago
(In reply to Luke Wagner [:luke] from comment #42)
> Comment on attachment 8443087 [details] [diff] [review]
> Measure - future possible gain
> 
> Looks good!

Great! 
Luke, what is the next step?
(Reporter)

Comment 44

5 years ago
I'd build a browser and browse around a bunch of sites (esp. popular sites) and see how often the  optimization applies.  You might want to move the printf() from JS_DestoryRuntime into something you can call from the JS console so you can print after each site.  Grepping http://code.jquery.com/jquery-2.1.1.js for "split", I do several instances of "literal".split("literal"), so that's encouraging.
(Assignee)

Comment 45

5 years ago
Hi!
I accessed some popular sites, and I got this results:

bbc.co.uk
Saved Time 0.983 ms
Total Time 3.123 ms
Gain 31.4761%

facebook.com
Saved Time 2.261 ms
Total Time 17.173 ms
Gain 13.166%

youtube.com
Saved Time 5.837 ms
Total Time 34.406 ms
Gain 16.9651%

yahoo.com
Saved Time 3.295 ms
Total Time 19.39 ms
Gain 16.9933%

wikipedia.org
Saved Time 0.081 ms
Total Time 1.74 ms
Gain 4.65517%

amazon.com
Saved Time 4.791 ms
Total Time 17.041 ms
Gain 28.1145%

cnn.com
Saved Time 0.947 ms
Total Time 7.342 ms
Gain 12.8984%

netflix.com
Saved Time 23.167 ms
Total Time 65.012 ms
Gain 35.635%
(Reporter)

Comment 46

5 years ago
That's pretty encouraging!

I realized that the measurement is a bit over-simplified, though: we can't optimize *any* split of an atom by an atom.  I guess the next step, then, is to measure how much we'd with with a real cache (cleared on GC, btw).  It'd be good to measure how much a general dynamically-sized cache wins over a small, fixed size cache which, it looks like, is what v8 has.

An additional statistic I'd be interested in: what is the distribution of sizes of these atom.split(atom) cases.  That is, are pages doing a few big splits() or many small splits.  If the wins are mostly big splits, then the JITs won't really benefit from this cache and we could probably get all the win from just changing the C++ str_split.  It looks like v8 does their caching in C++.
(Assignee)

Comment 47

5 years ago
So, How can we measure the saved time with real cache?

I would create three categories about size string:
small : size < 400
medium:  400 < size < 800
big: 800 < size 

I'll count the number of strings in each category, 
What do you think?
(Assignee)

Comment 48

5 years ago
Hi, I implemented additional statistic, following this rules:

size < 400
medium:  400 <= size < 800
big:  size >= 800

bbc.co.uk
Saved Time 14.267 ms
Total Time 26.472 ms
Gain 53.8947%
Small: 7214
Medium: 0
Big: 0

facebook.com
Saved Time 1.153 ms
Total Time 13.722 ms
Gain 8.40257%
Small: 598
Medium: 0
Big: 0

youtube.com
Saved Time 5.786 ms
Total Time 36.671 ms
Gain 15.7781%
Small: 2769
Medium: 2
Big: 2

yahoo.com
Saved Time 1.454 ms
Total Time 14.822 ms
Gain 9.80974%
Small: 893
Medium: 0
Big: 0

wikipedia.org
Saved Time 0.92 ms
Total Time 3.63 ms
Gain 25.3444%
Small: 320
Medium: 0
Big: 0

amazon.com
Saved Time 1.115 ms
Total Time 13.895 ms
Gain 8.02447%
Small: 760
Medium: 0
Big: 0

cnn.com
Saved Time 1.003 ms
Total Time 8.864 ms
Gain 11.3154%
Small: 816
Medium: 0
Big: 0

netflix.com
Saved Time 21.218 ms
Total Time 58.201 ms
Gain 36.4564%
Small: 7018
Medium: 0
Big: 0
(Reporter)

Comment 49

5 years ago
Well, this is all pretty encouraging; seems like this optimization will provide real improvements in practice.  What Kannan outlined in comment 27 sounds like a good strategy.  Ideally, we'd get copy-on-write arrays (bug 934450) so we wouldn't have to actually clone a new array each time (that's what v8 does), but since we are dealing with many small str.split's here, I expect we'll still get a pretty big win (in fact, for really small splits, we may be faster than v8 since we're not calling out to C++, doing a hash-table lookup, etc).
(Assignee)

Comment 50

5 years ago
(In reply to Luke Wagner [:luke] from comment #49)
Hi, This is really encouraging!

we would create a cache in each object storing the parameter and result, so when this object is called again, we'd return cache. 

what do you think?
(Reporter)

Comment 51

5 years ago
(In reply to Victor Carlquist from comment #50)
I'm not sure what you mean by "in each object".  In general, we can't cache the array object result (since it's mutable and thus each split() call needs a new instance (or COW).  I liked Kannan's suggestion which is to generate JIT code to efficiently create the new array clone.
(Assignee)

Comment 52

5 years ago
Hi, 
I'm sorry but I don't understand the Kannan's suggestion, how can I hold the template array object? (comment 27)

Thanks!
(Assignee)

Comment 53

5 years ago
Hi,
Where JSOP_CALL stub is generated?

Thanks!
(Assignee)

Comment 54

5 years ago
Hi,
I'm running some tests using |literal|.split(|literal|) e.g. 

for (var i = 0; i < 500000; ++i)
  a = ("done fail unicorns ponies esperanto").split(" ");
for (var i = 0; i < 500000; ++i)
  a = ("done fail unicorns ponies esperanto").split(" ");

"done fail unicorns ponies esperanto").split(" ");

but ins->isConstant() is never return 'true', only ins->isDefinition() return 'true'.

Is it normal?
(Assignee)

Comment 55

5 years ago
Posted patch work in progress (obsolete) — Splinter Review
Hi, 
This code result in "segmentation fault (core dumped)", and I don't know why.

Can I create the optimize in CodeGenerator.cpp?

Do I need to move the output register in any address memory to get right result?
Attachment #8441739 - Attachment is obsolete: true
Attachment #8441739 - Flags: feedback?(kvijayan)
Attachment #8453374 - Flags: feedback?(luke)
Assignee: general → nobody
(Reporter)

Comment 56

5 years ago
Comment on attachment 8453374 [details] [diff] [review]
work in progress

Review of attachment 8453374 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, this got lost in my email stream.  I'm afraid I'm not all that familiar with bailouts and general Ion compilation.  Perhaps nbp can help you here?
Attachment #8453374 - Flags: feedback?(luke) → feedback?(nicolas.b.pierron)
Comment on attachment 8453374 [details] [diff] [review]
work in progress

Review of attachment 8453374 [details] [diff] [review]:
-----------------------------------------------------------------

I am not sure to see why you are getting SEGV, but I see multiple reasons why this patch would not work (see below), also, we should have assertion messages about the lack of safepoint when the oolCallVM is generated, are you using debug builds of SpiderMonkey?  If so you should have an assertion failure that you can track with gdb.

I think Kannan's idea is to avoid generating a MStringSplit MIR Instruction when we know that both arguments are constants, and instead make a MNewArray and it initialization, as if we were copying from a cache.  I think he's idea is to implement this in either IonBuilder::inlineStringSplit or MStringSplit::foldsTo.  Have a look at how we do that in IonBuilder::inlineArray [1]

style-nit: do not use tabs.

[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/MCallOptimize.cpp#277

::: js/src/jit/CodeGenerator.cpp
@@ +5579,5 @@
> +	 // Load elements and length.
> +
> +	 OutOfLineCode *ool = oolCallVM(StringSplitInfo, lir, (ArgList(), ImmGCPtr(lir->mir()->typeObject()), str, sep ), StoreRegisterTo(output));
> +	 if(!ool)
> +		 return false;

This code define an out-of-line function call, but to execute it you have to add a:
  masm.jump(ool->entry());

Note that oolCallVM are only interesting if the call is optional and not mandatory.

::: js/src/jit/Lowering.cpp
@@ +2730,5 @@
>      JS_ASSERT(ins->separator()->type() == MIRType_String);
>  
>      LStringSplit *lir = new(alloc()) LStringSplit(useRegisterAtStart(ins->string()),
>                                                    useRegisterAtStart(ins->separator()));
> +    return assignSnapshot(lir, Bailout_NonPrimitiveInput) && define(lir, ins) /*&& assignSafepoint(lir, ins)*/;

If you are making a callVM or an oolCallVM, then we need to tell the GC about the content of the stack when we do the call.  This is what assignSafepoint is used for.

::: js/src/vm/Runtime.h
@@ +1101,5 @@
>      js::MathCache *maybeGetMathCache() {
>          return mathCache_;
>      }
> +    JSString *temp1;
> +    JSString *temp2;

I don't see any uses of these fields, is that part of another patch?
Attachment #8453374 - Flags: feedback?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #54)
> Hi,
> I'm running some tests using |literal|.split(|literal|) e.g. 
> 
> for (var i = 0; i < 500000; ++i)
>   a = ("done fail unicorns ponies esperanto").split(" ");
> for (var i = 0; i < 500000; ++i)
>   a = ("done fail unicorns ponies esperanto").split(" ");
> 
> "done fail unicorns ponies esperanto").split(" ");
> 
> but ins->isConstant() is never return 'true', only ins->isDefinition()
> return 'true'.
> 
> Is it normal?

Have you checked the inputs?  MStringSplit would never return true to isConstant(), on the other hand, its inputs will.
  ins->isStringSplit()
  ins->getOperand(0)->isConstant()
  ins->getOperand(1)->isConstant()
(Assignee)

Comment 59

5 years ago
Posted patch split.patch (obsolete) — Splinter Review
Attachment #8464763 - Flags: feedback?(kvijayan)
(Assignee)

Comment 60

5 years ago
(In reply to Victor Carlquist from comment #59)
> Created attachment 8464763 [details] [diff] [review]
> split.patch

I'm avoiding to generate MStringSplit instruction, but this patch is 10% slower, I don't know why...
Thanks!
Kannan, when would you be able to provide some feedback on Victor's patch?
Flags: needinfo?(kvijayan)
Comment on attachment 8464763 [details] [diff] [review]
split.patch

Review of attachment 8464763 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +1198,5 @@
> +		for (uint32_t i = 0; i < initLength; ++i) {
> +			id = MConstant::New(alloc(), Int32Value(i));
> +			current->add(id);
> +
> +			MConstant *value = MConstant::NewAsmJS(alloc(), templateObjectSplit->getDenseOrTypedArrayElement(i), MIRType_Object);

Why are you using MConstant::NewAsmJs here instead of MConstant::New ?  You probably want MConstant::New.

@@ +1204,5 @@
> +			// There is normally no need for a post barrier on these writes
> +			// because the new array will be in the nursery. However, this
> +			// assumption is volated if we specifically requested pre-tenuring.
> +			if (ins->initialHeap() == gc::TenuredHeap)
> +				current->add(MPostWriteBarrier::New(alloc(), ins, value));

PostWriteBarrier should happen after the store, not before.
Attachment #8464763 - Flags: feedback?(kvijayan)
Sorry for the late reply.

Victor: The patch seems like it should be just fine, aside from the points mentioned.

I think the slowdown may be because when you try optimizing the constant case and fail, you simply return directly without doing the non-inlined (yet still very good) optimization.

So the main optimization code for stringSplit should try the constant optimization, and if that fails, go back to the regular optimization.

I just landed a small optimization that does something very similar, so you can take a look at how we usually organize that logic:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8da59dd9fc7f
Flags: needinfo?(kvijayan)
(Assignee)

Comment 64

5 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
This patch is working.
We could reduce the execution time:
>for (var i = 0; i < 50000; ++i) 
>   a = ("done fail unicorns ponies esperanto").split(" "); 
before: 26ms
after: 3ms
Attachment #8453374 - Attachment is obsolete: true
Attachment #8464763 - Attachment is obsolete: true
Attachment #8466482 - Flags: feedback?(kvijayan)
(Assignee)

Comment 65

5 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
This patch is working.
We could reduce the execution time:
>for (var i = 0; i < 50000; ++i) 
>   a = ("done fail unicorns ponies esperanto").split(" "); 
before: 26ms
after: 3ms
Attachment #8466482 - Attachment is obsolete: true
Attachment #8466482 - Flags: feedback?(kvijayan)
Attachment #8466495 - Flags: feedback?(kvijayan)
(Assignee)

Comment 66

5 years ago
(In reply to Victor Carlquist from comment #65)
> Created attachment 8466495 [details] [diff] [review]
> bug688219.patch

The a.length() is not returning correct value.
I'll try fix it.
(Assignee)

Comment 67

5 years ago
Posted patch bug-688219.patch (obsolete) — Splinter Review
I fixed the length bug.
Attachment #8466495 - Attachment is obsolete: true
Attachment #8466495 - Flags: feedback?(kvijayan)
Attachment #8466709 - Flags: feedback?(kvijayan)
Comment on attachment 8466709 [details] [diff] [review]
bug-688219.patch

Review of attachment 8466709 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +1157,5 @@
>      }
>  
>      callInfo.setImplicitlyUsedUnchecked();
>  
> +    if(callInfo.thisArg()->isConstant() && callInfo.getArg(0)->isConstant()){

Please split the entire logic of this conditional out into a seperate method (maybe ::inlineConstantStringSplit).

Then, in |inlineStringSplit| you can do:

  InlineStatus constInlineStatus = inlineConstantStringSplit(...);
  if (constInlineStatus != InlineStatus_NotInlined)
    return constInlineStatus;

That way, even if the code within the conditional fails to inline, the generic stringSplit optimization code will kick in.
Attachment #8466709 - Flags: feedback?(kvijayan)
(Assignee)

Comment 69

5 years ago
Posted patch work in progress (obsolete) — Splinter Review
Created  inlineConstantStringSplit method.
Attachment #8466709 - Attachment is obsolete: true
Attachment #8466733 - Flags: feedback?(kvijayan)
(Assignee)

Comment 70

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #68)
> Comment on attachment 8466709 [details] [diff] [review]
> bug-688219.patch
> 
> Review of attachment 8466709 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MCallOptimize.cpp
> @@ +1157,5 @@
> >      }
> >  
> >      callInfo.setImplicitlyUsedUnchecked();
> >  
> > +    if(callInfo.thisArg()->isConstant() && callInfo.getArg(0)->isConstant()){
> 
> Please split the entire logic of this conditional out into a seperate method
> (maybe ::inlineConstantStringSplit).
> 
> Then, in |inlineStringSplit| you can do:
> 
>   InlineStatus constInlineStatus = inlineConstantStringSplit(...);
>   if (constInlineStatus != InlineStatus_NotInlined)
>     return constInlineStatus;
> 
> That way, even if the code within the conditional fails to inline, the
> generic stringSplit optimization code will kick in.

Thanks for your reply!
Comment on attachment 8466733 [details] [diff] [review]
work in progress

Review of attachment 8466733 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good.  Just run it against the major benchmarks and make sure that nothing is regressing, run it through try to ensure that it's not introducing any new bugs, and you should be good to go.
Attachment #8466733 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 72

5 years ago
Hi, Great!

I ran SunSpider and http://jsperf.com/occurence-counting-2 benchmark, I think nothing is regressing (string w/ pattern is a little bit slower comparing the two results) and I didn't find any new bugs.
Attachment #8441740 - Attachment is obsolete: true
Please check against Octane as well.  Sunspider runs extremely quickly, so a lot of JIT changes just won't show up on it, regression or improvement.
(Assignee)

Comment 74

5 years ago
Hi, 
I tried to run Octane, but this patch is crashing firefox... =(
(Assignee)

Comment 75

5 years ago
I am debugging the code, but I still don't know what's the problem.
Seems the raytrace test in Octane is failing.  
Maybe the memory isn't allocated in some test or the resumeAfter is failing;

> if (!resumeAfter(length))
>    return InliningStatus_Error;

Do you have any suggestion?
(Assignee)

Comment 76

5 years ago
Hi, 
I have executed Octane several times, sometimes the patch shows 'Segmentation fault' when it tries to call:
>  initLength = templateObjectSplit->as<ArrayObject>().length();

But, sometimes this code works correctly.

Does templateObjectSplit->as<ArrayObject>() mean that is null? In affirmative case, how can I verify this?

This problem occurs in raytrace.js:
> 882  var pixelSize = "5,5".split(','); //  $F('pixelSize').split(',');

Thanks!
(Assignee)

Comment 77

5 years ago
Posted patch bug-688219-fix.patch (obsolete) — Splinter Review
Hi,
I fixed the bug! but I needed to create a new variable to store length value.

Results of Octane:
before: 14374,4 (Average of 5)
after:  14490,8 (Average of 5)

Thanks in advance!
Attachment #8472618 - Flags: feedback?(kvijayan)
Nice work!

I think this proves the concept and shows some real gains, but I am uncomfortable with certain aspects of the current approach - in particular the caching of a single global template object and length.  This caches exactly one stringSplit operation across the entire runtime, and that's far more benchmark-specific than I like.

I think the right and more general way to do this is to use the BaselineJIT ICs for JSOP_CALL as caches that keep track of the TemplateObject and resulting array length, and have Ion use that during compilation.  This would allow for optimizing every site individually, and would work as a general optimization for STRING_LITERAL.split(STRING_LITERAL) idioms.

The thing is, most of the Ion work can stay the same.  We just need to take a step back and move the caching into Baseline ICs instead of keeping it on the Runtime.  This involves creating an optimized ICCall_StringSplit stub in Baseline, and using that to store the template object.  The template object can then be cloned on a per-site basis.

Does that sound reasonable?  I can create the appropriate bugs and take you through this work, but I think the end result will be pretty cool.  What do you think?
Attachment #8472618 - Flags: feedback?(kvijayan) → feedback+
Needinfo for comment 78
Flags: needinfo?(victorcarlquist)
(Assignee)

Comment 80

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #79)
> Needinfo for comment 78

True, you're right.
I'd like to work on this bug, your idea sound reasonable for me, I'll start to study BaselineJIT and ICs.
Flags: needinfo?(victorcarlquist)
(In reply to Kannan Vijayan [:djvj] from comment #78)
> This
> would allow for optimizing every site individually, and would work as a
> general optimization for STRING_LITERAL.split(STRING_LITERAL) idioms.

Not sure it's relevant, but note that this probably interacts with bug 977966 in some way.
(In reply to Victor Carlquist from comment #80)
> (In reply to Kannan Vijayan [:djvj] from comment #79)
> > Needinfo for comment 78
> 
> True, you're right.
> I'd like to work on this bug, your idea sound reasonable for me, I'll start
> to study BaselineJIT and ICs.

Awesome.  I'll create a bug for just optimizing this in baseline, and make it a blocker for this bug.  We can discuss the baseline specifics in there.

Thanks victor!
Depends on: 1054330
FWIW, we have COW arrays now, we could probably use this here.
Yeah, it was on my mind :) This can even be applied to the baseline optimization (the jitcode for the stub could CoW-clone the template array instead of doing a full copy).
So, now that the baseline work is done, the Ion optimization should be pretty quick, in fact.

The first step is to take a look at BaselineInspector (in js/src/jit/BaselineInspector.{h,cpp}).  Add a method to that class to check whether the baseline IC at a particular bytecode offset is an optimizable string split.

The signature of the method should look something like:

bool
BaselineInspector::isOptimizableCallStringSplit(jsbytecode *pc, JSString **stringOut, JSObject **objOut);

This should check the IC chain, see if it contains an ICCall_StringSplit stub, and if so return true (and also store the matching string and result array into stringOut and objOut, respectively).
(Assignee)

Comment 86

5 years ago
Posted patch WIP (obsolete) — Splinter Review
Hi!

I made the changes (:
I have some questions:
1) Should we check if callInfo.getArg(0) in Ion is equal to the expectedArg() in stub?
2) What's the difference between current->add() and current->push()?

Thanks!
Attachment #8466733 - Attachment is obsolete: true
Attachment #8472618 - Attachment is obsolete: true
Attachment #8498822 - Flags: feedback?(kvijayan)
Comment on attachment 8498822 [details] [diff] [review]
WIP

Review of attachment 8498822 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineInspector.cpp
@@ +424,5 @@
> +    for (ICStub *stub = entry.firstStub(); stub; stub = stub->next()) {
> +        if (stub->kind() == ICStub::Call_StringSplit) {
> +            *stringOut = stub->toCall_StringSplit()->expectedThis();
> +            *objOut = stub->toCall_StringSplit()->templateObject();
> +            break;

Instead of breaking here, you can just return true directly.

I also think it's good to assert here that the the |numOptimizedStubs| on the entry is exactly one, since that's how we coded the behaviour in Baseline.  (If StringSplit stub is attached, it's the only optimized stub attached).

::: js/src/jit/MCallOptimize.cpp
@@ +1208,5 @@
>  IonBuilder::InliningStatus
> +IonBuilder::inlineConstantStringSplit(CallInfo &callInfo)
> +{
> +    JSString *stringThis;
> +    JSObject *templateObject;

Move these down to right before the call to isOptimizableCallStringSplit.

@@ +1210,5 @@
> +{
> +    JSString *stringThis;
> +    JSObject *templateObject;
> +
> +    if (!callInfo.thisArg()->isConstant())

Add another conditional ensuring that the constant |this| is a string.  This will let you use a cheap check to shortcut out of the rest of the logic when the |this| is not a string, instead of making that check later after a bunch of other expensive checks.

@@ +1214,5 @@
> +    if (!callInfo.thisArg()->isConstant())
> +        return InliningStatus_NotInlined;
> +
> +    // Check if exist a template object in stub.
> +    if (inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject)) {

This conditional can be reversed, and allow you to put the rest of the logic at a lower tab level, with:

if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject))
  return InliningStatus_NotInlined;

... rest of code ...


Also, you do want to confirm here that the string in the stub matches up with the constant string.  The two strings should be pointer-identical, since they're both derived from the bytecode literal string, so a pointer compare should be sufficient to check if they are equal.

The reason you want to double-check this is because you don't know for a fact how that MConstant for the |this| was generated.  Now, it's VERY VERY likely that they are the same, but I don't think I can guarantee that it will be this way 100% of the time.  There may be some rare corner case which allows an attacker to trick the IonBuilder into generating a MConstant string for the |this|, that differs from the baseline-recorded string.  Better be safe and check to be sure.
Attachment #8498822 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 88

5 years ago
Posted patch bug688219WIP.patch (obsolete) — Splinter Review
I fixed the code.
Thanks.
Attachment #8498822 - Attachment is obsolete: true
Attachment #8499319 - Flags: feedback?(kvijayan)
Comment on attachment 8499319 [details] [diff] [review]
bug688219WIP.patch

Review of attachment 8499319 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Address the minor re-orderings suggested here.  And then fill in the array-copy logic required for the inlining.

::: js/src/jit/BaselineInspector.cpp
@@ +426,5 @@
> +    if (entry.fallbackStub()->numOptimizedStubs() != 1)
> +        return false;
> +
> +    ICStub *stub = entry.firstStub();
> +    if (stub->kind() == ICStub::Call_StringSplit) {

Invert the conditional here.  The rest of the code has the style:

 if (failureCondition)
    return false;

 ... rest of code ...

Good to stick to that style here too.  So:

  if (stub->kind() != ICStub_CallStringSplit)
    return false;

  ...

::: js/src/jit/MCallOptimize.cpp
@@ +1235,5 @@
> +        return InliningStatus_NotInlined;
> +    }
> +
> +    const js::Value *strval = callInfo.thisArg()->toConstant()->vp();
> +    if (!strval->isString())

move this check up to immediately after the

  (callInfo.thisArg()->type() != MIRType_String)

check.  It's a cheap check whose failure can shortcut the rest of the expensive TI checks.

@@ +1239,5 @@
> +    if (!strval->isString())
> +        return InliningStatus_NotInlined;
> +
> +    JSString *str = strval->toString();
> +    if (str != stringThis)

Move this check to immediately after the |isOptimizableCallStringSplit| check.  This is another cheap check that can can fail early and save us the rest of the expensive type-inference checks.
Comment on attachment 8499319 [details] [diff] [review]
bug688219WIP.patch

Review of attachment 8499319 [details] [diff] [review]:
-----------------------------------------------------------------

Forgot to + the feedback.
Attachment #8499319 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 91

5 years ago
Posted patch bug688219WIP.patch (obsolete) — Splinter Review
It's working!

micro-bench:
  for (var i = 0; i < 500000; ++i)
      a = "done fail unicorns ponies esperanto".split(" ");

no-patch:
237ms ~

with-patch:
7ms ~

Thanks.
Attachment #8499319 - Attachment is obsolete: true
Attachment #8500007 - Flags: feedback?(kvijayan)

Comment 92

5 years ago
Looks like we've largely settled on "good idea", so updating summary.
Summary: v8 has a String.prototype.split cache: benchmarketing or good idea? → Cache String.prototype.split
Comment on attachment 8500007 [details] [diff] [review]
bug688219WIP.patch

Review of attachment 8500007 [details] [diff] [review]:
-----------------------------------------------------------------

I've made a few comments, but I have a more fundamental question.  Why are you adding a new |templateObjectForJIT| to the Baseline IC stub?  I don't understand why there needs to be a new template object for this.

Could you walk me through the motivations that prompted the more extensive set of changes you made in this patch?  (Not saying that they're wrong, but I don't know why the |templateObjectForJIT| is needed, for example)

::: js/src/jit/MCallOptimize.cpp
@@ +1220,5 @@
> +
> +    // Check if exist a template object in stub.
> +    JSString *stringThis;
> +    JSObject *templateObject;
> +    JSObject *objCopy;

Initialize these pointers to nullptr.

@@ +1222,5 @@
> +    JSString *stringThis;
> +    JSObject *templateObject;
> +    JSObject *objCopy;
> +    if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject, &objCopy))
> +        return InliningStatus_NotInlined;

After, this, MOZ_ASSERT(stringThis); since it should be set.

@@ +1225,5 @@
> +    if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject, &objCopy))
> +        return InliningStatus_NotInlined;
> +
> +    if (!templateObject)
> +        return InliningStatus_NotInlined;

If |isOptimizableCallStringSplit| returned true, then templateObject must have been filled in, right?  Shouldn't this be a MOZ_ASSERT instead?

@@ +1228,5 @@
> +    if (!templateObject)
> +        return InliningStatus_NotInlined;
> +
> +    if (!objCopy)
> +        return InliningStatus_NotInlined;

Same as above, shouldn't this be a MOZ_ASSERT?
Attachment #8500007 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 94

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #93)
> Comment on attachment 8500007 [details] [diff] [review]
> bug688219WIP.patch
> 
> Review of attachment 8500007 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I've made a few comments, but I have a more fundamental question.  Why are
> you adding a new |templateObjectForJIT| to the Baseline IC stub?  I don't
> understand why there needs to be a new template object for this.
> 

Thanks for your reply, 
My first idea was just use |templateObject| in |MNewArray| but it was showing:

> Assertion failure: isTenured(), at /js/src/gc/Heap.h:1159 

I'm not sure if the last patch has a good approach.
So, the main motivation to use |templateObjectForJIT| is to avoid the failure, shown above.
Flags: needinfo?(kvijayan)
Ah, I think the issue here is that the template object is getting created in the nursery, and objects in the nursery can't be used as constants in Ion.

I tried to find a better way of approaching this, but ultimately I think the way you chose was the best approach.  Ok, I'll take another look at this.
Flags: needinfo?(kvijayan)
Actually, the issue is not in baseline at all.  I don't know why MNewArray is a UnaryInstruction.  This makes no sense, because it doesn't assign its constant operand to any regsiter when lowering.  There's no reason that argument needs to be an MConstant.  MNewArray could just store a reference to the ArrayObject directly in one of its fields.

In fact, MNewArrayCopyOnWrite is a nullary instruction.  No reason MNewArray can't also be nullary.

The assert you were running into was because you can't build MConstants out of pointers to non-tenured objects.  But you shouldn't have to.

Lemme look into this more.
(In reply to Victor Carlquist from comment #77)
> Created attachment 8472618 [details] [diff] [review]
> bug-688219-fix.patch
> 
> Hi,
> I fixed the bug! but I needed to create a new variable to store length value.
> 
> Results of Octane:
> before: 14374,4 (Average of 5)
> after:  14490,8 (Average of 5)
> 
> Thanks in advance!

Which tests in octane improve with this patch?
(Assignee)

Comment 98

5 years ago
(In reply to Brian Hackett (:bhackett) from comment #97)
> (In reply to Victor Carlquist from comment #77)
> > Created attachment 8472618 [details] [diff] [review]
> > bug-688219-fix.patch
> Which tests in octane improve with this patch?

Sorry, I don't have the results anymore, I think, this patch improved Raytrace test, I'm not sure,  but I'll test it again :)
(Assignee)

Comment 99

5 years ago
(In reply to Victor Carlquist from comment #98)
> (In reply to Brian Hackett (:bhackett) from comment #97)
> > (In reply to Victor Carlquist from comment #77)
> > > Created attachment 8472618 [details] [diff] [review]
> > > bug-688219-fix.patch
> > Which tests in octane improve with this patch?
> 
> Sorry, I don't have the results anymore, I think, this patch improved
> Raytrace test, I'm not sure,  but I'll test it again :)

Sorry,
I did the test again with more accuracy (average of 15) and this patch does not have impact over the result of Octane...

Here is the results:
Richards		-0.41%
DeltaBlue		-0.78%
Crypto	        	-0.20%
RayTrace		-0.67%
EarleyBoyer	 	+0.39%
RegExp		        +1.34%
Splay		        -1.31%
SplayLatency		-1.19%
NavierStokes		+0.01%
PdfJS	 	        +2.22%
Mandreel		-1.51%
MandreelLatency		+2.52%
Gameboy		        -2.54%
CodeLoad		-0.25%
Box2D		        +0.47%
zlib		        +0.18%
Typescript		+1.25%
----		
Score (version 9)	-0.04%
(In reply to Kannan Vijayan [:djvj] from comment #96)
> Actually, the issue is not in baseline at all.  I don't know why MNewArray
> is a UnaryInstruction.  This makes no sense, because it doesn't assign its
> constant operand to any regsiter when lowering.  There's no reason that
> argument needs to be an MConstant.  MNewArray could just store a reference
> to the ArrayObject directly in one of its fields.
> 
> In fact, MNewArrayCopyOnWrite is a nullary instruction.  No reason MNewArray
> can't also be nullary.
> 
> The assert you were running into was because you can't build MConstants out
> of pointers to non-tenured objects.  But you shouldn't have to.
> 
> Lemme look into this more.

Flagging for needinfo so this doesn't get forgotten.
Status: NEW → ASSIGNED
Flags: needinfo?(kvijayan)
Assignee: nobody → victorcarlquist
(Assignee)

Comment 101

5 years ago
Posted patch WIP (obsolete) — Splinter Review
Hi!

As we change the |nursery| templateObject to |Tenured| templateObject in bug1086530, I think that we should keep it as Tenured because the field |templateObject_| in MNewArrayCopyOnWrite is |AlwaysTenured<>|, so anyway, the actual templateObject for MNewArray should be Tenured, independently if it'll be |MNullaryInstruction| (I'm not familiar with Ion to confirm it yet...), what do you thing about it?

This patch is working, but, as Jan and you suggested, I want to implement CoW, but I have no idea on how to create a CoW templateObject to MNewArrayCopyOnWrite... Could you help me about this?  

This patch have a regression (0m0.038s to 0m0.134s) on test below:

> var testString = "We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence,promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.";
> 
> function test(testString) {
>    return testString.split("e").length - 1;
> }
> 
> for(var i=0; i<1000000; i++) {
>     test(testString);
> }
 
Thanks in advanced :)
Attachment #8500007 - Attachment is obsolete: true
Attachment #8519599 - Flags: feedback?(kvijayan)
Comment on attachment 8519599 [details] [diff] [review]
WIP

Review of attachment 8519599 [details] [diff] [review]:
-----------------------------------------------------------------

Personally, I think it's better to get this optimization in as-is (with fixes) before worrying about the CoW implementation.

I looked into the CoW implementation briefly.  The main wrinkle there is that we have to create a peroper CoW TypeObject for the result of that site's StringSplit, and currently, the CoW type-generation is tied to array literals in the source code.

A good chunk of the logic is going to be shared between this implementation and that on anyway, so I think going ahead and landing something useful, and then improving it later is good.

::: js/src/jit/IonMacroAssembler.cpp
@@ +1103,5 @@
>                       Address(obj, NativeObject::offsetOfElements()));
>          } else if (ntemplate->is<ArrayObject>()) {
>              Register temp = slots;
> +            // Kannan: It's really necessary?
> +            //MOZ_ASSERT(!ntemplate->getDenseInitializedLength());

Not sure if this is OK to disable.  I think it should be fine, but I'd ask h4writer or jandem about it.

::: js/src/jit/MCallOptimize.cpp
@@ +1290,5 @@
> +    // const js::Value *argval = callInfo.getArg(0)->toConstant()->vp();
> +    // if (!argval->isString())
> +    //    return InliningStatus_NotInlined;
> +    // if (argval->toString()->length() == 0)
> +    //    return InliningStatus_NotInlined;

Why are the argVal checks disabled here?

@@ +1312,5 @@
> +    MOZ_ASSERT(templateObject->is<ArrayObject>());
> +
> +    JSString *str = strval->toString();
> +    if (str != stringThis)
> +        return InliningStatus_NotInlined;

Also check if (argval->toString() != str) here and if so, fail inlining.

@@ +1324,5 @@
> +    if (!key.maybeTypes())
> +        return InliningStatus_NotInlined;
> +
> +    if (!key.maybeTypes()->hasType(types::Type::StringType())) {
> +        key.freeze(constraints());

I'm not sure if this freeze is necessary.  If we don't see a StringType in the typeset, we choose not to optimize.  If the typeset later changes, it doesn't invalidate that choice.  I'd doublecheck with bhackett, but this should be removable.

@@ +1333,5 @@
> +    if (templateObject->getDenseInitializedLength() != initLength)
> +        return InliningStatus_NotInlined;
> +
> +    for (uint32_t i = 0; i < initLength; i++) {
> +        MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints());

You're generating constant |values| here and then throwing them away.  Seems like a waste, since MDefinitions (of which MConstant is a subtype) are pretty large.  I think you should accumulate these in a vector, and use them later (since you need them anyway).

@@ +1335,5 @@
> +
> +    for (uint32_t i = 0; i < initLength; i++) {
> +        MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints());
> +        if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) {
> +            key.freeze(constraints());

See comment above on freeze.
Attachment #8519599 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 103

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #102)
> Comment on attachment 8519599 [details] [diff] [review]
> WIP
> 
> Review of attachment 8519599 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Personally, I think it's better to get this optimization in as-is (with
> fixes) before worrying about the CoW implementation.

Ok, I'll fixed what you spotted ;)
 
> ::: js/src/jit/IonMacroAssembler.cpp
> @@ +1103,5 @@
> >                       Address(obj, NativeObject::offsetOfElements()));
> >          } else if (ntemplate->is<ArrayObject>()) {
> >              Register temp = slots;
> > +            // Kannan: It's really necessary?
> > +            //MOZ_ASSERT(!ntemplate->getDenseInitializedLength());
> 
> Not sure if this is OK to disable.  I think it should be fine, but I'd ask
> h4writer or jandem about it.

The patch is failing on this assert because the templateObject is not empty (so is ntemplate->getDenseInitializedLength() > 0), if the disabling is wrong, I'll have to find a way to correct my patch to avoid that it falls in this.

Thanks for your reply!
(Assignee)

Comment 104

5 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
I made the changes.

Thanks again (:
Attachment #8519599 - Attachment is obsolete: true
Attachment #8521078 - Flags: feedback?(kvijayan)
Comment on attachment 8521078 [details] [diff] [review]
bug688219.patch

Review of attachment 8521078 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +1280,5 @@
> +    if (!callInfo.thisArg()->isConstant())
> +        return InliningStatus_NotInlined;
> +
> +    if (callInfo.thisArg()->type() != MIRType_String)
> +        return InliningStatus_NotInlined;

Since you're checking the types of constant values directly later on, this conditional can be removed, and instead have an assert after the |isString()| checks below.

@@ +1291,5 @@
> +        return InliningStatus_NotInlined;
> +
> +    const js::Value *strval = callInfo.thisArg()->toConstant()->vp();
> +    if (!strval->isString())
> +        return InliningStatus_NotInlined;

As noted above.  After this, you can do

MOZ_ASSERT(callInfo.getArg(0)->type() == MIRType_String);
MOZ_ASSERT(callInfo.thisArg()->type() == MIRType_String);

@@ +1329,5 @@
> +    uint32_t initLength = templateObject->as<ArrayObject>().length();
> +    if (templateObject->getDenseInitializedLength() != initLength)
> +        return InliningStatus_NotInlined;
> +
> +    Vector<MConstant *, 0, SystemAllocPolicy> vec;

Name this vector something more meaningful, like |arrayValues|.

@@ +1333,5 @@
> +    Vector<MConstant *, 0, SystemAllocPolicy> vec;
> +    for (uint32_t i = 0; i < initLength; i++) {
> +        MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints());
> +        if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) {
> +            vec.clearAndFree();

This is not necessary.  The vector will release it's heap allocations during destruction on the return below.

@@ +1336,5 @@
> +        if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) {
> +            vec.clearAndFree();
> +            return InliningStatus_NotInlined;
> +        }
> +        vec.append(value);

Vector append() can fail.  Return InliningStatus_Error if it does.

@@ +1345,5 @@
> +            getInlineReturnTypeSet()->convertDoubleElements(constraints());
> +    if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles)
> +        templateObject->setShouldConvertDoubleElements();
> +    else
> +        templateObject->clearShouldConvertDoubleElements();

I don't even know if this whole code block concerning conversion-to-double is necessary.  We know that we're going to be returning an array of strings.  The existence or not of a convert-to-double flag on the object is irrelevant.

I think this whole section can be replaced with a simpler:

  if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles)
    return InliningStatus_NotInlined;

@@ +1366,5 @@
> +    // jsop_initelem_array is doing because we do not expect to bailout
> +    // because the memory is supposed to be allocated by now.
> +    MConstant *id = nullptr;
> +    for (uint32_t i = 0; i < initLength; i++) {
> +       id = MConstant::New(alloc(), Int32Value(i));

The |id| var can be scoped inside the loop.

@@ +1374,5 @@
> +       if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles) {
> +           MInstruction *valueDouble = MToDouble::New(alloc(), value);
> +           current->add(valueDouble);
> +           value = valueDouble->toConstant();
> +       } else

With the changes above, this conditional can be removed entirely, and replaced with |current->add(value);|.

@@ +1390,5 @@
> +    }
> +
> +    // Update the length.
> +    MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id);
> +    current->add(length);

This setLength will get as input the last |id| stored in the loop, which will be |initLength - 1|, which is wrong.  You need a new MConstant here:

MConstant *length = MConstant::New(alloc(), Int32Value(initLength));
current->add(length);

MSetInitializedLength *setLength = MSetInitializedLength::New(...);
current->add(setLength);

@@ +1392,5 @@
> +    // Update the length.
> +    MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id);
> +    current->add(length);
> +
> +    vec.clearAndFree();

This is not necessary.  The destructor will do it automatically on all exit paths.
Attachment #8521078 - Flags: feedback?(kvijayan) → feedback+
(Assignee)

Comment 106

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #105)
 
> @@ +1390,5 @@
> > +    }
> > +
> > +    // Update the length.
> > +    MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id);
> > +    current->add(length);
> 
> This setLength will get as input the last |id| stored in the loop, which
> will be |initLength - 1|, which is wrong.  You need a new MConstant here:
> 
> MConstant *length = MConstant::New(alloc(), Int32Value(initLength));
> current->add(length);
> 
> MSetInitializedLength *setLength = MSetInitializedLength::New(...);
> current->add(setLength);
> 

Hi Kannan, thanks for review!

We have a little problem.
When I set this with Int32Value(initLength), I got the following result:

> done,fail,unicorns,ponies,esperanto9.704187067161276e-101,

but the right result would be:

> done,fail,unicorns,ponies,esperanto

It occur because the |SetInitializedLength| is adding '1' over length, so it's overflowing...

(CodeGenerator.cpp)
>bool
>CodeGenerator::visitSetInitializedLength(LSetInitializedLength *lir)
>{
>    Address initLength(ToRegister(lir->elements()), ObjectElements::offsetOfInitializedLength());
>    Int32Key index = ToInt32Key(lir->index());
>    masm.bumpKey(&index, 1);
>    masm.storeKey(index, initLength);
>    // Restore register value if it is used/captured after.
>    masm.bumpKey(&index, -1);
>}

So, What should I do?
(In reply to Victor Carlquist from comment #106)
> (In reply to Kannan Vijayan [:djvj] from comment #105)
>  
> Hi Kannan, thanks for review!
> 
> We have a little problem.
> When I set this with Int32Value(initLength), I got the following result:
> 
> > done,fail,unicorns,ponies,esperanto9.704187067161276e-101,
> 
> but the right result would be:
> 
> > done,fail,unicorns,ponies,esperanto
> 
> It occur because the |SetInitializedLength| is adding '1' over length, so
> it's overflowing...
> 
> (CodeGenerator.cpp)
> >bool
> >CodeGenerator::visitSetInitializedLength(LSetInitializedLength *lir)
> >{
> >    Address initLength(ToRegister(lir->elements()), ObjectElements::offsetOfInitializedLength());
> >    Int32Key index = ToInt32Key(lir->index());
> >    masm.bumpKey(&index, 1);
> >    masm.storeKey(index, initLength);
> >    // Restore register value if it is used/captured after.
> >    masm.bumpKey(&index, -1);
> >}
> 
> So, What should I do?

Well that's super weird that it would do an incr/decr around the setInitializedLength.  It seems like SetInitializedLength is always called with the index of the last element set, and it internally increments it if necessary.  The way you were doing it before is correct, then.

Good find.
Flags: needinfo?(kvijayan)
(Assignee)

Comment 108

5 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
I think that this patch is OK.
Attachment #8521078 - Attachment is obsolete: true
Attachment #8523061 - Flags: review?(kvijayan)
Comment on attachment 8523061 [details] [diff] [review]
bug688219.patch

Review of attachment 8523061 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +1380,5 @@
> +       if (ins->initialHeap() == gc::TenuredHeap)
> +           current->add(MPostWriteBarrier::New(alloc(), ins, value));
> +    }
> +    // Update the length.
> +    MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id);

I'm wondering if we actually need this.  MNewArray with the template object _should_ correctly set the intializedLength and the length.  In this case, this setLength is redundant.  Can you test without this line and see if it still works?
(Assignee)

Comment 110

5 years ago
(In reply to Kannan Vijayan [:djvj] from comment #109)
> Comment on attachment 8523061 [details] [diff] [review]
> bug688219.patch
> 
> Review of attachment 8523061 [details] [diff] [review]:
> -----------------------------------------------------------------

You are right! it works without that.
(Assignee)

Comment 111

5 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
Patch without |MSetInitializedLength| instruction.
Attachment #8523061 - Attachment is obsolete: true
Attachment #8523061 - Flags: review?(kvijayan)
Attachment #8523334 - Flags: review?(kvijayan)
(Assignee)

Comment 112

4 years ago
Posted patch patch rebased (obsolete) — Splinter Review
Attachment #8523334 - Attachment is obsolete: true
Attachment #8523334 - Flags: review?(kvijayan)
Attachment #8532964 - Flags: review?(kvijayan)
Comment on attachment 8532964 [details] [diff] [review]
patch rebased

Review of attachment 8532964 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MCallOptimize.cpp
@@ +1347,5 @@
> +        if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet()))
> +            return InliningStatus_NotInlined;
> +
> +        if (!arrayValues.append(value))
> +            return InliningStatus_NotInlined;

This should return InliningStatus_Error instead, as it's an OOM
Attachment #8532964 - Flags: review?(kvijayan) → review+
(Assignee)

Comment 114

4 years ago
Posted patch bug688219.patch (obsolete) — Splinter Review
I fixed that. (:

Thanks!
Attachment #8532964 - Attachment is obsolete: true
Attachment #8533738 - Flags: review?(kvijayan)
Sorry, I've been super busy and just got to this.  Reviewing now.
(Assignee)

Comment 116

4 years ago
No problem Kannan ;)
Comment on attachment 8533738 [details] [diff] [review]
bug688219.patch

Review of attachment 8533738 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry again.  This is great.  Everything looks pretty tight.  Have you benched the final result with the latest patch?
Attachment #8533738 - Flags: review?(kvijayan) → review+
(Assignee)

Comment 118

4 years ago
I'll bench and push it in try server.
Thanks again!
Hey Victor,

I rebased your patch (there were some changes to the names of various TI-related structs.  Types became ObjectGroups, etc.).  And pushed to try here:

 https://treeherder.mozilla.org/#/jobs?repo=try&revision=0edec69d7759

You can crib the rebased patch from there if you don't feel like doing it again (it's a bit of a pain in the butt to discover the new names).

I'm somewhat excited about benching this so I'm going to run the benches locally too.
Wow, just benched this on linux x86-64.. with a microbenchmark.  Results are sweet:

function foo() {
    var count = 0;
    for (var i = 0; i < 10000000; i++) {
        var arr = "zing,bing,bang".split(",");
        var arr2 = "foo|bar|boom".split("|");
        for (var j = 0; j < arr.length; j++)
            count += j;
        for (var j = 0; j < arr2.length; j++)
            count += j;
    }
    return count;
}

function main() {
    var d1 = new Date();
    var result = foo();
    var d2 = new Date();
    print("Time: " + (d2 - d1) + "  result=" + result);
}

main();

Reference build (without your patch):
Time: 3818  result=60000000
Time: 3786  result=60000000
Time: 4047  result=60000000
Time: 3796  result=60000000
Time: 3843  result=60000000


Optimized build (with your patch):
Time: 117  result=60000000
Time: 79  result=60000000
Time: 77  result=60000000
Time: 81  result=60000000
Time: 77  result=60000000

Averaging those times out:
3858ms on reference
86.2ms on optimized

Speedup of more than 40x on the microbench.  Nice :)
(Assignee)

Comment 121

4 years ago
Posted patch Patch (obsolete) — Splinter Review
I fixed that issue :)
 
I added the line:
 > MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id);

The RArrayState wasn't getting the right length value in Recover.

Thanks.
Attachment #8533738 - Attachment is obsolete: true
Attachment #8563578 - Flags: review?(kvijayan)
(Assignee)

Comment 122

4 years ago
(In reply to Kannan Vijayan [:djvj] from comment #120)

Awesome! It's very fast.
You helped me a lot, thank you!
Comment on attachment 8563578 [details] [diff] [review]
Patch

Review of attachment 8563578 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  If you can get this green on try, you have my blessing to land.  The last few changes have all been trivial.  So feel free to post a green 'try' link and then land this.

Good job on getting this done, and sorry again about the review delays on my end.

::: js/src/jit/MCallOptimize.cpp
@@ +1398,5 @@
> +    MOZ_ASSERT(templateObject);
> +    MOZ_ASSERT(templateObject->is<ArrayObject>());
> +
> +    JSString *str = strval->toString();
> +    if (str != stringThis)

The |str| var doesn't seem necessary here.

  if (strval->toString() != stringThis)

should be sufficient.

@@ +1402,5 @@
> +    if (str != stringThis)
> +        return InliningStatus_NotInlined;
> +
> +    str = argval->toString();
> +    if (str != stringArg)

likewise, here:

  if (argval->toString() != stringArg)

should be sufficient.
Attachment #8563578 - Flags: review?(kvijayan) → review+
(Assignee)

Comment 124

4 years ago
Posted patch PatchSplinter Review
I made the changes that Kannan suggested.
Attachment #8563578 - Attachment is obsolete: true
Attachment #8564090 - Flags: review+
(Assignee)

Comment 125

4 years ago
Comment on attachment 8564090 [details] [diff] [review]
Patch

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac760a841d7e

Try is green, so I'm setting the checkin flag. 
Thanks!
Attachment #8564090 - Flags: checkin?

Updated

4 years ago
Attachment #8564090 - Flags: checkin?

Updated

4 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a94fb6a46420
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1134074
You need to log in before you can comment on or make changes to this bug.