Closed Bug 688219 Opened 13 years ago Closed 10 years ago

Cache String.prototype.split

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: luke, Assigned: victorcarlquist)

References

Details

Attachments

(3 files, 24 obsolete files)

4.34 KB, patch
luke
: feedback+
Details | Diff | Splinter Review
4.65 KB, text/plain
Details
10.95 KB, patch
victorcarlquist
: review+
Details | Diff | Splinter Review
On this micro-bench: for (var i = 0; i < 500000; ++i) a = "done fail unicorns ponies esperanto".split(" "); v8 is more than 2x faster. Super-optimized string matching? Fast array creation? Generation GC ftw? No, a cache that maps (string, pattern) to results. The question asked by this bug is whether this was added to cheat on some unrealistic benchmark or whether this is optimizing real-world usage. Seems simple enough to instrument a browser and measure. To wit, on this loop: for (var i = 0; i < 500000; ++i) a = ("done fail unicorns ponies esperanto" + i).split(" "); we are 25% faster in the shell on my machine. "Yo cash ain't nothin' but trash" -- Fats Waller (via brendan)
"Ain't misbehavin'" -- v8
From http://code.google.com/p/v8/source/detail?r=9164: "Optimize the common obfuscator pattern where ["foo","bar","baz"] gets converted fo "foo,bar,baz".split(","). If the inputs are symbols we cache the result and make the substrings into symbols." Obfuscators. I wonder if we should be benchmarking more of them.
Alternatively, is it reasonable to constant-fold this pattern? Or do something like: if (String.prototype.split == WHAT_WE_EXPECT) return <constant-folded-result>; else return <call-it>
Totally. TI can tell us 'split' is the real split so, in the best case, we wouldn't even need a test. We could just emit code to build the array. Doing this from jit (IonMonkey, preferably) would be easiest. When IonMonkey is on stable and on mozilla-central, perhaps this would be a [good first bug]?
Frankly I don't think it's a huge win. I have mainly seen it in js1k entries, where it annoyed me and it was an easy change so I did it. I don't know of a real benchmark that uses it, but I did fall over http://jsperf.com/occurence-counting-2 which I guess is what caused this bug to be filed.
(In reply to Erik Corry from comment #5) > Frankly I don't think it's a huge win. http://jsperf.com/character-counting/3 Firefox is ahead in every test ’cept for the .split() case
That hardly qualifies as a real benchmark.
Also, the split version gets the wrong answer for example on the input string ".a.b.".
Luke, I'd like to take this on as a side project to get to know the js engine. Can you provide some tips on what files to look into and a general starting direction?
Assignee: general → jaws
Status: NEW → ASSIGNED
To return the spirit, I am going to help you out here. :) So String.prototype.split is implemented in jsstr.cpp as js::str_split. We already do caching for other stuff like eval or Math.sin etc. MathCache from my point of view is the cleanest implementation and you can find that mostly in jsmath.h and it's used int jsmath.cpp. The EvalCache uses strings for lookups and needs to guard on more conditions so it might be more related to this case. You want to look at EvalCacheLookup, EvalCacheHashPolicy, EvalCache in jscntx.h The EvalCache needs some special class to handle function destruction and construction and thus is implemented in Eval.cpp as EvalScriptGuard. You probably won't need such complexity. Of course in the end you should make sure that we are as good as v8 and don't regress anything. (v8 seems to only to this of Atoms/symbols, so memory usage shouldn't increase because we never garbage collect atoms anyway.)
(In reply to Tom Schuster [:evilpie] from comment #10) > Of course in the end you should make sure that we are as good as v8 and > don't regress anything. (v8 seems to only to this of Atoms/symbols, so > memory usage shouldn't increase because we never garbage collect atoms > anyway.) Atoms do indeed get GC'd. I'd consider purging the cache with the other caches that point to GC things in PurgeRuntime (js/src/jsgc.cpp).
(In reply to Erik Corry from comment #8) > Also, the split version gets the wrong answer for example on the input > string ".a.b.". A bit OT, but as far as I can tell, it gets the answer wrong for every string, as it should be s.split(".").length-1 - a change that is reflected in later revisions of the jsperf.
Assignee: jaws → general
Status: ASSIGNED → NEW
Someone just pointed out that jQuery uses "x".split("y") expression which have a constant value. It'd be great to measure this on a jQuery benchmark or on some heavy jQuery sites.
Blocks: 917839
I would like to work on this bug. As this would be my first bug I would like additional guidance.
A good starting point would be to instrument a browser to measure how often this pattern is used and how much time we could save from this optimization. The implementation of String.prototype.splitis js::str_split in js/src/jsstr.cpp. String literals are "atomized" (allocated once), so a good approximation of "literal".split("literal") would be test whether the 'this' and separator arguments of str_split are atoms (JSString::isAtom()). It'd also be good to accumulate the total time spent in str_split for cases where we have atom.split(atom). After getting these measurements working, I'd browser around some popular sites (particularly those using jQuery).
Hi luke(In reply to Luke Wagner [:luke] from comment #15) > A good starting point would be to instrument a browser to measure how often > this pattern is used and how much time we could save from this optimization. > The implementation of String.prototype.splitis js::str_split in > js/src/jsstr.cpp. String literals are "atomized" (allocated once), so a > good approximation of "literal".split("literal") would be test whether the > 'this' and separator arguments of str_split are atoms (JSString::isAtom()). > It'd also be good to accumulate the total time spent in str_split for cases > where we have atom.split(atom). After getting these measurements working, > I'd browser around some popular sites (particularly those using jQuery). Hi Luke, I'm 'cached' the last split executed. I'll write code to measure how often this pattern is used JSObject * js::str_split_string(JSContext *cx, HandleTypeObject type, HandleString str, HandleString sep) { static Rooted<JSLinearString*> cachedString; static RootedObject cachedAobj; Rooted<JSLinearString*> linearStr(cx, str->ensureLinear(cx)); if (!linearStr) return nullptr; Rooted<JSLinearString*> linearSep(cx, sep->ensureLinear(cx)); if (!linearSep) return nullptr; RootedObject aobj(cx); if(cachedString != linearStr){ cachedString = linearStr; uint32_t limit = UINT32_MAX; if (linearSep->length() == 0) { aobj = CharSplitHelper(cx, linearStr, limit); } else { SplitStringMatcher matcher(cx, linearSep); aobj = SplitHelper(cx, linearStr, limit, matcher, type); } if (!aobj) return nullptr; aobj->setType(type); cachedAobj = aobj; }else aobj = cachedAobj; return aobj; } -------------------------- tested: var start = new Date().getTime(); for (var i = 0; i < 500000; ++i) a = ("done fail unicorns ponies esperanto").split(" "); var end = new Date().getTime(); var time = end - start; ------------------------------ before: 9064 after: 6338 So, I'm doing it right?
(In reply to Victor Carlquist from comment #16) > Hi luke(In reply to Luke Wagner [:luke] from comment #15) > > A good starting point would be to instrument a browser to measure how often > > this pattern is used and how much time we could save from this optimization. > > The implementation of String.prototype.splitis js::str_split in > > js/src/jsstr.cpp. String literals are "atomized" (allocated once), so a > > good approximation of "literal".split("literal") would be test whether the > > 'this' and separator arguments of str_split are atoms (JSString::isAtom()). > > It'd also be good to accumulate the total time spent in str_split for cases > > where we have atom.split(atom). After getting these measurements working, > > I'd browser around some popular sites (particularly those using jQuery). > > Hi Luke, I'm 'cached' the last split executed. I'll write code to measure > how often this pattern is used > > JSObject * > js::str_split_string(JSContext *cx, HandleTypeObject type, HandleString str, > HandleString sep) > { > static Rooted<JSLinearString*> cachedString; > static RootedObject cachedAobj; > > Rooted<JSLinearString*> linearStr(cx, str->ensureLinear(cx)); > if (!linearStr) > return nullptr; > > Rooted<JSLinearString*> linearSep(cx, sep->ensureLinear(cx)); > if (!linearSep) > return nullptr; > > RootedObject aobj(cx); > if(cachedString != linearStr){ > cachedString = linearStr; > > uint32_t limit = UINT32_MAX; > > if (linearSep->length() == 0) { > aobj = CharSplitHelper(cx, linearStr, limit); > } else { > SplitStringMatcher matcher(cx, linearSep); > aobj = SplitHelper(cx, linearStr, limit, matcher, type); > } > > if (!aobj) > return nullptr; > > aobj->setType(type); > cachedAobj = aobj; > > }else > aobj = cachedAobj; > return aobj; > } > > -------------------------- > tested: > var start = new Date().getTime(); > for (var i = 0; i < 500000; ++i) > a = ("done fail unicorns ponies esperanto").split(" "); > var end = new Date().getTime(); > var time = end - start; > > ------------------------------ > before: > 9064 > after: > 6338 > > So, I'm doing it right? Sorry, this code is wrong.
Hi! It looks like you're off to a good start. One issue is that, in general, we can't keep anything in static variables, especially not Rooted (they need to be on the stack, since they are kept in a LIFO linked list). Instead, you'd probably want to keep some weak JSString pointer in the JSRuntime that gets cleared on GC (at least for now, maybe there is a better way).
Also, in the future, you can post 'work in progress' patches as attachments to the bug, which make them a bit easier to read.
The ideal place to do this is in the jits. The Baseline IC can easily generate an optimized stub for the |literalString.split(literalString)| case, and can hold template array result object within the stub. Note that the |split| optimization will always need to allocate a new array, not just return the same one, since the result of |split| needs to be a newly allocated array. Ion can handle these cases as well. |TI| will tell us that the |.split| is the canonical String.prototype.split, and Ion can generate inline MNewArray and MStoreElement(Constant) instructions for the split call.
In fact, Ion already does some optimizations for |String#split|. See MCallOptimize.cpp, IonBuilder::inlineStringSplit. Currently the "optimization" is only to carry the TypeObject for the resulting array into the split function. Optimizing this out can go the following route: 1. Baseline generates optimized JSOP_CALL stubs for constantString.split(constantString) 2. Baseline's optimized stub contains the template Array that is copied and returned. 3. Ion modifies IonBuilder::inlineStringSplit to check for optimized case (constant string inputs), and if so, inspects baseline ICs, pulls out template array object from the baseline stub, and generates inline MIR instructions for constructing and populating the returned array. Sticking a global cache for the result of the last split string seems unappealing compared to the general approach above.
Also note bug 977966, which might be an interesting next task for whoever ends up implementing this.
(In reply to Kannan Vijayan [:djvj] from comment #21) > In fact, Ion already does some optimizations for |String#split|. See > MCallOptimize.cpp, IonBuilder::inlineStringSplit. Currently the > "optimization" is only to carry the TypeObject for the resulting array into > the split function. > > Optimizing this out can go the following route: > > 1. Baseline generates optimized JSOP_CALL stubs for > constantString.split(constantString) > 2. Baseline's optimized stub contains the template Array that is copied and > returned. > 3. Ion modifies IonBuilder::inlineStringSplit to check for optimized case > (constant string inputs), and if so, inspects baseline ICs, pulls out > template array object from the baseline stub, and generates inline MIR > instructions for constructing and populating the returned array. > > Sticking a global cache for the result of the last split string seems > unappealing compared to the general approach above. Ok, I'll follow this route, thanks Kannan
Hi! I have some questions, How I check if parameter is a constant string or not? and where can I find documentation about CallInfo members? =)
Attached file cached all split () (work in progress) (obsolete) —
I cached all split on this patch, but, in the future, I want to cache only constant string.
Attached file cached all split () (work in progress) (obsolete) —
sorry. this is right code. I cached all split on this patch, but, in the future, I want to cache only constant string.
During Ion compilation, to find out if a MInstruction is a constant, you do the following: if (ins->isConstant()) { Value val = ins->toConstant()->value(); ... } However, I would advise starting this with a simple Baseline optimized stub for JSOP_CALL that implements a constant string split. The baseline stub can be written to hold the template array object that will be copied and returned. Once that's done, we can turn to enabling Ion optimization. When ion-optimizing the call to str_split, Ion can inspect the baseline stub chain for the call operation to see if there are any StringSplit stubs with matching inputs. If there are (likely), it can pull the template array object from the stub, pull each individual element, and directly generate inline instructions for NewArray + ElementStore.
Hi! I copied and modified MathCache to support JSString and JSObject (StrCache), but I don't know if it is correct. Also, I don't understand how to work inlineStringSplit, can you help me?
Attachment #8439500 - Attachment is obsolete: true
Attachment #8439505 - Attachment is obsolete: true
Attachment #8440465 - Flags: feedback?(kvijayan)
Attached patch work in progress (no Jit) (obsolete) — Splinter Review
This patch is cleaner than the other patch.. Also, I ran the SunSpider benchmark, but the results wasn't satisfactory. Maybe using jit, it'll become faster.
Attachment #8440465 - Attachment is obsolete: true
Attachment #8440465 - Flags: feedback?(kvijayan)
Attachment #8441739 - Flags: feedback+
Attached file SunSpider benchmark (obsolete) —
Comment on attachment 8441739 [details] [diff] [review] work in progress (no Jit) Setting the feedback flag to ?kannan. Victor, the way these flags work is that you request someone's feedback (or review or a number of other things) by setting the flag to "?" and entering their name in the field to the right of the flag. That person then gives feedback and sets the flag to "+" or "-" or resets it entirely, which usually means something like "I like the approach, but there are enough issues here that I'm not comfortable giving a +". Adding the right person to the request is important because bugzilla sends out a special email notification that will stand out in the flood of bugzilla emails most developers get on a daily basis.
Attachment #8441739 - Flags: feedback+ → feedback?(kvijayan)
Great you're working on it, but we should not forget to do the measurements Luke suggested in comment 15. This optimization is a bit of a hack and we should have some (measurable) justification for adding it...
(In reply to Till Schneidereit [:till] from comment #31) > Comment on attachment 8441739 [details] [diff] [review] > work in progress (no Jit) > > Setting the feedback flag to ?kannan. > > Victor, the way these flags work is that you request someone's feedback (or > review or a number of other things) by setting the flag to "?" and entering > their name in the field to the right of the flag. That person then gives > feedback and sets the flag to "+" or "-" or resets it entirely, which > usually means something like "I like the approach, but there are enough > issues here that I'm not comfortable giving a +". > > Adding the right person to the request is important because bugzilla sends > out a special email notification that will stand out in the flood of > bugzilla emails most developers get on a daily basis. Ok! Thanks. Till
I want to know if I modified the correct files, or not. In negative case, which file should I modify? Thanks!
(In reply to Luke Wagner [:luke] from comment #15) > A good starting point would be to instrument a browser to measure how often > this pattern is used and how much time we could save from this optimization. > The implementation of String.prototype.splitis js::str_split in > js/src/jsstr.cpp. String literals are "atomized" (allocated once), so a > good approximation of "literal".split("literal") would be test whether the > 'this' and separator arguments of str_split are atoms (JSString::isAtom()). > It'd also be good to accumulate the total time spent in str_split for cases > where we have atom.split(atom). After getting these measurements working, > I'd browser around some popular sites (particularly those using jQuery). Hi Luke. To measure the spent time in str_split, can I use clock() function? And how can I print this result?
(In reply to Victor Carlquist from comment #35) I'd use PRMJ_Now() to measure 'before' and 'after'; you can then convert this into milliseconds via double(after - before) / PRMJ_USEC_PER_MSEC. For ad hoc measurements, you can accumulate the total time in some global variable and printf the final reuse in, say, JS_DestroyRuntime (so, when the browser closes), or in some obscure JS function you can trigger from the JS console (personally, I like math_toSource in jsmath.cpp).
(In reply to Luke Wagner [:luke] from comment #36) > (In reply to Victor Carlquist from comment #35) > I'd use PRMJ_Now() to measure 'before' and 'after'; you can then convert > this into milliseconds via double(after - before) / PRMJ_USEC_PER_MSEC. For > ad hoc measurements, you can accumulate the total time in some global > variable and printf the final reuse in, say, JS_DestroyRuntime (so, when the > browser closes), or in some obscure JS function you can trigger from the JS > console (personally, I like math_toSource in jsmath.cpp). Thanks Luke! =)
Attached file Measure - possible future gain (obsolete) —
Attachment #8442939 - Flags: feedback?(luke)
Attached patch Measure - future possible gain (obsolete) — Splinter Review
Attachment #8442939 - Attachment is obsolete: true
Attachment #8442939 - Flags: feedback?(luke)
Attachment #8442942 - Flags: feedback?(luke)
Comment on attachment 8442942 [details] [diff] [review] Measure - future possible gain Review of attachment 8442942 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +3755,5 @@ > /* ES5 15.5.4.14 */ > bool > js::str_split(JSContext *cx, unsigned argc, Value *vp) > { > + double start, endAtom; You probably want to use PRMJ_Now() since JS_Now() is a JS_PUBLIC_API function and thus incurs some calling overhead. Second, you might want to use int64_t since this is the return type of PRMJ_Now() and it avoid the potential for PRMJ_Now()'s result falling outside the 2^53 integer range of doubles. @@ +3771,5 @@ > return false; > AddTypePropertyId(cx, type, JSID_VOID, Type::StringType()); > > + if(str.get()->isAtom() && args.get(0).toString()->isAtom()) > + endAtom = JS_Now(); This will assert/crash in the unlikely case that args.get(0) is not a string. I think you need the additional condition args.get(0).isString(). @@ +3845,5 @@ > return false; > > + if(str.get()->isAtom() && args.get(0).toString()->isAtom()){ > + milli_Optimize += double(endAtom - start) / PRMJ_USEC_PER_MSEC; > + milli_No_Optimize += double(JS_Now() - start) / PRMJ_USEC_PER_MSEC; I was thinking something more like: take the time before/after the portion of split that you can optimize and then: if (optimizable) savedTime += delta; totalTime += delta;
Attachment #8442942 - Flags: feedback?(luke)
Attachment #8442942 - Attachment is obsolete: true
Attachment #8443087 - Flags: feedback?(luke)
Comment on attachment 8443087 [details] [diff] [review] Measure - future possible gain Looks good!
Attachment #8443087 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #42) > Comment on attachment 8443087 [details] [diff] [review] > Measure - future possible gain > > Looks good! Great! Luke, what is the next step?
I'd build a browser and browse around a bunch of sites (esp. popular sites) and see how often the optimization applies. You might want to move the printf() from JS_DestoryRuntime into something you can call from the JS console so you can print after each site. Grepping http://code.jquery.com/jquery-2.1.1.js for "split", I do several instances of "literal".split("literal"), so that's encouraging.
Hi! I accessed some popular sites, and I got this results: bbc.co.uk Saved Time 0.983 ms Total Time 3.123 ms Gain 31.4761% facebook.com Saved Time 2.261 ms Total Time 17.173 ms Gain 13.166% youtube.com Saved Time 5.837 ms Total Time 34.406 ms Gain 16.9651% yahoo.com Saved Time 3.295 ms Total Time 19.39 ms Gain 16.9933% wikipedia.org Saved Time 0.081 ms Total Time 1.74 ms Gain 4.65517% amazon.com Saved Time 4.791 ms Total Time 17.041 ms Gain 28.1145% cnn.com Saved Time 0.947 ms Total Time 7.342 ms Gain 12.8984% netflix.com Saved Time 23.167 ms Total Time 65.012 ms Gain 35.635%
That's pretty encouraging! I realized that the measurement is a bit over-simplified, though: we can't optimize *any* split of an atom by an atom. I guess the next step, then, is to measure how much we'd with with a real cache (cleared on GC, btw). It'd be good to measure how much a general dynamically-sized cache wins over a small, fixed size cache which, it looks like, is what v8 has. An additional statistic I'd be interested in: what is the distribution of sizes of these atom.split(atom) cases. That is, are pages doing a few big splits() or many small splits. If the wins are mostly big splits, then the JITs won't really benefit from this cache and we could probably get all the win from just changing the C++ str_split. It looks like v8 does their caching in C++.
So, How can we measure the saved time with real cache? I would create three categories about size string: small : size < 400 medium: 400 < size < 800 big: 800 < size I'll count the number of strings in each category, What do you think?
Hi, I implemented additional statistic, following this rules: size < 400 medium: 400 <= size < 800 big: size >= 800 bbc.co.uk Saved Time 14.267 ms Total Time 26.472 ms Gain 53.8947% Small: 7214 Medium: 0 Big: 0 facebook.com Saved Time 1.153 ms Total Time 13.722 ms Gain 8.40257% Small: 598 Medium: 0 Big: 0 youtube.com Saved Time 5.786 ms Total Time 36.671 ms Gain 15.7781% Small: 2769 Medium: 2 Big: 2 yahoo.com Saved Time 1.454 ms Total Time 14.822 ms Gain 9.80974% Small: 893 Medium: 0 Big: 0 wikipedia.org Saved Time 0.92 ms Total Time 3.63 ms Gain 25.3444% Small: 320 Medium: 0 Big: 0 amazon.com Saved Time 1.115 ms Total Time 13.895 ms Gain 8.02447% Small: 760 Medium: 0 Big: 0 cnn.com Saved Time 1.003 ms Total Time 8.864 ms Gain 11.3154% Small: 816 Medium: 0 Big: 0 netflix.com Saved Time 21.218 ms Total Time 58.201 ms Gain 36.4564% Small: 7018 Medium: 0 Big: 0
Well, this is all pretty encouraging; seems like this optimization will provide real improvements in practice. What Kannan outlined in comment 27 sounds like a good strategy. Ideally, we'd get copy-on-write arrays (bug 934450) so we wouldn't have to actually clone a new array each time (that's what v8 does), but since we are dealing with many small str.split's here, I expect we'll still get a pretty big win (in fact, for really small splits, we may be faster than v8 since we're not calling out to C++, doing a hash-table lookup, etc).
(In reply to Luke Wagner [:luke] from comment #49) Hi, This is really encouraging! we would create a cache in each object storing the parameter and result, so when this object is called again, we'd return cache. what do you think?
(In reply to Victor Carlquist from comment #50) I'm not sure what you mean by "in each object". In general, we can't cache the array object result (since it's mutable and thus each split() call needs a new instance (or COW). I liked Kannan's suggestion which is to generate JIT code to efficiently create the new array clone.
Hi, I'm sorry but I don't understand the Kannan's suggestion, how can I hold the template array object? (comment 27) Thanks!
Hi, Where JSOP_CALL stub is generated? Thanks!
Hi, I'm running some tests using |literal|.split(|literal|) e.g. for (var i = 0; i < 500000; ++i) a = ("done fail unicorns ponies esperanto").split(" "); for (var i = 0; i < 500000; ++i) a = ("done fail unicorns ponies esperanto").split(" "); "done fail unicorns ponies esperanto").split(" "); but ins->isConstant() is never return 'true', only ins->isDefinition() return 'true'. Is it normal?
Attached patch work in progress (obsolete) — Splinter Review
Hi, This code result in "segmentation fault (core dumped)", and I don't know why. Can I create the optimize in CodeGenerator.cpp? Do I need to move the output register in any address memory to get right result?
Attachment #8441739 - Attachment is obsolete: true
Attachment #8441739 - Flags: feedback?(kvijayan)
Attachment #8453374 - Flags: feedback?(luke)
Assignee: general → nobody
Comment on attachment 8453374 [details] [diff] [review] work in progress Review of attachment 8453374 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay, this got lost in my email stream. I'm afraid I'm not all that familiar with bailouts and general Ion compilation. Perhaps nbp can help you here?
Attachment #8453374 - Flags: feedback?(luke) → feedback?(nicolas.b.pierron)
Comment on attachment 8453374 [details] [diff] [review] work in progress Review of attachment 8453374 [details] [diff] [review]: ----------------------------------------------------------------- I am not sure to see why you are getting SEGV, but I see multiple reasons why this patch would not work (see below), also, we should have assertion messages about the lack of safepoint when the oolCallVM is generated, are you using debug builds of SpiderMonkey? If so you should have an assertion failure that you can track with gdb. I think Kannan's idea is to avoid generating a MStringSplit MIR Instruction when we know that both arguments are constants, and instead make a MNewArray and it initialization, as if we were copying from a cache. I think he's idea is to implement this in either IonBuilder::inlineStringSplit or MStringSplit::foldsTo. Have a look at how we do that in IonBuilder::inlineArray [1] style-nit: do not use tabs. [1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/MCallOptimize.cpp#277 ::: js/src/jit/CodeGenerator.cpp @@ +5579,5 @@ > + // Load elements and length. > + > + OutOfLineCode *ool = oolCallVM(StringSplitInfo, lir, (ArgList(), ImmGCPtr(lir->mir()->typeObject()), str, sep ), StoreRegisterTo(output)); > + if(!ool) > + return false; This code define an out-of-line function call, but to execute it you have to add a: masm.jump(ool->entry()); Note that oolCallVM are only interesting if the call is optional and not mandatory. ::: js/src/jit/Lowering.cpp @@ +2730,5 @@ > JS_ASSERT(ins->separator()->type() == MIRType_String); > > LStringSplit *lir = new(alloc()) LStringSplit(useRegisterAtStart(ins->string()), > useRegisterAtStart(ins->separator())); > + return assignSnapshot(lir, Bailout_NonPrimitiveInput) && define(lir, ins) /*&& assignSafepoint(lir, ins)*/; If you are making a callVM or an oolCallVM, then we need to tell the GC about the content of the stack when we do the call. This is what assignSafepoint is used for. ::: js/src/vm/Runtime.h @@ +1101,5 @@ > js::MathCache *maybeGetMathCache() { > return mathCache_; > } > + JSString *temp1; > + JSString *temp2; I don't see any uses of these fields, is that part of another patch?
Attachment #8453374 - Flags: feedback?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #54) > Hi, > I'm running some tests using |literal|.split(|literal|) e.g. > > for (var i = 0; i < 500000; ++i) > a = ("done fail unicorns ponies esperanto").split(" "); > for (var i = 0; i < 500000; ++i) > a = ("done fail unicorns ponies esperanto").split(" "); > > "done fail unicorns ponies esperanto").split(" "); > > but ins->isConstant() is never return 'true', only ins->isDefinition() > return 'true'. > > Is it normal? Have you checked the inputs? MStringSplit would never return true to isConstant(), on the other hand, its inputs will. ins->isStringSplit() ins->getOperand(0)->isConstant() ins->getOperand(1)->isConstant()
Attached patch split.patch (obsolete) — Splinter Review
Attachment #8464763 - Flags: feedback?(kvijayan)
(In reply to Victor Carlquist from comment #59) > Created attachment 8464763 [details] [diff] [review] > split.patch I'm avoiding to generate MStringSplit instruction, but this patch is 10% slower, I don't know why... Thanks!
Kannan, when would you be able to provide some feedback on Victor's patch?
Flags: needinfo?(kvijayan)
Comment on attachment 8464763 [details] [diff] [review] split.patch Review of attachment 8464763 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MCallOptimize.cpp @@ +1198,5 @@ > + for (uint32_t i = 0; i < initLength; ++i) { > + id = MConstant::New(alloc(), Int32Value(i)); > + current->add(id); > + > + MConstant *value = MConstant::NewAsmJS(alloc(), templateObjectSplit->getDenseOrTypedArrayElement(i), MIRType_Object); Why are you using MConstant::NewAsmJs here instead of MConstant::New ? You probably want MConstant::New. @@ +1204,5 @@ > + // There is normally no need for a post barrier on these writes > + // because the new array will be in the nursery. However, this > + // assumption is volated if we specifically requested pre-tenuring. > + if (ins->initialHeap() == gc::TenuredHeap) > + current->add(MPostWriteBarrier::New(alloc(), ins, value)); PostWriteBarrier should happen after the store, not before.
Attachment #8464763 - Flags: feedback?(kvijayan)
Sorry for the late reply. Victor: The patch seems like it should be just fine, aside from the points mentioned. I think the slowdown may be because when you try optimizing the constant case and fail, you simply return directly without doing the non-inlined (yet still very good) optimization. So the main optimization code for stringSplit should try the constant optimization, and if that fails, go back to the regular optimization. I just landed a small optimization that does something very similar, so you can take a look at how we usually organize that logic: https://hg.mozilla.org/integration/mozilla-inbound/rev/8da59dd9fc7f
Flags: needinfo?(kvijayan)
Attached patch bug688219.patch (obsolete) — Splinter Review
This patch is working. We could reduce the execution time: >for (var i = 0; i < 50000; ++i) > a = ("done fail unicorns ponies esperanto").split(" "); before: 26ms after: 3ms
Attachment #8453374 - Attachment is obsolete: true
Attachment #8464763 - Attachment is obsolete: true
Attachment #8466482 - Flags: feedback?(kvijayan)
Attached patch bug688219.patch (obsolete) — Splinter Review
This patch is working. We could reduce the execution time: >for (var i = 0; i < 50000; ++i) > a = ("done fail unicorns ponies esperanto").split(" "); before: 26ms after: 3ms
Attachment #8466482 - Attachment is obsolete: true
Attachment #8466482 - Flags: feedback?(kvijayan)
Attachment #8466495 - Flags: feedback?(kvijayan)
(In reply to Victor Carlquist from comment #65) > Created attachment 8466495 [details] [diff] [review] > bug688219.patch The a.length() is not returning correct value. I'll try fix it.
Attached patch bug-688219.patch (obsolete) — Splinter Review
I fixed the length bug.
Attachment #8466495 - Attachment is obsolete: true
Attachment #8466495 - Flags: feedback?(kvijayan)
Attachment #8466709 - Flags: feedback?(kvijayan)
Comment on attachment 8466709 [details] [diff] [review] bug-688219.patch Review of attachment 8466709 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MCallOptimize.cpp @@ +1157,5 @@ > } > > callInfo.setImplicitlyUsedUnchecked(); > > + if(callInfo.thisArg()->isConstant() && callInfo.getArg(0)->isConstant()){ Please split the entire logic of this conditional out into a seperate method (maybe ::inlineConstantStringSplit). Then, in |inlineStringSplit| you can do: InlineStatus constInlineStatus = inlineConstantStringSplit(...); if (constInlineStatus != InlineStatus_NotInlined) return constInlineStatus; That way, even if the code within the conditional fails to inline, the generic stringSplit optimization code will kick in.
Attachment #8466709 - Flags: feedback?(kvijayan)
Attached patch work in progress (obsolete) — Splinter Review
Created inlineConstantStringSplit method.
Attachment #8466709 - Attachment is obsolete: true
Attachment #8466733 - Flags: feedback?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #68) > Comment on attachment 8466709 [details] [diff] [review] > bug-688219.patch > > Review of attachment 8466709 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: js/src/jit/MCallOptimize.cpp > @@ +1157,5 @@ > > } > > > > callInfo.setImplicitlyUsedUnchecked(); > > > > + if(callInfo.thisArg()->isConstant() && callInfo.getArg(0)->isConstant()){ > > Please split the entire logic of this conditional out into a seperate method > (maybe ::inlineConstantStringSplit). > > Then, in |inlineStringSplit| you can do: > > InlineStatus constInlineStatus = inlineConstantStringSplit(...); > if (constInlineStatus != InlineStatus_NotInlined) > return constInlineStatus; > > That way, even if the code within the conditional fails to inline, the > generic stringSplit optimization code will kick in. Thanks for your reply!
Comment on attachment 8466733 [details] [diff] [review] work in progress Review of attachment 8466733 [details] [diff] [review]: ----------------------------------------------------------------- This looks pretty good. Just run it against the major benchmarks and make sure that nothing is regressing, run it through try to ensure that it's not introducing any new bugs, and you should be good to go.
Attachment #8466733 - Flags: feedback?(kvijayan) → feedback+
Hi, Great! I ran SunSpider and http://jsperf.com/occurence-counting-2 benchmark, I think nothing is regressing (string w/ pattern is a little bit slower comparing the two results) and I didn't find any new bugs.
Attachment #8441740 - Attachment is obsolete: true
Please check against Octane as well. Sunspider runs extremely quickly, so a lot of JIT changes just won't show up on it, regression or improvement.
Hi, I tried to run Octane, but this patch is crashing firefox... =(
I am debugging the code, but I still don't know what's the problem. Seems the raytrace test in Octane is failing. Maybe the memory isn't allocated in some test or the resumeAfter is failing; > if (!resumeAfter(length)) > return InliningStatus_Error; Do you have any suggestion?
Hi, I have executed Octane several times, sometimes the patch shows 'Segmentation fault' when it tries to call: > initLength = templateObjectSplit->as<ArrayObject>().length(); But, sometimes this code works correctly. Does templateObjectSplit->as<ArrayObject>() mean that is null? In affirmative case, how can I verify this? This problem occurs in raytrace.js: > 882 var pixelSize = "5,5".split(','); // $F('pixelSize').split(','); Thanks!
Attached patch bug-688219-fix.patch (obsolete) — Splinter Review
Hi, I fixed the bug! but I needed to create a new variable to store length value. Results of Octane: before: 14374,4 (Average of 5) after: 14490,8 (Average of 5) Thanks in advance!
Attachment #8472618 - Flags: feedback?(kvijayan)
Nice work! I think this proves the concept and shows some real gains, but I am uncomfortable with certain aspects of the current approach - in particular the caching of a single global template object and length. This caches exactly one stringSplit operation across the entire runtime, and that's far more benchmark-specific than I like. I think the right and more general way to do this is to use the BaselineJIT ICs for JSOP_CALL as caches that keep track of the TemplateObject and resulting array length, and have Ion use that during compilation. This would allow for optimizing every site individually, and would work as a general optimization for STRING_LITERAL.split(STRING_LITERAL) idioms. The thing is, most of the Ion work can stay the same. We just need to take a step back and move the caching into Baseline ICs instead of keeping it on the Runtime. This involves creating an optimized ICCall_StringSplit stub in Baseline, and using that to store the template object. The template object can then be cloned on a per-site basis. Does that sound reasonable? I can create the appropriate bugs and take you through this work, but I think the end result will be pretty cool. What do you think?
Attachment #8472618 - Flags: feedback?(kvijayan) → feedback+
Needinfo for comment 78
Flags: needinfo?(victorcarlquist)
(In reply to Kannan Vijayan [:djvj] from comment #79) > Needinfo for comment 78 True, you're right. I'd like to work on this bug, your idea sound reasonable for me, I'll start to study BaselineJIT and ICs.
Flags: needinfo?(victorcarlquist)
(In reply to Kannan Vijayan [:djvj] from comment #78) > This > would allow for optimizing every site individually, and would work as a > general optimization for STRING_LITERAL.split(STRING_LITERAL) idioms. Not sure it's relevant, but note that this probably interacts with bug 977966 in some way.
(In reply to Victor Carlquist from comment #80) > (In reply to Kannan Vijayan [:djvj] from comment #79) > > Needinfo for comment 78 > > True, you're right. > I'd like to work on this bug, your idea sound reasonable for me, I'll start > to study BaselineJIT and ICs. Awesome. I'll create a bug for just optimizing this in baseline, and make it a blocker for this bug. We can discuss the baseline specifics in there. Thanks victor!
Depends on: 1054330
FWIW, we have COW arrays now, we could probably use this here.
Yeah, it was on my mind :) This can even be applied to the baseline optimization (the jitcode for the stub could CoW-clone the template array instead of doing a full copy).
So, now that the baseline work is done, the Ion optimization should be pretty quick, in fact. The first step is to take a look at BaselineInspector (in js/src/jit/BaselineInspector.{h,cpp}). Add a method to that class to check whether the baseline IC at a particular bytecode offset is an optimizable string split. The signature of the method should look something like: bool BaselineInspector::isOptimizableCallStringSplit(jsbytecode *pc, JSString **stringOut, JSObject **objOut); This should check the IC chain, see if it contains an ICCall_StringSplit stub, and if so return true (and also store the matching string and result array into stringOut and objOut, respectively).
Attached patch WIP (obsolete) — Splinter Review
Hi! I made the changes (: I have some questions: 1) Should we check if callInfo.getArg(0) in Ion is equal to the expectedArg() in stub? 2) What's the difference between current->add() and current->push()? Thanks!
Attachment #8466733 - Attachment is obsolete: true
Attachment #8472618 - Attachment is obsolete: true
Attachment #8498822 - Flags: feedback?(kvijayan)
Comment on attachment 8498822 [details] [diff] [review] WIP Review of attachment 8498822 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/BaselineInspector.cpp @@ +424,5 @@ > + for (ICStub *stub = entry.firstStub(); stub; stub = stub->next()) { > + if (stub->kind() == ICStub::Call_StringSplit) { > + *stringOut = stub->toCall_StringSplit()->expectedThis(); > + *objOut = stub->toCall_StringSplit()->templateObject(); > + break; Instead of breaking here, you can just return true directly. I also think it's good to assert here that the the |numOptimizedStubs| on the entry is exactly one, since that's how we coded the behaviour in Baseline. (If StringSplit stub is attached, it's the only optimized stub attached). ::: js/src/jit/MCallOptimize.cpp @@ +1208,5 @@ > IonBuilder::InliningStatus > +IonBuilder::inlineConstantStringSplit(CallInfo &callInfo) > +{ > + JSString *stringThis; > + JSObject *templateObject; Move these down to right before the call to isOptimizableCallStringSplit. @@ +1210,5 @@ > +{ > + JSString *stringThis; > + JSObject *templateObject; > + > + if (!callInfo.thisArg()->isConstant()) Add another conditional ensuring that the constant |this| is a string. This will let you use a cheap check to shortcut out of the rest of the logic when the |this| is not a string, instead of making that check later after a bunch of other expensive checks. @@ +1214,5 @@ > + if (!callInfo.thisArg()->isConstant()) > + return InliningStatus_NotInlined; > + > + // Check if exist a template object in stub. > + if (inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject)) { This conditional can be reversed, and allow you to put the rest of the logic at a lower tab level, with: if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject)) return InliningStatus_NotInlined; ... rest of code ... Also, you do want to confirm here that the string in the stub matches up with the constant string. The two strings should be pointer-identical, since they're both derived from the bytecode literal string, so a pointer compare should be sufficient to check if they are equal. The reason you want to double-check this is because you don't know for a fact how that MConstant for the |this| was generated. Now, it's VERY VERY likely that they are the same, but I don't think I can guarantee that it will be this way 100% of the time. There may be some rare corner case which allows an attacker to trick the IonBuilder into generating a MConstant string for the |this|, that differs from the baseline-recorded string. Better be safe and check to be sure.
Attachment #8498822 - Flags: feedback?(kvijayan) → feedback+
Attached patch bug688219WIP.patch (obsolete) — Splinter Review
I fixed the code. Thanks.
Attachment #8498822 - Attachment is obsolete: true
Attachment #8499319 - Flags: feedback?(kvijayan)
Comment on attachment 8499319 [details] [diff] [review] bug688219WIP.patch Review of attachment 8499319 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Address the minor re-orderings suggested here. And then fill in the array-copy logic required for the inlining. ::: js/src/jit/BaselineInspector.cpp @@ +426,5 @@ > + if (entry.fallbackStub()->numOptimizedStubs() != 1) > + return false; > + > + ICStub *stub = entry.firstStub(); > + if (stub->kind() == ICStub::Call_StringSplit) { Invert the conditional here. The rest of the code has the style: if (failureCondition) return false; ... rest of code ... Good to stick to that style here too. So: if (stub->kind() != ICStub_CallStringSplit) return false; ... ::: js/src/jit/MCallOptimize.cpp @@ +1235,5 @@ > + return InliningStatus_NotInlined; > + } > + > + const js::Value *strval = callInfo.thisArg()->toConstant()->vp(); > + if (!strval->isString()) move this check up to immediately after the (callInfo.thisArg()->type() != MIRType_String) check. It's a cheap check whose failure can shortcut the rest of the expensive TI checks. @@ +1239,5 @@ > + if (!strval->isString()) > + return InliningStatus_NotInlined; > + > + JSString *str = strval->toString(); > + if (str != stringThis) Move this check to immediately after the |isOptimizableCallStringSplit| check. This is another cheap check that can can fail early and save us the rest of the expensive type-inference checks.
Comment on attachment 8499319 [details] [diff] [review] bug688219WIP.patch Review of attachment 8499319 [details] [diff] [review]: ----------------------------------------------------------------- Forgot to + the feedback.
Attachment #8499319 - Flags: feedback?(kvijayan) → feedback+
Attached patch bug688219WIP.patch (obsolete) — Splinter Review
It's working! micro-bench: for (var i = 0; i < 500000; ++i) a = "done fail unicorns ponies esperanto".split(" "); no-patch: 237ms ~ with-patch: 7ms ~ Thanks.
Attachment #8499319 - Attachment is obsolete: true
Attachment #8500007 - Flags: feedback?(kvijayan)
Looks like we've largely settled on "good idea", so updating summary.
Summary: v8 has a String.prototype.split cache: benchmarketing or good idea? → Cache String.prototype.split
Comment on attachment 8500007 [details] [diff] [review] bug688219WIP.patch Review of attachment 8500007 [details] [diff] [review]: ----------------------------------------------------------------- I've made a few comments, but I have a more fundamental question. Why are you adding a new |templateObjectForJIT| to the Baseline IC stub? I don't understand why there needs to be a new template object for this. Could you walk me through the motivations that prompted the more extensive set of changes you made in this patch? (Not saying that they're wrong, but I don't know why the |templateObjectForJIT| is needed, for example) ::: js/src/jit/MCallOptimize.cpp @@ +1220,5 @@ > + > + // Check if exist a template object in stub. > + JSString *stringThis; > + JSObject *templateObject; > + JSObject *objCopy; Initialize these pointers to nullptr. @@ +1222,5 @@ > + JSString *stringThis; > + JSObject *templateObject; > + JSObject *objCopy; > + if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject, &objCopy)) > + return InliningStatus_NotInlined; After, this, MOZ_ASSERT(stringThis); since it should be set. @@ +1225,5 @@ > + if (!inspector->isOptimizableCallStringSplit(pc, &stringThis, &templateObject, &objCopy)) > + return InliningStatus_NotInlined; > + > + if (!templateObject) > + return InliningStatus_NotInlined; If |isOptimizableCallStringSplit| returned true, then templateObject must have been filled in, right? Shouldn't this be a MOZ_ASSERT instead? @@ +1228,5 @@ > + if (!templateObject) > + return InliningStatus_NotInlined; > + > + if (!objCopy) > + return InliningStatus_NotInlined; Same as above, shouldn't this be a MOZ_ASSERT?
Attachment #8500007 - Flags: feedback?(kvijayan) → feedback+
(In reply to Kannan Vijayan [:djvj] from comment #93) > Comment on attachment 8500007 [details] [diff] [review] > bug688219WIP.patch > > Review of attachment 8500007 [details] [diff] [review]: > ----------------------------------------------------------------- > > I've made a few comments, but I have a more fundamental question. Why are > you adding a new |templateObjectForJIT| to the Baseline IC stub? I don't > understand why there needs to be a new template object for this. > Thanks for your reply, My first idea was just use |templateObject| in |MNewArray| but it was showing: > Assertion failure: isTenured(), at /js/src/gc/Heap.h:1159 I'm not sure if the last patch has a good approach. So, the main motivation to use |templateObjectForJIT| is to avoid the failure, shown above.
Flags: needinfo?(kvijayan)
Ah, I think the issue here is that the template object is getting created in the nursery, and objects in the nursery can't be used as constants in Ion. I tried to find a better way of approaching this, but ultimately I think the way you chose was the best approach. Ok, I'll take another look at this.
Flags: needinfo?(kvijayan)
Actually, the issue is not in baseline at all. I don't know why MNewArray is a UnaryInstruction. This makes no sense, because it doesn't assign its constant operand to any regsiter when lowering. There's no reason that argument needs to be an MConstant. MNewArray could just store a reference to the ArrayObject directly in one of its fields. In fact, MNewArrayCopyOnWrite is a nullary instruction. No reason MNewArray can't also be nullary. The assert you were running into was because you can't build MConstants out of pointers to non-tenured objects. But you shouldn't have to. Lemme look into this more.
(In reply to Victor Carlquist from comment #77) > Created attachment 8472618 [details] [diff] [review] > bug-688219-fix.patch > > Hi, > I fixed the bug! but I needed to create a new variable to store length value. > > Results of Octane: > before: 14374,4 (Average of 5) > after: 14490,8 (Average of 5) > > Thanks in advance! Which tests in octane improve with this patch?
(In reply to Brian Hackett (:bhackett) from comment #97) > (In reply to Victor Carlquist from comment #77) > > Created attachment 8472618 [details] [diff] [review] > > bug-688219-fix.patch > Which tests in octane improve with this patch? Sorry, I don't have the results anymore, I think, this patch improved Raytrace test, I'm not sure, but I'll test it again :)
(In reply to Victor Carlquist from comment #98) > (In reply to Brian Hackett (:bhackett) from comment #97) > > (In reply to Victor Carlquist from comment #77) > > > Created attachment 8472618 [details] [diff] [review] > > > bug-688219-fix.patch > > Which tests in octane improve with this patch? > > Sorry, I don't have the results anymore, I think, this patch improved > Raytrace test, I'm not sure, but I'll test it again :) Sorry, I did the test again with more accuracy (average of 15) and this patch does not have impact over the result of Octane... Here is the results: Richards -0.41% DeltaBlue -0.78% Crypto -0.20% RayTrace -0.67% EarleyBoyer +0.39% RegExp +1.34% Splay -1.31% SplayLatency -1.19% NavierStokes +0.01% PdfJS +2.22% Mandreel -1.51% MandreelLatency +2.52% Gameboy -2.54% CodeLoad -0.25% Box2D +0.47% zlib +0.18% Typescript +1.25% ---- Score (version 9) -0.04%
(In reply to Kannan Vijayan [:djvj] from comment #96) > Actually, the issue is not in baseline at all. I don't know why MNewArray > is a UnaryInstruction. This makes no sense, because it doesn't assign its > constant operand to any regsiter when lowering. There's no reason that > argument needs to be an MConstant. MNewArray could just store a reference > to the ArrayObject directly in one of its fields. > > In fact, MNewArrayCopyOnWrite is a nullary instruction. No reason MNewArray > can't also be nullary. > > The assert you were running into was because you can't build MConstants out > of pointers to non-tenured objects. But you shouldn't have to. > > Lemme look into this more. Flagging for needinfo so this doesn't get forgotten.
Status: NEW → ASSIGNED
Flags: needinfo?(kvijayan)
Assignee: nobody → victorcarlquist
Attached patch WIP (obsolete) — Splinter Review
Hi! As we change the |nursery| templateObject to |Tenured| templateObject in bug1086530, I think that we should keep it as Tenured because the field |templateObject_| in MNewArrayCopyOnWrite is |AlwaysTenured<>|, so anyway, the actual templateObject for MNewArray should be Tenured, independently if it'll be |MNullaryInstruction| (I'm not familiar with Ion to confirm it yet...), what do you thing about it? This patch is working, but, as Jan and you suggested, I want to implement CoW, but I have no idea on how to create a CoW templateObject to MNewArrayCopyOnWrite... Could you help me about this? This patch have a regression (0m0.038s to 0m0.134s) on test below: > var testString = "We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence,promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America."; > > function test(testString) { > return testString.split("e").length - 1; > } > > for(var i=0; i<1000000; i++) { > test(testString); > } Thanks in advanced :)
Attachment #8500007 - Attachment is obsolete: true
Attachment #8519599 - Flags: feedback?(kvijayan)
Comment on attachment 8519599 [details] [diff] [review] WIP Review of attachment 8519599 [details] [diff] [review]: ----------------------------------------------------------------- Personally, I think it's better to get this optimization in as-is (with fixes) before worrying about the CoW implementation. I looked into the CoW implementation briefly. The main wrinkle there is that we have to create a peroper CoW TypeObject for the result of that site's StringSplit, and currently, the CoW type-generation is tied to array literals in the source code. A good chunk of the logic is going to be shared between this implementation and that on anyway, so I think going ahead and landing something useful, and then improving it later is good. ::: js/src/jit/IonMacroAssembler.cpp @@ +1103,5 @@ > Address(obj, NativeObject::offsetOfElements())); > } else if (ntemplate->is<ArrayObject>()) { > Register temp = slots; > + // Kannan: It's really necessary? > + //MOZ_ASSERT(!ntemplate->getDenseInitializedLength()); Not sure if this is OK to disable. I think it should be fine, but I'd ask h4writer or jandem about it. ::: js/src/jit/MCallOptimize.cpp @@ +1290,5 @@ > + // const js::Value *argval = callInfo.getArg(0)->toConstant()->vp(); > + // if (!argval->isString()) > + // return InliningStatus_NotInlined; > + // if (argval->toString()->length() == 0) > + // return InliningStatus_NotInlined; Why are the argVal checks disabled here? @@ +1312,5 @@ > + MOZ_ASSERT(templateObject->is<ArrayObject>()); > + > + JSString *str = strval->toString(); > + if (str != stringThis) > + return InliningStatus_NotInlined; Also check if (argval->toString() != str) here and if so, fail inlining. @@ +1324,5 @@ > + if (!key.maybeTypes()) > + return InliningStatus_NotInlined; > + > + if (!key.maybeTypes()->hasType(types::Type::StringType())) { > + key.freeze(constraints()); I'm not sure if this freeze is necessary. If we don't see a StringType in the typeset, we choose not to optimize. If the typeset later changes, it doesn't invalidate that choice. I'd doublecheck with bhackett, but this should be removable. @@ +1333,5 @@ > + if (templateObject->getDenseInitializedLength() != initLength) > + return InliningStatus_NotInlined; > + > + for (uint32_t i = 0; i < initLength; i++) { > + MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints()); You're generating constant |values| here and then throwing them away. Seems like a waste, since MDefinitions (of which MConstant is a subtype) are pretty large. I think you should accumulate these in a vector, and use them later (since you need them anyway). @@ +1335,5 @@ > + > + for (uint32_t i = 0; i < initLength; i++) { > + MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints()); > + if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) { > + key.freeze(constraints()); See comment above on freeze.
Attachment #8519599 - Flags: feedback?(kvijayan) → feedback+
(In reply to Kannan Vijayan [:djvj] from comment #102) > Comment on attachment 8519599 [details] [diff] [review] > WIP > > Review of attachment 8519599 [details] [diff] [review]: > ----------------------------------------------------------------- > > Personally, I think it's better to get this optimization in as-is (with > fixes) before worrying about the CoW implementation. Ok, I'll fixed what you spotted ;) > ::: js/src/jit/IonMacroAssembler.cpp > @@ +1103,5 @@ > > Address(obj, NativeObject::offsetOfElements())); > > } else if (ntemplate->is<ArrayObject>()) { > > Register temp = slots; > > + // Kannan: It's really necessary? > > + //MOZ_ASSERT(!ntemplate->getDenseInitializedLength()); > > Not sure if this is OK to disable. I think it should be fine, but I'd ask > h4writer or jandem about it. The patch is failing on this assert because the templateObject is not empty (so is ntemplate->getDenseInitializedLength() > 0), if the disabling is wrong, I'll have to find a way to correct my patch to avoid that it falls in this. Thanks for your reply!
Attached patch bug688219.patch (obsolete) — Splinter Review
I made the changes. Thanks again (:
Attachment #8519599 - Attachment is obsolete: true
Attachment #8521078 - Flags: feedback?(kvijayan)
Comment on attachment 8521078 [details] [diff] [review] bug688219.patch Review of attachment 8521078 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MCallOptimize.cpp @@ +1280,5 @@ > + if (!callInfo.thisArg()->isConstant()) > + return InliningStatus_NotInlined; > + > + if (callInfo.thisArg()->type() != MIRType_String) > + return InliningStatus_NotInlined; Since you're checking the types of constant values directly later on, this conditional can be removed, and instead have an assert after the |isString()| checks below. @@ +1291,5 @@ > + return InliningStatus_NotInlined; > + > + const js::Value *strval = callInfo.thisArg()->toConstant()->vp(); > + if (!strval->isString()) > + return InliningStatus_NotInlined; As noted above. After this, you can do MOZ_ASSERT(callInfo.getArg(0)->type() == MIRType_String); MOZ_ASSERT(callInfo.thisArg()->type() == MIRType_String); @@ +1329,5 @@ > + uint32_t initLength = templateObject->as<ArrayObject>().length(); > + if (templateObject->getDenseInitializedLength() != initLength) > + return InliningStatus_NotInlined; > + > + Vector<MConstant *, 0, SystemAllocPolicy> vec; Name this vector something more meaningful, like |arrayValues|. @@ +1333,5 @@ > + Vector<MConstant *, 0, SystemAllocPolicy> vec; > + for (uint32_t i = 0; i < initLength; i++) { > + MConstant *value = MConstant::New(alloc(), templateObject->getDenseElement(i), constraints()); > + if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) { > + vec.clearAndFree(); This is not necessary. The vector will release it's heap allocations during destruction on the return below. @@ +1336,5 @@ > + if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) { > + vec.clearAndFree(); > + return InliningStatus_NotInlined; > + } > + vec.append(value); Vector append() can fail. Return InliningStatus_Error if it does. @@ +1345,5 @@ > + getInlineReturnTypeSet()->convertDoubleElements(constraints()); > + if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles) > + templateObject->setShouldConvertDoubleElements(); > + else > + templateObject->clearShouldConvertDoubleElements(); I don't even know if this whole code block concerning conversion-to-double is necessary. We know that we're going to be returning an array of strings. The existence or not of a convert-to-double flag on the object is irrelevant. I think this whole section can be replaced with a simpler: if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles) return InliningStatus_NotInlined; @@ +1366,5 @@ > + // jsop_initelem_array is doing because we do not expect to bailout > + // because the memory is supposed to be allocated by now. > + MConstant *id = nullptr; > + for (uint32_t i = 0; i < initLength; i++) { > + id = MConstant::New(alloc(), Int32Value(i)); The |id| var can be scoped inside the loop. @@ +1374,5 @@ > + if (conversion == types::TemporaryTypeSet::AlwaysConvertToDoubles) { > + MInstruction *valueDouble = MToDouble::New(alloc(), value); > + current->add(valueDouble); > + value = valueDouble->toConstant(); > + } else With the changes above, this conditional can be removed entirely, and replaced with |current->add(value);|. @@ +1390,5 @@ > + } > + > + // Update the length. > + MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id); > + current->add(length); This setLength will get as input the last |id| stored in the loop, which will be |initLength - 1|, which is wrong. You need a new MConstant here: MConstant *length = MConstant::New(alloc(), Int32Value(initLength)); current->add(length); MSetInitializedLength *setLength = MSetInitializedLength::New(...); current->add(setLength); @@ +1392,5 @@ > + // Update the length. > + MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id); > + current->add(length); > + > + vec.clearAndFree(); This is not necessary. The destructor will do it automatically on all exit paths.
Attachment #8521078 - Flags: feedback?(kvijayan) → feedback+
(In reply to Kannan Vijayan [:djvj] from comment #105) > @@ +1390,5 @@ > > + } > > + > > + // Update the length. > > + MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id); > > + current->add(length); > > This setLength will get as input the last |id| stored in the loop, which > will be |initLength - 1|, which is wrong. You need a new MConstant here: > > MConstant *length = MConstant::New(alloc(), Int32Value(initLength)); > current->add(length); > > MSetInitializedLength *setLength = MSetInitializedLength::New(...); > current->add(setLength); > Hi Kannan, thanks for review! We have a little problem. When I set this with Int32Value(initLength), I got the following result: > done,fail,unicorns,ponies,esperanto9.704187067161276e-101, but the right result would be: > done,fail,unicorns,ponies,esperanto It occur because the |SetInitializedLength| is adding '1' over length, so it's overflowing... (CodeGenerator.cpp) >bool >CodeGenerator::visitSetInitializedLength(LSetInitializedLength *lir) >{ > Address initLength(ToRegister(lir->elements()), ObjectElements::offsetOfInitializedLength()); > Int32Key index = ToInt32Key(lir->index()); > masm.bumpKey(&index, 1); > masm.storeKey(index, initLength); > // Restore register value if it is used/captured after. > masm.bumpKey(&index, -1); >} So, What should I do?
(In reply to Victor Carlquist from comment #106) > (In reply to Kannan Vijayan [:djvj] from comment #105) > > Hi Kannan, thanks for review! > > We have a little problem. > When I set this with Int32Value(initLength), I got the following result: > > > done,fail,unicorns,ponies,esperanto9.704187067161276e-101, > > but the right result would be: > > > done,fail,unicorns,ponies,esperanto > > It occur because the |SetInitializedLength| is adding '1' over length, so > it's overflowing... > > (CodeGenerator.cpp) > >bool > >CodeGenerator::visitSetInitializedLength(LSetInitializedLength *lir) > >{ > > Address initLength(ToRegister(lir->elements()), ObjectElements::offsetOfInitializedLength()); > > Int32Key index = ToInt32Key(lir->index()); > > masm.bumpKey(&index, 1); > > masm.storeKey(index, initLength); > > // Restore register value if it is used/captured after. > > masm.bumpKey(&index, -1); > >} > > So, What should I do? Well that's super weird that it would do an incr/decr around the setInitializedLength. It seems like SetInitializedLength is always called with the index of the last element set, and it internally increments it if necessary. The way you were doing it before is correct, then. Good find.
Flags: needinfo?(kvijayan)
Attached patch bug688219.patch (obsolete) — Splinter Review
I think that this patch is OK.
Attachment #8521078 - Attachment is obsolete: true
Attachment #8523061 - Flags: review?(kvijayan)
Comment on attachment 8523061 [details] [diff] [review] bug688219.patch Review of attachment 8523061 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MCallOptimize.cpp @@ +1380,5 @@ > + if (ins->initialHeap() == gc::TenuredHeap) > + current->add(MPostWriteBarrier::New(alloc(), ins, value)); > + } > + // Update the length. > + MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id); I'm wondering if we actually need this. MNewArray with the template object _should_ correctly set the intializedLength and the length. In this case, this setLength is redundant. Can you test without this line and see if it still works?
(In reply to Kannan Vijayan [:djvj] from comment #109) > Comment on attachment 8523061 [details] [diff] [review] > bug688219.patch > > Review of attachment 8523061 [details] [diff] [review]: > ----------------------------------------------------------------- You are right! it works without that.
Attached patch bug688219.patch (obsolete) — Splinter Review
Patch without |MSetInitializedLength| instruction.
Attachment #8523061 - Attachment is obsolete: true
Attachment #8523061 - Flags: review?(kvijayan)
Attachment #8523334 - Flags: review?(kvijayan)
Attached patch patch rebased (obsolete) — Splinter Review
Attachment #8523334 - Attachment is obsolete: true
Attachment #8523334 - Flags: review?(kvijayan)
Attachment #8532964 - Flags: review?(kvijayan)
Comment on attachment 8532964 [details] [diff] [review] patch rebased Review of attachment 8532964 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MCallOptimize.cpp @@ +1347,5 @@ > + if (!TypeSetIncludes(key.maybeTypes(), value->type(), value->resultTypeSet())) > + return InliningStatus_NotInlined; > + > + if (!arrayValues.append(value)) > + return InliningStatus_NotInlined; This should return InliningStatus_Error instead, as it's an OOM
Attachment #8532964 - Flags: review?(kvijayan) → review+
Attached patch bug688219.patch (obsolete) — Splinter Review
I fixed that. (: Thanks!
Attachment #8532964 - Attachment is obsolete: true
Attachment #8533738 - Flags: review?(kvijayan)
Sorry, I've been super busy and just got to this. Reviewing now.
No problem Kannan ;)
Comment on attachment 8533738 [details] [diff] [review] bug688219.patch Review of attachment 8533738 [details] [diff] [review]: ----------------------------------------------------------------- Sorry again. This is great. Everything looks pretty tight. Have you benched the final result with the latest patch?
Attachment #8533738 - Flags: review?(kvijayan) → review+
I'll bench and push it in try server. Thanks again!
Hey Victor, I rebased your patch (there were some changes to the names of various TI-related structs. Types became ObjectGroups, etc.). And pushed to try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0edec69d7759 You can crib the rebased patch from there if you don't feel like doing it again (it's a bit of a pain in the butt to discover the new names). I'm somewhat excited about benching this so I'm going to run the benches locally too.
Wow, just benched this on linux x86-64.. with a microbenchmark. Results are sweet: function foo() { var count = 0; for (var i = 0; i < 10000000; i++) { var arr = "zing,bing,bang".split(","); var arr2 = "foo|bar|boom".split("|"); for (var j = 0; j < arr.length; j++) count += j; for (var j = 0; j < arr2.length; j++) count += j; } return count; } function main() { var d1 = new Date(); var result = foo(); var d2 = new Date(); print("Time: " + (d2 - d1) + " result=" + result); } main(); Reference build (without your patch): Time: 3818 result=60000000 Time: 3786 result=60000000 Time: 4047 result=60000000 Time: 3796 result=60000000 Time: 3843 result=60000000 Optimized build (with your patch): Time: 117 result=60000000 Time: 79 result=60000000 Time: 77 result=60000000 Time: 81 result=60000000 Time: 77 result=60000000 Averaging those times out: 3858ms on reference 86.2ms on optimized Speedup of more than 40x on the microbench. Nice :)
Attached patch Patch (obsolete) — Splinter Review
I fixed that issue :) I added the line: > MSetInitializedLength *length = MSetInitializedLength::New(alloc(), elements, id); The RArrayState wasn't getting the right length value in Recover. Thanks.
Attachment #8533738 - Attachment is obsolete: true
Attachment #8563578 - Flags: review?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #120) Awesome! It's very fast. You helped me a lot, thank you!
Comment on attachment 8563578 [details] [diff] [review] Patch Review of attachment 8563578 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. If you can get this green on try, you have my blessing to land. The last few changes have all been trivial. So feel free to post a green 'try' link and then land this. Good job on getting this done, and sorry again about the review delays on my end. ::: js/src/jit/MCallOptimize.cpp @@ +1398,5 @@ > + MOZ_ASSERT(templateObject); > + MOZ_ASSERT(templateObject->is<ArrayObject>()); > + > + JSString *str = strval->toString(); > + if (str != stringThis) The |str| var doesn't seem necessary here. if (strval->toString() != stringThis) should be sufficient. @@ +1402,5 @@ > + if (str != stringThis) > + return InliningStatus_NotInlined; > + > + str = argval->toString(); > + if (str != stringArg) likewise, here: if (argval->toString() != stringArg) should be sufficient.
Attachment #8563578 - Flags: review?(kvijayan) → review+
Attached patch PatchSplinter Review
I made the changes that Kannan suggested.
Attachment #8563578 - Attachment is obsolete: true
Attachment #8564090 - Flags: review+
Comment on attachment 8564090 [details] [diff] [review] Patch https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac760a841d7e Try is green, so I'm setting the checkin flag. Thanks!
Attachment #8564090 - Flags: checkin?
Attachment #8564090 - Flags: checkin?
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1134074
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: