688219 - Cache String.prototype.split

Reporter

Description

•

13 years ago

On this micro-bench: for (var i = 0; i < 500000; ++i) a = "done fail unicorns ponies esperanto".split(" "); v8 is more than 2x faster. Super-optimized string matching? Fast array creation? Generation GC ftw? No, a cache that maps (string, pattern) to results. The question asked by this bug is whether this was added to cheat on some unrealistic benchmark or whether this is optimizing real-world usage. Seems simple enough to instrument a browser and measure. To wit, on this loop: for (var i = 0; i < 500000; ++i) a = ("done fail unicorns ponies esperanto" + i).split(" "); we are 25% faster in the shell on my machine. "Yo cash ain't nothin' but trash" -- Fats Waller (via brendan)

Jeff Walden [:Waldo]

Comment 1

•

13 years ago

"Ain't misbehavin'" -- v8

Luke Wagner [:luke]

Reporter

Comment 2

•

13 years ago

From http://code.google.com/p/v8/source/detail?r=9164: "Optimize the common obfuscator pattern where ["foo","bar","baz"] gets converted fo "foo,bar,baz".split(","). If the inputs are symbols we cache the result and make the substrings into symbols." Obfuscators. I wonder if we should be benchmarking more of them.

David Mandelin [:dmandelin]

Updated

•

13 years ago

Blocks: WebJSPerf

Nathan Froyd [:froydnj]

Comment 3

•

13 years ago

Alternatively, is it reasonable to constant-fold this pattern? Or do something like: if (String.prototype.split == WHAT_WE_EXPECT) return <constant-folded-result>; else return <call-it>

Luke Wagner [:luke]

Reporter

Comment 4

•

13 years ago

Totally. TI can tell us 'split' is the real split so, in the best case, we wouldn't even need a test. We could just emit code to build the array. Doing this from jit (IonMonkey, preferably) would be easiest. When IonMonkey is on stable and on mozilla-central, perhaps this would be a [good first bug]?

Erik Corry

Comment 5

•

13 years ago

Frankly I don't think it's a huge win. I have mainly seen it in js1k entries, where it annoyed me and it was an easy change so I did it. I don't know of a real benchmark that uses it, but I did fall over http://jsperf.com/occurence-counting-2 which I guess is what caused this bug to be filed.

John Drinkwater (:beta)

Comment 6

•

12 years ago

(In reply to Erik Corry from comment #5) > Frankly I don't think it's a huge win. http://jsperf.com/character-counting/3 Firefox is ahead in every test ’cept for the .split() case

Erik Corry

Comment 7

•

12 years ago

That hardly qualifies as a real benchmark.

Erik Corry

Comment 8

•

12 years ago

Also, the split version gets the wrong answer for example on the input string ".a.b.".

Jared Wein [:jaws] (please needinfo? me)

Comment 9

•

12 years ago

Luke, I'd like to take this on as a side project to get to know the js engine. Can you provide some tips on what files to look into and a general starting direction?

Assignee: general → jaws

Status: NEW → ASSIGNED

Tom S

Comment 10

•

12 years ago

To return the spirit, I am going to help you out here. :) So String.prototype.split is implemented in jsstr.cpp as js::str_split. We already do caching for other stuff like eval or Math.sin etc. MathCache from my point of view is the cleanest implementation and you can find that mostly in jsmath.h and it's used int jsmath.cpp. The EvalCache uses strings for lookups and needs to guard on more conditions so it might be more related to this case. You want to look at EvalCacheLookup, EvalCacheHashPolicy, EvalCache in jscntx.h The EvalCache needs some special class to handle function destruction and construction and thus is implemented in Eval.cpp as EvalScriptGuard. You probably won't need such complexity. Of course in the end you should make sure that we are as good as v8 and don't regress anything. (v8 seems to only to this of Atoms/symbols, so memory usage shouldn't increase because we never garbage collect atoms anyway.)

Luke Wagner [:luke]

Reporter

Comment 11

•

12 years ago

(In reply to Tom Schuster [:evilpie] from comment #10) > Of course in the end you should make sure that we are as good as v8 and > don't regress anything. (v8 seems to only to this of Atoms/symbols, so > memory usage shouldn't increase because we never garbage collect atoms > anyway.) Atoms do indeed get GC'd. I'd consider purging the cache with the other caches that point to GC things in PurgeRuntime (js/src/jsgc.cpp).

Christian Sonne [:cers]

Comment 12

•

12 years ago

(In reply to Erik Corry from comment #8) > Also, the split version gets the wrong answer for example on the input > string ".a.b.". A bit OT, but as far as I can tell, it gets the answer wrong for every string, as it should be s.split(".").length-1 - a change that is reflected in later revisions of the jsperf.

Jared Wein [:jaws] (please needinfo? me)

Updated

•

12 years ago

Assignee: jaws → general

Status: ASSIGNED → NEW

Luke Wagner [:luke]

Reporter

Comment 13

•

12 years ago

Someone just pointed out that jQuery uses "x".split("y") expression which have a constant value. It'd be great to measure this on a jQuery benchmark or on some heavy jQuery sites.

Jan de Mooij [:jandem]

Updated

•

11 years ago

Blocks: 917839

cached all split () (work in progress) 11 years ago Victor Carlquist 2.68 KB, text/plain		Details
cached all split () (work in progress) 11 years ago Victor Carlquist 2.85 KB, text/plain		Details
bug-688219-fix.patch (working in progress) 11 years ago Victor Carlquist 31.77 KB, patch		Details \| Diff \| Splinter Review
work in progress (no Jit) 11 years ago Victor Carlquist 8.77 KB, patch		Details \| Diff \| Splinter Review
SunSpider benchmark 11 years ago Victor Carlquist 3.46 KB, text/plain		Details
Measure - possible future gain 11 years ago Victor Carlquist 4.01 KB, text/plain		Details
Measure - future possible gain 11 years ago Victor Carlquist 4.01 KB, patch		Details \| Diff \| Splinter Review
Measure - future possible gain 11 years ago Victor Carlquist 4.34 KB, patch	luke : feedback+	Details \| Diff \| Splinter Review
work in progress 11 years ago Victor Carlquist 10.66 KB, patch		Details \| Diff \| Splinter Review
split.patch 11 years ago Victor Carlquist 4.56 KB, patch		Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 6.55 KB, patch		Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 6.44 KB, patch		Details \| Diff \| Splinter Review
bug-688219.patch 10 years ago Victor Carlquist 6.30 KB, patch		Details \| Diff \| Splinter Review
work in progress 10 years ago Victor Carlquist 7.96 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
benchmark SunSpider and jsperf 10 years ago Victor Carlquist 4.65 KB, text/plain		Details
bug-688219-fix.patch 10 years ago Victor Carlquist 8.99 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
WIP 10 years ago Victor Carlquist 4.32 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
bug688219WIP.patch 10 years ago Victor Carlquist 4.53 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
bug688219WIP.patch 10 years ago Victor Carlquist 12.73 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
WIP 10 years ago Victor Carlquist 9.19 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 9.07 KB, patch	djvj : feedback+	Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 8.76 KB, patch		Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 8.81 KB, patch		Details \| Diff \| Splinter Review
patch rebased 10 years ago Victor Carlquist 8.80 KB, patch	djvj : review+	Details \| Diff \| Splinter Review
bug688219.patch 10 years ago Victor Carlquist 8.79 KB, patch	djvj : review+	Details \| Diff \| Splinter Review
Patch 10 years ago Victor Carlquist 10.98 KB, patch	djvj : review+	Details \| Diff \| Splinter Review
Patch 10 years ago Victor Carlquist 10.95 KB, patch	victorcarlquist : review+	Details \| Diff \| Splinter Review