Created attachment 8875520 [details] microbenchmark I noticed that str_replace_string_raw() shows up in my profiles on the VanillaJS speedometer test. The attached microbenchmark which I stole from the speedometer code runs in ~750ms in chrome and ~1350ms in Firefox. I'm not all that much sure how important this is to the overall score, but I've certainly written real JS code like this in real life...
Note that profiling this under perf I noticed that str_replace_string_raw() is spending all of its time under the rope paths, and I think it may be due to the fact that this.defaultTemplate here turns into a rope to start with? https://github.com/WebKit/webkit/blob/b5dd69a53cc2803ed66675327560148b6232b31f/PerformanceTests/Speedometer/resources/todomvc/vanilla-examples/vanillajs/js/template.js#L33 If yes, that's really unfortunate, but it could be a nice trick to win some benchmark points here if turning that into a flat string is simple!
See also bug 1347489, where I optimized exactly this code. That was a pretty big win and my micro-benchmark was on par with Chrome after that, but yours is different. (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #2) > and I think it may be due > to the fact that this.defaultTemplate here turns into a rope to start with? > https://github.com/WebKit/webkit/blob/ > b5dd69a53cc2803ed66675327560148b6232b31f/PerformanceTests/Speedometer/ > resources/todomvc/vanilla-examples/vanillajs/js/template.js#L33 The frontend is smart enough to constant fold this. The problem here is likely that each replace call will create ropes.
The main problem here is that RopeMatch deoptimizes because of this: * We don't want to do rope matching if there is a poor node-to-char ratio, * since this means spending a lot of time in the match loop below. So we exceed our threshold of 5 or so and we flatten instead of using rope matching. If I change sRopeMatchThresholdRatioLog2 from 5 to 4, we no longer flatten and the time improves from 1377 ms to 960 ms.
Most of the remaining time is allocating strings. Nursery-allocating strings (bug 903519) should help here by eliminating free list stuff and chunk allocation. I also noticed that FirstCharMatcher8bit and FirstCharMatcher16bit don't use memchr on OS X/Windows/Clang because "it's slow", but when I change FirstCharMatcher8bit to use memchr on all platforms your benchmark improves from 960 to 921 ms on OS X with Clang...
Created attachment 8875627 [details] [diff] [review] Patch Luke, as the original reviewer of the rope matching code, does this seem reasonable to you?
Comment on attachment 8875627 [details] [diff] [review] Patch Review of attachment 8875627 [details] [diff] [review]: ----------------------------------------------------------------- Yep!
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/65c3ca9e63c0 Change sRopeMatchThresholdRatioLog2 from 5 to 4 to flatten less eagerly. r=luke