Closed Bug 1371097 Opened 4 years ago Closed 4 years ago

String.replace() could be faster


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
firefox55 --- fixed


(Reporter: ehsan, Assigned: jandem)


(Blocks 1 open bug)



(2 files)

Attached file microbenchmark
I noticed that str_replace_string_raw() shows up in my profiles on the VanillaJS speedometer test.  The attached microbenchmark which I stole from the speedometer code runs in ~750ms in chrome and ~1350ms in Firefox.  I'm not all that much sure how important this is to the overall score, but I've certainly written real JS code like this in real life...
Note that profiling this under perf I noticed that str_replace_string_raw() is spending all of its time under the rope paths, and I think it may be due to the fact that this.defaultTemplate here turns into a rope to start with?  If yes, that's really unfortunate, but it could be a nice trick to win some benchmark points here if turning that into a flat string is simple!
See also bug 1347489, where I optimized exactly this code. That was a pretty big win and my micro-benchmark was on par with Chrome after that, but yours is different.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #2)
> and I think it may be due
> to the fact that this.defaultTemplate here turns into a rope to start with?
> b5dd69a53cc2803ed66675327560148b6232b31f/PerformanceTests/Speedometer/
> resources/todomvc/vanilla-examples/vanillajs/js/template.js#L33

The frontend is smart enough to constant fold this. The problem here is likely that each replace call will create ropes.
The main problem here is that RopeMatch deoptimizes because of this:

     * We don't want to do rope matching if there is a poor node-to-char ratio,
     * since this means spending a lot of time in the match loop below.

So we exceed our threshold of 5 or so and we flatten instead of using rope matching. If I change sRopeMatchThresholdRatioLog2 from 5 to 4, we no longer flatten and the time improves from 1377 ms to 960 ms.
Most of the remaining time is allocating strings. Nursery-allocating strings (bug 903519) should help here by eliminating free list stuff and chunk allocation.

I also noticed that FirstCharMatcher8bit and FirstCharMatcher16bit don't use memchr on OS X/Windows/Clang because "it's slow", but when I change FirstCharMatcher8bit to use memchr on all platforms your benchmark improves from 960 to 921 ms on OS X with Clang...
Attached patch PatchSplinter Review
Luke, as the original reviewer of the rope matching code, does this seem reasonable to you?
Assignee: nobody → jdemooij
Attachment #8875627 - Flags: review?(luke)
Depends on: 1371215
Comment on attachment 8875627 [details] [diff] [review]

Review of attachment 8875627 [details] [diff] [review]:

Attachment #8875627 - Flags: review?(luke) → review+
Pushed by
Change sRopeMatchThresholdRatioLog2 from 5 to 4 to flatten less eagerly. r=luke
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.