String.replace() could be faster

RESOLVED FIXED in Firefox 55



JavaScript Engine
3 months ago
2 months ago


(Reporter: Ehsan, Assigned: jandem)


(Blocks: 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)



(2 attachments)

Created attachment 8875520 [details]

I noticed that str_replace_string_raw() shows up in my profiles on the VanillaJS speedometer test.  The attached microbenchmark which I stole from the speedometer code runs in ~750ms in chrome and ~1350ms in Firefox.  I'm not all that much sure how important this is to the overall score, but I've certainly written real JS code like this in real life...
(The real code in question:
Note that profiling this under perf I noticed that str_replace_string_raw() is spending all of its time under the rope paths, and I think it may be due to the fact that this.defaultTemplate here turns into a rope to start with?  If yes, that's really unfortunate, but it could be a nice trick to win some benchmark points here if turning that into a flat string is simple!

Comment 3

3 months ago
See also bug 1347489, where I optimized exactly this code. That was a pretty big win and my micro-benchmark was on par with Chrome after that, but yours is different.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #2)
> and I think it may be due
> to the fact that this.defaultTemplate here turns into a rope to start with?
> b5dd69a53cc2803ed66675327560148b6232b31f/PerformanceTests/Speedometer/
> resources/todomvc/vanilla-examples/vanillajs/js/template.js#L33

The frontend is smart enough to constant fold this. The problem here is likely that each replace call will create ropes.

Comment 4

3 months ago
The main problem here is that RopeMatch deoptimizes because of this:

     * We don't want to do rope matching if there is a poor node-to-char ratio,
     * since this means spending a lot of time in the match loop below.

So we exceed our threshold of 5 or so and we flatten instead of using rope matching. If I change sRopeMatchThresholdRatioLog2 from 5 to 4, we no longer flatten and the time improves from 1377 ms to 960 ms.

Comment 5

3 months ago
Most of the remaining time is allocating strings. Nursery-allocating strings (bug 903519) should help here by eliminating free list stuff and chunk allocation.

I also noticed that FirstCharMatcher8bit and FirstCharMatcher16bit don't use memchr on OS X/Windows/Clang because "it's slow", but when I change FirstCharMatcher8bit to use memchr on all platforms your benchmark improves from 960 to 921 ms on OS X with Clang...

Comment 6

3 months ago
Created attachment 8875627 [details] [diff] [review]

Luke, as the original reviewer of the rope matching code, does this seem reasonable to you?
Assignee: nobody → jdemooij
Attachment #8875627 - Flags: review?(luke)


3 months ago
Depends on: 1371215

Comment 7

3 months ago
Comment on attachment 8875627 [details] [diff] [review]

Review of attachment 8875627 [details] [diff] [review]:

Attachment #8875627 - Flags: review?(luke) → review+

Comment 8

3 months ago
Pushed by
Change sRopeMatchThresholdRatioLog2 from 5 to 4 to flatten less eagerly. r=luke

Comment 9

2 months ago
Last Resolved: 2 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.