Closed Bug 555395 Opened 15 years ago Closed 15 years ago

JS: measure potential benefit of ropes in avoiding String copying

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 571549

People

(Reporter: luke, Unassigned)

References

Details

From Moh and Andreas, it appears we are spending a lot of time memcpy'ing string data in general and on benchmarks. There are several ways to improve this; one is to do copying lazily using a linked rope structure like JSC is using. For example, when a JS string is used as a StringBuffer a la s = ... s += ... s += ... print(s) we can delay building a single contiguous s until s is observed (by the print) and avoid memcpy's for all the temporaries. This is already done at compile-time for single expressions with the concatn opcode (bug 419743), but of course, all good Java programmers know that its more efficient to use += as above, right? :P The first task, though, would be to measure whether these ropes would really buy us anything on real webapps and benchmarks we care about. (Andreas expressed confidence in Gregor for this task :) I think measurement could be done as follows: 0. turn off the optimization that grows strings in place in js_ConcatStrings; have each js_ConcatStrings call yields a new string 1. give each string a new flag member which is, by default, unset and indicates whether the string could be a rope 2. when js_ConcatStrings creates a new string, set the flag for the string and do delayed++ and delayedbytes += length_of_new_string. 3. when the contents of a string are read and its flag is set: unset the flag and do delayed-- and delayedbytes -= length_of_read_string. If I haven't forgotten anything, after running a program, 'delayed' and 'delayedBytes' should give an estimate of the savings. (A separate 'total' and 'totalBytes' that are incremented regardless would also be useful to give perspective.) Other than tracing (which should be turned off), JSString members are private, so it should be as easy as hooking JSString::chars(), flagChars(), and other members that would require flattening the rope (being careful to not count the reads doing by js_ConcatStrings, which of course wouldn't flatten.) Additionally, there might be other string-creating operations that we could make ropey. JSOP_CONCATN for one.
jscore picked up rope support semi-recently, although I don't think they've started to take advantage of it much yet, and I'm not sure how much it really gained them (maybe a percent going from changelog memory?).
FWIW, string-validate-input may benefit greatly from this optimization because it builds a huge string but never "observes" it. (Gotta love that SunSpider.) See bug 565841 for details.
From my measurements that I put in comment 6 of bug 565841, realloc is succeeding (meaning it doesn't have to copy the buffer) most of the time when growing a gigantic string, so Luke's description of how to test the benefit of ropes isn't really accurate. In the case where realloc succeeds, the only thing we gain is that we don't need to make a call to realloc (which, as described in bug 565841, has a good chance of being significant). Using ropes is essentially equivalent to the case where, at every concat, we are able to grow the left string, and we never have to copy memory during realloc. So, ideally, the time saved by ropes is (1) the time spent mallocing a buffer and copying the left string in the case where the left string can't be grown; plus (2) the time spent during realloc in the case where the left string can be grown. Note that this analysis assumes that every big string that every rope is eventually flattened, so string-validate-input could potentially be a bigger win than the data predicts. Here are some running times on my machine (running Mac OS X) for the important benchmarks/sets of benchmarks: Total Total Total Total Total Ideal program concat concat successful failed percent time time realloc realloc grow of time time time time saved ------------------------------------------------------------------------------- sunspider 914.52 42.8 13.76 9.09 8.24 2.41% sunspider string tests 299.28 36.46 13.11 8.58 5.81 6.32% v8 tests 8908.7 10.98 0.26 0.22 5.99 0.07% regexp-dna 50.01 0.73 0.02 0.02 0.28 0.61% string-base64 19.14 8.4 4.16 3.78 0 21.76% string-fasta 40.85 2.46 0 0 0.82 2.01% string-tagcloud 106.7 3.11 1.18 0.88 0.51 1.58% string-unpack-code 101.82 2.19 0 0 1.17 1.15% string-validate-input 43.34 21.38 8.29 3.97 2.61 25.15% Another observation is that several of the v8 tests have absolutely no string concatenation, which explains why the percentage is so low compared to sunspider. All units are in milliseconds, except for the last column. All of the time numbers were generated by putting rdtsc around the relevant parts of code. The concat time was measured separately from the rest. To convert from rdtsc units to milliseconds, I timed a run of v8 and converted from there. The digits of precision for these numbers is misleading; they were somewhat unstable (maybe accurate to +/- 10%), and, of course, the relative values matter more than the absolute times. Total program time is the amount of time between the constructor and destructor of a global variable I added. Total concat time is the sum of the time through all runs of js_ConcatStrings. Total concat realloc time is the total time spent in the line: s = (jschar *) cx->realloc(ls, (ln + rn + 1) * sizeof(jschar)); Total successful realloc time is the total time spent in reallocs where the pointer returned back was the same one that was sent in. Total failed grow time is the time spent in malloc and strncpy in the case where left doesn't have a buffer to grow. The "ideal percent of time saved" is (column 3 + column 5) / column 1.
Did you measure in a shell or in the browser?
(In reply to comment #4) > Did you measure in a shell or in the browser? All of this was in the shell.
With jemalloc or a regular system malloc? Which OS?
Am I reading this right that ropes are predicted to save 22ms on SunSpider? That's pretty good--it will be more like 6% once we are fast.
(In reply to comment #6) > With jemalloc or a regular system malloc? Which OS? The numbers above are with system malloc on Mac OS X 10.6. Here are the same numbers for a few quick tests in compiled Firefox: string-base64 15 16.77 5.12 4.62 2.6 51.41% string-tagcloud 98 11.21 2.16 1.93 2.46 4.71% string-validate-input 32 21.87 7.22 2.56 4.13 35.45% The first column was taken using the JS Date instead of rdtsc, which is why it's possible for the numbers to say that more time is spent in concat than in the entire benchmark for the string-base64 test. Most likely, the conversion from rdtsc units to ms was a bit off (for that reason, the right column of those three rows probably can't be trusted), but it doesn't change the relative values for anything in the table in comment 3. (In reply to comment #7) > Am I reading this right that ropes are predicted to save 22ms on SunSpider? > That's pretty good--it will be more like 6% once we are fast. Yeah, that's what the numbers say. Again, these are rough numbers, and they definitely shouldn't be used as a comparison point for changes as small as 6%, but hopefully they give a rough idea of the amount that would be saved with ropes. The analysis also doesn't take into account things like difference in cache performance.
malloc in the browser is significantly more expensive because we have more than one thread, so libc starts locking. Fire up a 2nd thread in the shell (let it do a malloc and then sit idle) and measure again and you will see a big difference in realloc cost (taking the malloc lock). Also, the browser users jemalloc which has very different (and likely better) realloc behavior. So you have to be careful with that, too.
> Also, the browser users jemalloc Not on Mac, at least not yet.
Bug 571549 is the bug for implementing ropes.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.