Chromium 3x faster than Gecko, concatenating two local strings

VERIFIED WORKSFORME

Status

()

Core
JavaScript Engine
VERIFIED WORKSFORME
8 years ago
3 years ago

People

(Reporter: gandalf, Unassigned)

Tracking

({perf, testcase})

Trunk
perf, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
+++ This bug was initially created as a clone of Bug #157334 +++

Currently on XP and MacOS the testcase from bug 157334 (https://bugzilla.mozilla.org/attachment.cgi?id=96729) is showing us 3-4 times slower than Chromium.
(Reporter)

Updated

8 years ago
No longer blocks: 164421
No longer depends on: 157334, 608776
So I just benchmarked this in the shell (4M iterations) and got these times:
js -m: 175ms
jsc:   269ms
v8:    51ms

Since the benchmark doesn't use the result of the concatenation, this benchmark is purely timing rope creation / string allocation.

To confirm this, I changed the benchmark to store the result of the concatenation in an array (such that every concatenation is live).  Thus, the generational gc can't collect anything and doesn't get to get the big locality win.  Timings change to:
js -m: 310ms
jsc:   664ms
v8:    687ms

Tentatively depending on bug 619558.
Depends on: 619558

Comment 2

8 years ago
It is the GC that makes the first test slower, or it is just the rope creation/ string allocation itself, one has to ask?
Its the cache misses.  With their generational gc and a benchmark that only creates garbage, v8 keeps collecting and reallocating in the same region of memory, thereby avoiding lots of L2 misses.  We, on the other hand, march through new memory (thus taking L2 misses all the way) since our GCs are much less frequent.

To confirm this, I ran the original testcase (1M iterations) under cachegrind.  Our instruction count is actually *less* than v8's (so we are doing a good job generating code), but our simulated L2 miss rate is 8x v8's.
(In reply to comment #3)
> 
> Our instruction count is actually *less* than v8's (so we are doing a good job
> generating code), but our simulated L2 miss rate is 8x v8's.

Wow!  Nice analysis.  Good to see Cachegrind used to analyze cache misses, not just instruction counts :)
Why the JavaScript engine doesn't understand that the cycle is pointless?

However with UBOUND 100000000:
Firefox 7.0a2 - time = 7305
Chromium 15 - time = 156
(Windows 7 64bit)

Maybe is there some other optimizations that Chromium does?
(In reply to Marco Castelluccio from comment #5)
> Why the JavaScript engine doesn't understand that the cycle is pointless?

Are you asking about dead code elimination?

> However with UBOUND 100000000:
> Firefox 7.0a2 - time = 7305
> Chromium 15 - time = 156

What times do you get if you .push the result of each string concatenation in an array?
(In reply to Luke Wagner [:luke] from comment #6)
> Are you asking about dead code elimination?

Yes. For example LLVM, if you have a cycle like that (but with integers), gets that it's pointless and doesn't compile it.

> What times do you get if you .push the result of each string concatenation
> in an array?

In this case, Firefox (1076) is faster than Chromium (1850).

for(var i=0; i<=UBound; i++) {
  str = str1 + str2;
  array.push(str)
}
(In reply to Marco Castelluccio from comment #7)
> (In reply to Luke Wagner [:luke] from comment #6)
> > Are you asking about dead code elimination?
> 
> Yes.

Alright, then that would be a separate bug (one that is actually being considered for IonMonkey, btw), since this bug is about string concatentation which is clearly not happening if the loop is being thrown away.

> > What times do you get if you .push the result of each string concatenation
> > in an array?
> 
> In this case, Firefox (1076) is faster than Chromium (1850).

Right, then that would confirm comment 1.
(Reporter)

Comment 9

7 years ago
(In reply to Marco Castelluccio from comment #7)
> (In reply to Luke Wagner [:luke] from comment #6)
> > Are you asking about dead code elimination?
> 
> Yes. For example LLVM, if you have a cycle like that (but with integers),
> gets that it's pointless and doesn't compile it.

DCE is part of IonMonkey, read more here - http://blog.mozilla.com/dmandelin/2011/04/22/mozilla-javascript-2011/

We should revisit this testcase with IonMonkey :)

Comment 10

6 years ago
(In reply to Zbigniew Braniecki [:gandalf] from comment #9)
> (In reply to Marco Castelluccio from comment #7)
> > (In reply to Luke Wagner [:luke] from comment #6)
> > > Are you asking about dead code elimination?
> > 
> > Yes. For example LLVM, if you have a cycle like that (but with integers),
> > gets that it's pointless and doesn't compile it.
> 
> DCE is part of IonMonkey, read more here -
> http://blog.mozilla.com/dmandelin/2011/04/22/mozilla-javascript-2011/
> 
> We should revisit this testcase with IonMonkey :)

How far away is this?

Comment 11

6 years ago
Not sure what's this bug is about but based on my tests chrome is 17 times faster at the moment.
(In reply to Worcester12345 from comment #10)
> How far away is this?

IonMonkey is getting close.  You can try out an experimental build here:
  http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-ionmonkey/

(In reply to avada from comment #11)
> Not sure what's this bug is about but based on my tests chrome is 17 times
> faster at the moment.

Could you post the code you are measuring?  See also comment 1; JS engines don't concatenate when you add strings, but when you use them.

Comment 13

6 years ago
I used the linked attachment in post one added "9999999" made thirty tests. I got almost 300ms average. Loaded the test and Copied the nines into chrome, did the same thing and got 17ms average
Ok, same issue.  This bug isn't about concatenation, it's about allocating and GC'ing dead rope nodes.
(Reporter)

Comment 15

6 years ago
Test URL: https://bug157334.bugzilla.mozilla.org/attachment.cgi?id=96729
Platform: MacOS 10.7 64bit Core i7 2.3 Ghz, 8GB ram
UBOUND: 99999999
Runs: 10


Firefox Nightly 20120724 = 2925.4
Chrome Canary 22.0.1203 = 127.7
Firefox Nightly Ion 20120724 = 124.8

=================

modified to store str concatenation results in an array (no DCE):

  arr = [];  
  for(var i=0; i<=UBound; i++)
  {
      str = str1 + str2;
      arr.push(str);
  }


Firefox Nightly 20120724 = 526.3
Chrome Canary 22.0.1203 = 1600.9
Firefox Nightly Ion 20120724 = 181.2

Comment 16

5 years ago
IonMonkey fixed this specific test.
There are already other bugs open for concatenation speed.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
(Reporter)

Updated

5 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.