make perf using v8/jsshell

RESOLVED FIXED

Status

L20n
JS Library
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: gandalf, Assigned: stas)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Currently make perf uses node which has different characteristics to v8/jsshell.

We should make it possible to test performance against pure engines.
(Reporter)

Comment 1

5 years ago
There are a few limitations here:

1) time resolution in jsshell/v8 is really low (Date.miliseconds) which makes it impossible to reliably test one cycle
2) no require

We need bug 886824 for 2) to use target shell, but even then, the closest we can get is 1000 reps for decent resolution and that triggers JIT.

One idea to avoid that would be to use node to fire jsshell/v8 on another js file that runs each (parse/compile/get) just once. This will avoid artificial jitting and will allow us to use node's high resolution timings.

The downside is that we'll get an overhead of running another process.
Depends on: 886824
(Reporter)

Comment 2

5 years ago
That port of ./tools/perf/index.js works with v8 and jsshell - http://pastebin.mozilla.org/2561840 but that's just a POC
(In reply to Zbigniew Braniecki [:gandalf] from comment #1)

> We need bug 886824 for 2) to use target shell, but even then, the closest we
> can get is 1000 reps for decent resolution and that triggers JIT.

We're doing 500x sampling anyways, so I guess JIT always compiles something?
(Reporter)

Comment 4

5 years ago
(In reply to Staś Małolepszy :stas from comment #3)

> We're doing 500x sampling anyways, so I guess JIT always compiles something?

Yeah, try the make perf with -s 1 and you'll see massive diff on parser.
(In reply to Zbigniew Braniecki [:gandalf] from comment #4)
> Yeah, try the make perf with -s 1 and you'll see massive diff on parser.

I know, but at least on my setup, I'm consistently getting 2.3-2.6 times smaller results with --sample 500.  That factor of 2.3-2.6 is what JIT must be giving us.  I tend to think that as long as the factor is stable, we can still get some insight out of the data.

There are three things that one might want to achieve, and they require different approaches:

 - compare impact of code changes:  you don't need to know how much time an operation takes exactly.  You're only interested deltas.  Jitting can get in the way, because you could write code that jits well, but in general, as long as all code is jitted, the deltas should be stable.  This way you know if you're making the code run faster or not.

 - compare how the same code runs in different engines:  again, jitting can influence the results.  I don't know by how much, so I can't say if deltas are enough.

 - know how much time a single operation takes:  this is only possible with HRT and should be run in separate instances of the engine to avoid jitting completely.  jsshell can run with jitting disabled, but v8 can't (it's always on).


(In reply to Zbigniew Braniecki [:gandalf] from comment #1)
> One idea to avoid that would be to use node to fire jsshell/v8 on another js
> file that runs each (parse/compile/get) just once. This will avoid
> artificial jitting and will allow us to use node's high resolution timings.
> 
> The downside is that we'll get an overhead of running another process.

I thought about this and the challenge is that the overhead that you mention makes node's HRT useless if it's available only to the outer process.  I experimented with a node module which spawns perf/index.js.  It takes at least 30ms-80ms to spawn (and also about 10MB of memory) as it launches an entirely new instance of V8.  So we're back to thinking in milliseconds.

So we'd need to probe the HRT in the inner process.   It seems that in vanilla V8/d8 there's nothing available.  In node, there's process.hrtime which is node's addition written in C++ [1].  In Firefox's JS shell, there's dateNow() [2].

Alternatively, we could run the tests in the browser, where window.performance.now() is available (in Firefox and Chrome alike).  However, automating this would be a hurdle.

I think that the best way to move forward is to rewrite perf/index.js to be a node script spawning:

 - child node processes, 100 times, each of which runs the benchmark only once (to avoid jitting);  time of each operation is measured with process.hrtime() in the child process and is reported back to the spawning script,

 - child jsshell process, 100 times, each of which runs the benchmark only once (to avoid jitting);  time of each operation is measured in the child process with dateNow() and is reported back to the spawning script.

With n=100, the whole benchmark should run for less than 5 seconds.


[1] https://github.com/joyent/node/blob/master/src/node.cc#L1731-L1762
[2] https://developer.mozilla.org/en-US/docs/SpiderMonkey/Hacking_Tips#Benchmarking_with_sub-milliseconds_%28JS_shell%29
(In reply to Staś Małolepszy :stas from comment #5)
> (In reply to Zbigniew Braniecki [:gandalf] from comment #4)
> > Yeah, try the make perf with -s 1 and you'll see massive diff on parser.
> 
> I know, but at least on my setup, I'm consistently getting 2.3-2.6 times
> smaller results with --sample 500.  That factor of 2.3-2.6 is what JIT must
> be giving us.  I tend to think that as long as the factor is stable, we can
> still get some insight out of the data.

Actually, this it not true.  I just found a bug in the benchmark code, and the results for getting entities where underestimated.

Getting with --sample 500 (i.e, with JIT) take 230 microseconds on average.  With -sample 1, it takes over 2 milliseconds, ten times as much!

I'm working on implementing the solution I outlined at the bottom of comment 5.
Created attachment 767711 [details] [diff] [review]
Use child_process.exec

I used child_process.exec to sequentially spawn node or jsshell processes which run a single iteration of the benchmark and report times in microseconds.  run.js then calculates the means and stdevs.

I changed the default sample size to 150, which should give us results with 99% confidence level and +/- 50 microseconds confidence interval.

This patch requires the new patch I submitted in bug 886824.

Can you take a look at the code and the results?  I think I got the milli/micro/nanoseconds conversions right.  If so, the results on SpiderMonkey are not pretty:

(V8) $ make perf
parse: 
  mean:   779.75 μs
  stdev:  172.44
  sample: 150
compile: 
  mean:   471.64 μs
  stdev:  71.85
  sample: 150
get: 
  mean:   2247.31 μs
  stdev:  219.31
  sample: 150


(SpiderMonkey) $ make perf JSSHELL=~/moz/l20n/jsshell/js
parse: 
  mean:   2723.59 μs (+258%)
  stdev:  257.88
  sample: 150
compile: 
  mean:   1165.41 μs (+150%)
  stdev:  165.75
  sample: 150
get: 
  mean:   4613.57 μs (+107%)
  stdev:  459.67
  sample: 150
Assignee: nobody → stas
Status: NEW → ASSIGNED
Attachment #767711 - Flags: review?(gandalf)
(In reply to Staś Małolepszy :stas from comment #7)

> (V8) $ make perf
> parse: 
>   mean:   779.75 μs
>   stdev:  172.44
>   sample: 150
> compile: 
>   mean:   471.64 μs
>   stdev:  71.85
>   sample: 150
> get: 
>   mean:   2247.31 μs
>   stdev:  219.31
>   sample: 150

I'm lost.  I created a d8-specific benchmakr file, based on benchmark.jsshell.js but s/dateNow/Date.now/.  I know this only reports in milliseconds, but with --sample 150, I'm able to get a little bit more resolution:

./run.js d8 benchmark.d8.js
parse: 
  mean:   2040
  stdev:  476.01
  sample: 150
compile: 
  mean:   733.33
  stdev:  741.26
  sample: 150
get: 
  mean:   4866.67
  stdev:  3344.65
  sample: 150

This is much closer to SpiderMonkey...
(Reporter)

Updated

5 years ago
Attachment #767711 - Flags: review?(gandalf) → review+
https://github.com/l20n/l20n.js/commit/53a5e92d5f2d35152d07c0515992c84aca3e8f62
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.