Closed Bug 488202 Opened 15 years ago Closed 13 years ago

Parallel JITing for TraceMonkey

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: shengnan.cong, Unassigned)

Details

Attachments

(3 files, 5 obsolete files)

Patch 15 years ago Shengnan Cong 50.67 KB, patch		Details \| Diff \| Splinter Review
CompilerThread.h 15 years ago Shengnan Cong 5.41 KB, text/plain		Details
CompilerThread.cpp 15 years ago Shengnan Cong 9.34 KB, text/plain		Details
Patch for jsregexp.cpp 15 years ago Shengnan Cong 3.44 KB, patch		Details \| Diff \| Splinter Review
Fix for regexp-dna slowdown 15 years ago Shengnan Cong 3.59 KB, patch		Details \| Diff \| Splinter Review
New patch 15 years ago Shengnan Cong 37.57 KB, patch		Details \| Diff \| Splinter Review
CompilerThread.h (to be added in /js/nanojit) 15 years ago Shengnan Cong 5.04 KB, patch		Details \| Diff \| Splinter Review
CompilerThread.cpp (to be added in /js/nanojit) 15 years ago Shengnan Cong 7.47 KB, patch		Details \| Diff \| Splinter Review

Shengnan Cong

Reporter

Description

•

15 years ago

I have been working on off-loading the JITing to another thread so that the interpreter does not need to pause for compilation. The current status is as follows:

On Windows, there is a reasonably good gains. On Mac, there is much slowdowns and I am still working on it.

Results from an Intel Core-2 Duo CPU T7300 @ 2.00GHz, 1.96 GB RAM running
Windows XP:

Sunspider Speedup 
sunspider/3d-raytrace.js              6.70% 
sunspider/3d-cube.js                  6.40% 
sunspider/access-nbody.js             4.70% 
sunspider/crypto-aes.js               4.00% 
sunspider/access-fannkuch.js          3.80% 
sunspider/crypto-sha1.js              2.20% 
sunspider/string-unpack-code.js       1.90% 
sunspider/access-binary-trees.js      1.70% 
sunspider/date-format-xparb.js        1.70% 
sunspider/3d-morph.js                 1.50% 
sunspider/date-format-tofte.js        1.50% 
sunspider/string-fasta.js             1.40% 
sunspider/math-spectral-norm.js       1.30% 
sunspider/bitops-nsieve-bits.js       0.70% 
sunspider/access-nsieve.js            0.50% 
sunspider/math-partial-sums.js        0.50% 
sunspider/controlflow-recursive.js    0.20% 
sunspider/string-validate-input.js    0.20% 
sunspider/string-tagcloud.js          0.00% 
sunspider/bitops-3bit-bits-in-byte.js -0.10% 
sunspider/crypto-md5.js               -0.20% 
sunspider/math-cordic.js              -0.20% 
sunspider/bitops-bits-in-byte.js      -0.50% 
sunspider/string-base64.js            -2.00% 
sunspider/bitops-bitwise-and.js       -3.80% 
sunspider/regexp-dna.js               -12.60% 

Results from Core 2 Duo 2.0GHz (2cores) + Mac OS X Leopard:
t/3d-raytrace.js                         13.00% 
t/access-nbody.js                        4.36% 
t/access-fannkuch.js                      3.60% 
t/crypto-aes.js                           1.35% 
t/controlflow-recursive.js                0.98% 
t/math-partial-sums.js                    0.79% 
t/3d-cube.js                              0.24% 
t/bitops-nsieve-bits.js                    -0.40% 
t/math-spectral-norm.js                    -1.06% 
t/3d-morph.js                              -1.10% 
t/crypto-md5.js                            -1.29% 
t/access-nsieve.js                         -1.32% 
t/bitops-bits-in-byte.js                   -1.46% 
t/crypto-sha1.js                           -1.63% 
t/math-cordic.js                           -2.62% 
t/date-format-tofte.js                     -3.30% 
t/access-binary-trees.js                   -4.03% 
t/date-format-xparb.js                     -4.63% 
t/string-fasta.js                          -6.16% 
t/bitops-3bit-bits-in-byte.js              -6.42% 
t/bitops-bitwise-and.js                    -7.42% 
t/string-validate-input.js                 -10.17% 
t/string-unpack-code.js                    -10.93% 
t/string-tagcloud.js                       -14.64% 
t/string-base64.js                         -20.60% 
t/regexp-dna.js                            -70.99% 

Notes: 1. The code is based on a snapshot of code of Mozilla-Central on Mar18. 
       2. The timing is obtained by running the benchmarks with js shell.
       3. There is occasionally crash on raytrace. (one out of 30 runs). 

Patch to come next.

Shengnan Cong

Reporter

Comment 1

•

15 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

-Parallel JITting enabled when defined PARALLEL_COMPILER in avmplus.h. -
-MEASURE_PAUSE is defined for timing. 
-On Mac, need to define DARWIN in avmplus.h and CompilerThread.h

Shengnan Cong

Reporter

Comment 2

•

15 years ago

Attached file CompilerThread.h (obsolete) — Details

Additional file: CompilerThread.h to be placed in js/src/nanojit

Shengnan Cong

Reporter

Comment 3

•

15 years ago

Attached file CompilerThread.cpp (obsolete) — Details

Additional file: CompilerThread.cpp to be placed in js/src/nanojit

David Mandelin [:dmandelin]

Comment 4

•

15 years ago

First let me summarize the design to see if I understand: 

The basic design is to have a thread-safe worklist of things to compile or patch. Where the old code compiled, the new code adds something to the worklist. A compiler worker thread compiles code as it enters the worklist. The worklist is implemented with condition variables. 

The main other change is that some things that used to be attached to the lirbuf must be attached to the fragment, because the lirbuf data may be overwritten by the time the compiler thread gets to it. (By the way, I think this is a good development and maybe we should store all relevant data in fragment-specific storage instead of relying on the lirbuf.)

--
What jumps out of the numbers to me is the slowdowns on regexp-dna. Maybe you should set it up so that using parallel recompilation can be controlled independently for the tracer and the regexp compiler. 

One thing to note is that in the current code, regexps are compiled on demand, i.e., just before the first time they are used in a match operation. In a parallel compilation setup, it probably makes more sense to queue them for compilation immediately after they are created.

Shengnan Cong

Reporter

Comment 5

•

15 years ago

Attached patch Patch for jsregexp.cpp (obsolete) — Details — Splinter Review

Modified jsregexp.cpp to queue the compilation of regexps earlier(before the match operation).

Shengnan Cong

Reporter

Comment 6

•

15 years ago

David, Thanks for the comments. I am not sure how to make the parallel compilation independent of the tracer. It seems to me that the type specialization done by the tracer is related to the compilation and could be hard to make them apart.

I agree with you that it would make more sense to queue the regexps earlier for compilation. I modified the code as the patch but it seems has no big change to the performance.

Shengnan Cong

Reporter

Comment 7

•

15 years ago

David, are you working on the patch? Please let me know if you need anything
from me. Thanks.

David Mandelin [:dmandelin]

Comment 8

•

15 years ago

Shengnan, I am not currently working on that patch. For now, I read it and liked it. If there is anything in particular you'd *like* me to help with, let me know.

Shengnan Cong

Reporter

Comment 9

•

15 years ago

Attached patch Fix for regexp-dna slowdown (obsolete) — Details — Splinter Review

I have found the reason for the regexp slowdowns. Basically, the interpreter for regexp is very slow. Interpreting even one iteration is much slower than waiting for the native ready and running the Jitted code. Although I put the regexps in compilation queue right after they are created, it may be still not early enough and may trigger the slow interpreter to go. 

So with the patch, I let the interpreter wait for the compilation if the native is not ready to avoid interpreting it. The performance of regexp has improved with the patch as below:
on Windows: from -12.60% to -0.30%
on Mac:     from -70.99% to -16.7%  

I am still working on optimizations. I will be on vacation next week and will resume the work the week after.

The

Attachment #372730 - Attachment is obsolete: true

Nochum Sossonko [:Natch]

Updated

•

15 years ago

Attachment #372495 - Attachment is patch: false

Nochum Sossonko [:Natch]

Updated

•

15 years ago

Attachment #372496 - Attachment is patch: false

Shengnan Cong

Reporter

Comment 10

•

15 years ago

Attached patch New patch — Details — Splinter Review

I just merged my changes for parallel JITing with the latest TraceMonkey. Now, on both Mac and Windows, we get reasonably good speedups from using the parallelism between the JIT and the interpreter. Sunspider numbers follow. The speedups show the gain of the parallelized TM over the existing sequential version of TM on Core-2 Duo systems. On Mac, we have speedups in the range of [-2% to 18%], while on Windows we have speedups in the range [-3% to 15%]. I am wondering whether there is any larger workload that I can test with.

 
Sunspider Test	      Mac	Windows	   
t/3d-raytrace.js	17.7%	14.9%	   
t/crypto-sha1.js	11.5%	10.3%	   
t/date-format-xparb.js	6.6%	4.2%	   
t/access-nbody.js	6.2%	8.6%	   
t/access-fannkuch.js	4.3%	2.0%	   
t/math-spectral-norm.js	3.7%	2.4%	   
t/bitops-nsieve-bits.js	3.3%	-0.5%	   
t/bitops-bitwise-and.js	3.2%	-2.6%	   
t/string-unpack-code.js	2.4%	-0.1%	   
t/crypto-aes.js	2.4%	3.4%	   
t/bitops-bits-in-byte.js	2.2%	-0.6%	   
t/crypto-md5.js	2.2%	3.5%	   
t/date-format-tofte.js	1.9%	-0.6%	   
t/3d-cube.js	1.4%	3.0%	   
t/string-validate-input.js	1.4%	3.4%	   
t/string-tagcloud.js	0.9%	3.2%	   
t/math-cordic.js	0.9%	0.6%	   
t/string-base64.js	0.3%	0.9%	   
t/regexp-dna.js	0.1%	-0.3%	   
t/math-partial-sums.js	0.0%	0.1%	   
t/3d-morph.js	-0.2%	0.3%	   
t/controlflow-recursive.js	-0.3%	-0.9%	   
t/string-fasta.js	-0.3%	-0.8%	   
t/access-nsieve.js	-0.4%	1.1%	   
t/access-binary-trees.js	-1.7%	1.5%	   
t/bitops-3bit-bits-in-byte.js	-2.1%	-0.5%

Attachment #372494 - Attachment is obsolete: true

Attachment #377772 - Attachment is obsolete: true

Shengnan Cong

Reporter

Comment 11

•

15 years ago

Attached patch CompilerThread.h (to be added in /js/nanojit) — Details — Splinter Review

Attachment #372495 - Attachment is obsolete: true

Shengnan Cong

Reporter

Comment 12

•

15 years ago

Attached patch CompilerThread.cpp (to be added in /js/nanojit) — Details — Splinter Review

Attachment #372496 - Attachment is obsolete: true

David Mandelin [:dmandelin]

Comment 13

•

15 years ago

Coool. For now I think besides SS we mainly have the v8 benchmarks (in our tree at js/src/v8), Dromaeo (http://dromaeo.com/), and Peacekeeper (http://service.futuremark.com/peacekeeper/index.action).

From a combination of looking at your data, talking to Brendan, and making stuff up ;-) I wonder if it is better to parallelize longer traces than shorter ones. It would be interesting to create a version that compiles in parallel only if the length of the trace is greater than K (in who knows what units--LIR instructions?) and tune that K.

Shengnan Cong

Reporter

Comment 14

•

15 years ago

Thanks for the pointers. I will try them and post the results.

Good suggestion. Since the interpreter checks whether the compiled code is ready when reaching back edges, it is possible that the compilation for a short trace finishes before the interpreter hits the back edge again. Parallel JIT does not show benifit in such cases. Actually in the new patch, I disabled the parallel JIT of regexp for the same reason. I will create a version as you suggested and update.

Ryan VanderMeulen [:RyanVM]

Comment 15

•

13 years ago

Obsolete with the removal of tracejit.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Parallel JITing for TraceMonkey

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: shengnan.cong, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files, 5 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Attachment

General

Description

File Name

Content Type