Closed
Bug 545303
Opened 14 years ago
Closed 14 years ago
TM: non API function for JS_CallTracer
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: gwagner, Assigned: gwagner)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(5 files, 3 obsolete files)
Luke noticed that all the GC JS_CALL_*_TRACER macros and the TRACE_JSVALS macro call JS_CallTracer which is a public API function. So I made a non-public js_CallTracer which contains the body of JS_CallTracer and gets called by JS_CallTracer and changed all the internal uses to call js_CallTracer directly. The surprising outcome is: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.008x as fast 721.2ms +/- 0.2% 715.5ms +/- 0.3% significant ============================================================================= 3d: - 116.8ms +/- 0.4% 116.7ms +/- 0.5% cube: - 38.4ms +/- 1.0% 38.4ms +/- 1.0% morph: - 21.0ms +/- 0.0% 21.0ms +/- 0.0% raytrace: - 57.4ms +/- 0.6% 57.3ms +/- 0.6% access: ?? 95.3ms +/- 0.5% 95.4ms +/- 0.7% not conclusive: might be *1.001x as slow* binary-trees: - 19.7ms +/- 1.8% 19.6ms +/- 1.9% fannkuch: *1.018x as slow* 44.2ms +/- 0.7% 45.0ms +/- 0.0% significant nbody: - 19.5ms +/- 1.9% 19.5ms +/- 1.9% nsieve: 1.053x as fast 11.9ms +/- 1.9% 11.3ms +/- 3.1% significant bitops: ?? 31.1ms +/- 1.7% 31.3ms +/- 1.1% not conclusive: might be *1.006x as slow* 3bit-bits-in-byte: - 1.3ms +/- 26.6% 1.2ms +/- 25.1% bits-in-byte: *1.011x as slow* 9.0ms +/- 0.0% 9.1ms +/- 2.5% significant bitwise-and: ?? 1.8ms +/- 16.7% 2.0ms +/- 0.0% not conclusive: might be *1.111x as slow* nsieve-bits: - 19.0ms +/- 0.0% 19.0ms +/- 0.0% controlflow: *1.070x as slow* 7.1ms +/- 3.2% 7.6ms +/- 4.9% significant recursive: *1.070x as slow* 7.1ms +/- 3.2% 7.6ms +/- 4.9% significant crypto: 1.029x as fast 46.7ms +/- 2.0% 45.4ms +/- 1.8% significant aes: - 26.9ms +/- 3.4% 25.8ms +/- 2.9% md5: - 13.2ms +/- 2.3% 13.0ms +/- 0.0% sha1: - 6.6ms +/- 5.6% 6.6ms +/- 5.6% date: 1.019x as fast 104.6ms +/- 0.4% 102.7ms +/- 0.7% significant format-tofte: 1.023x as fast 54.5ms +/- 0.7% 53.3ms +/- 0.6% significant format-xparb: 1.014x as fast 50.1ms +/- 0.5% 49.4ms +/- 0.7% significant math: - 34.9ms +/- 2.5% 34.9ms +/- 1.8% cordic: ?? 15.5ms +/- 2.4% 15.6ms +/- 2.4% not conclusive: might be *1.006x as slow* partial-sums: ?? 12.7ms +/- 2.7% 12.8ms +/- 2.4% not conclusive: might be *1.008x as slow* spectral-norm: - 6.7ms +/- 5.2% 6.5ms +/- 5.8% regexp: ?? 36.1ms +/- 0.6% 36.3ms +/- 1.0% not conclusive: might be *1.006x as slow* dna: ?? 36.1ms +/- 0.6% 36.3ms +/- 1.0% not conclusive: might be *1.006x as slow* string: 1.014x as fast 248.6ms +/- 0.3% 245.2ms +/- 0.2% significant base64: ?? 11.2ms +/- 2.7% 11.4ms +/- 3.2% not conclusive: might be *1.018x as slow* fasta: 1.025x as fast 50.0ms +/- 0.0% 48.8ms +/- 0.6% significant tagcloud: 1.007x as fast 84.0ms +/- 0.0% 83.4ms +/- 0.4% significant unpack-code: 1.026x as fast 80.1ms +/- 0.5% 78.1ms +/- 0.3% significant validate-input: ?? 23.3ms +/- 1.5% 23.5ms +/- 1.6% not conclusive: might be *1.009x as slow*
Assignee | ||
Comment 1•14 years ago
|
||
Assignee: general → anygregor
Comment 2•14 years ago
|
||
Cool... I wasn't expecting SS to get any faster (I thought we didn't GC during SS?); what about the GC-heavy tests that you've been using to compare our GC to WebKit's in that other bug?
Assignee | ||
Comment 3•14 years ago
|
||
I modified one of Igors benchmark and run it on my Mac Pro: Processor Dual-Core Intel Xeon @ 2.66 GHz Number Of Processors: 2 test(); function test() { function generate_big_object_graph() { var root = {}; f(root, 17); return root; function f(parent, depth) { if (depth == 0) return; --depth; f(parent.a = {}, depth); f(parent.b = {}, depth); } } function f(obj) { with (obj) return arguments; } gc(); var start = Date.now(); x = null; x = f(generate_big_object_graph()); var create_end = Date.now(); gc(); var fullgc_end = Date.now(); x = null; gc(); var emptygc_end = Date.now(); var actual = ""; actual = " alloc: "+(create_end - start)+ "\n used : "+(fullgc_end - create_end) + "\n free : "+(emptygc_end - fullgc_end); print(actual); } Tip: alloc: 108 used : 22 free : 25 Patch: alloc: 107 used : 19 free : 24
Comment 4•14 years ago
|
||
So, if I am reading the benchmark correctly, the 'used' time is when there is a lot of marking going on, and that shows, depending on the variation in measurement, a 15% improvement. Cool!
Assignee | ||
Comment 5•14 years ago
|
||
Attachment #426155 -
Attachment is obsolete: true
Comment 6•14 years ago
|
||
Bonus points for renaming "Trace" to "GCTrace" or "GCScan" or something that won't be confusing forever. ;-) /be
Comment 7•14 years ago
|
||
(In reply to comment #2) > (I thought we didn't GC during SS?) Me too -- Gregor, what is the story? /be
(In reply to comment #6) > Bonus points for renaming "Trace" to "GCTrace" or "GCScan" or something that > won't be confusing forever. ;-) Holy hannah, yes.
Comment 9•14 years ago
|
||
(In reply to comment #6) > Bonus points for renaming "Trace" to "GCTrace" or "GCScan" or something that > won't be confusing forever. ;-) Lets rename it do the original name, js_CallMarker
Assignee | ||
Comment 10•14 years ago
|
||
I am very confused! Even with this little patch where I extract js_CallTracer and move the public JS_CallTracer function to jsapi.cpp I get 5-8 ms speedup on SS. I also tried it on Linux but nothing happens there. Could somebody try this on Mac as well?
Assignee | ||
Comment 11•14 years ago
|
||
How about js_CallGCMarker ?
Attachment #426266 -
Attachment is obsolete: true
Attachment #426358 -
Attachment is obsolete: true
Assignee | ||
Comment 12•14 years ago
|
||
I used this benchmark to collect more data on this patch: All numbers represent again cycles * 1E6. test(); function test() { function generate_big_object_graph() { var root = {}; f(root, 17); return root; function f(parent, depth) { if (depth == 0) return; --depth; f(parent.a = {}, depth); f(parent.b = {}, depth); } } function f(obj) { with (obj) return arguments; } for (var i = 0; i != 10; ++i) { gc(); x = null; x = f(generate_big_object_graph()); gc(); gc(); x = null; gc(); } }
Assignee | ||
Comment 13•14 years ago
|
||
The marking time reduces by about 5%.
Assignee | ||
Comment 14•14 years ago
|
||
Previous histograms were Mac results. These are Linux numbers. Amazing how the finalizer times vary between these two OS.
Assignee | ||
Comment 15•14 years ago
|
||
Assignee | ||
Comment 16•14 years ago
|
||
If I only had a PC I would be able to get some numbers for Windows... :)
Assignee | ||
Updated•14 years ago
|
Attachment #426781 -
Flags: review?(igor)
Comment 17•14 years ago
|
||
Comment on attachment 426781 [details] [diff] [review] patch js_locked_atom_tracer(JSDHashTable *tabl > JSAtomHashEntry *entry = TO_ATOM_ENTRY(hdr); > JSTracer *trc = (JSTracer *)arg; > > if (entry->keyAndFlags == 0) { > /* Ignore uninitialized entries during tracing. */ > return JS_DHASH_NEXT; > } > JS_SET_TRACING_INDEX(trc, "locked_atom", (size_t)number); >- JS_CallTracer(trc, ATOM_ENTRY_KEY(entry), >+ js_CallGCMarker(trc, ATOM_ENTRY_KEY(entry), > IS_STRING_TABLE(table) ? JSTRACE_STRING : JSTRACE_DOUBLE); Nit: align the second argument line.
Attachment #426781 -
Flags: review?(igor) → review+
Assignee | ||
Comment 18•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/3b25677f1fee
Assignee | ||
Updated•14 years ago
|
Whiteboard: fixed-in-tracemonkey
Comment 19•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/3b25677f1fee
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•