Closed
Bug 593931
Opened 14 years ago
Closed 14 years ago
inline js_GetPropertyHelper() and friends more aggressively
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 2 obsolete files)
5.56 KB,
patch
|
cdleary
:
review+
|
Details | Diff | Splinter Review |
I noticed in string-fasta that js_GetProperty() is called a *lot*. Furthemore, js_GetProperty() calls js_GetPropertyHelper(), which calls js_LookupPropertyWithFlags() and js_NativeGet(). None of these calls are inlined, and Cachegrind tells me that the call overhead for each of them is exceeding the amount of work going on within the functions themselves. Just as an experiment I created copies of js_GetPropertyHelper(), js_LookupPropertyWithFlags() and js_NativeGet(), marked them JS_ALWAYS_INLINE, and ensured that the hot path would call these inlined versions. (The patch is attached.) The results were much better than I'd hoped. Cachegrind says: --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 95.193 94.541 (1.007x) | 25.159 25.123 (1.001x) | 3d-cube | 57.410 57.410 (------) | 23.099 23.099 (------) | 3d-morph | 116.739 116.486 (1.002x) | 44.065 44.049 (------) | 3d-raytrace | 73.043 70.055 (1.043x) | 15.220 15.052 (1.011x) | access-binary- | 133.642 133.643 (------) | 82.806 82.806 (------) | access-fannkuc | 37.526 37.529 (------) | 16.700 16.700 (------) | access-nbody | 61.690 61.692 (------) | 28.901 28.901 (------) | access-nsieve | 16.349 16.350 (------) | 3.249 3.249 (------) | bitops-3bit-bi | 45.911 45.912 (------) | 32.701 32.701 (------) | bitops-bits-in | 24.779 24.781 (------) | 12.019 12.019 (------) | bitops-bitwise | 63.619 63.621 (------) | 38.301 38.301 (------) | bitops-nsieve- | 49.008 49.009 (------) | 19.438 19.438 (------) | controlflow-re | 47.619 47.607 (------) | 5.964 5.964 (------) | crypto-md5 | 30.330 30.331 (------) | 6.809 6.809 (------) | crypto-sha1 | 97.604 97.613 (------) | 17.295 17.295 (------) | date-format-to | 83.587 81.923 (1.020x) | 9.767 9.672 (1.010x) | date-format-xp | 54.274 54.275 (------) | 31.292 31.292 (------) | math-cordic | 29.549 29.795 (0.992x) | 6.232 6.462 (0.964x) | math-partial-s | 31.019 31.025 (------) | 13.319 13.319 (------) | math-spectral- | 58.528 58.528 (------) | 34.592 34.592 (------) | regexp-dna | 39.745 39.747 (------) | 9.408 9.408 (------) | string-base64 | 118.456 106.890 (1.108x) | 25.728 25.027 (1.028x) | string-fasta | 139.147 139.575 (0.997x) | 17.703 17.733 (0.998x) | string-tagclou | 180.727 175.309 (1.031x) | 21.897 21.590 (1.014x) | string-unpack- | 58.414 58.414 (------) | 11.706 11.706 (------) | string-validat ------- | 1743.920 1722.075 (1.013x) | 553.382 552.318 (1.002x) | all --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 1966.223 1966.107 (------) | 1735.163 1735.157 (------) | v8-crypto | 1557.612 1553.712 (1.003x) | 958.631 957.417 (1.001x) | v8-deltablue | 1812.821 1738.561 (1.043x) | 531.263 527.090 (1.008x) | v8-earley-boye | 1460.158 1432.332 (1.019x) | 293.350 291.761 (1.005x) | v8-raytrace | 873.751 873.738 (------) | 374.289 374.288 (------) | v8-regexp | 1213.670 1210.363 (1.003x) | 1162.236 1159.012 (1.003x) | v8-richards | 1321.328 1306.473 (1.011x) | 131.794 131.562 (1.002x) | v8-splay ------- | 10205.566 10081.289 (1.012x) | 5186.730 5176.290 (1.002x) | all Measurements on my laptop suggest that the timing improvements are even better, giving perhaps a 5-8ms win for SS on the AWFY machines. The timings for V8 were less conclusive, maybe a 30ms win; this surprised me because I thought property accesses were more important in V8. Now obviously these functions are (surprisingly!) large, they each have multiple callers, and just duplicating them is bad. But there must be some way to rearrange them so the hot cases can be inlined to avoid the calling overhead. js_NativeGet() looks like the easiest, it almost always returns after the shape->hasDefaultGetter() test. Furthermore, js_GetProperty() is mostly called from this function: JSBool getProperty(JSContext *cx, jsid id, js::Value *vp) { js::PropertyIdOp op = getOps()->getProperty; return (op ? op : js_GetProperty)(cx, this, id, vp); } and I can't help wondering if this could be done better as well. Eg. could 'op' be made to point to js_GetProperty() rather than being NULL? Then the test wouldn't be necessary. The same could be done with lots of the other object ops.
Assignee | ||
Comment 1•14 years ago
|
||
Assignee | ||
Comment 2•14 years ago
|
||
Comment on attachment 473875 [details] Sunspider timings Sorry, those sunspider timings were meant to go on bug 595048.
Attachment #473875 -
Attachment is obsolete: true
Comment 3•14 years ago
|
||
(In reply to comment #0) > Created attachment 472564 [details] [diff] [review] > patch (against TM 52850:9d7c084a9a56) > > I noticed in string-fasta that js_GetProperty() is called a *lot*. But this is addressed by other open bugs: bug 580138 (bhackett-owned with partial patch) bug 560989 (unowned, dup bait -- gal!) Not sayin' we don't need to clean up js_GetPropertyHelper (and much in jsobj.cpp for that matter) but if we want fasta wins, this may be the wrong path. The issue is that infernal for-in loop and the use of the enumerated property name in [] to index the "next" property. Lotta predictability, not quite a PIC use case (mega-morphic in general), still looks winnable. /be
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3) > (In reply to comment #0) > > Created attachment 472564 [details] [diff] [review] [details] > > patch (against TM 52850:9d7c084a9a56) > > > > I noticed in string-fasta that js_GetProperty() is called a *lot*. > > But this is addressed by other open bugs: > > bug 580138 (bhackett-owned with partial patch) > bug 560989 (unowned, dup bait -- gal!) > > Not sayin' we don't need to clean up js_GetPropertyHelper (and much in > jsobj.cpp for that matter) but if we want fasta wins, this may be the wrong > path. Good to know, but the inlining helps with some other cases too :)
Comment 5•14 years ago
|
||
Yeah, my studies of JSC and conversations with Olliej support inlining hot paths. The NATIVE_GET and NATIVE_SET ugly C macros in jsinterp.cpp date from an earlier inlining win. Don't want to overrotate on fasta's account, 'sall. It should get a more targeted fix (is getting, when bhackett returns from his search for the elusive GPO). /be
Assignee | ||
Comment 6•14 years ago
|
||
This patch creates always-inline variants of js_GetPropertyHelper(), js_LookupPropertyWithFlags() and js_NativeGet(). Cachegrind results: --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 101.241 100.576 (1.007x) | 47.530 47.494 (1.001x) | 3d-cube | 47.191 47.185 (------) | 25.565 25.565 (------) | 3d-morph | 108.085 107.784 (1.003x) | 43.385 43.368 (------) | 3d-raytrace | 44.014 41.022 (1.073x) | 14.868 14.699 (1.011x) | access-binary- | 112.003 111.996 (------) | 92.713 92.713 (------) | access-fannkuc | 38.313 38.292 (1.001x) | 17.191 17.190 (------) | access-nbody | 48.416 48.410 (------) | 28.185 28.185 (------) | access-nsieve | 16.376 16.370 (------) | 3.250 3.250 (------) | bitops-3bit-bi | 45.760 45.754 (------) | 32.524 32.524 (------) | bitops-bits-in | 24.795 24.789 (------) | 12.020 12.020 (------) | bitops-bitwise | 52.237 52.231 (------) | 37.968 37.968 (------) | bitops-nsieve- | 30.672 30.666 (------) | 17.541 17.541 (------) | controlflow-re | 47.544 47.543 (------) | 6.368 6.368 (------) | crypto-md5 | 30.179 30.171 (------) | 7.046 7.046 (------) | crypto-sha1 | 78.958 78.943 (------) | 22.471 22.471 (------) | date-format-to | 80.704 80.262 (1.006x) | 9.774 9.741 (1.003x) | date-format-xp | 54.165 54.155 (------) | 31.075 31.075 (------) | math-cordic | 29.387 29.626 (0.992x) | 6.237 6.468 (0.964x) | math-partial-s | 31.085 31.075 (------) | 13.416 13.416 (------) | math-spectral- | 58.289 58.283 (------) | 34.590 34.590 (------) | regexp-dna | 39.784 39.776 (------) | 9.414 9.414 (------) | string-base64 | 95.634 95.620 (------) | 24.845 24.845 (------) | string-fasta | 121.490 121.475 (------) | 17.444 17.444 (------) | string-tagclou | 174.195 168.893 (1.031x) | 21.614 21.316 (1.014x) | string-unpack- | 52.214 52.206 (------) | 8.604 8.604 (------) | string-validat ------- | 1562.742 1553.117 (1.006x) | 585.648 585.326 (1.001x) | all --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 1791.482 1791.358 (------) | 1628.001 1627.995 (------) | v8-crypto | 1640.548 1637.368 (1.002x) | 1109.452 1109.117 (------) | v8-deltablue | 1216.334 1140.729 (1.066x) | 528.420 524.162 (1.008x) | v8-earley-boye | 1246.646 1218.230 (1.023x) | 306.273 304.668 (1.005x) | v8-raytrace | 913.893 913.752 (------) | 374.975 374.968 (------) | v8-regexp | 1204.717 1204.406 (------) | 1163.749 1163.531 (------) | v8-richards | 1208.137 1201.750 (1.005x) | 127.181 126.666 (1.004x) | v8-splay ------- | 9221.761 9107.596 (1.013x) | 5238.054 5231.110 (1.001x) | all Timing-wise, it looks to be worth about 3ms on SS and maybe 50ms on V8. Thanks to cdleary for helping me realize that you could use wrapper functions like this to guarantee inlining at only some of the call sites of a function.
Attachment #472564 -
Attachment is obsolete: true
Attachment #476727 -
Flags: review?(cdleary)
Assignee | ||
Comment 7•14 years ago
|
||
cdleary: ping?
Comment 8•14 years ago
|
||
njn: sorry for the delay, reviewing now!
Comment 9•14 years ago
|
||
Comment on attachment 476727 [details] [diff] [review] patch (against TM 53640:dfcf5611ce02) Gotta love simple inlining wins!
Attachment #476727 -
Flags: review?(cdleary) → review+
Assignee | ||
Comment 10•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/46038daefd9f I re-ran Cachegrind, looks like the access-binary-trees improvement was cannibalized by a different patch (we're down to 24.6M instructions for it now), so the overall instruction count improvement dropped from 1.006x to 1.004x. Still not bad for an easy patch. I didn't get AWFY numbers because the regress-x86 page was disabled when I landed the patch :(
Whiteboard: fixed-in-tracemonkey
Comment 11•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/46038daefd9f
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•