Closed
Bug 649576
Opened 10 years ago
Closed 10 years ago
Extricate JSHashTable from JSAtomList death grip
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: cdleary, Assigned: cdleary)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 20 obsolete files)
184.36 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
12.68 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
Per bug 647367 comment 2. On my mac mini the patch shows no slowdown on parsemark, but I'm increasingly convinced parsemark is useless. I plan to try a njn-style valgrind icount analysis, since that seemed to give much better measurements in his bugs. Potential improvements on WIP patch: - Get rid of ALE_NEXT hacks, migrate those uses to sensible AtomList API(s). - Rip out parse node recycling, clear decls across top level statements, and free the arena to the mark before the current top-level statement. - Recycle hashtables across top level statements. If the tables get larger than a certain threshold of space wastefulness, I think we should free them immediately. - Since the common case is a single piece of atom data (instead of a chain), make the hashtable value a tagged union of (single-data | arena-allocd-linked-list). This gets a little tricky because we declare |JSAtomListElement *|s in a number of places, so we would have to change all those sites to a value-type smart pointer that performs a re-lookup on a hashtable generation change. The benefit of this scheme is that we get exactly the js::HashTable performance in the common case.
![]() |
||
Comment 1•10 years ago
|
||
woot
Assignee | ||
Comment 2•10 years ago
|
||
Forgot to mention: the hash tables are context-alloc'd, so that reallocs can happen naturally without a weird exponential amount of useless hashtable space in the arena. The recycling would let us reuse the allocated tables for JSAtomLists in non-root JSTreeContexts. Since we don't really know (or care) when parse nodes die and forfeit their hash table memory, this is a "natural" recycling at the top level statement boundaries where we also know that we can safely pop the arena space.
![]() |
||
Comment 3•10 years ago
|
||
Parsemark the suite of tests is great. parsemark.py the script is less great. I did a Cachegrind run, results aren't good (ignore the "compiled" columns, they're irrelevant here): --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 12.041 12.322 (0.977x) | 158529 148488 (1.068x) | 280slides-comb | 45.938 47.021 (0.977x) | 802934 758036 (1.059x) | ball-pool-comb | 23.495 24.456 (0.961x) | 379459 355498 (1.067x) | dojo | 43.943 45.193 (0.972x) | 733092 697472 (1.051x) | effectgames-ga | 11.332 11.770 (0.963x) | 170125 157844 (1.078x) | ga | 239.103 249.239 (0.959x) | 4.834 4.472 (1.081x) | gmail-combined | 70.134 71.941 (0.975x) | 1.262 1.186 (1.064x) | gravity-combin | 17.037 17.668 (0.964x) | 263714 250121 (1.054x) | jquery-min | 29.269 30.023 (0.975x) | 499957 471808 (1.060x) | jsgb-combined | 43.855 45.375 (0.967x) | 759237 725382 (1.047x) | mochikit | 197.685 207.897 (0.951x) | 3.652 3.479 (1.050x) | pipio-combined | 10.148 10.403 (0.976x) | 122256 115829 (1.055x) | processing-twi | 35.129 35.588 (0.987x) | 319697 301276 (1.061x) | sunspider-comb | 14.421 14.965 (0.964x) | 224760 208180 (1.080x) | tetris-combine | 52.882 55.446 (0.954x) | 926760 875973 (1.058x) | twitter-combin | 54.672 56.361 (0.970x) | 915359 871601 (1.050x) | v8-combined | 7.967 8.181 (0.974x) | 100088 94473 (1.059x) | yui-min | 318.889 328.790 (0.970x) | 5.770 5.494 (1.050x) | zimbra-combine ------- | 1227.949 1272.649 (0.965x) | 21.895 20.663 (1.060x) | all I've attached the cg_diff output, which gives function-level changes. Note that inlining causes some funny results, eg: -5,628,012 /home/njn/moz/wsN/js/src/optg32/../jsatom.cpp:JSAtomList::rawLookup(JSAtom*, JSHashEntry**&) 4,658,748 /home/njn/moz/wsN/js/src/optg32/../jshashtable.h:JSAtomList::rawLookup(JSAtom*, JSAtomRemoval*) Here the counts for rawLookup are split across two files because rawLookup contains one or more calls to functions in jshashtable.h that have been inlined. Looks like the main problem is that you're calling malloc and free a lot more. Getting rid of parse node recycling would be cool if it doesn't hurt performance. Last I checked only about 20% of nodes are recycled on typical code.
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to comment #3) Thanks Nick! That's awesome data. I can reduce the allocation overhead in two ways: recycling the tables across top-level statements and arena allocing fewer nodes. I'll compare the arena stats before and after the patch to figure out which is more likely going to fix the allocation overhead.
Comment 5•10 years ago
|
||
I have a terrible idea about how to avoid generation checks on hash table entry references (on the grounds that branches are expensive): Have a datatype representing a reference to a table. Implementation: a pointer to a reference. However, these references add themselves to (on construction) and remove themselves from (on destruction) a list owned by the hash table. If we need to resize, we walk the list and fix up pointers immediately. The list would almost always be short, since the entry pointers don't usually live long. Probably overengineering; just occurred to me.
Comment 6•10 years ago
|
||
You know, it might be interesting to have that "entry reference" type just keep a ref count in the table when compiled with DEBUG, and assert if we ever add when the refcount is non-zero. It may be that we never hold references across an add, or could readily change the code to make that so.
Comment 7•10 years ago
|
||
(In reply to comment #0) > - Rip out parse node recycling, clear decls across top level statements, and > free the arena to the mark before the current top-level statement. > - Recycle hashtables across top level statements. If the tables get larger than > a certain threshold of space wastefulness, I think we should free them > immediately. > - Since the common case is a single piece of atom data (instead of a chain), > make the hashtable value a tagged union of (single-data | > arena-allocd-linked-list). Shouldn't these each be separate follow-on patches? And perhaps separate bugs? I wouldn't want to review the whole shebang at once.
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to comment #5, comment #6, and comment #7) Ideally, we could make it obvious that there are no pointers into table space kept across potential adds and bolster that with assertions to avoid future wrong-doing. There's no overhead associated with that. In a recursive descent parser, though, even with the assertions it's a bit tricky to ensure that it will be used correctly going forward -- calls into sub-productions seem innocent enough but can have unapparent call sequences that lead to adds. I don't think you want to rely on fuzzing catching newly introduced danging pointer vulnerabilities, but I may be a bit too far on the fearful-and-defensive side here. We can also measure the cost of doing nothing versus Jim's solution in comment 5 -- consing up smart-pointer-like structures on the stack seems like it shouldn't add much overhead and can guarantee us safety in the unlikely case. I usually do each step that creates a working, test-passing build as its own patch, with multiple of these in a single bug to create useful units of work. i.e. if we can't make this beneficial this without ripping out recycling, then that should be a part of this bug's patch queue. If not, I totally agree, we should make it a follow-up bug!
Comment 9•10 years ago
|
||
(In reply to comment #3) > Getting rid of parse node recycling would be cool if it doesn't hurt > performance. Last I checked only about 20% of nodes are recycled on typical > code. The reason I'd like to get rid of it is that the node recycling, combined with the decls retained between iterations of the big loop in compileScript, create a pretty complicated set of invariants that I'm afraid people are going to trip over. When I went in there last fall to clean up an apparently trivial quadratic time issue, it took me a lot of effort to figure out why things were working; it was much more subtle than I'd expected. See the comments for cleanFunctionList, those in PushNodeChildren, and PrepareNodeForMutation. The comment above RecycleTree beginning "You must not..." is the kind of thing I hope never to have to write.
Assignee | ||
Comment 10•10 years ago
|
||
Attachment #525633 -
Attachment is obsolete: true
Attachment #526162 -
Attachment is obsolete: true
Assignee | ||
Comment 11•10 years ago
|
||
This patch is much closer to performance parity. MochiKit.js 100.97% (slower) +696,246 icount processing_twitch.js 101.59% (slower) +229,873 icount jquery.min.js 101.77% (slower) +478,128 icount dojo.js 101.78% (slower) +670,803 icount jsgb_combined.js 101.32% (slower) +684,673 icount 280slides_combined.js 100.48% (slower) +83,094 icount gravity_combined.js 100.20% (slower) +222,919 icount twitter_combined.js 101.30% (slower) +1,231,109 icount v8_combined.js 101.15% (slower) +1,068,429 icount zimbra_combined.js 100.65% (slower) +3,505,647 icount tetris_combined.js 101.05% (slower) +238,876 icount effectgames_galaxy.js 100.62% (slower) +430,116 icount pipio_combined.js 101.52% (slower) +5,478,139 icount gmail_combined.js 100.26% (slower) +1,078,910 icount sunspider_combined.js 100.35% (slower) +196,844 icount yui-min.js 101.19% (slower) +125,771 icount ball_pool_combined.js 100.55% (slower) +404,643 icount ga.js 101.48% (slower) +258,201 icount Average percent: 101.01% Here's the valgrind-based breakdown: The good news is that parsing gmail is ~.6Mcycles faster, reflecting the increased performance of js::HashTable. However, we end up using ~2.6Mcycles on memset because of map recycling. We allocate ~1500 maps for the larger programs, and we see about a 1Mcycle bump in malloc time, 80% of which can be attributed to the the map allocation path. This comes out to 533 cycles per map, which I think is probably a little bit unrealistic. Not sure of the details of valgrind's malloc implementation and accounting -- Nick, any perspective there? I think I may still have a few performance ideas up my sleeve, so I'll spin on this for another day.
Attachment #535505 -
Attachment is obsolete: true
![]() |
||
Comment 12•10 years ago
|
||
(In reply to comment #11) > > We allocate ~1500 maps for the larger programs, and we see about a 1Mcycle > bump in malloc time, 80% of which can be attributed to the the map > allocation path. This comes out to 533 cycles per map, which I think is > probably a little bit unrealistic. Not sure of the details of valgrind's > malloc implementation and accounting -- Nick, any perspective there? If you're running Cachegrind (or Callgrind), Valgrind doesn't replace malloc with its own version. > I think I may still have a few performance ideas up my sleeve, so I'll spin > on this for another day. I'll do a measurement run on Monday with my scripts.
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to comment #12) > I'll do a measurement run on Monday with my scripts. Do your scripts do a more complex cycle estimation that just icount? If so, and it doesn't require valgrind-internals knowledge, I'd be interested in a blog entry or quick how-to about you set it up! I'm actually trying a different tact that I hope will avoid most of the heavy map-clear overhead. I'm kind of obsessed with making this patch viable, so I should have something soon.
![]() |
||
Comment 14•10 years ago
|
||
(In reply to comment #13) > (In reply to comment #12) > > I'll do a measurement run on Monday with my scripts. > > Do your scripts do a more complex cycle estimation that just icount? No, but they can give you per-line icount (and cache/branch stats) and also give you function-level diffs of those stats. > If so, > and it doesn't require valgrind-internals knowledge, I'd be interested in a > blog entry or quick how-to about you set it up! See bug 555106. I'm not sure how many bells and whistles I've added to my versions of the scripts since that bug was last modified.
Assignee | ||
Comment 15•10 years ago
|
||
(In reply to comment #14) > No, but they can give you per-line icount (and cache/branch stats) and also > give you function-level diffs of those stats. Ah okay, thanks for the background! I just have a thing that does simple icount differentials for parsemark, since I can't get good execution-time variance out of that suite running it directly. http://hg.mozilla.org/users/cleary_mozilla.com/spidermonkey_helpers/file/tip/vg_diff.py Despite the coarseness of the measurement, I intend to make it go down! Or, failing at that, at least demonstrate that the real-world performance is improved using luke's instrumentation branch. I'd like to win on all fronts, if possible, though. :-)
![]() |
||
Comment 16•10 years ago
|
||
I tried measuring this. First, it wouldn't build with optimization enabled on my Linux box. I had to move the JSParseMapPool destructor into jsatominlines.h. Second, it crashed here: ==3959== Invalid read of size 4 ==3959== at 0x80AD3FB: JSParseMapPool::allocate() (jshashtable.h:89) ==3959== by 0x811E3D2: js::Parser::parse(JSObject*) (jsatom.h:1042) ==3959== by 0x804B19E: Parse(JSContext*, unsigned int, unsigned long long*) (js.cpp:4486) ==3959== by 0x8295D19: js::Interpret(JSContext*, js::StackFrame*, unsigned int, js::InterpMode) (jscntxtinlines.h:277) ==3959== by 0x80C7390: js::RunScript(JSContext*, JSScript*, js::StackFrame*) (jsinterp.cpp:613) ==3959== by 0x80C765C: js::Execute(JSContext*, JSObject&, JSScript*, js::StackFrame*, unsigned int, js::Value*) (jsinterp.cpp:974) ==3959== by 0x805AE90: JS_ExecuteScript (jsapi.cpp:4935) ==3959== by 0x804E22E: Process(JSContext*, JSObject*, char*, int) (js.cpp:452) ==3959== by 0x804ED81: Shell(JSContext*, int, char**, char**) (js.cpp:975) ==3959== by 0x804EFD2: main (js.cpp:6051) ==3959== Address 0x78 is not stack'd, malloc'd or (recently) free'd I'll wait for a new patch.
Assignee | ||
Comment 17•10 years ago
|
||
Okay, this one seems a lot more promising -- linear map that turns itself into a hash table after an insertion count threshold. I haven't really performance tuned it yet and the Valgrind icount increase is generally under 0.3%. When we take data locality into account I think this will be a clear winner, but I'll look at some stats to see if there are any obvious tweaks I can make to match the usage behavior. 280slides_combined.js 0.33% (slower) icount +57,153 / 17,157,259 MochiKit.js 0.23% (slower) icount +162,445 / 71,778,386 ball_pool_combined.js 0.40% (slower) icount +296,220 / 73,141,408 dojo.js 0.53% (slower) icount +199,193 / 37,588,057 effectgames_galaxy.js 0.31% (slower) icount +210,861 / 69,104,363 ga.js 0.49% (slower) icount +85,492 / 17,500,568 gmail_combined.js 0.10% (slower) icount +403,481 / 422,632,513 gravity_combined.js 0.29% (slower) icount +328,324 / 112,934,758 jquery.min.js 0.64% (slower) icount +173,079 / 26,944,963 jsgb_combined.js 0.48% (slower) icount +247,726 / 51,749,415 pipio_combined.js 0.21% (slower) icount +741,187 / 360,758,754 processing_twitch.js 0.39% (slower) icount +57,071 / 14,468,190 sunspider_combined.js 0.15% (slower) icount +81,945 / 55,920,531 tetris_combined.js 0.36% (slower) icount +81,357 / 22,761,274 twitter_combined.js 0.26% (slower) icount +247,818 / 94,875,359 v8_combined.js 0.36% (slower) icount +332,863 / 92,914,476 yui-min.js 0.16% (slower) icount +16,795 / 10,574,145 zimbra_combined.js 0.29% (slower) icount +1,547,724 / 537,160,070 Average percent: 0.33% Nick, this is diffed against qparent and definitely builds on my machine, let me know if you have any other issues.
Attachment #535742 -
Attachment is obsolete: true
Assignee | ||
Comment 18•10 years ago
|
||
A few additional things I forgot to note: 1) I'm working through some reftest failures. 2) This approach uses tagged unions for everything to match the existing map interface and to avoid create dual-interfaces (i.e. LinMap::asLin() and Linmap::asMap() based on LinMap::isLin()) -- that would make it a bit harder to use, as after every mutating operation you would have to re-query what interface you should be using. I'm not totally sure how good modern compilers are at tracking bit-level constness (i.e. inlining and culling out obvious tagged pointer relationships due to bit-const propagation), and a tagged-union-heavy approach like this may not be as great on compilers that don't have that capability. I think I remember a presentation that said it was implemented in GCC 4.5, but I would have to check, look it up for MSVC, and test perf on OS X which is stuck on the old version.
Assignee | ||
Comment 19•10 years ago
|
||
Fixed reftests, but I break a bunch of strict aliasing constraints. Tough to type-pun stuff with constructors (like the Map::Ptr types).
Attachment #536154 -
Attachment is obsolete: true
Assignee | ||
Comment 20•10 years ago
|
||
Quarter million cycles faster on gmail_combined.js according to callgrind.
Attachment #536179 -
Attachment is obsolete: true
Assignee | ||
Comment 21•10 years ago
|
||
Comparing 32b opt --disable-methodjit --disable-tracejit, running prog: "compile(snarf('$filename'))" comparison off revision 4e334541e521 I get the following results: 280slides_combined.js -0.78% (faster) icount -128,673 / 16,417,487 MochiKit.js -1.04% (faster) icount -689,039 / 66,163,309 ball_pool_combined.js -0.82% (faster) icount -561,317 / 68,663,748 dojo.js -1.00% (faster) icount -339,026 / 33,804,941 effectgames_galaxy.js -0.96% (faster) icount -642,047 / 66,643,380 ga.js -1.09% (faster) icount -166,734 / 15,338,031 gmail_combined.js -1.87% (faster) icount -6,994,013 / 374,530,208 gravity_combined.js -1.22% (faster) icount -1,298,662 / 106,731,963 jquery.min.js -0.83% (faster) icount -202,778 / 24,287,716 jsgb_combined.js -0.65% (faster) icount -293,013 / 45,003,228 pipio_combined.js -1.28% (faster) icount -4,023,711 / 314,805,480 processing_twitch.js -0.36% (faster) icount -47,304 / 13,202,756 sunspider_combined.js -0.30% (faster) icount -172,029 / 57,664,180 tetris_combined.js -1.20% (faster) icount -243,529 / 20,280,597 twitter_combined.js -1.30% (faster) icount -1,083,595 / 83,141,004 v8_combined.js -0.64% (faster) icount -543,438 / 84,895,403 yui-min.js -0.88% (faster) icount -84,607 / 9,614,643 zimbra_combined.js -0.76% (faster) icount -3,852,091 / 509,705,928 memcheck comes up clean, except for a free/delete mismatch in regexp stuff (unrelated). There are a few more things to try (e.g. bifurcate the LinMap interface and see how much faster we get without tagged unions) and I'm collecting stats on the usage patterns to see if there's more perf we can get by using different recycling bins.
Attachment #536185 -
Attachment is obsolete: true
Comment 22•10 years ago
|
||
Good stuff. Maybe also worth a quick check of D1 misses (D1mr, D1mw) and mispredicts (Bcm, Bim) ? Since you can get them at zero hassle from cachegrind, and they're ungood if they happen.
Comment 23•10 years ago
|
||
Yeah, in this day and age, with a good benchmark dataset, I'd think the cache miss figures would be more significant than the instruction counts. Isn't that so?
Assignee | ||
Comment 24•10 years ago
|
||
Looks like I1 and D1 misses go up, which is surprising, since the JSAtomList is generally using data strewn around parts of the arena allocated space.
Attachment #536228 -
Attachment is obsolete: true
Assignee | ||
Comment 25•10 years ago
|
||
Branch mispredict rate is down, though, according to branch-sim.
Attachment #536378 -
Attachment is obsolete: true
![]() |
||
Comment 26•10 years ago
|
||
(In reply to comment #23) > Yeah, in this day and age, with a good benchmark dataset, I'd think the > cache miss figures would be more significant than the instruction counts. > Isn't that so? Not really. You can always find exceptions, but Icount is usually tracks speed pretty closely, IME.
Assignee | ||
Comment 27•10 years ago
|
||
Lazily initializes the kinds of maps that are mostly dead (i.e. are taken from the pool and returned without any operations being performed on them). 1.5% icount decrease for gmail_combined.js (32b no methodjit no tracejit).
Attachment #536408 -
Attachment is obsolete: true
Assignee | ||
Comment 28•10 years ago
|
||
Tried adding chunk allocation and it brings the gmail_combined.js to 1.7% icount decrease. I think there's a lot of smaller wins like this that we can get with the new approach, but I have to get back to GC work, so I'm going to just try the dual interface approach before cleaning this patch up for review. For posterity, additional optimization ideas are: - bucketized recycling: how many times does a map get reused? can we anticipate the context in which we'll want to recycle a map instead of something in the linear state? - noshrink on maps: removal may be costing us some unnecessary table downsizes - tweaking: chunk allocation size growth, as given in this patch, and number of inline elements (potentially to be combined with bucketizing)
Assignee | ||
Comment 29•10 years ago
|
||
Forgot to update compileFunctionBody similarly to compileScript, which prevented browser runs from working.
Attachment #536533 -
Attachment is obsolete: true
Assignee | ||
Comment 30•10 years ago
|
||
Assignee | ||
Comment 31•10 years ago
|
||
Fixes errors related to strict aliasing violations and eliminates some unnecessary tagged union overhead. I made a thinko WRT the strict aliasing correctness in the previous patches. This one drops the icount for gmail_combined.js by 2.22% (8.7Minst). f?=luke for measure branch! qparent is b36861bd7a01
Attachment #536529 -
Attachment is obsolete: true
Attachment #536711 -
Attachment is obsolete: true
Attachment #536732 -
Attachment is obsolete: true
Attachment #536845 -
Flags: feedback?(luke)
Assignee | ||
Comment 32•10 years ago
|
||
Doesn't seem to make much difference when I try it on luke's measure branch, though there are only 13,000 total map allocations this way loading tech crunch. After luke gives his feedback on measure perf I'm gonna button this up. We know it's at least close to parity and have good reason to believe it will be faster. Let's get this sucker in the tree and iterate on it.
Attachment #536845 -
Attachment is obsolete: true
Attachment #536845 -
Flags: feedback?(luke)
Attachment #537092 -
Flags: feedback?(luke)
![]() |
||
Comment 33•10 years ago
|
||
Comment on attachment 537092 [details] [diff] [review] WIP: using LinMap and GC-based recycling I tried the most recent patch out and was able to get low enough variance to measure some distinct changes. (I disabled flash for starters :) With the previous patch (comment 31), there was a distinct 4 ms increase in compilation time on startup over trunk (from 18ms to 22ms) (this after the fastload cache was in effect; without the fastload cache, parse/emit is ~120ms). This patch puts timing exactly back to trunk 18ms. (Now, comparing to trunk) For techcrunch (which, btw, spends 1.3 SECONDS in parse/emit!), I see a fairly distinct ~20ms improvement. For Gmail, there is a fairly distinct ~5ms improvement, and for Zimbra, a ~5-10ms improvement. Nice work!
Attachment #537092 -
Flags: feedback?(luke) → feedback+
Assignee | ||
Comment 34•10 years ago
|
||
(In reply to comment #33) Joy! Will clean up and push to try. Review incoming.
Comment 35•10 years ago
|
||
Nice, good to see the old C-era fusion of the chained hash table and atom list yield to sweet C++, with wins and science and stuff. Is that TechCrunch compiler benchmark filed yet? That is one to profile for sure. /be
![]() |
||
Comment 36•10 years ago
|
||
The front page of techcrunch.com is horrific. It creates over 70 compartments and has an absurd number of iframes.
![]() |
||
Comment 37•10 years ago
|
||
I want to measure this, but the patch doesn't apply cleanly to TM 69771:7e6f3b179644. cdleary, can you update or give me instructions on how to apply? Thanks.
Assignee | ||
Comment 38•10 years ago
|
||
This is the patch I just pushed to try. f?= njn, since he said he wanted to check it out. Should be up for review in a few hours, just gotta do some tidying.
Attachment #537092 -
Attachment is obsolete: true
Attachment #538541 -
Flags: feedback?(nnethercote)
![]() |
||
Comment 39•10 years ago
|
||
Thanks! I won't be able to measure these until Tuesday morning Oz time; it's a public holiday here on Monday. Since you and Luke both have measured good improvements I'm happy if you want to land the patch before then. I'll post the results I get, hopefully they'll be positive.
Assignee | ||
Comment 40•10 years ago
|
||
Attachment #538541 -
Attachment is obsolete: true
Attachment #538541 -
Flags: feedback?(nnethercote)
Attachment #538617 -
Flags: review?(luke)
![]() |
||
Comment 41•10 years ago
|
||
I get this: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 12.874 12.851 (1.002x) | 156034 140882 (1.108x) | 280slides-comb | 48.727 48.782 (0.999x) | 795333 728796 (1.091x) | ball-pool-comb | 24.401 24.298 (1.004x) | 377107 337640 (1.117x) | dojo | 49.296 49.283 (------) | 714948 660491 (1.082x) | effectgames-ga | 10.802 10.758 (1.004x) | 167568 147764 (1.134x) | ga | 246.473 245.548 (1.004x) | 4.815 4.256 (1.131x) | gmail-combined | 74.173 74.046 (1.002x) | 1.246 1.133 (1.100x) | gravity-combin | 17.728 17.666 (1.004x) | 258524 235208 (1.099x) | jquery-min | 31.803 31.767 (1.001x) | 496640 455069 (1.091x) | jsgb-combined | 47.342 47.290 (1.001x) | 751059 693689 (1.083x) | mochikit | 215.987 215.662 (1.002x) | 3.605 3.278 (1.100x) | pipio-combined | 10.651 10.636 (1.001x) | 118534 108208 (1.095x) | processing-twi | 50.606 50.596 (------) | 316631 290078 (1.092x) | sunspider-comb | 14.379 14.296 (1.006x) | 221512 195119 (1.135x) | tetris-combine | 57.609 57.432 (1.003x) | 912124 822615 (1.109x) | twitter-combin | 63.254 63.319 (0.999x) | 908620 838573 (1.084x) | v8-combined | 7.558 7.520 (1.005x) | 98370 88836 (1.107x) | yui-min | 356.488 356.868 (0.999x) | 5.682 5.245 (1.083x) | zimbra-combine ------- | 1340.161 1338.625 (1.001x) | 21.643 19.657 (1.101x) | all Good enough. BTW, can you put a comment at the top of LinMap.h explaining what a LinMap is? Looking through the code I think it's like a HashMap, except an array is used if it's small enough. But a one-sentence comment would have saved me having to work that out.
Comment 42•10 years ago
|
||
Don't let me be the boring-name guy, but LinMap isn't all that exciting. Could we have a name that spells out what this thing is better, without being unbearably long? /be
Comment 43•10 years ago
|
||
(In reply to comment #41) > BTW, can you put a comment at the top of LinMap.h explaining what a LinMap > is? Looking through the code I think it's like a HashMap, except an array > is used if it's small enough. But a one-sentence comment would have saved > me having to work that out. ChintzyMap!!!
Assignee | ||
Comment 44•10 years ago
|
||
(In reply to comment #41) > I get this: What program are you running under valgrind? With |compile(snarf($filename))| I got a few percent improvement on each.
![]() |
||
Comment 45•10 years ago
|
||
(In reply to comment #44) > > What program are you running under valgrind? With > |compile(snarf($filename))| I got a few percent improvement on each. I've been doing |parse(snarf($filename))|. The results in comment 41 are for a 32-bit Linux build. For 64-bit I get this: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 11.645 11.668 (0.998x) | 0 0 (------) | 280slides-comb | 44.626 44.816 (0.996x) | 0 0 (------) | ball-pool-comb | 22.128 22.219 (0.996x) | 0 0 (------) | dojo | 45.072 45.201 (0.997x) | 0 0 (------) | effectgames-ga | 9.747 9.792 (0.995x) | 0 0 (------) | ga | 225.913 226.145 (0.999x) | 0 0 (------) | gmail-combined | 67.782 67.964 (0.997x) | 0 0 (------) | gravity-combin | 16.071 16.146 (0.995x) | 0 0 (------) | jquery-min | 28.889 28.978 (0.997x) | 0 0 (------) | jsgb-combined | 43.263 43.447 (0.996x) | 0 0 (------) | mochikit | 197.389 198.058 (0.997x) | 0 0 (------) | pipio-combined | 9.615 9.647 (0.997x) | 0 0 (------) | processing-twi | 45.759 45.815 (0.999x) | 0 0 (------) | sunspider-comb | 13.003 13.041 (0.997x) | 0 0 (------) | tetris-combine | 52.403 52.686 (0.995x) | 0 0 (------) | twitter-combin | 57.621 57.836 (0.996x) | 0 0 (------) | v8-combined | 6.809 6.822 (0.998x) | 0 0 (------) | yui-min | 324.926 326.953 (0.994x) | 0 0 (------) | zimbra-combine ------- | 1222.671 1227.243 (0.996x) | 0 0 (------) | all So a net increase in instructions. Here are the top 10 entries in the function-level diff: -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- -4,079,961 /home/njn/moz/tmN/js/src/jsatom.cpp:JSAtomList::rawLookup(JSAtom*, JSHashEntry**&) 3,555,503 /home/njn/moz/tmN/js/src/o64/../mfbt/LinMap.h:js::Parser::primaryExpr(js::TokenKind, int) 2,130,408 /home/njn/moz/tmN/js/src/jsparse.cpp:_ZL11RecycleTreeP11JSParseNodeP13JSTreeContext.clone.207 -1,814,702 /home/njn/moz/tmN/js/src/jsatom.cpp:JSAtomList::add(js::Parser*, JSAtom*, JSAtomList::AddHow) -1,485,356 /home/njn/moz/tmN/js/src/jshash.cpp:JS_HashTableRawLookup 1,028,309 /home/njn/moz/tmN/js/src/o64/../mfbt/LinMap.h:LeaveFunction(JSParseNode*, JSTreeContext*, JSAtom*, unsigned int) -1,025,610 /home/njn/moz/tmN/js/src/jsparse.cpp:js::Parser::memberExpr(int) 602,625 /home/njn/moz/tmN/js/src/jshashtable.h:js::detail::HashTable<js::HashMap<JSAtom*, JSDefinition*, js::DefaultHasher<JSAtom*>, js::Conte xtAllocPolicy>::Entry, js::HashMap<JSAtom*, JSDefinition*, js::DefaultHasher<JSAtom*>, js::ContextAllocPolicy>::MapHashPolicy, js::ContextAllocPol icy>::lookupFo -565,149 /home/njn/moz/tmN/js/src/jsatom.cpp:js_alloc_temp_entry(void*, void const*) 517,378 /build/buildd/eglibc-N.13/malloc/malloc.c:_int_malloc
Assignee | ||
Comment 46•10 years ago
|
||
(In reply to comment #45) > I've been doing |parse(snarf($filename))|. Ah, okay! There are a few reasons why parse isn't as cool/applicable as compile here. - parse builds a full AST by running Parser::statements() at the top level. compile, on the other hand, is able to throw away the parse nodes after every top-level script statement, like when you call into the JSAPI functions that trigger compilation. This gives us real-world opportunities to optimize by reusing resources across top level statements (which also helps this patch). - parse, understandably, doesn't invoke the emitter. However, the optimizations made in this patch for the parser data structures are all carrying information from parse time to emit time. The efficiency of the resulting emitter access of those data structures is also pertinent in the calculation. The fact that we do, in a loop in compileScript, parse-then-fold-then-emit, means that the data structure usage for these hand-offs can actually be very important. - Given that the emitter is important, the emitter acts slightly differently in COMPILE_N_GO situations than when COMPILE_N_GO is disabled (tracking global slots to bind slot-access ops to). Some of the maps affected by this patch are used exclusively when COMPILE_N_GO is on in the emitter. With 3ab18e7ea3d5 COMPILE_N_GO is enabled for the compile builtin in the shell.
![]() |
||
Comment 47•10 years ago
|
||
Here are 32-bit results using compile(): --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 19.005 18.816 (1.010x) | 273499 245788 (1.113x) | 280slides-comb | 77.166 76.743 (1.006x) | 1.392 1.288 (1.081x) | ball-pool-comb | 39.966 39.631 (1.008x) | 648105 595174 (1.089x) | dojo | 74.536 73.860 (1.009x) | 1.231 1.135 (1.084x) | effectgames-ga | 18.502 18.330 (1.009x) | 295183 268362 (1.100x) | ga | 450.337 444.947 (1.012x) | 8.570 7.686 (1.115x) | gmail-combined | 119.184 118.117 (1.009x) | 2.205 2.034 (1.084x) | gravity-combin | 28.635 28.446 (1.007x) | 446620 413580 (1.080x) | jquery-min | 54.861 54.340 (1.010x) | 862931 788815 (1.094x) | jsgb-combined | 76.033 75.346 (1.009x) | 1.300 1.212 (1.073x) | mochikit | 381.542 377.491 (1.011x) | 6.417 5.885 (1.090x) | pipio-combined | 15.480 15.414 (1.004x) | 200344 187000 (1.071x) | processing-twi | 61.310 61.114 (1.003x) | 546636 507611 (1.077x) | sunspider-comb | 24.132 23.873 (1.011x) | 390848 354955 (1.101x) | tetris-combine | 100.654 99.520 (1.011x) | 1.623 1.483 (1.094x) | twitter-combin | 99.762 99.317 (1.004x) | 1.577 1.465 (1.076x) | v8-combined | 11.364 11.267 (1.009x) | 160279 147567 (1.086x) | yui-min | 574.182 571.508 (1.005x) | 10.234 9.494 (1.078x) | zimbra-combine ------- | 2226.660 2208.090 (1.008x) | 38.377 35.195 (1.090x) | all Much better! Thanks for the detailed comparison of parse() and compile(), I didn't know about that. I've changed my parsemark installation to use compile(); even though it measures emitting as well as parsing, it more accurately reflects what the browser is doing.
![]() |
||
Comment 48•10 years ago
|
||
Comment on attachment 538617 [details] [diff] [review] Extricate with zeal! Review of attachment 538617 [details] [diff] [review]: ----------------------------------------------------------------- Great job unraveling all this! In general, looking good so far, here is an incremental batch of comments: ::: js/src/jsapi.cpp @@ -4558,5 @@ > else > chars = InflateString(cx, bytes, &length); > if (!chars) > return JS_TRUE; > - You probably didn't mean to delete this well-meaning \n. ::: js/src/jshashtable.h @@ +585,5 @@ > + /* > + * N.B. Should only be called when the interface knows the entry may be > + * cleared as a POD type. > + */ > + void turboClear() I'd rather not add this method. Instead, we should be able to use IsPodType to automatically choose between the loop vs. memset. To do this we'll need to hoist HashTable::Entry and HashMap::Entry out of their respective templates so that we can specialize IsPodType inductively: template <class T> struct IsPodType<HashTableEntry<T> > { static const bool result = IsPodType<T>::result; } template <class K, class V> struct IsPodType<HashMapEntry<K, V> > { static const bool result = IsPodType<K>::result && IsPodType<V>::result; }; ::: js/src/jsprvtd.h @@ +195,5 @@ > } /* namespace js */ > > +typedef js::LinMap<JSAtom *, JSDefinition *, 24> JSAtomDefnMap; > +typedef js::LinMap<JSAtom *, jsatomid, 24> JSAtomIndexMap; > +typedef js::LinMap<JSAtom *, JSDefnOrHeader, 24> JSAtomDOHMap; I see a lot of JSAtomDOHMap::Range, ::AddPtr, etc. Perhaps you can have typedefs for AtomDOHRange, AtomDOHAddPtr, etc.? ::: js/src/mfbt/LinMap.h @@ +44,5 @@ > +#include "jshashtable.h" > + > +namespace js { > + > +static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1; Could this be a member static const of LinMap? @@ +48,5 @@ > +static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1; > + > +/* > + * If a zero value can be reserved for use by the LinMap as an "invalid" key, > + * it is usable by the LinMap. I think you want to strengthen this statement: "An element may only be used in a LinMap if zero is an invalid value (and thus may be used as a tombstone by LinMap)." @@ +56,5 @@ > + */ > +template <typename T> struct ZeroIsReserved { static const bool result = false; }; > +template <typename T> struct ZeroIsReserved<T *> { static const bool result = true; }; > + > +template <typename K, typename V, size_t inlineElems> Could you s/inlineElements/InlineElems/ to indicate compile-time const-ness. @@ +77,5 @@ > + InlineElem lin[inlineElems]; > + WordMap map; > + size_t linNext; > + size_t linCount; > + bool mapInitialized; You can remove this member and just use map.initialized(). @@ +101,5 @@ > + return false; > + mapInitialized = true; > + } > + > + for (InlineElem *it = lin; it != lin + linNext; ++it) { Is LICM good enough to hoist "lin + linNext"? There are function calls in the body which may make the compiler conservatively recompute every time. To be safe, let's just follow ye olde idiom: for (InlineElement *it = lin, *end = lin + linNext; ... @@ +110,5 @@ > + return false; > + } > + > + JS_ASSERT(map.count() == linCount); > + linNext = inlineElems + 1; JS_ASSERT(!usingMap()); here would be nice. @@ +123,5 @@ > + } > + > + ~LinMap() { > + if (mapInitialized) > + map.finish(); Shouldn't map's destructor handle this? @@ +150,5 @@ > + friend class LinMap; > + > + WordMapPtr mapPtr; > + InlineElem *linPtr; > + bool isLinPtr; Could we tighten up the invariants in Ptr so that: found() === !!linPtr isLinPtr === linPtr != 0x1 Then we could remove isLinPtr (which doesn't matter space-wise, but it would allow found() to be branch-less and key() and value() to require one less load / keep one less register live. @@ +164,5 @@ > + Ptr(const Ptr &other) : isLinPtr(other.isLinPtr) { > + if (isLinPtr) > + linPtr = other.linPtr; > + else > + mapPtr = other.mapPtr; Default copy ctor should just work, without branching. ::: js/src/parser/AtomData.cpp @@ +98,5 @@ > + return true; > +} > + > +JSAtomDeclNode * > +JSAtomDecls::lastAsNode(JSAtomDOHMap::AddPtr &p) IIUC, you can just take a "JSDefnOrHeader &" instead of the AddPtr type. ::: js/src/parser/AtomData.h @@ +38,5 @@ > + * > + * ***** END LICENSE BLOCK ***** */ > + > +#ifndef AtomData_h__ > +#define AtomData_h__ I would prefer you merged the AtomData files with the ParseMapPool files since they are all closely associated concepts. ParseMaps{.h, .cpp, -inl.h} makes sense to me. Also, can you strip the JS prefix from JSAtomThingMapPtr, JSDefnOrHeader, JSAtomDecls, JSMultiDeclLookup, and JSAtomDeclsLookup and put them in namespace js? @@ +49,5 @@ > + * name may reference. If a with block or a function that calls eval encloses > + * the use, the name may end up referring to something else at runtime. > + */ > + > +/* Note: POD-type. */ Seems like a weird place to put this comment. Perhaps merge with above comment and explain why this is necessary. @@ +61,5 @@ > + void finish(JSContext *cx); > + > + bool hasMap() const { return !!map; } > + Map *getMap() { return map; } > + void setMap(Map *newMap) { map = newMap; } JS_ASSERT(!map); @@ +80,5 @@ > + if (p) { > + p.value() = defn; > + return true; > + } > + return map->add(p, atom, defn); For consistency, rename to "put" and replace the body with HashMap::put. ::: js/src/parser/ParseMapPool-inl.h @@ +70,5 @@ > +/* Arbitrary atom map type, that has keys and values of the same kind. */ > +typedef JSAtomIndexMap AtomMapT; > + > +static AtomMapT * > +AsAtomMap(void *ptr) Seems like this can be a private static helper of ParseMapPool. ::: js/src/parser/ParseMapPool.cpp @@ +78,5 @@ > + return NULL; > + > + if (!all.append(map) || !recyclable.reserve(all.length())) { > + cx->delete_<AtomMapT>(map); > + return NULL; You could all.reserve/recycleable.reserve above new_ and then infallibleAppend after new_, that way you don't have to call free_ on the error path. ::: js/src/parser/ParseMapPool.h @@ +49,5 @@ > + * A pool that permits the reuse of the backing storage for the defn, index, or > + * defn-or-header (multi) maps. > + * > + * The pool owns all the maps that are given out, and is responsible for > + * relinquishing all resources when |destroyAll| is triggered. s/destroyAll/purgeAll/ @@ +53,5 @@ > + * relinquishing all resources when |destroyAll| is triggered. > + * > + * Tables that have changed size upwards are not recycled, because a |remove| > + * from one such hash table causes the table to change size downwards. However, > + * Performing frees on the (hot) release path is costly, so we use a delayed This seems like an impl detail that could just be put in the relevant member function. @@ +71,5 @@ > + void recycle(void *map) { > + JS_ASSERT(map); > +#ifdef DEBUG > + bool ok = false; > + /* Make sure the map is in all but not already in recyclable. */ Put '' or || around 'all'. @@ +104,5 @@ > + } > + > + /* Fallibly aquire one of the supported map types from the pool. */ > + template <typename T> > + T *acquire(); To be symmetric with release() and save typing, you could replace the current specialiation-based approach with simple inline overloads acquireAtomIndexMap(), acquireAtomDefnMap, etc.
Assignee | ||
Comment 49•10 years ago
|
||
This accounts for everything except the turboClear/HashTable::Entry changes, which I'll put in a separate patch once this one finishes getting reviewed for the rest of its contents, if that's okay.
Attachment #538617 -
Attachment is obsolete: true
Attachment #538617 -
Flags: review?(luke)
Attachment #540149 -
Flags: review?(luke)
Assignee | ||
Comment 50•10 years ago
|
||
Thinking about it, I think the parser/ subdir should be renamed to frontend/ so that parser/emitter stuff can live in there. I'll change that when this lands if there are no objections.
![]() |
||
Comment 51•10 years ago
|
||
Good point, 'parser' had bugged me but I hadn't thought of a name as good as 'frontend'.
![]() |
||
Comment 52•10 years ago
|
||
Comment on attachment 540149 [details] [diff] [review] Updated per review comments. Review of attachment 540149 [details] [diff] [review]: ----------------------------------------------------------------- I looked through almost everything and I really like the direction this patch has taken things. A lot less ALE mystery and a lot more static-y type-y goodness, so great. Almost done iterating :) ::: js/src/jsatom.cpp @@ +650,5 @@ > #define TEMP_SIZE_LIMIT JS_BIT(TEMP_SIZE_LIMIT_LOG2) > > JS_STATIC_ASSERT(TEMP_SIZE_START >= sizeof(JSHashTable)); > > +JS_NEVER_INLINE From IRL: remove profiling remnants and re-inline these to js_InitAtomMap. @@ +657,4 @@ > { > + for (const AtomIndexMap::InlineElem *it = indices->asLin(), *end = indices->linEnd(); > + it != end; ++it) { > + JSAtom *atom = (JSAtom *) it->key; I think it->key is already a JSAtom* without the cast. @@ +659,5 @@ > + it != end; ++it) { > + JSAtom *atom = (JSAtom *) it->key; > + if (!atom) > + continue; > + jsatomid index(it->value); it->value is also a jsatomid, so can just use inline below. @@ +673,5 @@ > + typedef AtomIndexMap::WordMap WordMap; > + const WordMap &wm = indices->asMap(); > + for (WordMap::Range r(wm.all()); !r.empty(); r.popFront()) { > + JSAtom *atom = (JSAtom *) r.front().key; > + jsatomid index(r.front().value); Ditto. ::: js/src/jsatom.h @@ -45,5 @@ > #include <stddef.h> > #include "jsversion.h" > #include "jsapi.h" > #include "jsprvtd.h" > -#include "jshash.h" Bam! ::: js/src/jscntxt.cpp @@ +90,5 @@ > #include "jscntxtinlines.h" > #include "jscompartment.h" > #include "jsobjinlines.h" > > +#include "parser/ParseMaps.h" Mega nit: I believe the style is to put xyz/Abc.h includes after the js*.h and before the js*inlines.h. @@ +327,5 @@ > > JS_InitArenaPool(&cx->tempPool, "temp", TEMP_POOL_CHUNK_SIZE, sizeof(jsdouble)); > JS_InitArenaPool(&cx->regExpPool, "regExp", TEMP_POOL_CHUNK_SIZE, sizeof(int)); > > + cx->freeMapPool = OffTheBooks::new_<ParseMapPool>(cx); Perhaps this could be done lazily (in the compiler init)? Also, could 'purge' just call 'delete cx->freeMapPool' and, if the JSContext goes unused for a long time (which sounds quite possible since there is currently context-per-outer-global-window) then we save memory. If we do this, then it would also make sense to make freeMapPool a private member and have: ParseMapPool &freeMapPool() { return *freeMapPool_; } bool ensureFreeMapPool(); @@ +1599,5 @@ > /* > * Release pool's arenas if the stackPool has existed for longer than the > * limit specified by gcEmptyArenaPoolLifespan. > */ > +static void Intentionally un-inlined or profiling remnant? Just asking. ::: js/src/jsemit.cpp @@ +138,4 @@ > { > + roLexdeps.clearMap(); > + globalMap.clearMap(); > + upvarIndices.clearMap(); It seems strange to me that all these have an 'init()' method, and its not the thing we are calling in 'init()'. Perhaps we could rename 'clearMap' to 'init' and rename 'init' to 'ensureMap'? Plus, ensureMap would, as the name implies, handle the already-hasMap case, so you could simplify this "if (!x.hasMap() && !x.init(context()))" pattern that appears in several places to just "if (!x.ensureMap())". @@ +153,5 @@ > + if (globalMap.hasMap()) > + globalMap.finish(cx); > + > + if (upvarIndices.hasMap()) > + upvarIndices.finish(cx); We are in a destructor so my spider senses say that no explicit 'finish' should be necessary. Now, AtomThingMapPtr needs to be union-able, so perhaps we could have a derived OwnedAtomThingMapPtr whose destructor calls finish(). @@ +2256,3 @@ > return JS_FALSE; > > + (void) index; Is GCC really warning you here and other places where you do this? ::: js/src/jsemit.h @@ +51,5 @@ > #include "jsprvtd.h" > #include "jspubtd.h" > > +#include "jsatominlines.h" > +#include "parser/ParseMaps.h" Ditto above header nit. @@ +349,5 @@ > JS_ASSERT(!inFunction()); > scopeChain_ = scopeChain; > } > > + js::AtomDefnMapPtr lexdeps; /* unresolved lexical name dependencies */ comment alignment ::: js/src/jsparse.cpp @@ +958,2 @@ > js::GlobalScope globalScope(cx, globalObj, &cg); > + Since you are touching this code anyway, could this decl be lowered below the 'if' and then the resulting conditiona+if sequence simplified? @@ +1611,5 @@ > if (!dn) > return NULL; > > + if (!tc->lexdeps->add(p, atom, dn)) > + return NULL; Can anything weird happen between the lookupForAdd and this 'add()' ? Also, would it make sense to hoist the 'add()' out of this function into its caller so that there was a visual pairing between lookupForAdd+add? That way MakePlaceholder is just a simple node-creating function. Light suggestion. @@ +2140,5 @@ > if (pn->pn_type == TOK_UPVARS) { > + AtomDefnMapPtr &upvars = pn->pn_names; > + JS_ASSERT(upvars->count() != 0); > + > + for (AtomDefnRange r(upvars->all()); !r.empty(); r.popFront()) { "r = upvars->all()" works and it a bit prettier (here and x5 below). ::: js/src/jsprvtd.h @@ +72,5 @@ > > /* Scalar typedefs. */ > +typedef uint8 jsbytecode; > +typedef uint8 jssrcnote; > +typedef uintptr_t jsatomid; Ideally, we'd have a nice little word-sized struct that gave jsatomid a cute little isX()/toX() discriminated-union interface. Since it looks like you have to touch most places that use jsatomid currently, can you do this? It looks like there are a lot of places that use jsatomid for no valid reason, and once you nix them, then there are only 24 uses in jsatom code so you could pull this out of jsprvtd.h and into jsatom.h. @@ +179,5 @@ > > +template <typename K, > + typename V, > + size_t inlineElems> > +class LinMap; Still renaming this, right? ::: js/src/mfbt/LinMap.h @@ +71,5 @@ > + > + InlineElem lin[InlineElems]; > + WordMap map; > + size_t linNext; > + size_t linCount; It seems like, for locality, we'd want 'lin' next to 'linNext' and 'linCount'. @@ +80,5 @@ > + JS_STATIC_ASSERT(sizeof(V) == sizeof(uintptr_t)); > + } > + > + bool usingMap() const { > + JS_STATIC_ASSERT(InlineElems + 1 > InlineElems); I think you can safely assume that noone is using InlineElements = size_t(-1) :) @@ +81,5 @@ > + } > + > + bool usingMap() const { > + JS_STATIC_ASSERT(InlineElems + 1 > InlineElems); > + return linNext > InlineElems && map.initialized(); Is it possible to get linNext > InlineElems and !map.initialized()? Even if the map.init() in switchToMap fails, we won't yet have updated the state, so we'll still have linNext <= InlineElems. @@ +99,5 @@ > + for (InlineElem *it = lin, *end = lin + linNext; it != end; ++it) { > + if (NULL == it->key) > + continue; > + WordMapAddPtr p = map.lookupForAdd(it->key); > + if (!map.add(p, it->key, it->value)) map.putNew() @@ +147,5 @@ > + > + typedef Ptr ******* ConvertibleToBool; > + > + explicit Ptr(WordMapPtr p) : mapPtr(p), isLinPtr(false) {} > + explicit Ptr(InlineElem *ie) : linPtr(ie), isLinPtr(true) {} Since you are leaving one word uninitialized, I think you should define operator== or make it private+undefined. @@ +150,5 @@ > + explicit Ptr(WordMapPtr p) : mapPtr(p), isLinPtr(false) {} > + explicit Ptr(InlineElem *ie) : linPtr(ie), isLinPtr(true) {} > + > + public: > + Ptr() : linPtr(NULL), isLinPtr(true) {} AddPtr's default constructor (below) leaves members uninitialized, so for consistency with it and HashTable::Ptr, I would do so here too. @@ +159,5 @@ > + return isLinPtr ? bool(linPtr) : mapPtr.found(); > + } > + > + operator ConvertibleToBool() const { > + return found() ? (ConvertibleToBool) 1 : (ConvertibleToBool) 0; return ConvertibleToBool(found()); @@ +164,5 @@ > + } > + > + K &key() { > + JS_ASSERT(found()); > + return *reinterpret_cast<K *>(isLinPtr ? &linPtr->key : &mapPtr->key); I think this casting here and in value() isn't necessary; linPtr and mapPtr's types both reflect K and V. @@ +182,5 @@ > + InlineElem *linPtr; > + InlineElem *linAddPtr; > + bool isLinPtr; > + > + AddPtr(InlineElem *ptr, bool found_) : isLinPtr(true) { Can you s/found_/found/? _ is for member vars. Also, you might consider having AddPtr's fields match the constructor args; seems simpler than having two InlineElem*. @@ +204,5 @@ > + return isLinPtr ? bool(linPtr) : mapAddPtr.found(); > + } > + > + operator ConvertibleToBool() const { > + return found() ? ConvertibleToBool(1) : ConvertibleToBool(0); return ConvertibleToBool(found()); @@ +211,5 @@ > + V &value() { > + JS_ASSERT(found()); > + if (isLinPtr) > + return *((V *) &linPtr->value); > + return *((V *) &mapAddPtr->value); Shouldn't need casts. @@ +237,5 @@ > + JS_ASSERT(isMap()); > + return map; > + } > + > + const InlineElem *asLin() const { Reminder: uses of "Lin" in LinMap should change with the name. @@ +257,5 @@ > + if (it->key == key) > + return Ptr(it); > + } > + > + return Ptr(); With Ptr() leaving undefined members, would need to change this to explicitly pass NULL. @@ +303,5 @@ > + return false; > + > + WordMapAddPtr p = map.lookupForAdd(key); > + JS_ASSERT(!p); > + return map.add(p, key, value); map.putNew(). Also, you might consider putting switchToPath+map.put in a separate JS_NEVER_INLINE path since this should be cold code compared to the inline/map paths through add(). @@ +319,5 @@ > + } > + return add(p, key, value); > + } > + > + void remove(Ptr &p) { I think you can just take a "Ptr p". Mutable arg refs are weird. @@ +325,5 @@ > + if (p.isLinPtr) { > + JS_ASSERT(linCount > 0); > + JS_ASSERT(p.linPtr->key != NULL); > + p.linPtr->key = NULL; > + --linCount; This seems like a sensible strategy for removal. Did happen to measure whether there were often cases where, if we shifted elements down to fill removed elements, we would stay linear? Or is that just a waste of everyone's time? @@ +344,5 @@ > + > + /* The first bits are tagged when iterating over a lin range. */ > + typedef AlignedStorage<sizeof(WordMapRange)> WordMapRangeBytes; > + union { > + uintptr_t firstBits; As previously discussed, inlining and struct decomposition should allow you to un-union-ify mapRange and (cur,end) and just have a simple bool tag. Probably good for perf since it takes computation (untagging) off the dependent path and puts the extra load on a path to a predictable branch. ::: js/src/parser/ParseMaps-inl.h @@ +137,5 @@ > +AtomThingMapPtr<Map>::finish(JSContext *cx) > +{ > + JS_ASSERT(map); > + cx->freeMapPool->release(map); > + clearMap(); I would have thought that 'finish()' would take care of the "if (!map)" case to make it simpler for the callers. Sure it wastes a branch in a few of the cases, but I'm guessing it won't be measurable. ::: js/src/parser/ParseMaps.h @@ +127,5 @@ > + > +/* > + * In an upvars list, defn->resolve() is the outermost definition the > + * name may reference. If a with block or a function that calls eval encloses > + * the use, the name may end up referring to something else at runtime. This comment seems to have lost its parents. @@ +130,5 @@ > + * name may reference. If a with block or a function that calls eval encloses > + * the use, the name may end up referring to something else at runtime. > + */ > + > +/* Note: POD-type capable of inclusion in (JSParseNode) union. */ I think LLVM is going to bitch about this being a non-POD b/c technically a POD doesn't have private members. Perhaps turn into a 'struct' and strip 'protected' and 'public' and rename 'map' to 'map_' to make offenders thing twice. @@ +261,5 @@ > + map->clear(); > + } > + > + /* Return the definition at the head of the chain for |atom|. */ > + inline JSDefinition *lookup(JSAtom *atom); For symmetry, could this be lookupFirst? (or lookupSingle or whatever makes sense) @@ +264,5 @@ > + /* Return the definition at the head of the chain for |atom|. */ > + inline JSDefinition *lookup(JSAtom *atom); > + > + /* Perform a lookup that can iterate over the definitions associated with |atom|. */ > + inline void lookupMulti(JSAtom *atom, MultiDeclLookup *lookup); Nice job using a type here to express this muli-map. Instead of a custom MultiDeclLookup type, perhaps you could define a member Range type that modeled the Range concept (empty, popFront, front) we have been using in SpiderMonkey?
Attachment #540149 -
Flags: review?(luke)
Assignee | ||
Comment 53•10 years ago
|
||
I actually realized that the wider jsatomid type was increasing the size of JSScript, so I pinned some things explicitly to uint32. When this patch gets r+ I'll make the hashtable patch to hoist the entry types and eliminate turboClear.
Attachment #540149 -
Attachment is obsolete: true
Attachment #540944 -
Flags: review?(luke)
Assignee | ||
Comment 54•10 years ago
|
||
Now realize I also left some unnecessary static assertion constraints in InlineMap that I can now move into the ParseMapPool (i.e. same sizedness of keys/values, POD-ness of the HashMapEntry).
Assignee | ||
Comment 55•10 years ago
|
||
Another before-commit fix: there are a few places where we instantiate a parser without a compiler, so active compilation count and ensuring the parseMapPool should be moved into the parser initialization as well.
![]() |
||
Comment 56•10 years ago
|
||
Comment on attachment 540944 [details] [diff] [review] Updated x2 per review comments. Review of attachment 540944 [details] [diff] [review]: ----------------------------------------------------------------- Mm mm, bring on the type-y goodness; r+ with these nits. Again, great work and thanks for putting up with all these comments :) ::: js/src/frontend/ParseMaps-inl.h @@ +99,5 @@ > +{ > + JS_ASSERT(map); > + AtomDOHPtr p = map->lookup(atom); > + if (!p) { > + return MultiDeclRange(); No {} ::: js/src/frontend/ParseMaps.cpp @@ +64,5 @@ > + > + all.~RecyclableMaps(); > + new (&all) RecyclableMaps(); > + recyclable.~RecyclableMaps(); > + new (&recyclable) RecyclableMaps(); This makes me think that a Vector::clearAndFree() is in order. I know before we've used Maybe<>... but clearAndFree() is really what you want to say in this sort of "purge" situation that has come up now a few times. @@ +114,5 @@ > + js_ReportOutOfMemory(cx); > + return NULL; > + } > + new (p) AtomDeclNode(defn); > + return p; return new (p) AtomDeclNode(defn); @@ +123,5 @@ > +{ > + AtomDOHAddPtr p = map->lookupForAdd(atom); > + AtomDeclNode *node = allocNode(defn); > + if (!node) > + return false; This is a strange ordering; perhaps move the lookup for add to after the "if (!node)" test? @@ +141,5 @@ > + return true; > +} > + > +AtomDeclNode * > +AtomDecls::lastAsNode(DefnOrHeader &doh) in/out param wants pass-by-pointer to make syntactically evident at the callsite. @@ +164,5 @@ > +{ > + AtomDOHAddPtr p = map->lookupForAdd(atom); > + AtomDeclNode *node = allocNode(defn); > + if (!node) > + return false; Again, strange ordering; maybe I'm missing something subtle (which deserves a comment, in that case :). ::: js/src/frontend/ParseMaps.h @@ +124,5 @@ > + recycle((void *) map); > + } > +}; /* ParseMapPool */ > + > +/* Note: POD-type capable of inclusion in (JSParseNode) union. */ I would turn this comment into a sentence and also include explicit guidance to avoid this one and use an OwnedAtomThingMapPtr instead. @@ +135,5 @@ > + > + bool ensureMap(JSContext *cx); > + void releaseMap(JSContext *cx); > + > + void copyMap(const AtomThingMapPtr &other) { map_ = other.map_; } It seems like default operator= would work fine instead of explicit copyMap(). @@ +174,5 @@ > + AtomThingMapPtrT::releaseMap(cx); > + } > +}; > + > +typedef AtomThingMapPtr<AtomIndexMap> AtomIndexMapPtr; I think this typedef should be move up a bit to go under AtomThingMapPtr. @@ +182,5 @@ > +/* Node structure for chaining in AtomDecls. */ > +struct AtomDeclNode > +{ > + JSDefinition *defn; > + AtomDeclNode *next; weird alignment @@ +190,5 @@ > + {} > +}; > + > +/* > + * Tagged union of a JSDefinition and a AtomDeclNode, for use in the s/a Atom/an Atom/, s/in the/in/ @@ +207,5 @@ > + u.bits = 0; > + } > + > + explicit DefnOrHeader(JSDefinition *defn) { > + u.defn = defn; How about following this with JS_ASSERT(!isHeader()) and following the other ctor overload with JS_ASSERT(isHeader()); @@ +284,5 @@ > + > + /* Add-or-update a known-unique definition for |atom|. */ > + inline bool addUnique(JSAtom *atom, JSDefinition *defn); > + bool addShadow(JSAtom *, JSDefinition *); > + bool addHoist(JSAtom *, JSDefinition *); House style (which I'm not a huge fan of) compels me to request names for these parameters. @@ +289,5 @@ > + > + /* Updating the definition for an entry that is known to exist is infallible. */ > + void update(JSAtom *atom, JSDefinition *defn) { > + JS_ASSERT(map); > + AtomDOHMap::AddPtr p = map->lookupForAdd(atom); I think here you just want 'lookup' and use a 'Ptr'. @@ +331,5 @@ > + * Lookup state tracker for those situations where the caller wants to traverse > + * multiple definitions associated with a single atom. This occurs due to block > + * scoping. > + */ > +class MultiDeclRange Viewing AtomDecls as a real container, could you s/MultiDeclRange/AtomDecl::Range/ ? They already friend each other and there aren't too many places to change... @@ +342,5 @@ > + explicit MultiDeclRange(JSDefinition *defn) : node(NULL), defn(defn) {} > + explicit MultiDeclRange(AtomDeclNode *node) : node(node), defn(node->defn) {} > + > + public: > + MultiDeclRange() : node(NULL), defn(NULL) {} The general pattern has been to either not have a default constructor or (if its needed for, say, diamond-control-flow-initialization) have the default ctor be a noop. I don't yet see a need for a default ctor here. @@ +344,5 @@ > + > + public: > + MultiDeclRange() : node(NULL), defn(NULL) {} > + > + void popFront() { JS_ASSERT(!empty()); @@ +359,5 @@ > + return defn; > + } > + > + bool empty() const { > + return !node && !defn; I think this can just be: JS_ASSERT_IF(!defn, !node); return !!defn; ::: js/src/jsatom.cpp @@ +653,5 @@ > > JS_STATIC_ASSERT(TEMP_SIZE_START >= sizeof(JSHashTable)); > > +static void > +js_InitAtomMap_Lin(JSContext *cx, JSAtomMap *map, AtomIndexMap *indices) JS_NEVER_INLINE was removed, but might as well re-inline since both are short and not reused. ::: js/src/jsemit.cpp @@ +2455,5 @@ > { > + if (!globalMap.ensureMap(context())) > + return false; > + AtomIndexAddPtr p = globalMap->lookupForAdd(atom); > + if (p) { if (AtomIndexAddPtr p = ...) { ... ::: js/src/mfbt/InlineMap.h @@ +70,5 @@ > + typedef typename WordMap::Range WordMapRange; > + > + InlineElem lin[InlineElems]; > + size_t linNext; > + size_t linCount; They are next to each other now, but, if we assume that most InlineMaps are (1) inline and (2) small, then I think we would have better locality by putting linNext/linCount in front of 'lin'. Also, 'lin' doesn't look great in identifiers; can you replace (L|l)in with (I|i)nline? @@ +98,5 @@ > + for (InlineElem *it = lin, *end = lin + linNext; it != end; ++it) { > + if (NULL == it->key) > + continue; > + if (!map.putNew(it->key, it->value)) > + return false; How about: if (it->key && !map.putNew(it->key, it->value)) return false; @@ +115,5 @@ > + > + return map.putNew(key, value); > + } > + > + static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1; this is now dead @@ +364,5 @@ > +#endif > + } > + > + bool isLinRange() const { > + return isLin; How about one JS_ASSERT_IF(isInline, checkInlineRangeInvariants()) here and remove all the other ones (since they all seem to be dominated by a call to isInlineRange).
Attachment #540944 -
Flags: review?(luke) → review+
Comment 57•10 years ago
|
||
Comment on attachment 540944 [details] [diff] [review] Updated x2 per review comments. Some random comments: >--- /dev/null >+++ b/js/src/frontend/ParseMaps-inl.h >+ * The Initial Developer of the Original Code is >+ * Mozilla Corporation. the Mozilla Foundation. Also elsewhere. >--- /dev/null >+++ b/js/src/frontend/ParseMaps.cpp >+using namespace js; Dunno what the expected style in js/ is, but I think namespace js { ... } is clearer. >--- a/js/src/jsemit.cpp >+++ b/js/src/jsemit.cpp >+ jsatomid _; >+ if (!cg->makeAtomIndex(atom, &_)) (A couple of times.) Does this want a one-argument-overload?
Assignee | ||
Comment 58•10 years ago
|
||
(In reply to comment #57) > Dunno what the expected style in js/ is I think that's a good reason to avoid drive-by nits. :-/
Assignee | ||
Comment 59•10 years ago
|
||
I'm gonna have me a turbodog!
Attachment #541519 -
Attachment is obsolete: true
Attachment #541793 -
Flags: review?(luke)
![]() |
||
Comment 60•10 years ago
|
||
Comment on attachment 541793 [details] [diff] [review] Anti-turbo-clear. r+ with *Entry types put back inline as discussed. Emerging SM namespace style is "using namespace js" at head of .cpp after headers.
Attachment #541793 -
Flags: review?(luke) → review+
Assignee | ||
Comment 61•10 years ago
|
||
Looked good on try: http://hg.mozilla.org/tracemonkey/rev/3d646df22a4b
Assignee | ||
Updated•10 years ago
|
Whiteboard: fixed-in-tracemonkey
Assignee | ||
Comment 62•10 years ago
|
||
Followup to quell rumblings in the bowels of the GCC 4.3.3 optimizer: http://hg.mozilla.org/tracemonkey/rev/7208baa27f17
Assignee | ||
Comment 63•10 years ago
|
||
cdleary-bot mozilla-central merge info: http://hg.mozilla.org/mozilla-central/rev/3d646df22a4b http://hg.mozilla.org/mozilla-central/rev/7208baa27f17
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 64•10 years ago
|
||
revision 3d646df22a4b caused a build bustage for Win64(MSVC 2010 SP1): ParseMaps.cpp void ParseMapPool::checkInvariants() { /* * Having all values be of the same size permits us to easily reuse the * allocated space for each of the map types. */ JS_STATIC_ASSERT(sizeof(JSDefinition *) == sizeof(jsatomid)); JS_STATIC_ASSERT(sizeof(JSDefinition *) == sizeof(DefnOrHeader)); JS_STATIC_ASSERT(sizeof(AtomDefnMap::Entry) == sizeof(AtomIndexMap::Entry)); JS_STATIC_ASSERT(sizeof(AtomDefnMap::Entry) == sizeof(AtomDOHMap::Entry)); JS_STATIC_ASSERT(sizeof(AtomMapT::Entry) == sizeof(AtomDOHMap::Entry)); /* Ensure that the HasTable::clear goes quickly via memset. */ JS_STATIC_ASSERT(tl::IsPodType<AtomIndexMap::WordMap::Entry>::result); <=== JS_STATIC_ASSERT(tl::IsPodType<AtomDOHMap::WordMap::Entry>::result); JS_STATIC_ASSERT(tl::IsPodType<AtomDefnMap::WordMap::Entry>::result); } p:/mozilla-build/src/js/src/frontend/ParseMaps.cpp(58) : error C2118: negative s ubscript Probably lines 59 and 60 should be affected as well.
Assignee | ||
Comment 65•10 years ago
|
||
(In reply to comment #64) Thanks for the heads up! I actually fixed that on tracemonkey branch with http://hg.mozilla.org/tracemonkey/rev/cc588cd332a9 , but I should cherry pick that change to mozilla-central as well.
Assignee | ||
Comment 66•10 years ago
|
||
Cherry picked: http://hg.mozilla.org/mozilla-central/rev/b9f09ba568b4
Comment 67•10 years ago
|
||
Ou , Thank you :) Great work, btw ;)
You need to log in
before you can comment on or make changes to this bug.
Description
•