Last Comment Bug 649576 - Extricate JSHashTable from JSAtomList death grip
: Extricate JSHashTable from JSAtomList death grip
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- enhancement (vote)
: ---
Assigned To: Chris Leary [:cdleary] (not checking bugmail)
:
: Jason Orendorff [:jorendorff]
Mentors:
: 515441 (view as bug list)
Depends on: 667504 687951 755099
Blocks: 647367 667825
  Show dependency treegraph
 
Reported: 2011-04-12 23:30 PDT by Chris Leary [:cdleary] (not checking bugmail)
Modified: 2012-05-14 16:42 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP: JSAtomList sans JSHashTable. (62.83 KB, patch)
2011-04-12 23:30 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
cg_diff results (5.73 KB, text/plain)
2011-04-14 17:19 PDT, Nicholas Nethercote [:njn]
no flags Details
WIP: js::HashTable good, JSAtomList bad. (147.61 KB, patch)
2011-05-26 16:44 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: more performant. (169.81 KB, patch)
2011-05-27 13:53 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap. (181.30 KB, patch)
2011-05-30 13:07 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap. (184.31 KB, patch)
2011-05-30 14:54 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap. (186.15 KB, patch)
2011-05-30 15:28 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap. (201.30 KB, patch)
2011-05-30 21:45 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
Cachegrind diff. (5.60 KB, text/plain)
2011-05-31 12:41 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details
Cachegrind diff. (7.32 KB, text/plain)
2011-05-31 14:10 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details
WIP: using LinMap. (187.88 KB, patch)
2011-05-31 22:43 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap and chunk alloc. (188.52 KB, patch)
2011-05-31 23:40 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap and chunk alloc. (215.81 KB, patch)
2011-06-01 13:07 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap and chunk alloc against Luke's measure branch tip. (215.91 KB, patch)
2011-06-01 13:59 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap and chunk alloc, tested on i686. (186.23 KB, patch)
2011-06-02 02:46 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
WIP: using LinMap and GC-based recycling (184.17 KB, patch)
2011-06-02 23:26 PDT, Chris Leary [:cdleary] (not checking bugmail)
luke: feedback+
Details | Diff | Splinter Review
WIP: using LinMap and GC-based recycling (191.03 KB, patch)
2011-06-10 09:41 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
Extricate with zeal! (181.81 KB, patch)
2011-06-10 15:26 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
Updated per review comments. (175.29 KB, patch)
2011-06-17 15:26 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
Updated x2 per review comments. (184.36 KB, patch)
2011-06-21 18:32 PDT, Chris Leary [:cdleary] (not checking bugmail)
luke: review+
Details | Diff | Splinter Review
WIP: remove turboClear. (11.16 KB, patch)
2011-06-23 15:12 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Splinter Review
Anti-turbo-clear. (12.68 KB, patch)
2011-06-24 14:18 PDT, Chris Leary [:cdleary] (not checking bugmail)
luke: review+
Details | Diff | Splinter Review

Description Chris Leary [:cdleary] (not checking bugmail) 2011-04-12 23:30:56 PDT
Created attachment 525633 [details] [diff] [review]
WIP: JSAtomList sans JSHashTable.

Per bug 647367 comment 2.

On my mac mini the patch shows no slowdown on parsemark, but I'm increasingly convinced parsemark is useless. I plan to try a njn-style valgrind icount analysis, since that seemed to give much better measurements in his bugs.

Potential improvements on WIP patch:

- Get rid of ALE_NEXT hacks, migrate those uses to sensible AtomList API(s).
- Rip out parse node recycling, clear decls across top level statements, and free the arena to the mark before the current top-level statement.
- Recycle hashtables across top level statements. If the tables get larger than a certain threshold of space wastefulness, I think we should free them immediately.
- Since the common case is a single piece of atom data (instead of a chain), make the hashtable value a tagged union of (single-data | arena-allocd-linked-list).

  This gets a little tricky because we declare |JSAtomListElement *|s in a number of places, so we would have to change all those sites to a value-type smart pointer that performs a re-lookup on a hashtable generation change. The benefit of this scheme is that we get exactly the js::HashTable performance in the common case.
Comment 1 Luke Wagner [:luke] 2011-04-12 23:33:14 PDT
woot
Comment 2 Chris Leary [:cdleary] (not checking bugmail) 2011-04-12 23:39:57 PDT
Forgot to mention: the hash tables are context-alloc'd, so that reallocs can happen naturally without a weird exponential amount of useless hashtable space in the arena. The recycling would let us reuse the allocated tables for JSAtomLists in non-root JSTreeContexts. Since we don't really know (or care) when parse nodes die and forfeit their hash table memory, this is a "natural" recycling at the top level statement boundaries where we also know that we can safely pop the arena space.
Comment 3 Nicholas Nethercote [:njn] 2011-04-14 17:19:27 PDT
Created attachment 526162 [details]
cg_diff results

Parsemark the suite of tests is great.  parsemark.py the script is less great.

I did a Cachegrind run, results aren't good (ignore the "compiled" columns, they're irrelevant here):

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    12.041    12.322 (0.977x) |    158529    148488 (1.068x) | 280slides-comb
|    45.938    47.021 (0.977x) |    802934    758036 (1.059x) | ball-pool-comb
|    23.495    24.456 (0.961x) |    379459    355498 (1.067x) | dojo
|    43.943    45.193 (0.972x) |    733092    697472 (1.051x) | effectgames-ga
|    11.332    11.770 (0.963x) |    170125    157844 (1.078x) | ga
|   239.103   249.239 (0.959x) |     4.834     4.472 (1.081x) | gmail-combined
|    70.134    71.941 (0.975x) |     1.262     1.186 (1.064x) | gravity-combin
|    17.037    17.668 (0.964x) |    263714    250121 (1.054x) | jquery-min
|    29.269    30.023 (0.975x) |    499957    471808 (1.060x) | jsgb-combined
|    43.855    45.375 (0.967x) |    759237    725382 (1.047x) | mochikit
|   197.685   207.897 (0.951x) |     3.652     3.479 (1.050x) | pipio-combined
|    10.148    10.403 (0.976x) |    122256    115829 (1.055x) | processing-twi
|    35.129    35.588 (0.987x) |    319697    301276 (1.061x) | sunspider-comb
|    14.421    14.965 (0.964x) |    224760    208180 (1.080x) | tetris-combine
|    52.882    55.446 (0.954x) |    926760    875973 (1.058x) | twitter-combin
|    54.672    56.361 (0.970x) |    915359    871601 (1.050x) | v8-combined
|     7.967     8.181 (0.974x) |    100088     94473 (1.059x) | yui-min
|   318.889   328.790 (0.970x) |     5.770     5.494 (1.050x) | zimbra-combine
-------
|  1227.949  1272.649 (0.965x) |    21.895    20.663 (1.060x) | all

I've attached the cg_diff output, which gives function-level changes.  Note that inlining causes some funny results, eg:

-5,628,012  /home/njn/moz/wsN/js/src/optg32/../jsatom.cpp:JSAtomList::rawLookup(JSAtom*, JSHashEntry**&)
 4,658,748  /home/njn/moz/wsN/js/src/optg32/../jshashtable.h:JSAtomList::rawLookup(JSAtom*, JSAtomRemoval*)

Here the counts for rawLookup are split across two files because rawLookup contains one or more calls to functions in jshashtable.h that have been inlined.

Looks like the main problem is that you're calling malloc and free a lot more.

Getting rid of parse node recycling would be cool if it doesn't hurt performance.  Last I checked only about 20% of nodes are recycled on typical code.
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2011-04-15 13:59:26 PDT
(In reply to comment #3)

Thanks Nick! That's awesome data. I can reduce the allocation overhead in two ways: recycling the tables across top-level statements and arena allocing fewer nodes. I'll compare the arena stats before and after the patch to figure out which is more likely going to fix the allocation overhead.
Comment 5 Jim Blandy :jimb 2011-04-18 17:19:12 PDT
I have a terrible idea about how to avoid generation checks on hash table entry references (on the grounds that branches are expensive):

Have a datatype representing a reference to a table. Implementation: a pointer to a reference. However, these references add themselves to (on construction) and remove themselves from (on destruction) a list owned by the hash table. If we need to resize, we walk the list and fix up pointers immediately. The list would almost always be short, since the entry pointers don't usually live long.

Probably overengineering; just occurred to me.
Comment 6 Jim Blandy :jimb 2011-04-18 17:21:06 PDT
You know, it might be interesting to have that "entry reference" type just keep a ref count in the table when compiled with DEBUG, and assert if we ever add when the refcount is non-zero. It may be that we never hold references across an add, or could readily change the code to make that so.
Comment 7 Jim Blandy :jimb 2011-04-18 17:21:58 PDT
(In reply to comment #0)
> - Rip out parse node recycling, clear decls across top level statements, and
> free the arena to the mark before the current top-level statement.
> - Recycle hashtables across top level statements. If the tables get larger than
> a certain threshold of space wastefulness, I think we should free them
> immediately.
> - Since the common case is a single piece of atom data (instead of a chain),
> make the hashtable value a tagged union of (single-data |
> arena-allocd-linked-list).

Shouldn't these each be separate follow-on patches? And perhaps separate bugs? I wouldn't want to review the whole shebang at once.
Comment 8 Chris Leary [:cdleary] (not checking bugmail) 2011-04-18 21:16:59 PDT
(In reply to comment #5, comment #6, and comment #7)

Ideally, we could make it obvious that there are no pointers into table space kept across potential adds and bolster that with assertions to avoid future wrong-doing. There's no overhead associated with that. In a recursive descent parser, though, even with the assertions it's a bit tricky to ensure that it will be used correctly going forward -- calls into sub-productions seem innocent enough but can have unapparent call sequences that lead to adds. I don't think you want to rely on fuzzing catching newly introduced danging pointer vulnerabilities, but I may be a bit too far on the fearful-and-defensive side here.

We can also measure the cost of doing nothing versus Jim's solution in comment 5 -- consing up smart-pointer-like structures on the stack seems like it shouldn't add much overhead and can guarantee us safety in the unlikely case.

I usually do each step that creates a working, test-passing build as its own patch, with multiple of these in a single bug to create useful units of work. i.e. if we can't make this beneficial this without ripping out recycling, then that should be a part of this bug's patch queue. If not, I totally agree, we should make it a follow-up bug!
Comment 9 Jim Blandy :jimb 2011-04-19 09:27:59 PDT
(In reply to comment #3)
> Getting rid of parse node recycling would be cool if it doesn't hurt
> performance.  Last I checked only about 20% of nodes are recycled on typical
> code.

The reason I'd like to get rid of it is that the node recycling, combined with the decls retained between iterations of the big loop in compileScript, create a pretty complicated set of invariants that I'm afraid people are going to trip over. When I went in there last fall to clean up an apparently trivial quadratic time issue, it took me a lot of effort to figure out why things were working; it was much more subtle than I'd expected. See the comments for cleanFunctionList, those in PushNodeChildren, and PrepareNodeForMutation.

The comment above RecycleTree beginning "You must not..." is the kind of thing I hope never to have to write.
Comment 10 Chris Leary [:cdleary] (not checking bugmail) 2011-05-26 16:44:20 PDT
Created attachment 535505 [details] [diff] [review]
WIP: js::HashTable good, JSAtomList bad.
Comment 11 Chris Leary [:cdleary] (not checking bugmail) 2011-05-27 13:53:01 PDT
Created attachment 535742 [details] [diff] [review]
WIP: more performant.

This patch is much closer to performance parity.

MochiKit.js
    100.97% (slower) +696,246 icount
processing_twitch.js
    101.59% (slower) +229,873 icount
jquery.min.js
    101.77% (slower) +478,128 icount
dojo.js
    101.78% (slower) +670,803 icount
jsgb_combined.js
    101.32% (slower) +684,673 icount
280slides_combined.js
    100.48% (slower) +83,094 icount
gravity_combined.js
    100.20% (slower) +222,919 icount
twitter_combined.js
    101.30% (slower) +1,231,109 icount
v8_combined.js
    101.15% (slower) +1,068,429 icount
zimbra_combined.js
    100.65% (slower) +3,505,647 icount
tetris_combined.js
    101.05% (slower) +238,876 icount
effectgames_galaxy.js
    100.62% (slower) +430,116 icount
pipio_combined.js
    101.52% (slower) +5,478,139 icount
gmail_combined.js
    100.26% (slower) +1,078,910 icount
sunspider_combined.js
    100.35% (slower) +196,844 icount
yui-min.js
    101.19% (slower) +125,771 icount
ball_pool_combined.js
    100.55% (slower) +404,643 icount
ga.js
    101.48% (slower) +258,201 icount
Average percent: 101.01%

Here's the valgrind-based breakdown: The good news is that parsing gmail is ~.6Mcycles faster, reflecting the increased performance of js::HashTable. However, we end up using ~2.6Mcycles on memset because of map recycling. 

We allocate ~1500 maps for the larger programs, and we see about a 1Mcycle bump in malloc time, 80% of which can be attributed to the the map allocation path. This comes out to 533 cycles per map, which I think is probably a little bit unrealistic. Not sure of the details of valgrind's malloc implementation and accounting -- Nick, any perspective there?

I think I may still have a few performance ideas up my sleeve, so I'll spin on this for another day.
Comment 12 Nicholas Nethercote [:njn] 2011-05-27 19:32:54 PDT
(In reply to comment #11)
> 
> We allocate ~1500 maps for the larger programs, and we see about a 1Mcycle
> bump in malloc time, 80% of which can be attributed to the the map
> allocation path. This comes out to 533 cycles per map, which I think is
> probably a little bit unrealistic. Not sure of the details of valgrind's
> malloc implementation and accounting -- Nick, any perspective there?

If you're running Cachegrind (or Callgrind), Valgrind doesn't replace malloc with its own version.
 
> I think I may still have a few performance ideas up my sleeve, so I'll spin
> on this for another day.

I'll do a measurement run on Monday with my scripts.
Comment 13 Chris Leary [:cdleary] (not checking bugmail) 2011-05-27 23:32:58 PDT
(In reply to comment #12)
> I'll do a measurement run on Monday with my scripts.

Do your scripts do a more complex cycle estimation that just icount? If so, and it doesn't require valgrind-internals knowledge, I'd be interested in a blog entry or quick how-to about you set it up!

I'm actually trying a different tact that I hope will avoid most of the heavy map-clear overhead. I'm kind of obsessed with making this patch viable, so I should have something soon.
Comment 14 Nicholas Nethercote [:njn] 2011-05-28 00:13:21 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > I'll do a measurement run on Monday with my scripts.
> 
> Do your scripts do a more complex cycle estimation that just icount?

No, but they can give you per-line icount (and cache/branch stats) and also give you function-level diffs of those stats.

> If so,
> and it doesn't require valgrind-internals knowledge, I'd be interested in a
> blog entry or quick how-to about you set it up!

See bug 555106.  I'm not sure how many bells and whistles I've added to my versions of the scripts since that bug was last modified.
Comment 15 Chris Leary [:cdleary] (not checking bugmail) 2011-05-28 03:07:06 PDT
(In reply to comment #14)
> No, but they can give you per-line icount (and cache/branch stats) and also
> give you function-level diffs of those stats.

Ah okay, thanks for the background! I just have a thing that does simple icount differentials for parsemark, since I can't get good execution-time variance out of that suite running it directly. http://hg.mozilla.org/users/cleary_mozilla.com/spidermonkey_helpers/file/tip/vg_diff.py

Despite the coarseness of the measurement, I intend to make it go down! Or, failing at that, at least demonstrate that the real-world performance is improved using luke's instrumentation branch. I'd like to win on all fronts, if possible, though. :-)
Comment 16 Nicholas Nethercote [:njn] 2011-05-29 19:46:51 PDT
I tried measuring this.  First, it wouldn't build with optimization enabled on my Linux box.  I had to move the JSParseMapPool destructor into jsatominlines.h.

Second, it crashed here:

==3959== Invalid read of size 4
==3959==    at 0x80AD3FB: JSParseMapPool::allocate() (jshashtable.h:89)
==3959==    by 0x811E3D2: js::Parser::parse(JSObject*) (jsatom.h:1042)
==3959==    by 0x804B19E: Parse(JSContext*, unsigned int, unsigned long long*) (js.cpp:4486)
==3959==    by 0x8295D19: js::Interpret(JSContext*, js::StackFrame*, unsigned int, js::InterpMode) (jscntxtinlines.h:277)
==3959==    by 0x80C7390: js::RunScript(JSContext*, JSScript*, js::StackFrame*) (jsinterp.cpp:613)
==3959==    by 0x80C765C: js::Execute(JSContext*, JSObject&, JSScript*, js::StackFrame*, unsigned int, js::Value*) (jsinterp.cpp:974)
==3959==    by 0x805AE90: JS_ExecuteScript (jsapi.cpp:4935)
==3959==    by 0x804E22E: Process(JSContext*, JSObject*, char*, int) (js.cpp:452)
==3959==    by 0x804ED81: Shell(JSContext*, int, char**, char**) (js.cpp:975)
==3959==    by 0x804EFD2: main (js.cpp:6051)
==3959==  Address 0x78 is not stack'd, malloc'd or (recently) free'd

I'll wait for a new patch.
Comment 17 Chris Leary [:cdleary] (not checking bugmail) 2011-05-30 13:07:00 PDT
Created attachment 536154 [details] [diff] [review]
WIP: using LinMap.

Okay, this one seems a lot more promising -- linear map that turns itself into a hash table after an insertion count threshold.

I haven't really performance tuned it yet and the Valgrind icount increase is generally under 0.3%. When we take data locality into account I think this will be a clear winner, but I'll look at some stats to see if there are any obvious tweaks I can make to match the usage behavior.

280slides_combined.js
      0.33% (slower) icount      +57,153 / 17,157,259  
MochiKit.js
      0.23% (slower) icount     +162,445 / 71,778,386  
ball_pool_combined.js
      0.40% (slower) icount     +296,220 / 73,141,408  
dojo.js
      0.53% (slower) icount     +199,193 / 37,588,057  
effectgames_galaxy.js
      0.31% (slower) icount     +210,861 / 69,104,363  
ga.js
      0.49% (slower) icount      +85,492 / 17,500,568  
gmail_combined.js
      0.10% (slower) icount     +403,481 / 422,632,513 
gravity_combined.js
      0.29% (slower) icount     +328,324 / 112,934,758 
jquery.min.js
      0.64% (slower) icount     +173,079 / 26,944,963  
jsgb_combined.js
      0.48% (slower) icount     +247,726 / 51,749,415  
pipio_combined.js
      0.21% (slower) icount     +741,187 / 360,758,754 
processing_twitch.js
      0.39% (slower) icount      +57,071 / 14,468,190  
sunspider_combined.js
      0.15% (slower) icount      +81,945 / 55,920,531  
tetris_combined.js
      0.36% (slower) icount      +81,357 / 22,761,274  
twitter_combined.js
      0.26% (slower) icount     +247,818 / 94,875,359  
v8_combined.js
      0.36% (slower) icount     +332,863 / 92,914,476  
yui-min.js
      0.16% (slower) icount      +16,795 / 10,574,145  
zimbra_combined.js
      0.29% (slower) icount   +1,547,724 / 537,160,070 
Average percent:  0.33%

Nick, this is diffed against qparent and definitely builds on my machine, let me know if you have any other issues.
Comment 18 Chris Leary [:cdleary] (not checking bugmail) 2011-05-30 13:25:19 PDT
A few additional things I forgot to note: 1) I'm working through some reftest failures. 2) This approach uses tagged unions for everything to match the existing map interface and to avoid create dual-interfaces (i.e. LinMap::asLin() and Linmap::asMap() based on LinMap::isLin()) -- that would make it a bit harder to use, as after every mutating operation you would have to re-query what interface you should be using.

I'm not totally sure how good modern compilers are at tracking bit-level constness (i.e. inlining and culling out obvious tagged pointer relationships due to bit-const propagation), and a tagged-union-heavy approach like this may not be as great on compilers that don't have that capability. I think I remember a presentation that said it was implemented in GCC 4.5, but I would have to check, look it up for MSVC, and test perf on OS X which is stuck on the old version.
Comment 19 Chris Leary [:cdleary] (not checking bugmail) 2011-05-30 14:54:18 PDT
Created attachment 536179 [details] [diff] [review]
WIP: using LinMap.

Fixed reftests, but I break a bunch of strict aliasing constraints. Tough to type-pun stuff with constructors (like the Map::Ptr types).
Comment 20 Chris Leary [:cdleary] (not checking bugmail) 2011-05-30 15:28:32 PDT
Created attachment 536185 [details] [diff] [review]
WIP: using LinMap.

Quarter million cycles faster on gmail_combined.js according to callgrind.
Comment 21 Chris Leary [:cdleary] (not checking bugmail) 2011-05-30 21:45:45 PDT
Created attachment 536228 [details] [diff] [review]
WIP: using LinMap.

Comparing 32b opt --disable-methodjit --disable-tracejit, running prog: "compile(snarf('$filename'))" comparison off revision 4e334541e521 I get the following results:

280slides_combined.js
     -0.78% (faster) icount     -128,673 / 16,417,487  
MochiKit.js
     -1.04% (faster) icount     -689,039 / 66,163,309  
ball_pool_combined.js
     -0.82% (faster) icount     -561,317 / 68,663,748  
dojo.js
     -1.00% (faster) icount     -339,026 / 33,804,941  
effectgames_galaxy.js
     -0.96% (faster) icount     -642,047 / 66,643,380  
ga.js
     -1.09% (faster) icount     -166,734 / 15,338,031  
gmail_combined.js
     -1.87% (faster) icount   -6,994,013 / 374,530,208 
gravity_combined.js
     -1.22% (faster) icount   -1,298,662 / 106,731,963 
jquery.min.js
     -0.83% (faster) icount     -202,778 / 24,287,716  
jsgb_combined.js
     -0.65% (faster) icount     -293,013 / 45,003,228  
pipio_combined.js
     -1.28% (faster) icount   -4,023,711 / 314,805,480 
processing_twitch.js
     -0.36% (faster) icount      -47,304 / 13,202,756  
sunspider_combined.js
     -0.30% (faster) icount     -172,029 / 57,664,180  
tetris_combined.js
     -1.20% (faster) icount     -243,529 / 20,280,597  
twitter_combined.js
     -1.30% (faster) icount   -1,083,595 / 83,141,004  
v8_combined.js
     -0.64% (faster) icount     -543,438 / 84,895,403  
yui-min.js
     -0.88% (faster) icount      -84,607 / 9,614,643   
zimbra_combined.js
     -0.76% (faster) icount   -3,852,091 / 509,705,928 

memcheck comes up clean, except for a free/delete mismatch in regexp stuff (unrelated). There are a few more things to try (e.g. bifurcate the LinMap interface and see how much faster we get without tagged unions) and I'm collecting stats on the usage patterns to see if there's more perf we can get by using different recycling bins.
Comment 22 Julian Seward [:jseward] 2011-05-31 05:30:45 PDT
Good stuff.  Maybe also worth a quick check of D1 misses (D1mr, D1mw)
and mispredicts (Bcm, Bim) ?  Since you can get them at zero hassle from
cachegrind, and they're ungood if they happen.
Comment 23 Jim Blandy :jimb 2011-05-31 07:23:00 PDT
Yeah, in this day and age, with a good benchmark dataset, I'd think the cache miss figures would be more significant than the instruction counts. Isn't that so?
Comment 24 Chris Leary [:cdleary] (not checking bugmail) 2011-05-31 12:41:20 PDT
Created attachment 536378 [details]
Cachegrind diff.

Looks like I1 and D1 misses go up, which is surprising, since the JSAtomList is generally using data strewn around parts of the arena allocated space.
Comment 25 Chris Leary [:cdleary] (not checking bugmail) 2011-05-31 14:10:02 PDT
Created attachment 536408 [details]
Cachegrind diff.

Branch mispredict rate is down, though, according to branch-sim.
Comment 26 Nicholas Nethercote [:njn] 2011-05-31 19:16:00 PDT
(In reply to comment #23)
> Yeah, in this day and age, with a good benchmark dataset, I'd think the
> cache miss figures would be more significant than the instruction counts.
> Isn't that so?

Not really.  You can always find exceptions, but Icount is usually tracks speed pretty closely, IME.
Comment 27 Chris Leary [:cdleary] (not checking bugmail) 2011-05-31 22:43:18 PDT
Created attachment 536529 [details] [diff] [review]
WIP: using LinMap.

Lazily initializes the kinds of maps that are mostly dead (i.e. are taken from the pool and returned without any operations being performed on them).

1.5% icount decrease for gmail_combined.js (32b no methodjit no tracejit).
Comment 28 Chris Leary [:cdleary] (not checking bugmail) 2011-05-31 23:40:56 PDT
Created attachment 536533 [details] [diff] [review]
WIP: using LinMap and chunk alloc.

Tried adding chunk allocation and it brings the gmail_combined.js to 1.7% icount decrease. I think there's a lot of smaller wins like this that we can get with the new approach, but I have to get back to GC work, so I'm going to just try the dual interface approach before cleaning this patch up for review.

For posterity, additional optimization ideas are:
- bucketized recycling: how many times does a map get reused? can we anticipate the context in which we'll want to recycle a map instead of something in the linear state?
- noshrink on maps: removal may be costing us some unnecessary table downsizes
- tweaking: chunk allocation size growth, as given in this patch, and number of inline elements (potentially to be combined with bucketizing)
Comment 29 Chris Leary [:cdleary] (not checking bugmail) 2011-06-01 13:07:14 PDT
Created attachment 536711 [details] [diff] [review]
WIP: using LinMap and chunk alloc.

Forgot to update compileFunctionBody similarly to compileScript, which prevented browser runs from working.
Comment 30 Chris Leary [:cdleary] (not checking bugmail) 2011-06-01 13:59:27 PDT
Created attachment 536732 [details] [diff] [review]
WIP: using LinMap and chunk alloc against Luke's measure branch tip.
Comment 31 Chris Leary [:cdleary] (not checking bugmail) 2011-06-02 02:46:50 PDT
Created attachment 536845 [details] [diff] [review]
WIP: using LinMap and chunk alloc, tested on i686.

Fixes errors related to strict aliasing violations and eliminates some unnecessary tagged union overhead. I made a thinko WRT the strict aliasing correctness in the previous patches. This one drops the icount for gmail_combined.js by 2.22% (8.7Minst).

f?=luke for measure branch! qparent is b36861bd7a01
Comment 32 Chris Leary [:cdleary] (not checking bugmail) 2011-06-02 23:26:42 PDT
Created attachment 537092 [details] [diff] [review]
WIP: using LinMap and GC-based recycling

Doesn't seem to make much difference when I try it on luke's measure branch, though there are only 13,000 total map allocations this way loading tech crunch.

After luke gives his feedback on measure perf I'm gonna button this up. We know it's at least close to parity and have good reason to believe it will be faster. Let's get this sucker in the tree and iterate on it.
Comment 33 Luke Wagner [:luke] 2011-06-03 14:05:45 PDT
Comment on attachment 537092 [details] [diff] [review]
WIP: using LinMap and GC-based recycling

I tried the most recent patch out and was able to get low enough variance to measure some distinct changes.  (I disabled flash for starters :)

With the previous patch (comment 31), there was a distinct 4 ms increase in compilation time on startup over trunk (from 18ms to 22ms) (this after the fastload cache was in effect; without the fastload cache, parse/emit is ~120ms).  This patch puts timing exactly back to trunk 18ms.

(Now, comparing to trunk) For techcrunch (which, btw, spends 1.3 SECONDS in parse/emit!), I see a fairly distinct ~20ms improvement.  For Gmail, there is a fairly distinct ~5ms improvement, and for Zimbra, a ~5-10ms improvement.

Nice work!
Comment 34 Chris Leary [:cdleary] (not checking bugmail) 2011-06-03 14:07:42 PDT
(In reply to comment #33)

Joy! Will clean up and push to try. Review incoming.
Comment 35 Brendan Eich [:brendan] 2011-06-04 10:37:42 PDT
Nice, good to see the old C-era fusion of the chained hash table and atom list yield to sweet C++, with wins and science and stuff.

Is that TechCrunch compiler benchmark filed yet? That is one to profile for sure.

/be
Comment 36 Nicholas Nethercote [:njn] 2011-06-04 14:23:24 PDT
The front page of techcrunch.com is horrific.  It creates over 70 compartments and has an absurd number of iframes.
Comment 37 Nicholas Nethercote [:njn] 2011-06-05 18:31:39 PDT
I want to measure this, but the patch doesn't apply cleanly to TM 69771:7e6f3b179644.  cdleary, can you update or give me instructions on how to apply?  Thanks.
Comment 38 Chris Leary [:cdleary] (not checking bugmail) 2011-06-10 09:41:53 PDT
Created attachment 538541 [details] [diff] [review]
WIP: using LinMap and GC-based recycling

This is the patch I just pushed to try. f?= njn, since he said he wanted to check it out. Should be up for review in a few hours, just gotta do some tidying.
Comment 39 Nicholas Nethercote [:njn] 2011-06-10 09:47:13 PDT
Thanks!  I won't be able to measure these until Tuesday morning Oz time;  it's a public holiday here on Monday.  Since you and Luke both have measured good improvements I'm happy if you want to land the patch before then.  I'll post the results I get, hopefully they'll be positive.
Comment 40 Chris Leary [:cdleary] (not checking bugmail) 2011-06-10 15:26:53 PDT
Created attachment 538617 [details] [diff] [review]
Extricate with zeal!
Comment 41 Nicholas Nethercote [:njn] 2011-06-13 20:47:13 PDT
I get this:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    12.874    12.851 (1.002x) |    156034    140882 (1.108x) | 280slides-comb
|    48.727    48.782 (0.999x) |    795333    728796 (1.091x) | ball-pool-comb
|    24.401    24.298 (1.004x) |    377107    337640 (1.117x) | dojo
|    49.296    49.283 (------) |    714948    660491 (1.082x) | effectgames-ga
|    10.802    10.758 (1.004x) |    167568    147764 (1.134x) | ga
|   246.473   245.548 (1.004x) |     4.815     4.256 (1.131x) | gmail-combined
|    74.173    74.046 (1.002x) |     1.246     1.133 (1.100x) | gravity-combin
|    17.728    17.666 (1.004x) |    258524    235208 (1.099x) | jquery-min
|    31.803    31.767 (1.001x) |    496640    455069 (1.091x) | jsgb-combined
|    47.342    47.290 (1.001x) |    751059    693689 (1.083x) | mochikit
|   215.987   215.662 (1.002x) |     3.605     3.278 (1.100x) | pipio-combined
|    10.651    10.636 (1.001x) |    118534    108208 (1.095x) | processing-twi
|    50.606    50.596 (------) |    316631    290078 (1.092x) | sunspider-comb
|    14.379    14.296 (1.006x) |    221512    195119 (1.135x) | tetris-combine
|    57.609    57.432 (1.003x) |    912124    822615 (1.109x) | twitter-combin
|    63.254    63.319 (0.999x) |    908620    838573 (1.084x) | v8-combined
|     7.558     7.520 (1.005x) |     98370     88836 (1.107x) | yui-min
|   356.488   356.868 (0.999x) |     5.682     5.245 (1.083x) | zimbra-combine
-------
|  1340.161  1338.625 (1.001x) |    21.643    19.657 (1.101x) | all

Good enough.


BTW, can you put a comment at the top of LinMap.h explaining what a LinMap is?  Looking through the code I think it's like a HashMap, except an array is used if it's small enough.  But a one-sentence comment would have saved me having to work that out.
Comment 42 Brendan Eich [:brendan] 2011-06-14 00:19:59 PDT
Don't let me be the boring-name guy, but LinMap isn't all that exciting. Could we have a name that spells out what this thing is better, without being unbearably long?

/be
Comment 43 Jim Blandy :jimb 2011-06-14 07:00:10 PDT
(In reply to comment #41)
> BTW, can you put a comment at the top of LinMap.h explaining what a LinMap
> is?  Looking through the code I think it's like a HashMap, except an array
> is used if it's small enough.  But a one-sentence comment would have saved
> me having to work that out.

ChintzyMap!!!
Comment 44 Chris Leary [:cdleary] (not checking bugmail) 2011-06-14 08:40:56 PDT
(In reply to comment #41)
> I get this:

What program are you running under valgrind? With |compile(snarf($filename))| I got a few percent improvement on each.
Comment 45 Nicholas Nethercote [:njn] 2011-06-14 16:38:14 PDT
(In reply to comment #44)
> 
> What program are you running under valgrind? With
> |compile(snarf($filename))| I got a few percent improvement on each.

I've been doing |parse(snarf($filename))|.  The results in comment 41 are for a 32-bit Linux build.  For 64-bit I get this:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    11.645    11.668 (0.998x) |         0         0 (------) | 280slides-comb
|    44.626    44.816 (0.996x) |         0         0 (------) | ball-pool-comb
|    22.128    22.219 (0.996x) |         0         0 (------) | dojo
|    45.072    45.201 (0.997x) |         0         0 (------) | effectgames-ga
|     9.747     9.792 (0.995x) |         0         0 (------) | ga
|   225.913   226.145 (0.999x) |         0         0 (------) | gmail-combined
|    67.782    67.964 (0.997x) |         0         0 (------) | gravity-combin
|    16.071    16.146 (0.995x) |         0         0 (------) | jquery-min
|    28.889    28.978 (0.997x) |         0         0 (------) | jsgb-combined
|    43.263    43.447 (0.996x) |         0         0 (------) | mochikit
|   197.389   198.058 (0.997x) |         0         0 (------) | pipio-combined
|     9.615     9.647 (0.997x) |         0         0 (------) | processing-twi
|    45.759    45.815 (0.999x) |         0         0 (------) | sunspider-comb
|    13.003    13.041 (0.997x) |         0         0 (------) | tetris-combine
|    52.403    52.686 (0.995x) |         0         0 (------) | twitter-combin
|    57.621    57.836 (0.996x) |         0         0 (------) | v8-combined
|     6.809     6.822 (0.998x) |         0         0 (------) | yui-min
|   324.926   326.953 (0.994x) |         0         0 (------) | zimbra-combine
-------
|  1222.671  1227.243 (0.996x) |         0         0 (------) | all


So a net increase in instructions.

Here are the top 10 entries in the function-level diff:

--------------------------------------------------------------------------------
        Ir  file:function
--------------------------------------------------------------------------------
-4,079,961  /home/njn/moz/tmN/js/src/jsatom.cpp:JSAtomList::rawLookup(JSAtom*, JSHashEntry**&)
 3,555,503  /home/njn/moz/tmN/js/src/o64/../mfbt/LinMap.h:js::Parser::primaryExpr(js::TokenKind, int)
 2,130,408  /home/njn/moz/tmN/js/src/jsparse.cpp:_ZL11RecycleTreeP11JSParseNodeP13JSTreeContext.clone.207
-1,814,702  /home/njn/moz/tmN/js/src/jsatom.cpp:JSAtomList::add(js::Parser*, JSAtom*, JSAtomList::AddHow)
-1,485,356  /home/njn/moz/tmN/js/src/jshash.cpp:JS_HashTableRawLookup
 1,028,309  /home/njn/moz/tmN/js/src/o64/../mfbt/LinMap.h:LeaveFunction(JSParseNode*, JSTreeContext*, JSAtom*, unsigned int)
-1,025,610  /home/njn/moz/tmN/js/src/jsparse.cpp:js::Parser::memberExpr(int)
   602,625  /home/njn/moz/tmN/js/src/jshashtable.h:js::detail::HashTable<js::HashMap<JSAtom*, JSDefinition*, js::DefaultHasher<JSAtom*>, js::Conte
xtAllocPolicy>::Entry, js::HashMap<JSAtom*, JSDefinition*, js::DefaultHasher<JSAtom*>, js::ContextAllocPolicy>::MapHashPolicy, js::ContextAllocPol
icy>::lookupFo
  -565,149  /home/njn/moz/tmN/js/src/jsatom.cpp:js_alloc_temp_entry(void*, void const*)
   517,378  /build/buildd/eglibc-N.13/malloc/malloc.c:_int_malloc
Comment 46 Chris Leary [:cdleary] (not checking bugmail) 2011-06-14 18:06:44 PDT
(In reply to comment #45)
> I've been doing |parse(snarf($filename))|.

Ah, okay! There are a few reasons why parse isn't as cool/applicable as compile here.

- parse builds a full AST by running Parser::statements() at the top level. compile, on the other hand, is able to throw away the parse nodes after every top-level script statement, like when you call into the JSAPI functions that trigger compilation. This gives us real-world opportunities to optimize by reusing resources across top level statements (which also helps this patch).

- parse, understandably, doesn't invoke the emitter. However, the optimizations made in this patch for the parser data structures are all carrying information from parse time to emit time. The efficiency of the resulting emitter access of those data structures is also pertinent in the calculation. The fact that we do, in a loop in compileScript, parse-then-fold-then-emit, means that the data structure usage for these hand-offs can actually be very important.

- Given that the emitter is important, the emitter acts slightly differently in COMPILE_N_GO situations than when COMPILE_N_GO is disabled (tracking global slots to bind slot-access ops to). Some of the maps affected by this patch are used exclusively when COMPILE_N_GO is on in the emitter. With 3ab18e7ea3d5 COMPILE_N_GO is enabled for the compile builtin in the shell.
Comment 47 Nicholas Nethercote [:njn] 2011-06-14 18:26:52 PDT
Here are 32-bit results using compile():

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    19.005    18.816 (1.010x) |    273499    245788 (1.113x) | 280slides-comb
|    77.166    76.743 (1.006x) |     1.392     1.288 (1.081x) | ball-pool-comb
|    39.966    39.631 (1.008x) |    648105    595174 (1.089x) | dojo
|    74.536    73.860 (1.009x) |     1.231     1.135 (1.084x) | effectgames-ga
|    18.502    18.330 (1.009x) |    295183    268362 (1.100x) | ga
|   450.337   444.947 (1.012x) |     8.570     7.686 (1.115x) | gmail-combined
|   119.184   118.117 (1.009x) |     2.205     2.034 (1.084x) | gravity-combin
|    28.635    28.446 (1.007x) |    446620    413580 (1.080x) | jquery-min
|    54.861    54.340 (1.010x) |    862931    788815 (1.094x) | jsgb-combined
|    76.033    75.346 (1.009x) |     1.300     1.212 (1.073x) | mochikit
|   381.542   377.491 (1.011x) |     6.417     5.885 (1.090x) | pipio-combined
|    15.480    15.414 (1.004x) |    200344    187000 (1.071x) | processing-twi
|    61.310    61.114 (1.003x) |    546636    507611 (1.077x) | sunspider-comb
|    24.132    23.873 (1.011x) |    390848    354955 (1.101x) | tetris-combine
|   100.654    99.520 (1.011x) |     1.623     1.483 (1.094x) | twitter-combin
|    99.762    99.317 (1.004x) |     1.577     1.465 (1.076x) | v8-combined
|    11.364    11.267 (1.009x) |    160279    147567 (1.086x) | yui-min
|   574.182   571.508 (1.005x) |    10.234     9.494 (1.078x) | zimbra-combine
-------
|  2226.660  2208.090 (1.008x) |    38.377    35.195 (1.090x) | all


Much better!  Thanks for the detailed comparison of parse() and compile(), I didn't know about that.  I've changed my parsemark installation to use compile();  even though it measures emitting as well as parsing, it more accurately reflects what the browser is doing.
Comment 48 Luke Wagner [:luke] 2011-06-15 09:53:29 PDT
Comment on attachment 538617 [details] [diff] [review]
Extricate with zeal!

Review of attachment 538617 [details] [diff] [review]:
-----------------------------------------------------------------

Great job unraveling all this!

In general, looking good so far, here is an incremental batch of comments:

::: js/src/jsapi.cpp
@@ -4558,5 @@
>      else
>          chars = InflateString(cx, bytes, &length);
>      if (!chars)
>          return JS_TRUE;
> -

You probably didn't mean to delete this well-meaning \n.

::: js/src/jshashtable.h
@@ +585,5 @@
> +    /* 
> +     * N.B. Should only be called when the interface knows the entry may be
> +     * cleared as a POD type.
> +     */
> +    void turboClear()

I'd rather not add this method.  Instead, we should be able to use IsPodType to automatically choose between the loop vs. memset.  To do this we'll need to hoist HashTable::Entry and HashMap::Entry out of their respective templates so that we can specialize IsPodType inductively:

  template <class T>
  struct IsPodType<HashTableEntry<T> > {
    static const bool result = IsPodType<T>::result;
  }

  template <class K, class V>
  struct IsPodType<HashMapEntry<K, V> > {
    static const bool result = IsPodType<K>::result && IsPodType<V>::result;
  };

::: js/src/jsprvtd.h
@@ +195,5 @@
>  } /* namespace js */
>  
> +typedef js::LinMap<JSAtom *, JSDefinition *, 24> JSAtomDefnMap;
> +typedef js::LinMap<JSAtom *, jsatomid, 24> JSAtomIndexMap;
> +typedef js::LinMap<JSAtom *, JSDefnOrHeader, 24> JSAtomDOHMap;

I see a lot of JSAtomDOHMap::Range, ::AddPtr, etc.  Perhaps you can have typedefs for AtomDOHRange, AtomDOHAddPtr, etc.?

::: js/src/mfbt/LinMap.h
@@ +44,5 @@
> +#include "jshashtable.h"
> +
> +namespace js {
> +
> +static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1;

Could this be a member static const of LinMap?

@@ +48,5 @@
> +static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1;
> +
> +/* 
> + * If a zero value can be reserved for use by the LinMap as an "invalid" key,
> + * it is usable by the LinMap.

I think you want to strengthen this statement: "An element may only be used in a LinMap if zero is an invalid value (and thus may be used as a tombstone by LinMap)."

@@ +56,5 @@
> + */
> +template <typename T> struct ZeroIsReserved         { static const bool result = false; };
> +template <typename T> struct ZeroIsReserved<T *>    { static const bool result = true; };
> +
> +template <typename K, typename V, size_t inlineElems>

Could you s/inlineElements/InlineElems/ to indicate compile-time const-ness.

@@ +77,5 @@
> +    InlineElem      lin[inlineElems];
> +    WordMap         map;
> +    size_t          linNext;
> +    size_t          linCount;
> +    bool            mapInitialized;

You can remove this member and just use map.initialized().

@@ +101,5 @@
> +                return false;
> +            mapInitialized = true;
> +        }
> +
> +        for (InlineElem *it = lin; it != lin + linNext; ++it) {

Is LICM good enough to hoist "lin + linNext"?  There are function calls in the body which may make the compiler conservatively recompute every time.  To be safe, let's just follow ye olde idiom:

  for (InlineElement *it = lin, *end = lin + linNext; ...

@@ +110,5 @@
> +                return false;
> +        }
> +
> +        JS_ASSERT(map.count() == linCount);
> +        linNext = inlineElems + 1;

JS_ASSERT(!usingMap()); here would be nice.

@@ +123,5 @@
> +    }
> +
> +    ~LinMap() {
> +        if (mapInitialized)
> +            map.finish();

Shouldn't map's destructor handle this?

@@ +150,5 @@
> +        friend class LinMap;
> +
> +        WordMapPtr  mapPtr;
> +        InlineElem  *linPtr;
> +        bool        isLinPtr;

Could we tighten up the invariants in Ptr so that:
  found()  === !!linPtr
  isLinPtr === linPtr != 0x1
Then we could remove isLinPtr (which doesn't matter space-wise, but it would allow found() to be branch-less and key() and value() to require one less load / keep one less register live.

@@ +164,5 @@
> +        Ptr(const Ptr &other) : isLinPtr(other.isLinPtr) {
> +            if (isLinPtr)
> +                linPtr = other.linPtr;
> +            else
> +                mapPtr = other.mapPtr;

Default copy ctor should just work, without branching.

::: js/src/parser/AtomData.cpp
@@ +98,5 @@
> +    return true;
> +}
> +
> +JSAtomDeclNode *
> +JSAtomDecls::lastAsNode(JSAtomDOHMap::AddPtr &p)

IIUC, you can just take a "JSDefnOrHeader &" instead of the AddPtr type.

::: js/src/parser/AtomData.h
@@ +38,5 @@
> + *
> + * ***** END LICENSE BLOCK ***** */
> +
> +#ifndef AtomData_h__
> +#define AtomData_h__

I would prefer you merged the AtomData files with the ParseMapPool files since they are all closely associated concepts.  ParseMaps{.h, .cpp, -inl.h} makes sense to me.

Also, can you strip the JS prefix from JSAtomThingMapPtr, JSDefnOrHeader, JSAtomDecls, JSMultiDeclLookup, and JSAtomDeclsLookup and put them in namespace js?

@@ +49,5 @@
> + * name may reference. If a with block or a function that calls eval encloses
> + * the use, the name may end up referring to something else at runtime.
> + */
> +
> +/* Note: POD-type. */

Seems like a weird place to put this comment.  Perhaps merge with above comment and explain why this is necessary.

@@ +61,5 @@
> +    void finish(JSContext *cx);
> +
> +    bool hasMap() const { return !!map; }
> +    Map *getMap() { return map; }
> +    void setMap(Map *newMap) { map = newMap; }

JS_ASSERT(!map);

@@ +80,5 @@
> +        if (p) {
> +            p.value() = defn;
> +            return true;
> +        }
> +        return map->add(p, atom, defn);

For consistency, rename to "put" and replace the body with HashMap::put.

::: js/src/parser/ParseMapPool-inl.h
@@ +70,5 @@
> +/* Arbitrary atom map type, that has keys and values of the same kind. */
> +typedef JSAtomIndexMap AtomMapT;
> +
> +static AtomMapT *
> +AsAtomMap(void *ptr)

Seems like this can be a private static helper of ParseMapPool.

::: js/src/parser/ParseMapPool.cpp
@@ +78,5 @@
> +        return NULL;
> +
> +    if (!all.append(map) || !recyclable.reserve(all.length())) {
> +        cx->delete_<AtomMapT>(map);
> +        return NULL;

You could all.reserve/recycleable.reserve above new_ and then infallibleAppend after new_, that way you don't have to call free_ on the error path.

::: js/src/parser/ParseMapPool.h
@@ +49,5 @@
> + * A pool that permits the reuse of the backing storage for the defn, index, or
> + * defn-or-header (multi) maps.
> + *
> + * The pool owns all the maps that are given out, and is responsible for
> + * relinquishing all resources when |destroyAll| is triggered.

s/destroyAll/purgeAll/

@@ +53,5 @@
> + * relinquishing all resources when |destroyAll| is triggered.
> + *
> + * Tables that have changed size upwards are not recycled, because a |remove|
> + * from one such hash table causes the table to change size downwards. However,
> + * Performing frees on the (hot) release path is costly, so we use a delayed

This seems like an impl detail that could just be put in the relevant member function.

@@ +71,5 @@
> +    void recycle(void *map) {
> +        JS_ASSERT(map);
> +#ifdef DEBUG
> +        bool ok = false;
> +        /* Make sure the map is in all but not already in recyclable. */

Put '' or || around 'all'.

@@ +104,5 @@
> +    }
> +
> +    /* Fallibly aquire one of the supported map types from the pool. */
> +    template <typename T>
> +    T *acquire();

To be symmetric with release() and save typing, you could replace the current specialiation-based approach with simple inline overloads acquireAtomIndexMap(), acquireAtomDefnMap, etc.
Comment 49 Chris Leary [:cdleary] (not checking bugmail) 2011-06-17 15:26:18 PDT
Created attachment 540149 [details] [diff] [review]
Updated per review comments.

This accounts for everything except the turboClear/HashTable::Entry changes, which I'll put in a separate patch once this one finishes getting reviewed for the rest of its contents, if that's okay.
Comment 50 Chris Leary [:cdleary] (not checking bugmail) 2011-06-20 13:24:37 PDT
Thinking about it, I think the parser/ subdir should be renamed to frontend/ so that parser/emitter stuff can live in there. I'll change that when this lands if there are no objections.
Comment 51 Luke Wagner [:luke] 2011-06-20 13:34:37 PDT
Good point, 'parser' had bugged me but I hadn't thought of a name as good as 'frontend'.
Comment 52 Luke Wagner [:luke] 2011-06-20 16:19:22 PDT
Comment on attachment 540149 [details] [diff] [review]
Updated per review comments.

Review of attachment 540149 [details] [diff] [review]:
-----------------------------------------------------------------

I looked through almost everything and I really like the direction this patch has taken things.  A lot less ALE mystery and a lot more static-y type-y goodness, so great.  Almost done iterating :)

::: js/src/jsatom.cpp
@@ +650,5 @@
>  #define TEMP_SIZE_LIMIT         JS_BIT(TEMP_SIZE_LIMIT_LOG2)
>  
>  JS_STATIC_ASSERT(TEMP_SIZE_START >= sizeof(JSHashTable));
>  
> +JS_NEVER_INLINE

From IRL: remove profiling remnants and re-inline these to js_InitAtomMap.

@@ +657,4 @@
>  {
> +    for (const AtomIndexMap::InlineElem *it = indices->asLin(), *end = indices->linEnd();
> +         it != end; ++it) {
> +        JSAtom *atom = (JSAtom *) it->key;

I think it->key is already a JSAtom* without the cast.

@@ +659,5 @@
> +         it != end; ++it) {
> +        JSAtom *atom = (JSAtom *) it->key;
> +        if (!atom)
> +            continue;
> +        jsatomid index(it->value);

it->value is also a jsatomid, so can just use inline below.

@@ +673,5 @@
> +    typedef AtomIndexMap::WordMap WordMap;
> +    const WordMap &wm = indices->asMap();
> +    for (WordMap::Range r(wm.all()); !r.empty(); r.popFront()) {
> +        JSAtom *atom = (JSAtom *) r.front().key;
> +        jsatomid index(r.front().value);

Ditto.

::: js/src/jsatom.h
@@ -45,5 @@
>  #include <stddef.h>
>  #include "jsversion.h"
>  #include "jsapi.h"
>  #include "jsprvtd.h"
> -#include "jshash.h"

Bam!

::: js/src/jscntxt.cpp
@@ +90,5 @@
>  #include "jscntxtinlines.h"
>  #include "jscompartment.h"
>  #include "jsobjinlines.h"
>  
> +#include "parser/ParseMaps.h"

Mega nit: I believe the style is to put xyz/Abc.h includes after the js*.h and before the js*inlines.h.

@@ +327,5 @@
>  
>      JS_InitArenaPool(&cx->tempPool, "temp", TEMP_POOL_CHUNK_SIZE, sizeof(jsdouble));
>      JS_InitArenaPool(&cx->regExpPool, "regExp", TEMP_POOL_CHUNK_SIZE, sizeof(int));
>  
> +    cx->freeMapPool = OffTheBooks::new_<ParseMapPool>(cx);

Perhaps this could be done lazily (in the compiler init)?  Also, could 'purge' just call 'delete cx->freeMapPool' and, if the JSContext goes unused for a long time (which sounds quite possible since there is currently context-per-outer-global-window) then we save memory.  If we do this, then it would also make sense to make freeMapPool a private member and have:

  ParseMapPool &freeMapPool() { return *freeMapPool_; }
  bool ensureFreeMapPool();

@@ +1599,5 @@
>  /*
>   * Release pool's arenas if the stackPool has existed for longer than the
>   * limit specified by gcEmptyArenaPoolLifespan.
>   */
> +static void

Intentionally un-inlined or profiling remnant?  Just asking.

::: js/src/jsemit.cpp
@@ +138,4 @@
>  {
> +    roLexdeps.clearMap();
> +    globalMap.clearMap();
> +    upvarIndices.clearMap();

It seems strange to me that all these have an 'init()' method, and its not the thing we are calling in 'init()'.  Perhaps we could rename 'clearMap' to 'init' and rename 'init' to 'ensureMap'?  Plus, ensureMap would, as the name implies, handle the already-hasMap case, so you could simplify this "if (!x.hasMap() && !x.init(context()))" pattern that appears in several places to just "if (!x.ensureMap())".

@@ +153,5 @@
> +    if (globalMap.hasMap())
> +        globalMap.finish(cx);
> +
> +    if (upvarIndices.hasMap())
> +        upvarIndices.finish(cx);

We are in a destructor so my spider senses say that no explicit 'finish' should be necessary.  Now, AtomThingMapPtr needs to be union-able, so perhaps we could have a derived OwnedAtomThingMapPtr whose destructor calls finish().

@@ +2256,3 @@
>                      return JS_FALSE;
>  
> +                (void) index;

Is GCC really warning you here and other places where you do this?

::: js/src/jsemit.h
@@ +51,5 @@
>  #include "jsprvtd.h"
>  #include "jspubtd.h"
>  
> +#include "jsatominlines.h"
> +#include "parser/ParseMaps.h"

Ditto above header nit.

@@ +349,5 @@
>          JS_ASSERT(!inFunction());
>          scopeChain_ = scopeChain;
>      }
>  
> +    js::AtomDefnMapPtr lexdeps;       /* unresolved lexical name dependencies */

comment alignment

::: js/src/jsparse.cpp
@@ +958,2 @@
>      js::GlobalScope globalScope(cx, globalObj, &cg);
> +

Since you are touching this code anyway, could this decl be lowered below the 'if' and then the resulting conditiona+if sequence simplified?

@@ +1611,5 @@
>      if (!dn)
>          return NULL;
>  
> +    if (!tc->lexdeps->add(p, atom, dn))
> +        return NULL;

Can anything weird happen between the lookupForAdd and this 'add()' ?  Also, would it make sense to hoist the 'add()' out of this function into its caller so that there was a visual pairing between lookupForAdd+add?  That way MakePlaceholder is just a simple node-creating function.  Light suggestion.

@@ +2140,5 @@
>          if (pn->pn_type == TOK_UPVARS) {
> +            AtomDefnMapPtr &upvars = pn->pn_names;
> +            JS_ASSERT(upvars->count() != 0);
> +
> +            for (AtomDefnRange r(upvars->all()); !r.empty(); r.popFront()) {

"r = upvars->all()" works and it a bit prettier (here and x5 below).

::: js/src/jsprvtd.h
@@ +72,5 @@
>  
>  /* Scalar typedefs. */
> +typedef uint8       jsbytecode;
> +typedef uint8       jssrcnote;
> +typedef uintptr_t   jsatomid;

Ideally, we'd have a nice little word-sized struct that gave jsatomid a cute little isX()/toX() discriminated-union interface.  Since it looks like you have to touch most places that use jsatomid currently, can you do this?  It looks like there are a lot of places that use jsatomid for no valid reason, and once you nix them, then there are only 24 uses in jsatom code so you could pull this out of jsprvtd.h and into jsatom.h.

@@ +179,5 @@
>  
> +template <typename K,
> +          typename V,
> +          size_t inlineElems>
> +class LinMap;

Still renaming this, right?

::: js/src/mfbt/LinMap.h
@@ +71,5 @@
> +
> +    InlineElem      lin[InlineElems];
> +    WordMap         map;
> +    size_t          linNext;
> +    size_t          linCount;

It seems like, for locality, we'd want 'lin' next to 'linNext' and 'linCount'.

@@ +80,5 @@
> +        JS_STATIC_ASSERT(sizeof(V) == sizeof(uintptr_t));
> +    }
> +
> +    bool usingMap() const {
> +        JS_STATIC_ASSERT(InlineElems + 1 > InlineElems);

I think you can safely assume that noone is using InlineElements = size_t(-1) :)

@@ +81,5 @@
> +    }
> +
> +    bool usingMap() const {
> +        JS_STATIC_ASSERT(InlineElems + 1 > InlineElems);
> +        return linNext > InlineElems && map.initialized();

Is it possible to get linNext > InlineElems and !map.initialized()?  Even if the map.init() in switchToMap fails, we won't yet have updated the state, so we'll still have linNext <= InlineElems.

@@ +99,5 @@
> +        for (InlineElem *it = lin, *end = lin + linNext; it != end; ++it) {
> +            if (NULL == it->key)
> +                continue;
> +            WordMapAddPtr p = map.lookupForAdd(it->key);
> +            if (!map.add(p, it->key, it->value))

map.putNew()

@@ +147,5 @@
> +
> +        typedef Ptr ******* ConvertibleToBool;
> +
> +        explicit Ptr(WordMapPtr p) : mapPtr(p), isLinPtr(false) {}
> +        explicit Ptr(InlineElem *ie) : linPtr(ie), isLinPtr(true) {}

Since you are leaving one word uninitialized, I think you should define operator== or make it private+undefined.

@@ +150,5 @@
> +        explicit Ptr(WordMapPtr p) : mapPtr(p), isLinPtr(false) {}
> +        explicit Ptr(InlineElem *ie) : linPtr(ie), isLinPtr(true) {}
> +
> +      public:
> +        Ptr() : linPtr(NULL), isLinPtr(true) {}

AddPtr's default constructor (below) leaves members uninitialized, so for consistency with it and HashTable::Ptr, I would do so here too.

@@ +159,5 @@
> +            return isLinPtr ? bool(linPtr) : mapPtr.found();
> +        }
> +
> +        operator ConvertibleToBool() const {
> +            return found() ? (ConvertibleToBool) 1 : (ConvertibleToBool) 0;

return ConvertibleToBool(found());

@@ +164,5 @@
> +        }
> +
> +        K &key() {
> +            JS_ASSERT(found());
> +            return *reinterpret_cast<K *>(isLinPtr ? &linPtr->key : &mapPtr->key);

I think this casting here and in value() isn't necessary; linPtr and mapPtr's types both reflect K and V.

@@ +182,5 @@
> +        InlineElem      *linPtr;
> +        InlineElem      *linAddPtr;
> +        bool            isLinPtr;
> +
> +        AddPtr(InlineElem *ptr, bool found_) : isLinPtr(true) {

Can you s/found_/found/?  _ is for member vars.  Also, you might consider having AddPtr's fields match the constructor args; seems simpler than having two InlineElem*.

@@ +204,5 @@
> +            return isLinPtr ? bool(linPtr) : mapAddPtr.found();
> +        }
> +
> +        operator ConvertibleToBool() const {
> +            return found() ? ConvertibleToBool(1) : ConvertibleToBool(0);

return ConvertibleToBool(found());

@@ +211,5 @@
> +        V &value() {
> +            JS_ASSERT(found());
> +            if (isLinPtr)
> +                return *((V *) &linPtr->value);
> +            return *((V *) &mapAddPtr->value);

Shouldn't need casts.

@@ +237,5 @@
> +        JS_ASSERT(isMap());
> +        return map;
> +    }
> +
> +    const InlineElem *asLin() const {

Reminder: uses of "Lin" in LinMap should change with the name.

@@ +257,5 @@
> +            if (it->key == key)
> +                return Ptr(it);
> +        }
> +
> +        return Ptr();

With Ptr() leaving undefined members, would need to change this to explicitly pass NULL.

@@ +303,5 @@
> +                return false;
> +
> +            WordMapAddPtr p = map.lookupForAdd(key);
> +            JS_ASSERT(!p);
> +            return map.add(p, key, value);

map.putNew().  Also, you might consider putting switchToPath+map.put in a separate JS_NEVER_INLINE path since this should be cold code compared to the inline/map paths through add().

@@ +319,5 @@
> +        }
> +        return add(p, key, value);
> +    }
> +
> +    void remove(Ptr &p) {

I think you can just take a "Ptr p".  Mutable arg refs are weird.

@@ +325,5 @@
> +        if (p.isLinPtr) {
> +            JS_ASSERT(linCount > 0);
> +            JS_ASSERT(p.linPtr->key != NULL);
> +            p.linPtr->key = NULL;
> +            --linCount;

This seems like a sensible strategy for removal.  Did happen to measure whether there were often cases where, if we shifted elements down to fill removed elements, we would stay linear?  Or is that just a waste of everyone's time?

@@ +344,5 @@
> +
> +        /* The first bits are tagged when iterating over a lin range. */
> +        typedef AlignedStorage<sizeof(WordMapRange)> WordMapRangeBytes;
> +        union {
> +            uintptr_t               firstBits;

As previously discussed, inlining and struct decomposition should allow you to un-union-ify mapRange and (cur,end) and just have a simple bool tag.  Probably good for perf since it takes computation (untagging) off the dependent path and puts the extra load on a path to a predictable branch.

::: js/src/parser/ParseMaps-inl.h
@@ +137,5 @@
> +AtomThingMapPtr<Map>::finish(JSContext *cx)
> +{
> +    JS_ASSERT(map);
> +    cx->freeMapPool->release(map);
> +    clearMap();

I would have thought that 'finish()' would take care of the "if (!map)" case to make it simpler for the callers.  Sure it wastes a branch in a few of the cases, but I'm guessing it won't be measurable.

::: js/src/parser/ParseMaps.h
@@ +127,5 @@
> +
> +/*
> + * In an upvars list, defn->resolve() is the outermost definition the
> + * name may reference. If a with block or a function that calls eval encloses
> + * the use, the name may end up referring to something else at runtime.

This comment seems to have lost its parents.

@@ +130,5 @@
> + * name may reference. If a with block or a function that calls eval encloses
> + * the use, the name may end up referring to something else at runtime.
> + */
> +
> +/* Note: POD-type capable of inclusion in (JSParseNode) union. */

I think LLVM is going to bitch about this being a non-POD b/c technically a POD doesn't have private members.  Perhaps turn into a 'struct' and strip 'protected' and 'public' and rename 'map' to 'map_' to make offenders thing twice.

@@ +261,5 @@
> +        map->clear();
> +    }
> +
> +    /* Return the definition at the head of the chain for |atom|. */
> +    inline JSDefinition *lookup(JSAtom *atom);

For symmetry, could this be lookupFirst?  (or lookupSingle or whatever makes sense)

@@ +264,5 @@
> +    /* Return the definition at the head of the chain for |atom|. */
> +    inline JSDefinition *lookup(JSAtom *atom);
> +
> +    /* Perform a lookup that can iterate over the definitions associated with |atom|. */
> +    inline void lookupMulti(JSAtom *atom, MultiDeclLookup *lookup);

Nice job using a type here to express this muli-map.  Instead of a custom MultiDeclLookup type, perhaps you could define a member Range type that modeled the Range concept (empty, popFront, front) we have been using in SpiderMonkey?
Comment 53 Chris Leary [:cdleary] (not checking bugmail) 2011-06-21 18:32:31 PDT
Created attachment 540944 [details] [diff] [review]
Updated x2 per review comments.

I actually realized that the wider jsatomid type was increasing the size of JSScript, so I pinned some things explicitly to uint32. When this patch gets r+ I'll make the hashtable patch to hoist the entry types and eliminate turboClear.
Comment 54 Chris Leary [:cdleary] (not checking bugmail) 2011-06-23 15:12:04 PDT
Created attachment 541519 [details] [diff] [review]
WIP: remove turboClear.

Now realize I also left some unnecessary static assertion constraints in InlineMap that I can now move into the ParseMapPool (i.e. same sizedness of keys/values, POD-ness of the HashMapEntry).
Comment 55 Chris Leary [:cdleary] (not checking bugmail) 2011-06-23 15:19:36 PDT
Another before-commit fix: there are a few places where we instantiate a parser without a compiler, so active compilation count and ensuring the parseMapPool should be moved into the parser initialization as well.
Comment 56 Luke Wagner [:luke] 2011-06-23 16:35:48 PDT
Comment on attachment 540944 [details] [diff] [review]
Updated x2 per review comments.

Review of attachment 540944 [details] [diff] [review]:
-----------------------------------------------------------------

Mm mm, bring on the type-y goodness; r+ with these nits.  Again, great work and thanks for putting up with all these comments :)

::: js/src/frontend/ParseMaps-inl.h
@@ +99,5 @@
> +{
> +    JS_ASSERT(map);
> +    AtomDOHPtr p = map->lookup(atom);
> +    if (!p) {
> +        return MultiDeclRange();

No {}

::: js/src/frontend/ParseMaps.cpp
@@ +64,5 @@
> +
> +    all.~RecyclableMaps();
> +    new (&all) RecyclableMaps();
> +    recyclable.~RecyclableMaps();
> +    new (&recyclable) RecyclableMaps();

This makes me think that a Vector::clearAndFree() is in order.  I know before we've used Maybe<>... but clearAndFree() is really what you want to say in this sort of "purge" situation that has come up now a few times.

@@ +114,5 @@
> +        js_ReportOutOfMemory(cx);
> +        return NULL;
> +    }
> +    new (p) AtomDeclNode(defn);
> +    return p;

return new (p) AtomDeclNode(defn);

@@ +123,5 @@
> +{
> +    AtomDOHAddPtr p = map->lookupForAdd(atom);
> +    AtomDeclNode *node = allocNode(defn);
> +    if (!node)
> +        return false;

This is a strange ordering; perhaps move the lookup for add to after the "if (!node)" test?

@@ +141,5 @@
> +    return true;
> +}
> +
> +AtomDeclNode *
> +AtomDecls::lastAsNode(DefnOrHeader &doh)

in/out param wants pass-by-pointer to make syntactically evident at the callsite.

@@ +164,5 @@
> +{
> +    AtomDOHAddPtr p = map->lookupForAdd(atom);
> +    AtomDeclNode *node = allocNode(defn);
> +    if (!node)
> +        return false;

Again, strange ordering; maybe I'm missing something subtle (which deserves a comment, in that case :).

::: js/src/frontend/ParseMaps.h
@@ +124,5 @@
> +        recycle((void *) map);
> +    }
> +}; /* ParseMapPool */
> +
> +/* Note: POD-type capable of inclusion in (JSParseNode) union. */

I would turn this comment into a sentence and also include explicit guidance to avoid this one and use an OwnedAtomThingMapPtr instead.

@@ +135,5 @@
> +
> +    bool ensureMap(JSContext *cx);
> +    void releaseMap(JSContext *cx);
> +
> +    void copyMap(const AtomThingMapPtr &other) { map_ = other.map_; }

It seems like default operator= would work fine instead of explicit copyMap().

@@ +174,5 @@
> +        AtomThingMapPtrT::releaseMap(cx);
> +    }
> +};
> +
> +typedef AtomThingMapPtr<AtomIndexMap> AtomIndexMapPtr;

I think this typedef should be move up a bit to go under AtomThingMapPtr.

@@ +182,5 @@
> +/* Node structure for chaining in AtomDecls. */
> +struct AtomDeclNode
> +{
> +    JSDefinition    *defn;
> +    AtomDeclNode  *next;

weird alignment

@@ +190,5 @@
> +    {}
> +};
> +
> +/*
> + * Tagged union of a JSDefinition and a AtomDeclNode, for use in the

s/a Atom/an Atom/, s/in the/in/

@@ +207,5 @@
> +        u.bits = 0;
> +    }
> +
> +    explicit DefnOrHeader(JSDefinition *defn) {
> +        u.defn = defn;

How about following this with JS_ASSERT(!isHeader()) and following the other ctor overload with JS_ASSERT(isHeader());

@@ +284,5 @@
> +
> +    /* Add-or-update a known-unique definition for |atom|. */
> +    inline bool addUnique(JSAtom *atom, JSDefinition *defn);
> +    bool addShadow(JSAtom *, JSDefinition *);
> +    bool addHoist(JSAtom *, JSDefinition *);

House style (which I'm not a huge fan of) compels me to request names for these parameters.

@@ +289,5 @@
> +
> +    /* Updating the definition for an entry that is known to exist is infallible. */
> +    void update(JSAtom *atom, JSDefinition *defn) {
> +        JS_ASSERT(map);
> +        AtomDOHMap::AddPtr p = map->lookupForAdd(atom);

I think here you just want 'lookup' and use a 'Ptr'.

@@ +331,5 @@
> + * Lookup state tracker for those situations where the caller wants to traverse
> + * multiple definitions associated with a single atom. This occurs due to block
> + * scoping.
> + */
> +class MultiDeclRange

Viewing AtomDecls as a real container, could you s/MultiDeclRange/AtomDecl::Range/ ?  They already friend each other and there aren't too many places to change...

@@ +342,5 @@
> +    explicit MultiDeclRange(JSDefinition *defn) : node(NULL), defn(defn) {}
> +    explicit MultiDeclRange(AtomDeclNode *node) : node(node), defn(node->defn) {}
> +
> +  public:
> +    MultiDeclRange() : node(NULL), defn(NULL) {}

The general pattern has been to either not have a default constructor or (if its needed for, say, diamond-control-flow-initialization) have the default ctor be a noop.  I don't yet see a need for a default ctor here.

@@ +344,5 @@
> +
> +  public:
> +    MultiDeclRange() : node(NULL), defn(NULL) {}
> +
> +    void popFront() {

JS_ASSERT(!empty());

@@ +359,5 @@
> +        return defn;
> +    }
> +
> +    bool empty() const {
> +        return !node && !defn;

I think this can just be:
  JS_ASSERT_IF(!defn, !node);
  return !!defn;

::: js/src/jsatom.cpp
@@ +653,5 @@
>  
>  JS_STATIC_ASSERT(TEMP_SIZE_START >= sizeof(JSHashTable));
>  
> +static void
> +js_InitAtomMap_Lin(JSContext *cx, JSAtomMap *map, AtomIndexMap *indices)

JS_NEVER_INLINE was removed, but might as well re-inline since both are short and not reused.

::: js/src/jsemit.cpp
@@ +2455,5 @@
>  {
> +    if (!globalMap.ensureMap(context()))
> +        return false;
> +    AtomIndexAddPtr p = globalMap->lookupForAdd(atom);
> +    if (p) {

if (AtomIndexAddPtr p = ...) {
 ...

::: js/src/mfbt/InlineMap.h
@@ +70,5 @@
> +    typedef typename WordMap::Range     WordMapRange;
> +
> +    InlineElem      lin[InlineElems];
> +    size_t          linNext;
> +    size_t          linCount;

They are next to each other now, but, if we assume that most InlineMaps are (1) inline and (2) small, then I think we would have better locality by putting linNext/linCount in front of 'lin'.  Also, 'lin' doesn't look great in identifiers; can you replace (L|l)in with (I|i)nline?

@@ +98,5 @@
> +        for (InlineElem *it = lin, *end = lin + linNext; it != end; ++it) {
> +            if (NULL == it->key)
> +                continue;
> +            if (!map.putNew(it->key, it->value))
> +                return false;

How about:

  if (it->key && !map.putNew(it->key, it->value))
    return false;

@@ +115,5 @@
> +
> +        return map.putNew(key, value);
> +    }
> +
> +    static const uintptr_t UNTAG_PTR_MASK = uintptr_t(-1) - 1;

this is now dead

@@ +364,5 @@
> +#endif
> +        }
> +
> +        bool isLinRange() const {
> +            return isLin;

How about one JS_ASSERT_IF(isInline, checkInlineRangeInvariants()) here and remove all the other ones (since they all seem to be dominated by a call to isInlineRange).
Comment 57 :Ms2ger (⌚ UTC+1/+2) 2011-06-24 10:49:10 PDT
Comment on attachment 540944 [details] [diff] [review]
Updated x2 per review comments.

Some random comments:

>--- /dev/null
>+++ b/js/src/frontend/ParseMaps-inl.h
>+ * The Initial Developer of the Original Code is
>+ * Mozilla Corporation.

the Mozilla Foundation.

Also elsewhere.

>--- /dev/null
>+++ b/js/src/frontend/ParseMaps.cpp

>+using namespace js;

Dunno what the expected style in js/ is, but I think namespace js { ... } is clearer.

>--- a/js/src/jsemit.cpp
>+++ b/js/src/jsemit.cpp

>+                jsatomid _;
>+                if (!cg->makeAtomIndex(atom, &_))

(A couple of times.) Does this want a one-argument-overload?
Comment 58 Chris Leary [:cdleary] (not checking bugmail) 2011-06-24 11:01:17 PDT
(In reply to comment #57)
> Dunno what the expected style in js/ is

I think that's a good reason to avoid drive-by nits. :-/
Comment 59 Chris Leary [:cdleary] (not checking bugmail) 2011-06-24 14:18:04 PDT
Created attachment 541793 [details] [diff] [review]
Anti-turbo-clear.

I'm gonna have me a turbodog!
Comment 60 Luke Wagner [:luke] 2011-06-24 15:21:01 PDT
Comment on attachment 541793 [details] [diff] [review]
Anti-turbo-clear.

r+ with *Entry types put back inline as discussed.

Emerging SM namespace style is "using namespace js" at head of .cpp after headers.
Comment 61 Chris Leary [:cdleary] (not checking bugmail) 2011-06-25 16:03:02 PDT
Looked good on try: http://hg.mozilla.org/tracemonkey/rev/3d646df22a4b
Comment 62 Chris Leary [:cdleary] (not checking bugmail) 2011-06-25 17:24:16 PDT
Followup to quell rumblings in the bowels of the GCC 4.3.3 optimizer: http://hg.mozilla.org/tracemonkey/rev/7208baa27f17
Comment 63 Chris Leary [:cdleary] (not checking bugmail) 2011-06-27 11:40:59 PDT
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/3d646df22a4b
http://hg.mozilla.org/mozilla-central/rev/7208baa27f17
Comment 64 Victor K. 2011-06-28 03:25:49 PDT
revision 3d646df22a4b caused a build bustage for Win64(MSVC 2010 SP1):

ParseMaps.cpp

void
ParseMapPool::checkInvariants()
{
    /*
     * Having all values be of the same size permits us to easily reuse the
     * allocated space for each of the map types.
     */
    JS_STATIC_ASSERT(sizeof(JSDefinition *) == sizeof(jsatomid));
    JS_STATIC_ASSERT(sizeof(JSDefinition *) == sizeof(DefnOrHeader));
    JS_STATIC_ASSERT(sizeof(AtomDefnMap::Entry) == sizeof(AtomIndexMap::Entry));
    JS_STATIC_ASSERT(sizeof(AtomDefnMap::Entry) == sizeof(AtomDOHMap::Entry));
    JS_STATIC_ASSERT(sizeof(AtomMapT::Entry) == sizeof(AtomDOHMap::Entry));
    /* Ensure that the HasTable::clear goes quickly via memset. */
    JS_STATIC_ASSERT(tl::IsPodType<AtomIndexMap::WordMap::Entry>::result); <===
    JS_STATIC_ASSERT(tl::IsPodType<AtomDOHMap::WordMap::Entry>::result);
    JS_STATIC_ASSERT(tl::IsPodType<AtomDefnMap::WordMap::Entry>::result);
}

p:/mozilla-build/src/js/src/frontend/ParseMaps.cpp(58) : error C2118: negative s
ubscript

Probably lines 59 and 60 should be affected as well.
Comment 65 Chris Leary [:cdleary] (not checking bugmail) 2011-06-28 14:06:20 PDT
(In reply to comment #64)

Thanks for the heads up! I actually fixed that on tracemonkey branch with http://hg.mozilla.org/tracemonkey/rev/cc588cd332a9 , but I should cherry pick that change to mozilla-central as well.
Comment 66 Chris Leary [:cdleary] (not checking bugmail) 2011-06-28 14:12:11 PDT
Cherry picked: http://hg.mozilla.org/mozilla-central/rev/b9f09ba568b4
Comment 67 Victor K. 2011-06-28 23:29:07 PDT
Ou , Thank you :)
Great work, btw ;)
Comment 68 Jim Blandy :jimb 2012-05-08 17:33:21 PDT
*** Bug 515441 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.