Last Comment Bug 684039 - Land jsarena rewrite for realsies
: Land jsarena rewrite for realsies
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86_64 Linux
: -- normal with 1 vote (vote)
: mozilla10
Assigned To: Chris Leary [:cdleary] (not checking bugmail)
:
Mentors:
: 521328 564060 675150 (view as bug list)
Depends on: 690933 691132 691143 691898 693928 699668 937132
Blocks: 689368 689362
  Show dependency treegraph
 
Reported: 2011-09-01 14:12 PDT by Chris Leary [:cdleary] (not checking bugmail)
Modified: 2013-11-13 10:21 PST (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP: arena rewrite. (147.98 KB, patch)
2011-09-01 14:12 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
Valgrind diff results. (7.37 KB, text/plain)
2011-09-01 14:25 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details
Valgrind diff results w/ no secondary chunks. (7.32 KB, text/plain)
2011-09-02 17:00 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details
WIP: arena rewrite. (154.03 KB, patch)
2011-09-02 18:26 PDT, Chris Leary [:cdleary] (not checking bugmail)
n.nethercote: feedback-
Details | Diff | Review
WIP: arena rewrite. (142.84 KB, patch)
2011-09-07 14:39 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
Valgrind diff. (8.99 KB, text/plain)
2011-09-07 14:55 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details
Arena rewrite (143.84 KB, patch)
2011-09-21 10:31 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
No pool in nsJSEnvironment. (7.56 KB, patch)
2011-09-22 15:12 PDT, Chris Leary [:cdleary] (not checking bugmail)
mrbkap: review+
Details | Diff | Review
Arena rewrite (146.64 KB, patch)
2011-09-23 16:26 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
Arena rewrite (146.84 KB, patch)
2011-09-23 16:29 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
Arena rewrite. (150.99 KB, patch)
2011-09-26 14:45 PDT, Chris Leary [:cdleary] (not checking bugmail)
luke: review+
Details | Diff | Review
Arena rewrite. (151.84 KB, patch)
2011-09-26 15:51 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details | Diff | Review
Arena comparison for leak test. (187.15 KB, image/png)
2011-09-29 19:57 PDT, Chris Leary [:cdleary] (not checking bugmail)
no flags Details

Description Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 14:12:02 PDT
Created attachment 557637 [details] [diff] [review]
WIP: arena rewrite.

This patch rewrites all of JSArenaPool and has nice results on Valgrind-based parsemark icount differentials. (Soon to be attached.) Just need to check whether the analysis pool and temp pool are easily consolidated into the compartment and need to make it use the proper OffTheBooks accounting instead of js_malloc.
Comment 1 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 14:25:08 PDT
Created attachment 557644 [details]
Valgrind diff results.
Comment 2 Boris Zbarsky [:bz] 2011-09-01 14:34:16 PDT
Is this something that would makes sense to do for PLArenaPool too (as in, does it make sense for this to go in mfbt/)?
Comment 3 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 14:39:05 PDT
(In reply to Boris Zbarsky (:bz) from comment #2)
> Is this something that would makes sense to do for PLArenaPool too (as in,
> does it make sense for this to go in mfbt/)?

This relies on us clearing the unused parts of the pool on JS GC to avoid thrashing chunk allocs/frees, which was one of the major annoyances with the prior implementation. Would you have a mechanism to do similar from mfbt/ space?
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 14:41:04 PDT
Brian says that TI data persists until GC, when the analysis data and dependent mjit code get purged.

This means consolidating is not an option, because parsing wants to treat the arenas like a LIFO stack -- parsing could try to pop data from an analysis that it triggered, which would cause it to be clobbered.

I'll just move the tempPool to the compartment.
Comment 5 Boris Zbarsky [:bz] 2011-09-01 14:50:53 PDT
Chris, I don't think there's anything in mfbt to do that at the moment.  So let's not worry about generalizing this for now.
Comment 6 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 15:01:11 PDT
Brian also pointed out that the "large primary chunk" assumption may not be able to hold up if a lot of compartments spring up and parse a little bit in a short period of time -- you end up with ([large primary chunk size] x number of parsing compartments). I'll cut down the primary chunk size and see what happens -- maybe a large primary chunk size will have to wait for single-threaded JSRuntime. Unless we can stick it in JSThreadData? I'll have a chat with luke.
Comment 7 Nicholas Nethercote [:njn] 2011-09-01 16:03:25 PDT
Can you summarize the changes and where the improvements come from?
Comment 8 Nicholas Nethercote [:njn] 2011-09-01 16:05:51 PDT
Also, I was planning to re-land the parts of bug 675150 that fix the clownshoes over-allocations (but not the parts that greatly increase various arena sizes).  But maybe I shouldn't -- does your patch guarantee that all allocations done by the arena pool are powers-of-two?
Comment 9 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 17:06:02 PDT
(In reply to Nicholas Nethercote [:njn] from comment #8)
> But maybe I shouldn't -- does your patch guarantee that all
> allocations done by the arena pool are powers-of-two?

Yes. The header for the secondary chunks is subtracted off internally.

Testing the JSThreadData based temp pool now, should have a new patch soon.
Comment 10 Chris Leary [:cdleary] (not checking bugmail) 2011-09-01 17:13:44 PDT
(In reply to Nicholas Nethercote [:njn] from comment #7)
> Can you summarize the changes and where the improvements come from?

Avoiding thrashing seems to be the key. Callgrind shows that, with the same chunk sizing, my impl only makes 3,125 calls to malloc, whereas the prior impl made 14,377. The larger chunk sizing probably also emphasizes the speed of the primary chunk bump allocation path. I'll post a more complete description of the design with the next patch.
Comment 11 Nicholas Nethercote [:njn] 2011-09-01 17:42:09 PDT
(In reply to Chris Leary [:cdleary] from comment #10)
> I'll post a more complete
> description of the design with the next patch.

Great.  I had a quick look but I haven't yet grokked the difference between primary and secondary chunks.
Comment 12 Nicholas Nethercote [:njn] 2011-09-01 17:53:56 PDT
(In reply to Chris Leary [:cdleary] from comment #10)
> 
> Avoiding thrashing seems to be the key. Callgrind shows that, with the same
> chunk sizing, my impl only makes 3,125 calls to malloc, whereas the prior
> impl made 14,377.

With the same chunk sizing you make many fewer calls to malloc?  I'm puzzled!
Comment 13 Chris Leary [:cdleary] (not checking bugmail) 2011-09-02 17:00:46 PDT
Created attachment 557992 [details]
Valgrind diff results w/ no secondary chunks.

I had some kind of strange stale build issue with the last Valgrind results, which makes sense, because that many frees/mallocs (mentioned in the above comment) shouldn't account for that much perf improvement.

Attached is the best perf improvement I can get using pure bump allocation and a huge single primary chunk. 1.15% icount improvement on gmail_combined in parsemark. New patch and design details to follow.
Comment 14 Chris Leary [:cdleary] (not checking bugmail) 2011-09-02 18:26:05 PDT
Created attachment 558011 [details] [diff] [review]
WIP: arena rewrite.

I put a comment at the top of BumpArena that explains the lifecycle and reasons behind the |freeUnused| style deallocation policy. I'll probably rebase this on top of Nick's code/notePool changes.

Nick, let me know if this is unclear.
Comment 15 Nicholas Nethercote [:njn] 2011-09-04 23:22:01 PDT
Comment on attachment 558011 [details] [diff] [review]
WIP: arena rewrite.

Review of attachment 558011 [details] [diff] [review]:
-----------------------------------------------------------------

I've only looked as jsarena.h so far.  The main thing so far is that I'm really surprised by the primary/secondary split;  it introduces a bunch of complexity and slowness and I can't see the benefits of it.  Some specifics about this:

- The primary/secondary split seems to be based on the assumption that the majority of arena allocations can be satisfied out of the first chunk.  My experience is that this is not true, especially for tempPool which is the hottest arena and can contain hundreds or thousands of chunks.  (Oh, I see you've increased the size of the tempPool's primary chunk to 4MB.  Well, that would make things fit in the primary chunk a lot more, but 4MB is excessive.)

- I tried to do some measurements, but the patch doesn't compile in the browser.  First, BumpArena::realloc clashes with the realloc macro from mozalloc.h.  Second, I'm getting compile errors because jsdebug.c (which is C) ends up including jsarena.h (which is C++) through this sequence:

                 from ../../dist/include/jsarena.h:43:0,
                 from ../../dist/include/jsopcode.h:49,
                 from ../../dist/include/jsdbgapi.h:48,
                 from /home/njn/moz/mi6/js/jsd/jsdebug.h:58,
                 from /home/njn/moz/mi6/js/jsd/jsd.h:81,
                 from /home/njn/moz/mi6/js/jsd/jsdebug.c:42:

- It puts a conditional check in the fast path.  It may well be highly predictable but no check would be better.

- Once we're out of primary mode, the allocation path looks like this:  alloc() calls allocSlow() which calls allocSecondary() which calls tryAlloc() which does a bump allocation that is almost identical to the bump allocation done by alloc() when in primary mode.  Even if inlining avoids a lot of the overhead there it still seems rather strange and unnecessary.

Does the primary/secondary split really help?  Maybe I'm missing something, but my gut is telling me it can be removed, leaving the code with a structure more like the existing jsarena.cpp code -- i.e. only one kind of chunk and no modes -- and simpler as a result.

BTW, here are my parsemark measurements for a 64-bit shell build, compiling them with compile(snarf(<file>)):

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    16.395    16.359 (1.002x) |         0         0 (------) | 280slides-comb
|    67.108    67.223 (0.998x) |         0         0 (------) | ball-pool-comb
|    33.234    33.239 (------) |         0         0 (------) | dojo
|    65.634    65.527 (1.002x) |         0         0 (------) | effectgames-ga
|    14.703    14.672 (1.002x) |         0         0 (------) | ga
|   370.869   374.150 (0.991x) |         0         0 (------) | gmail-combined
|   103.846   104.483 (0.994x) |         0         0 (------) | gravity-combin
|    23.833    23.836 (------) |         0         0 (------) | jquery-min
|    44.015    43.657 (1.008x) |         0         0 (------) | jsgb-combined
|    64.602    64.435 (1.003x) |         0         0 (------) | mochikit
|   311.495   314.466 (0.991x) |         0         0 (------) | pipio-combined
|    13.016    12.939 (1.006x) |         0         0 (------) | processing-twi
|    54.694    54.670 (------) |         0         0 (------) | sunspider-comb
|    19.639    19.592 (1.002x) |         0         0 (------) | tetris-combine
|    81.252    81.117 (1.002x) |         0         0 (------) | twitter-combin
|    84.036    83.518 (1.006x) |         0         0 (------) | v8-combined
|     9.387     9.349 (1.004x) |         0         0 (------) | yui-min
|   501.731   507.744 (0.988x) |         0         0 (------) | zimbra-combine
-------
|  1879.498  1890.985 (0.994x) |         0         0 (------) | all

So it's slightly worse.  For a 32-bit shell build the regression is less, being 0.998x.

Good things about this patch:

- The modern C++ style is obviously vastly better than the horrible old-fashioned C in jsarena.cpp.

- I can believe that not deallocating chunks immediately upon release() avoids thrashing, though some measurements would be nice.

Other things:

- I think the speed of realloc isn't that important, once codePool/notePool are removed it's only used in jsopcode.cpp for decompiling/disassembling, and doesn't seem to be called that often.

::: js/src/jsarena.h
@@ +55,2 @@
>  {
> +    char *result = (char *) ((uintptr_t(orig) + (align - 1)) & -align);

Using unary minus on an unsigned type feels grotty;  how about |~(align - 1)| instead?

@@ +155,5 @@
> + * - When a release occurs in secondary chunk space, we turn the secondary
> + *   chunk containing the mark into |latest|. This means that...
> + * - Secondary chunks may exist beyond |latest|. Secondary chunks may also
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|

Nit: "which *was* a problem"

@@ +156,5 @@
> + *   chunk containing the mark into |latest|. This means that...
> + * - Secondary chunks may exist beyond |latest|. Secondary chunks may also
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> + *   implementation.

How do you know this was a problem?

@@ +158,5 @@
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> + *   implementation.
> + * - Because we keep secondary chunks around (to avoid thrashing), the
> + *   BumpArena relies on the GC to call |freeUnused|.

Relies on the GC?  That's... surprising.  Looking at the rest of the patch, the only place where freeUnused() is called is in ThreadData::purge(), for tempPool.  What about the other arenas?  It feels like this comment is totally wrong or the design needs some work or something.

@@ +160,5 @@
> + *   implementation.
> + * - Because we keep secondary chunks around (to avoid thrashing), the
> + *   BumpArena relies on the GC to call |freeUnused|.
> + * - |freeUnused| frees down to the starting state, if possible. If there
> + *   are outstanding marks we consider all the space used. If there are no

The sentence containing "consider" is ambiguous, I don't understand it.  Please rephrase?

@@ +173,5 @@
> +        tl::FloorLog2<sAlign>::result == tl::CeilingLog2<sAlign>::result
> +    >::result _;
> +
> +    /* Primary (hot) chunk data. */
> +    uintN       primaryMode;

Surely this should be a bool?  And I like "is" prefixes on bools, i.e. |isPrimaryMode| would be a better name.

@@ +310,5 @@
> +
> +    /* Called when we're not in bump mode. */
> +    JS_NEVER_INLINE
> +    void *allocSlow(size_t n) {
> +        if (!primaryChunk) {

This case only occurs if the arena is empty, is that right?  If so, a JS_ASSERT(empty()) would be helpful here.

@@ +352,5 @@
> +        primaryChunk(NULL), primaryChunkSize(primaryChunkSize), markCount(0) {
> +        JS_ASSERT(RoundUpPow2(primaryChunkSize) == primaryChunkSize);
> +    }
> +
> +    void steal(BumpArena *other) {

This needs a comment, as do all other public methods that lack one.
Comment 16 Chris Leary [:cdleary] (not checking bugmail) 2011-09-06 15:43:53 PDT
(In reply to Nicholas Nethercote [:njn] from comment #15)
> (Oh, I see
> you've increased the size of the tempPool's primary chunk to 4MB.  Well,
> that would make things fit in the primary chunk a lot more, but 4MB is
> excessive.)

The idea was to make basically one arena in the thread-data temp pool (for the main thread) and satisfy *all* requests out of it except for the small percentage of very large scripts (like things that GWT makes). gmail_combined.js actually overflows 4MB, but it's not clear to me whether that's the result of artificially combining all the sources into one file.

I wish I could spare more time to look into it, but I have to move on, so I'm just going to revert back to the all-secondary-chunks approach.

> - I tried to do some measurements, but the patch doesn't compile in the
> browser.

Sorry about that, will fix and push to try before next review.

> - It puts a conditional check in the fast path.  It may well be highly
> predictable but no check would be better.

The basic issue is that you can't satisfy requests in non-LIFO order. But, now that you mention it, you *could* set |bump| to |limit| when you leave |primaryMode| in order to guarantee you don't allocate out of the primary chunk later on. Screws up your "used" count a bit, but for good cause!

> - Once we're out of primary mode, the allocation path looks like this: 
> alloc() calls allocSlow() which calls allocSecondary() which calls
> tryAlloc() which does a bump allocation that is almost identical to the bump
> allocation done by alloc() when in primary mode.  Even if inlining avoids a
> lot of the overhead there it still seems rather strange and unnecessary.

I like small methods. :-) |allocSlow| was shared for the "you've never entered primary mode" and the "you're not in primary mode anymore" cases where you still wanted to allocate some memory.

> Does the primary/secondary split really help?  Maybe I'm missing something,
> but my gut is telling me it can be removed, leaving the code with a
> structure more like the existing jsarena.cpp code -- i.e. only one kind of
> chunk and no modes -- and simpler as a result.

Done! I tried to construct a microbenchmark to point out a large difference and failed: I now realize the crux of the reason that the primary chunk matters so little -- we don't clear the tempPool across top level statements in |compileScript|, because we mix persistent and recycled parse node data structures together in the pool. That's a pretty big one to fix if we want to reduce arena allocation space -- maybe we can get one of the memshrinkers to look at fixing that? I think bug 651448 might get it some of the way there.

> BTW, here are my parsemark measurements for a 64-bit shell build, compiling
> them with compile(snarf(<file>)):

I got significantly different results. I'll post my new results in a more readable format -- if we still disagree, we should discuss our test-running methodologies to figure out where we differ.

> - I think the speed of realloc isn't that important, once codePool/notePool
> are removed it's only used in jsopcode.cpp for decompiling/disassembling,
> and doesn't seem to be called that often.

Yeah, I'm very happy about that! So long as we avoid memcpying all the time (and permit the simple grow case), I think we're good.

> Using unary minus on an unsigned type feels grotty;  how about |~(align -
> 1)| instead?

Sounds good.

> Nit: "which *was* a problem"

Good catch.

> > + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> > + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> > + *   implementation.
> 
> How do you know this was a problem?

Trail stemming from one of my earliest bugs: bug 558971 comment 24.

> Relies on the GC?  That's... surprising.  Looking at the rest of the patch,
> the only place where freeUnused() is called is in ThreadData::purge(), for
> tempPool.  What about the other arenas?  It feels like this comment is
> totally wrong or the design needs some work or something.

The other ones are RAII-destructed or explicitly call |freeAll|. The analysis pool already has its own purging mechanism in |JSCompartment::sweep|. I should extend the comment to include a description of those, I suppose.

> @@ +173,5 @@
> > +        tl::FloorLog2<sAlign>::result == tl::CeilingLog2<sAlign>::result
> > +    >::result _;
> > +
> > +    /* Primary (hot) chunk data. */
> > +    uintN       primaryMode;
> 
> Surely this should be a bool?  And I like "is" prefixes on bools, i.e.
> |isPrimaryMode| would be a better name.

I wanted to make sure it was word sized so I knew exactly the fast path that would be generated by the compiler. I thought we typically reserved |is*| for methods, since |is| is like the boolean action word, but I'm not sure. This is gone in the new patch anyway.

New patch to follow, need to get full browser build going.
Comment 17 Chris Leary [:cdleary] (not checking bugmail) 2011-09-07 14:39:08 PDT
Created attachment 558964 [details] [diff] [review]
WIP: arena rewrite.

Seems okay for full browser build, pushed to try.
Comment 18 Chris Leary [:cdleary] (not checking bugmail) 2011-09-07 14:55:55 PDT
Created attachment 558972 [details]
Valgrind diff.

Looks like ~.5% icount increase on average. The parsemark execution time script doesn't show a slowdown.
Comment 19 Chris Leary [:cdleary] (not checking bugmail) 2011-09-07 15:02:57 PDT
Try push: http://hg.mozilla.org/try/rev/7dee8fb85282
Comment 20 Jim Blandy :jimb 2011-09-07 17:22:25 PDT
It's kind of weird to have BumpArena<8> named QuadArena. What's "quad" about it?

Would "WordArena" be better?
Comment 21 Jim Blandy :jimb 2011-09-07 17:23:21 PDT
Oh, you're thinking of the Intel "Quadword" terminology. That stuff is horrible, especially since JS uses "word" to mean something pointer-sized. Arena64 would be much better.
Comment 22 Chris Leary [:cdleary] (not checking bugmail) 2011-09-07 17:29:45 PDT
(In reply to Jim Blandy :jimb from comment #21)
> Oh, you're thinking of the Intel "Quadword" terminology.

Haha yeah, quad aligned. 8B is also double or js::Value alignment. Arena64 sounds pretty reasonable. We'll make it something different, ByteArena and QuadArena just rolled off the tongue.
Comment 23 Chris Leary [:cdleary] (not checking bugmail) 2011-09-15 10:05:22 PDT
*** Bug 564060 has been marked as a duplicate of this bug. ***
Comment 24 Chris Leary [:cdleary] (not checking bugmail) 2011-09-21 10:31:32 PDT
Created attachment 561511 [details] [diff] [review]
Arena rewrite

Looks good on try.
Comment 25 Luke Wagner [:luke] 2011-09-22 10:36:39 PDT
Comment on attachment 561511 [details] [diff] [review]
Arena rewrite

Review of attachment 561511 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +220,5 @@
>  
>  class nsAutoPoolRelease {
>  public:
> +  nsAutoPoolRelease(js::QuadArena *p, void *m) : mPool(p), mMark(p->mark()) {}
> +  ~nsAutoPoolRelease() { mPool->release(mMark); }

It's crazy that nsJSEnvironment is using js internal arenas.  These are very slow paths so the arena-ness isn't really winning them here.  Can you switch them to use nsAutoTArray (which has inline storage, like js::Vector) instead?  The surrounding code is gross, but the usage of the the arena itself is pretty simple.

@@ +2420,5 @@
>      argCount = 1; // the nsISupports which is not an array
>    }
>  
> +  void *mark = JS_THREAD_DATA(mContext)->tempPool.mark();
> +  jsval *argv = JS_THREAD_DATA(mContext)->tempPool.newArrayUnsafe<jsval>(argCount);

I made the same nit to dmandelin earlier: "unsafe"/"safe" makes the reader worried: what about this is unsafe?  What is the invariant or expectation being broken?  In his case, we switched "safe" for "nullable"; here, I would switch "Unsafe" for "Unintialized".

@@ +2425,2 @@
>    NS_ENSURE_TRUE(argv, NS_ERROR_OUT_OF_MEMORY);
> +  memset(argv, 0, argCount * sizeof(jsval));  /* initialize so GC-able */

MakeValueRangeGCSafe please.

::: js/src/frontend/ParseMaps.cpp
@@ +125,5 @@
>  
>  AtomDeclNode *
>  AtomDecls::allocNode(JSDefinition *defn)
>  {
> +    AtomDeclNode *p = JS_THREAD_DATA(cx)->tempPool.new_<AtomDeclNode>(defn);

How about adding helpers to the JSContext (analogous to JSContext::stack()) for getting cx->compartment->pool and JS_THREAD_DATA(cx)->tempPool?

::: js/src/jsanalyze.cpp
@@ +275,5 @@
>  ScriptAnalysis::analyzeBytecode(JSContext *cx)
>  {
>      JS_ASSERT(cx->compartment->activeAnalysis);
>      JS_ASSERT(!ranBytecode());
> +    QuadArena &pool = cx->compartment->pool;

Previously, "pool" was a "pool" of arenas, not of the allocated memory so "pool" was an internal concept.  "Arena" is also name-encumbered since we already have arenas in GC and the meaning is "fixed-size hunks o' memory".  It seems best, if this code is going to live on and be pretty, that remove all uses of "pool" and "arena" in this code.

So, what to use?  Clearly "stack" sounds nice, but we already have vm/Stack.h, js::StackSpace, etc, so it would be nice to avoid confusion there.  The well-known thing we are implementing here is obstacks, but that is confusing since "objects" don't go on this stack.  How about "lifo alloc", as in, js::LifoAlloc, cx->compartment->typeLifoAlloc, JS_THREAD_DATA(cx)->tempLifoAlloc, AutoLifoAlloc, jslifoalloc.h?

On the subject of jslifoalloc.h.  I wonder if we shouldn't start a new js/src/ds directory where we can put Vector.h, HashTable.h, and now LifoAlloc.h.

::: js/src/jsarena.h
@@ -41,5 @@
>  #define jsarena_h___
> -/*
> - * Lifetime-based fast allocation, inspired by much prior art, including
> - * "Fast Allocation and Deallocation of Memory Based on Object Lifetimes"
> - * David R. Hanson, Software -- Practice and Experience, Vol. 20(1).

Can we keep the comment?

@@ +58,5 @@
>  }
>  
> +/* Header for a chunk of memory wrangled by the BumpArena. */
> +template <size_t sAlign>
> +class BumpChunk

IIUC, everywhere uses either sAlign = 1 or 8, but the only sites using 1 are random decompiler/dump stuff.  Can you just monomorphize everything so that sAlign is always 8?  Then we can stop mentioning "Byte" and "Quad" at all and just have (8-byte aligned) "lifo alloc".

@@ +182,5 @@
> +     *
> +     * Side effect: if retval is non-null, |first| and |latest| are initialized
> +     * appropriately.
> +     */
> +    BumpChunk *getOrCreateChunk(size_t n) {

Once un-templatized, could you move this and the other cold allocation paths to a .cpp file?

@@ +387,5 @@
> +    /* Doesn't perform construction; useful for lazily-initialized POD types. */
> +    template <typename T>
> +    JS_ALWAYS_INLINE
> +    T *newStruct() {
> +        return static_cast<T *>(alloc(sizeof(T)));

Can this be newPod?

@@ +392,5 @@
> +    }
> +
> +    template <typename T>
> +    JS_ALWAYS_INLINE
> +    T *new_() {

Can all these be generated by JS_DELARE_NEW_METHODS?

::: js/src/jsinfer.cpp
@@ +3251,5 @@
>      unsigned defCount = GetDefCount(script, offset);
>      if (ExtendedDef(pc))
>          defCount++;
>  
> +    TypeSet *pushed = cx->compartment->pool.newArrayUnsafe<TypeSet>(defCount);

Similar "unsafe" point.

::: js/src/jsopcode.cpp
@@ +700,5 @@
>      base = sp->base;
> +    if (!base)
> +        base = static_cast<char *>(sp->pool->alloc(nb));
> +    else
> +        base = static_cast<char *>(sp->pool->realloc_(base, sp->size, nb));

Can you use newArrayUnitialized instead of cast?

::: js/src/jstl.h
@@ +172,5 @@
> +template <> struct IsPodType<js::analyze::LifetimeVariable> { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::LoopAnalysis>     { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SlotValue>      { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SSAValue>         { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SSAUseChain>      { static const bool result = true; };

It seems like these specializations should go right after the definitions of these types, not in the general header.

::: mfbt/Util.h
@@ +182,5 @@
>      DebugOnly(const T&) {}
>      DebugOnly& operator=(const T&) { return *this; }
>      void operator++(int) {}
>      void operator--(int) {}
> +    bool operator<(const T&) { return false; }

Should this also be on the defined(DEBUG) path?
Comment 26 Chris Leary [:cdleary] (not checking bugmail) 2011-09-22 15:12:22 PDT
Created attachment 561901 [details] [diff] [review]
No pool in nsJSEnvironment.

Looks okay on xpcshelltests so far, pushing to try now.
Comment 27 Chris Leary [:cdleary] (not checking bugmail) 2011-09-23 16:26:20 PDT
Created attachment 562185 [details] [diff] [review]
Arena rewrite

I think this addresses everything from comment 25, except for the DebugOnly -- shouldn't that be okay by virtue of the:

    operator T&() { return value; }
    operator const T&() const { return value; }

If not, I'll fix before commit, but it seems to work fine in opt.
Comment 28 Chris Leary [:cdleary] (not checking bugmail) 2011-09-23 16:29:44 PDT
Created attachment 562186 [details] [diff] [review]
Arena rewrite

Sorry, was the wrong version.
Comment 29 Chris Leary [:cdleary] (not checking bugmail) 2011-09-23 16:35:39 PDT
I also filed bug 688891 to remove the use of "pool" in the Sprinter, so I didn't bother renaming those legacy uses.
Comment 30 Luke Wagner [:luke] 2011-09-26 10:35:13 PDT
Comment on attachment 562186 [details] [diff] [review]
Arena rewrite

Review of attachment 562186 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great; big improvement to the codebase.  About to r+, but I'd like to see what you think of the idea in the comment.

::: js/src/jsarena.h
@@ +58,5 @@
> +namespace detail {
> +
> +JS_ALWAYS_INLINE
> +char *
> +AlignPtr(void *orig, size_t align)

Can unparameterize and inline LIFO_ALLOC_ALIGN now.

More generally: if we have a single static alignment and are always allocating in units of that alignment, then we wouldn't need the AlignPtr at all.  Now, the interface takes an allocation size in bytes, so this would require rounding up the allocation size and we've gained nothing.  However, it seems like we often know the allocation size statically (as its the type arg to new_).  Perhaps you could have newArray and JS_DECLARE_NEW_METHODS both call a allocWithRoundedSize that skips the rounding?

@@ +199,5 @@
> +
> +    /* Steal allocated chunks from |other|. */
> +    void steal(LifoAlloc *other) {
> +        JS_ASSERT(!other->markCount);
> +        PodCopy((char *) this, (char *) other, sizeof(*this));

Once you strip the type, you might as well just use memcpy :)

::: js/src/jsapi.cpp
@@ +98,5 @@
>  #include "assembler/wtf/Platform.h"
>  
>  #include "vm/Stack-inl.h"
>  #include "vm/String-inl.h"
> +#include "ds/LifoAlloc.h"

This little paragraph is for -inl.h files.  I canonical order is, I believe (1) js*.h, (2) */*.h (3) js*inlines.h (4) */*-inl.h with newlines to separate.  (That means assembler/wtf/Platform.h is also misplaced.)

::: mfbt/Util.h
@@ +182,5 @@
>      DebugOnly(const T&) {}
>      DebugOnly& operator=(const T&) { return *this; }
>      void operator++(int) {}
>      void operator--(int) {}
> +    bool operator<(const T&) { return false; }

You explained why its valid to only have this in the !defined(DEBUG) branch, but it would still be pleasant, just for symmetry, to put operator< in the defined(DEBUG) branch (unless it causes an ambiguity?  I hope not...).
Comment 31 Luke Wagner [:luke] 2011-09-26 13:59:35 PDT
Comment on attachment 562186 [details] [diff] [review]
Arena rewrite

Review of attachment 562186 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsarena.h
@@ +222,5 @@
> +
> +        if (!getOrCreateChunk(n))
> +            return NULL;
> +
> +        JS_ASSERT(latest && latest->canAlloc(n));

Don't need to assert either conjunct, allocInfallible will take care of it.

@@ +232,5 @@
> +        void *mem = alloc(sizeof(T) * count);
> +        if (!mem)
> +            return NULL;
> +        JS_STATIC_ASSERT(tl::IsPodType<T>::result);
> +        return (T *) mem;

Shouldn't this be returning void*?  Otherwise, what's the point of having newArrayUninitialized?

@@ +251,5 @@
> +        return latest ? latest->mark() : NULL;
> +    }
> +
> +    void release(void *mark) {
> +        markCount--;

I don't see any calls to BumpChunk::delete_ on the release path...

Also, ds/LifoAlloc.h isn't included in the patch.

@@ +271,5 @@
> +            if (container->contains(mark))
> +                break;
> +            JS_ASSERT(container != latest);
> +            container = container->next();
> +        }

I would have assumed that links pointed the other direction.  For release(), one would assume that 'mark' is, in the common case, either in latest or latest->prev.  It seems like this, other than 'used', this is the only real use of 'next'; can we just switch it to 'prev'?

@@ +311,5 @@
> +
> +  public:
> +    explicit LifoAllocScope(LifoAlloc *lifoAlloc
> +                            JS_GUARD_OBJECT_NOTIFIER_PARAM)
> +      : lifoAlloc(lifoAlloc), mark(lifoAlloc->mark()), shouldRelease(true) {

I find it nice when the effect (the call to mark()) is in the ctor body, not hidden in the initializer list, but that's a matter of personal style.
Comment 32 Chris Leary [:cdleary] (not checking bugmail) 2011-09-26 14:45:58 PDT
Created attachment 562541 [details] [diff] [review]
Arena rewrite.

Include forgotten .cpp file.
Comment 33 Luke Wagner [:luke] 2011-09-26 15:40:39 PDT
Comment on attachment 562541 [details] [diff] [review]
Arena rewrite.

Review of attachment 562541 [details] [diff] [review]:
-----------------------------------------------------------------

Again, nice work.

::: js/src/ds/LifoAlloc.cpp
@@ +90,5 @@
> +        while (latest->next()) {
> +            latest = latest->next();
> +            latest->resetBump(); /* This was an unused BumpChunk on the chain. */
> +            if (latest->canAlloc(n)) {
> +                //fprintf(stderr, "Found an existing secondary to satisfy size %u\n", unsigned(n));

rm fprintf.

@@ +108,5 @@
> +        return NULL;
> +    if (!first) {
> +        latest = first = newChunk;
> +    } else {
> +        JS_ASSERT(latest);

can haz && !latest->next() ?
Comment 34 Chris Leary [:cdleary] (not checking bugmail) 2011-09-26 15:51:33 PDT
Created attachment 562563 [details] [diff] [review]
Arena rewrite.

Carries luke's r+, needed a small build tweak to get the header exported to XPConnect.
Comment 35 Chris Leary [:cdleary] (not checking bugmail) 2011-09-26 16:51:37 PDT
Grr, burned the tree. Not similar enough to my previously green try runs, apparently.
Comment 36 Chris Leary [:cdleary] (not checking bugmail) 2011-09-28 02:05:56 PDT
Got it green on try again, pushed to inbound.
Comment 37 Chris Leary [:cdleary] (not checking bugmail) 2011-09-28 13:16:25 PDT
This regressed MaxHeap in the Trace Malloc talos graph by 7.65% on x64 -- I just figured out how to run that locally, so I'm investigating now.
Comment 38 Michael Wu [:mwu] 2011-09-29 01:24:43 PDT
https://hg.mozilla.org/mozilla-central/rev/d709c25c0e1f
Comment 39 Michael Wu [:mwu] 2011-09-29 01:25:25 PDT
Uh, and also https://hg.mozilla.org/mozilla-central/rev/4d10127fd106
Comment 40 :Ms2ger 2011-09-29 03:09:28 PDT
Comment on attachment 562563 [details] [diff] [review]
Arena rewrite.

>diff --git a/js/src/ds/LifoAlloc.cpp b/js/src/ds/LifoAlloc.cpp
>new file mode 100644
>--- /dev/null
>+++ b/js/src/ds/LifoAlloc.cpp
>@@ -0,0 +1,149 @@
>+#include "LifoAlloc.h"

Missing license boilerplate. Fix ASAP.

>diff --git a/js/src/jsarena.h b/js/src/ds/LifoAlloc.h
>rename from js/src/jsarena.h
>rename to js/src/ds/LifoAlloc.h
>--- a/js/src/jsarena.h
>+++ b/js/src/ds/LifoAlloc.h
>  * The Initial Developer of the Original Code is
>- * Netscape Communications Corporation.
>- * Portions created by the Initial Developer are Copyright (C) 1998
>- * the Initial Developer. All Rights Reserved.
>+ *   the Mozilla Corporation.

This is not true. The ID is the Mozilla *Foundation*. Also, removing "Portions created by..." is not allowed. Fix ASAP as well.

>--- a/js/src/jsapi.cpp
>+++ b/js/src/jsapi.cpp
>@@ -95,16 +94,17 @@
> #include "jsobjinlines.h"
> #include "jsscopeinlines.h"
> #include "jsregexpinlines.h"
> #include "jsscriptinlines.h"
> #include "assembler/wtf/Platform.h"
> 
> #include "vm/Stack-inl.h"
> #include "vm/String-inl.h"
>+#include "ds/LifoAlloc.h"

This should be before the inlines.
Comment 41 Jason Orendorff [:jorendorff] 2011-09-29 11:44:54 PDT
> Missing license boilerplate. Fix ASAP.

Please be as polite in bugzilla as you would be anywhere else. Terse imperatives don't come across as nice. Thanks for the comments, though.
Comment 42 Chris Leary [:cdleary] (not checking bugmail) 2011-09-29 11:58:47 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/b16c327f7878
Comment 43 Chris Leary [:cdleary] (not checking bugmail) 2011-09-29 19:57:19 PDT
Created attachment 563639 [details]
Arena comparison for leak test.

So, as it turns out, the increase is expected behavior. The temp pool hits two spikes, one around .2s and another around .4s. We avoid thrashing for that second spike, but at the expense of holding the memory from .4s to about 1.5s, during which the type information is also growing, leading to a higher peak usage.

To summarize, we avoid thrashing alloc/frees in exchange for a potential short-term max heap size increase. Perhaps we should make a followup to get this to play into the GC heuristics (if they're sane at this point)? I'll at least CC and ping Bill.
Comment 44 Chris Leary [:cdleary] (not checking bugmail) 2011-09-29 19:58:31 PDT
Whoops, the upper figure has the title "temp" when it should be "allocs-ng.dat" -- it represents the new LifoAlloc trends on the same workload.
Comment 45 Nicholas Nethercote [:njn] 2011-10-16 21:44:16 PDT
*** Bug 675150 has been marked as a duplicate of this bug. ***
Comment 46 Nicholas Nethercote [:njn] 2011-10-20 20:21:56 PDT
I just found that js/src/yarr/pcre/pcre_exec.cpp has various remaining references to JSArenaPool.  Is that file dead?
Comment 47 Ed Morley [:emorley] 2011-10-21 09:48:27 PDT
(In reply to Nicholas Nethercote [:njn] from comment #46)
> I just found that js/src/yarr/pcre/pcre_exec.cpp has various remaining
> references to JSArenaPool.  Is that file dead?

I believe it's still being used by the OS X PowerPC + OpenBSD community maintained ports (and any others that don't have YARR JIT support), as of bug 684559.
Comment 48 Cameron Kaiser [:spectre] 2011-10-21 20:06:07 PDT
Yes, we're using it still. I'm trying to decide whether to fix this to match, or give up and go back to the slower YARR interpreter.
Comment 49 Cameron Kaiser [:spectre] 2011-10-21 20:22:37 PDT
(however, it looks pretty straightforward to convert)
Comment 50 Chris Leary [:cdleary] (not checking bugmail) 2011-11-16 16:24:15 PST
*** Bug 521328 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.