Closed Bug 684039 Opened 13 years ago Closed 13 years ago

Land jsarena rewrite for realsies

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: cdleary, Assigned: cdleary)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 10 obsolete files)

7.56 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
151.84 KB, patch
Details | Diff | Splinter Review
187.15 KB, image/png
Details
Attached patch WIP: arena rewrite. (obsolete) — Splinter Review
This patch rewrites all of JSArenaPool and has nice results on Valgrind-based parsemark icount differentials. (Soon to be attached.) Just need to check whether the analysis pool and temp pool are easily consolidated into the compartment and need to make it use the proper OffTheBooks accounting instead of js_malloc.
Attached file Valgrind diff results. (obsolete) —
Is this something that would makes sense to do for PLArenaPool too (as in, does it make sense for this to go in mfbt/)?
(In reply to Boris Zbarsky (:bz) from comment #2)
> Is this something that would makes sense to do for PLArenaPool too (as in,
> does it make sense for this to go in mfbt/)?

This relies on us clearing the unused parts of the pool on JS GC to avoid thrashing chunk allocs/frees, which was one of the major annoyances with the prior implementation. Would you have a mechanism to do similar from mfbt/ space?
Brian says that TI data persists until GC, when the analysis data and dependent mjit code get purged.

This means consolidating is not an option, because parsing wants to treat the arenas like a LIFO stack -- parsing could try to pop data from an analysis that it triggered, which would cause it to be clobbered.

I'll just move the tempPool to the compartment.
Chris, I don't think there's anything in mfbt to do that at the moment.  So let's not worry about generalizing this for now.
Brian also pointed out that the "large primary chunk" assumption may not be able to hold up if a lot of compartments spring up and parse a little bit in a short period of time -- you end up with ([large primary chunk size] x number of parsing compartments). I'll cut down the primary chunk size and see what happens -- maybe a large primary chunk size will have to wait for single-threaded JSRuntime. Unless we can stick it in JSThreadData? I'll have a chat with luke.
Can you summarize the changes and where the improvements come from?
Also, I was planning to re-land the parts of bug 675150 that fix the clownshoes over-allocations (but not the parts that greatly increase various arena sizes).  But maybe I shouldn't -- does your patch guarantee that all allocations done by the arena pool are powers-of-two?
(In reply to Nicholas Nethercote [:njn] from comment #8)
> But maybe I shouldn't -- does your patch guarantee that all
> allocations done by the arena pool are powers-of-two?

Yes. The header for the secondary chunks is subtracted off internally.

Testing the JSThreadData based temp pool now, should have a new patch soon.
(In reply to Nicholas Nethercote [:njn] from comment #7)
> Can you summarize the changes and where the improvements come from?

Avoiding thrashing seems to be the key. Callgrind shows that, with the same chunk sizing, my impl only makes 3,125 calls to malloc, whereas the prior impl made 14,377. The larger chunk sizing probably also emphasizes the speed of the primary chunk bump allocation path. I'll post a more complete description of the design with the next patch.
(In reply to Chris Leary [:cdleary] from comment #10)
> I'll post a more complete
> description of the design with the next patch.

Great.  I had a quick look but I haven't yet grokked the difference between primary and secondary chunks.
(In reply to Chris Leary [:cdleary] from comment #10)
> 
> Avoiding thrashing seems to be the key. Callgrind shows that, with the same
> chunk sizing, my impl only makes 3,125 calls to malloc, whereas the prior
> impl made 14,377.

With the same chunk sizing you make many fewer calls to malloc?  I'm puzzled!
I had some kind of strange stale build issue with the last Valgrind results, which makes sense, because that many frees/mallocs (mentioned in the above comment) shouldn't account for that much perf improvement.

Attached is the best perf improvement I can get using pure bump allocation and a huge single primary chunk. 1.15% icount improvement on gmail_combined in parsemark. New patch and design details to follow.
Attachment #557644 - Attachment is obsolete: true
Attached patch WIP: arena rewrite. (obsolete) — Splinter Review
I put a comment at the top of BumpArena that explains the lifecycle and reasons behind the |freeUnused| style deallocation policy. I'll probably rebase this on top of Nick's code/notePool changes.

Nick, let me know if this is unclear.
Attachment #557637 - Attachment is obsolete: true
Attachment #558011 - Flags: feedback?(nnethercote)
Comment on attachment 558011 [details] [diff] [review]
WIP: arena rewrite.

Review of attachment 558011 [details] [diff] [review]:
-----------------------------------------------------------------

I've only looked as jsarena.h so far.  The main thing so far is that I'm really surprised by the primary/secondary split;  it introduces a bunch of complexity and slowness and I can't see the benefits of it.  Some specifics about this:

- The primary/secondary split seems to be based on the assumption that the majority of arena allocations can be satisfied out of the first chunk.  My experience is that this is not true, especially for tempPool which is the hottest arena and can contain hundreds or thousands of chunks.  (Oh, I see you've increased the size of the tempPool's primary chunk to 4MB.  Well, that would make things fit in the primary chunk a lot more, but 4MB is excessive.)

- I tried to do some measurements, but the patch doesn't compile in the browser.  First, BumpArena::realloc clashes with the realloc macro from mozalloc.h.  Second, I'm getting compile errors because jsdebug.c (which is C) ends up including jsarena.h (which is C++) through this sequence:

                 from ../../dist/include/jsarena.h:43:0,
                 from ../../dist/include/jsopcode.h:49,
                 from ../../dist/include/jsdbgapi.h:48,
                 from /home/njn/moz/mi6/js/jsd/jsdebug.h:58,
                 from /home/njn/moz/mi6/js/jsd/jsd.h:81,
                 from /home/njn/moz/mi6/js/jsd/jsdebug.c:42:

- It puts a conditional check in the fast path.  It may well be highly predictable but no check would be better.

- Once we're out of primary mode, the allocation path looks like this:  alloc() calls allocSlow() which calls allocSecondary() which calls tryAlloc() which does a bump allocation that is almost identical to the bump allocation done by alloc() when in primary mode.  Even if inlining avoids a lot of the overhead there it still seems rather strange and unnecessary.

Does the primary/secondary split really help?  Maybe I'm missing something, but my gut is telling me it can be removed, leaving the code with a structure more like the existing jsarena.cpp code -- i.e. only one kind of chunk and no modes -- and simpler as a result.

BTW, here are my parsemark measurements for a 64-bit shell build, compiling them with compile(snarf(<file>)):

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    16.395    16.359 (1.002x) |         0         0 (------) | 280slides-comb
|    67.108    67.223 (0.998x) |         0         0 (------) | ball-pool-comb
|    33.234    33.239 (------) |         0         0 (------) | dojo
|    65.634    65.527 (1.002x) |         0         0 (------) | effectgames-ga
|    14.703    14.672 (1.002x) |         0         0 (------) | ga
|   370.869   374.150 (0.991x) |         0         0 (------) | gmail-combined
|   103.846   104.483 (0.994x) |         0         0 (------) | gravity-combin
|    23.833    23.836 (------) |         0         0 (------) | jquery-min
|    44.015    43.657 (1.008x) |         0         0 (------) | jsgb-combined
|    64.602    64.435 (1.003x) |         0         0 (------) | mochikit
|   311.495   314.466 (0.991x) |         0         0 (------) | pipio-combined
|    13.016    12.939 (1.006x) |         0         0 (------) | processing-twi
|    54.694    54.670 (------) |         0         0 (------) | sunspider-comb
|    19.639    19.592 (1.002x) |         0         0 (------) | tetris-combine
|    81.252    81.117 (1.002x) |         0         0 (------) | twitter-combin
|    84.036    83.518 (1.006x) |         0         0 (------) | v8-combined
|     9.387     9.349 (1.004x) |         0         0 (------) | yui-min
|   501.731   507.744 (0.988x) |         0         0 (------) | zimbra-combine
-------
|  1879.498  1890.985 (0.994x) |         0         0 (------) | all

So it's slightly worse.  For a 32-bit shell build the regression is less, being 0.998x.

Good things about this patch:

- The modern C++ style is obviously vastly better than the horrible old-fashioned C in jsarena.cpp.

- I can believe that not deallocating chunks immediately upon release() avoids thrashing, though some measurements would be nice.

Other things:

- I think the speed of realloc isn't that important, once codePool/notePool are removed it's only used in jsopcode.cpp for decompiling/disassembling, and doesn't seem to be called that often.

::: js/src/jsarena.h
@@ +55,2 @@
>  {
> +    char *result = (char *) ((uintptr_t(orig) + (align - 1)) & -align);

Using unary minus on an unsigned type feels grotty;  how about |~(align - 1)| instead?

@@ +155,5 @@
> + * - When a release occurs in secondary chunk space, we turn the secondary
> + *   chunk containing the mark into |latest|. This means that...
> + * - Secondary chunks may exist beyond |latest|. Secondary chunks may also
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|

Nit: "which *was* a problem"

@@ +156,5 @@
> + *   chunk containing the mark into |latest|. This means that...
> + * - Secondary chunks may exist beyond |latest|. Secondary chunks may also
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> + *   implementation.

How do you know this was a problem?

@@ +158,5 @@
> + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> + *   implementation.
> + * - Because we keep secondary chunks around (to avoid thrashing), the
> + *   BumpArena relies on the GC to call |freeUnused|.

Relies on the GC?  That's... surprising.  Looking at the rest of the patch, the only place where freeUnused() is called is in ThreadData::purge(), for tempPool.  What about the other arenas?  It feels like this comment is totally wrong or the design needs some work or something.

@@ +160,5 @@
> + *   implementation.
> + * - Because we keep secondary chunks around (to avoid thrashing), the
> + *   BumpArena relies on the GC to call |freeUnused|.
> + * - |freeUnused| frees down to the starting state, if possible. If there
> + *   are outstanding marks we consider all the space used. If there are no

The sentence containing "consider" is ambiguous, I don't understand it.  Please rephrase?

@@ +173,5 @@
> +        tl::FloorLog2<sAlign>::result == tl::CeilingLog2<sAlign>::result
> +    >::result _;
> +
> +    /* Primary (hot) chunk data. */
> +    uintN       primaryMode;

Surely this should be a bool?  And I like "is" prefixes on bools, i.e. |isPrimaryMode| would be a better name.

@@ +310,5 @@
> +
> +    /* Called when we're not in bump mode. */
> +    JS_NEVER_INLINE
> +    void *allocSlow(size_t n) {
> +        if (!primaryChunk) {

This case only occurs if the arena is empty, is that right?  If so, a JS_ASSERT(empty()) would be helpful here.

@@ +352,5 @@
> +        primaryChunk(NULL), primaryChunkSize(primaryChunkSize), markCount(0) {
> +        JS_ASSERT(RoundUpPow2(primaryChunkSize) == primaryChunkSize);
> +    }
> +
> +    void steal(BumpArena *other) {

This needs a comment, as do all other public methods that lack one.
Attachment #558011 - Flags: feedback?(nnethercote) → feedback-
(In reply to Nicholas Nethercote [:njn] from comment #15)
> (Oh, I see
> you've increased the size of the tempPool's primary chunk to 4MB.  Well,
> that would make things fit in the primary chunk a lot more, but 4MB is
> excessive.)

The idea was to make basically one arena in the thread-data temp pool (for the main thread) and satisfy *all* requests out of it except for the small percentage of very large scripts (like things that GWT makes). gmail_combined.js actually overflows 4MB, but it's not clear to me whether that's the result of artificially combining all the sources into one file.

I wish I could spare more time to look into it, but I have to move on, so I'm just going to revert back to the all-secondary-chunks approach.

> - I tried to do some measurements, but the patch doesn't compile in the
> browser.

Sorry about that, will fix and push to try before next review.

> - It puts a conditional check in the fast path.  It may well be highly
> predictable but no check would be better.

The basic issue is that you can't satisfy requests in non-LIFO order. But, now that you mention it, you *could* set |bump| to |limit| when you leave |primaryMode| in order to guarantee you don't allocate out of the primary chunk later on. Screws up your "used" count a bit, but for good cause!

> - Once we're out of primary mode, the allocation path looks like this: 
> alloc() calls allocSlow() which calls allocSecondary() which calls
> tryAlloc() which does a bump allocation that is almost identical to the bump
> allocation done by alloc() when in primary mode.  Even if inlining avoids a
> lot of the overhead there it still seems rather strange and unnecessary.

I like small methods. :-) |allocSlow| was shared for the "you've never entered primary mode" and the "you're not in primary mode anymore" cases where you still wanted to allocate some memory.

> Does the primary/secondary split really help?  Maybe I'm missing something,
> but my gut is telling me it can be removed, leaving the code with a
> structure more like the existing jsarena.cpp code -- i.e. only one kind of
> chunk and no modes -- and simpler as a result.

Done! I tried to construct a microbenchmark to point out a large difference and failed: I now realize the crux of the reason that the primary chunk matters so little -- we don't clear the tempPool across top level statements in |compileScript|, because we mix persistent and recycled parse node data structures together in the pool. That's a pretty big one to fix if we want to reduce arena allocation space -- maybe we can get one of the memshrinkers to look at fixing that? I think bug 651448 might get it some of the way there.

> BTW, here are my parsemark measurements for a 64-bit shell build, compiling
> them with compile(snarf(<file>)):

I got significantly different results. I'll post my new results in a more readable format -- if we still disagree, we should discuss our test-running methodologies to figure out where we differ.

> - I think the speed of realloc isn't that important, once codePool/notePool
> are removed it's only used in jsopcode.cpp for decompiling/disassembling,
> and doesn't seem to be called that often.

Yeah, I'm very happy about that! So long as we avoid memcpying all the time (and permit the simple grow case), I think we're good.

> Using unary minus on an unsigned type feels grotty;  how about |~(align -
> 1)| instead?

Sounds good.

> Nit: "which *was* a problem"

Good catch.

> > + *   exist when in |primaryMode|. This avoids thrashing malloc/frees for
> > + *   secondary chunk space, which a problem for the historical |JSArenaPool|
> > + *   implementation.
> 
> How do you know this was a problem?

Trail stemming from one of my earliest bugs: bug 558971 comment 24.

> Relies on the GC?  That's... surprising.  Looking at the rest of the patch,
> the only place where freeUnused() is called is in ThreadData::purge(), for
> tempPool.  What about the other arenas?  It feels like this comment is
> totally wrong or the design needs some work or something.

The other ones are RAII-destructed or explicitly call |freeAll|. The analysis pool already has its own purging mechanism in |JSCompartment::sweep|. I should extend the comment to include a description of those, I suppose.

> @@ +173,5 @@
> > +        tl::FloorLog2<sAlign>::result == tl::CeilingLog2<sAlign>::result
> > +    >::result _;
> > +
> > +    /* Primary (hot) chunk data. */
> > +    uintN       primaryMode;
> 
> Surely this should be a bool?  And I like "is" prefixes on bools, i.e.
> |isPrimaryMode| would be a better name.

I wanted to make sure it was word sized so I knew exactly the fast path that would be generated by the compiler. I thought we typically reserved |is*| for methods, since |is| is like the boolean action word, but I'm not sure. This is gone in the new patch anyway.

New patch to follow, need to get full browser build going.
Attached patch WIP: arena rewrite. (obsolete) — Splinter Review
Seems okay for full browser build, pushed to try.
Attachment #558011 - Attachment is obsolete: true
Attached file Valgrind diff. (obsolete) —
Looks like ~.5% icount increase on average. The parsemark execution time script doesn't show a slowdown.
Attachment #557992 - Attachment is obsolete: true
It's kind of weird to have BumpArena<8> named QuadArena. What's "quad" about it?

Would "WordArena" be better?
Oh, you're thinking of the Intel "Quadword" terminology. That stuff is horrible, especially since JS uses "word" to mean something pointer-sized. Arena64 would be much better.
(In reply to Jim Blandy :jimb from comment #21)
> Oh, you're thinking of the Intel "Quadword" terminology.

Haha yeah, quad aligned. 8B is also double or js::Value alignment. Arena64 sounds pretty reasonable. We'll make it something different, ByteArena and QuadArena just rolled off the tongue.
Attached patch Arena rewrite (obsolete) — Splinter Review
Looks good on try.
Attachment #558964 - Attachment is obsolete: true
Attachment #558972 - Attachment is obsolete: true
Attachment #561511 - Flags: review?(luke)
Comment on attachment 561511 [details] [diff] [review]
Arena rewrite

Review of attachment 561511 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +220,5 @@
>  
>  class nsAutoPoolRelease {
>  public:
> +  nsAutoPoolRelease(js::QuadArena *p, void *m) : mPool(p), mMark(p->mark()) {}
> +  ~nsAutoPoolRelease() { mPool->release(mMark); }

It's crazy that nsJSEnvironment is using js internal arenas.  These are very slow paths so the arena-ness isn't really winning them here.  Can you switch them to use nsAutoTArray (which has inline storage, like js::Vector) instead?  The surrounding code is gross, but the usage of the the arena itself is pretty simple.

@@ +2420,5 @@
>      argCount = 1; // the nsISupports which is not an array
>    }
>  
> +  void *mark = JS_THREAD_DATA(mContext)->tempPool.mark();
> +  jsval *argv = JS_THREAD_DATA(mContext)->tempPool.newArrayUnsafe<jsval>(argCount);

I made the same nit to dmandelin earlier: "unsafe"/"safe" makes the reader worried: what about this is unsafe?  What is the invariant or expectation being broken?  In his case, we switched "safe" for "nullable"; here, I would switch "Unsafe" for "Unintialized".

@@ +2425,2 @@
>    NS_ENSURE_TRUE(argv, NS_ERROR_OUT_OF_MEMORY);
> +  memset(argv, 0, argCount * sizeof(jsval));  /* initialize so GC-able */

MakeValueRangeGCSafe please.

::: js/src/frontend/ParseMaps.cpp
@@ +125,5 @@
>  
>  AtomDeclNode *
>  AtomDecls::allocNode(JSDefinition *defn)
>  {
> +    AtomDeclNode *p = JS_THREAD_DATA(cx)->tempPool.new_<AtomDeclNode>(defn);

How about adding helpers to the JSContext (analogous to JSContext::stack()) for getting cx->compartment->pool and JS_THREAD_DATA(cx)->tempPool?

::: js/src/jsanalyze.cpp
@@ +275,5 @@
>  ScriptAnalysis::analyzeBytecode(JSContext *cx)
>  {
>      JS_ASSERT(cx->compartment->activeAnalysis);
>      JS_ASSERT(!ranBytecode());
> +    QuadArena &pool = cx->compartment->pool;

Previously, "pool" was a "pool" of arenas, not of the allocated memory so "pool" was an internal concept.  "Arena" is also name-encumbered since we already have arenas in GC and the meaning is "fixed-size hunks o' memory".  It seems best, if this code is going to live on and be pretty, that remove all uses of "pool" and "arena" in this code.

So, what to use?  Clearly "stack" sounds nice, but we already have vm/Stack.h, js::StackSpace, etc, so it would be nice to avoid confusion there.  The well-known thing we are implementing here is obstacks, but that is confusing since "objects" don't go on this stack.  How about "lifo alloc", as in, js::LifoAlloc, cx->compartment->typeLifoAlloc, JS_THREAD_DATA(cx)->tempLifoAlloc, AutoLifoAlloc, jslifoalloc.h?

On the subject of jslifoalloc.h.  I wonder if we shouldn't start a new js/src/ds directory where we can put Vector.h, HashTable.h, and now LifoAlloc.h.

::: js/src/jsarena.h
@@ -41,5 @@
>  #define jsarena_h___
> -/*
> - * Lifetime-based fast allocation, inspired by much prior art, including
> - * "Fast Allocation and Deallocation of Memory Based on Object Lifetimes"
> - * David R. Hanson, Software -- Practice and Experience, Vol. 20(1).

Can we keep the comment?

@@ +58,5 @@
>  }
>  
> +/* Header for a chunk of memory wrangled by the BumpArena. */
> +template <size_t sAlign>
> +class BumpChunk

IIUC, everywhere uses either sAlign = 1 or 8, but the only sites using 1 are random decompiler/dump stuff.  Can you just monomorphize everything so that sAlign is always 8?  Then we can stop mentioning "Byte" and "Quad" at all and just have (8-byte aligned) "lifo alloc".

@@ +182,5 @@
> +     *
> +     * Side effect: if retval is non-null, |first| and |latest| are initialized
> +     * appropriately.
> +     */
> +    BumpChunk *getOrCreateChunk(size_t n) {

Once un-templatized, could you move this and the other cold allocation paths to a .cpp file?

@@ +387,5 @@
> +    /* Doesn't perform construction; useful for lazily-initialized POD types. */
> +    template <typename T>
> +    JS_ALWAYS_INLINE
> +    T *newStruct() {
> +        return static_cast<T *>(alloc(sizeof(T)));

Can this be newPod?

@@ +392,5 @@
> +    }
> +
> +    template <typename T>
> +    JS_ALWAYS_INLINE
> +    T *new_() {

Can all these be generated by JS_DELARE_NEW_METHODS?

::: js/src/jsinfer.cpp
@@ +3251,5 @@
>      unsigned defCount = GetDefCount(script, offset);
>      if (ExtendedDef(pc))
>          defCount++;
>  
> +    TypeSet *pushed = cx->compartment->pool.newArrayUnsafe<TypeSet>(defCount);

Similar "unsafe" point.

::: js/src/jsopcode.cpp
@@ +700,5 @@
>      base = sp->base;
> +    if (!base)
> +        base = static_cast<char *>(sp->pool->alloc(nb));
> +    else
> +        base = static_cast<char *>(sp->pool->realloc_(base, sp->size, nb));

Can you use newArrayUnitialized instead of cast?

::: js/src/jstl.h
@@ +172,5 @@
> +template <> struct IsPodType<js::analyze::LifetimeVariable> { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::LoopAnalysis>     { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SlotValue>      { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SSAValue>         { static const bool result = true; };
> +template <> struct IsPodType<js::analyze::SSAUseChain>      { static const bool result = true; };

It seems like these specializations should go right after the definitions of these types, not in the general header.

::: mfbt/Util.h
@@ +182,5 @@
>      DebugOnly(const T&) {}
>      DebugOnly& operator=(const T&) { return *this; }
>      void operator++(int) {}
>      void operator--(int) {}
> +    bool operator<(const T&) { return false; }

Should this also be on the defined(DEBUG) path?
Attachment #561511 - Flags: review?(luke)
Looks okay on xpcshelltests so far, pushing to try now.
Attachment #561901 - Flags: review?(mrbkap)
Attachment #561901 - Flags: review?(mrbkap) → review+
Attached patch Arena rewrite (obsolete) — Splinter Review
I think this addresses everything from comment 25, except for the DebugOnly -- shouldn't that be okay by virtue of the:

    operator T&() { return value; }
    operator const T&() const { return value; }

If not, I'll fix before commit, but it seems to work fine in opt.
Attachment #561511 - Attachment is obsolete: true
Attachment #562185 - Flags: review?(luke)
Attached patch Arena rewrite (obsolete) — Splinter Review
Sorry, was the wrong version.
Attachment #562185 - Attachment is obsolete: true
Attachment #562185 - Flags: review?(luke)
Attachment #562186 - Flags: review?(luke)
I also filed bug 688891 to remove the use of "pool" in the Sprinter, so I didn't bother renaming those legacy uses.
Comment on attachment 562186 [details] [diff] [review]
Arena rewrite

Review of attachment 562186 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great; big improvement to the codebase.  About to r+, but I'd like to see what you think of the idea in the comment.

::: js/src/jsarena.h
@@ +58,5 @@
> +namespace detail {
> +
> +JS_ALWAYS_INLINE
> +char *
> +AlignPtr(void *orig, size_t align)

Can unparameterize and inline LIFO_ALLOC_ALIGN now.

More generally: if we have a single static alignment and are always allocating in units of that alignment, then we wouldn't need the AlignPtr at all.  Now, the interface takes an allocation size in bytes, so this would require rounding up the allocation size and we've gained nothing.  However, it seems like we often know the allocation size statically (as its the type arg to new_).  Perhaps you could have newArray and JS_DECLARE_NEW_METHODS both call a allocWithRoundedSize that skips the rounding?

@@ +199,5 @@
> +
> +    /* Steal allocated chunks from |other|. */
> +    void steal(LifoAlloc *other) {
> +        JS_ASSERT(!other->markCount);
> +        PodCopy((char *) this, (char *) other, sizeof(*this));

Once you strip the type, you might as well just use memcpy :)

::: js/src/jsapi.cpp
@@ +98,5 @@
>  #include "assembler/wtf/Platform.h"
>  
>  #include "vm/Stack-inl.h"
>  #include "vm/String-inl.h"
> +#include "ds/LifoAlloc.h"

This little paragraph is for -inl.h files.  I canonical order is, I believe (1) js*.h, (2) */*.h (3) js*inlines.h (4) */*-inl.h with newlines to separate.  (That means assembler/wtf/Platform.h is also misplaced.)

::: mfbt/Util.h
@@ +182,5 @@
>      DebugOnly(const T&) {}
>      DebugOnly& operator=(const T&) { return *this; }
>      void operator++(int) {}
>      void operator--(int) {}
> +    bool operator<(const T&) { return false; }

You explained why its valid to only have this in the !defined(DEBUG) branch, but it would still be pleasant, just for symmetry, to put operator< in the defined(DEBUG) branch (unless it causes an ambiguity?  I hope not...).
Comment on attachment 562186 [details] [diff] [review]
Arena rewrite

Review of attachment 562186 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsarena.h
@@ +222,5 @@
> +
> +        if (!getOrCreateChunk(n))
> +            return NULL;
> +
> +        JS_ASSERT(latest && latest->canAlloc(n));

Don't need to assert either conjunct, allocInfallible will take care of it.

@@ +232,5 @@
> +        void *mem = alloc(sizeof(T) * count);
> +        if (!mem)
> +            return NULL;
> +        JS_STATIC_ASSERT(tl::IsPodType<T>::result);
> +        return (T *) mem;

Shouldn't this be returning void*?  Otherwise, what's the point of having newArrayUninitialized?

@@ +251,5 @@
> +        return latest ? latest->mark() : NULL;
> +    }
> +
> +    void release(void *mark) {
> +        markCount--;

I don't see any calls to BumpChunk::delete_ on the release path...

Also, ds/LifoAlloc.h isn't included in the patch.

@@ +271,5 @@
> +            if (container->contains(mark))
> +                break;
> +            JS_ASSERT(container != latest);
> +            container = container->next();
> +        }

I would have assumed that links pointed the other direction.  For release(), one would assume that 'mark' is, in the common case, either in latest or latest->prev.  It seems like this, other than 'used', this is the only real use of 'next'; can we just switch it to 'prev'?

@@ +311,5 @@
> +
> +  public:
> +    explicit LifoAllocScope(LifoAlloc *lifoAlloc
> +                            JS_GUARD_OBJECT_NOTIFIER_PARAM)
> +      : lifoAlloc(lifoAlloc), mark(lifoAlloc->mark()), shouldRelease(true) {

I find it nice when the effect (the call to mark()) is in the ctor body, not hidden in the initializer list, but that's a matter of personal style.
Attachment #562186 - Flags: review?(luke)
Attached patch Arena rewrite. (obsolete) — Splinter Review
Include forgotten .cpp file.
Attachment #562186 - Attachment is obsolete: true
Attachment #562541 - Flags: review?(luke)
Comment on attachment 562541 [details] [diff] [review]
Arena rewrite.

Review of attachment 562541 [details] [diff] [review]:
-----------------------------------------------------------------

Again, nice work.

::: js/src/ds/LifoAlloc.cpp
@@ +90,5 @@
> +        while (latest->next()) {
> +            latest = latest->next();
> +            latest->resetBump(); /* This was an unused BumpChunk on the chain. */
> +            if (latest->canAlloc(n)) {
> +                //fprintf(stderr, "Found an existing secondary to satisfy size %u\n", unsigned(n));

rm fprintf.

@@ +108,5 @@
> +        return NULL;
> +    if (!first) {
> +        latest = first = newChunk;
> +    } else {
> +        JS_ASSERT(latest);

can haz && !latest->next() ?
Attachment #562541 - Flags: review?(luke) → review+
Attached patch Arena rewrite.Splinter Review
Carries luke's r+, needed a small build tweak to get the header exported to XPConnect.
Attachment #562541 - Attachment is obsolete: true
Blocks: 689362
Whiteboard: [inbound]
Grr, burned the tree. Not similar enough to my previously green try runs, apparently.
Whiteboard: [inbound]
Got it green on try again, pushed to inbound.
Whiteboard: [inbound]
This regressed MaxHeap in the Trace Malloc talos graph by 7.65% on x64 -- I just figured out how to run that locally, so I'm investigating now.
https://hg.mozilla.org/mozilla-central/rev/d709c25c0e1f
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Comment on attachment 562563 [details] [diff] [review]
Arena rewrite.

>diff --git a/js/src/ds/LifoAlloc.cpp b/js/src/ds/LifoAlloc.cpp
>new file mode 100644
>--- /dev/null
>+++ b/js/src/ds/LifoAlloc.cpp
>@@ -0,0 +1,149 @@
>+#include "LifoAlloc.h"

Missing license boilerplate. Fix ASAP.

>diff --git a/js/src/jsarena.h b/js/src/ds/LifoAlloc.h
>rename from js/src/jsarena.h
>rename to js/src/ds/LifoAlloc.h
>--- a/js/src/jsarena.h
>+++ b/js/src/ds/LifoAlloc.h
>  * The Initial Developer of the Original Code is
>- * Netscape Communications Corporation.
>- * Portions created by the Initial Developer are Copyright (C) 1998
>- * the Initial Developer. All Rights Reserved.
>+ *   the Mozilla Corporation.

This is not true. The ID is the Mozilla *Foundation*. Also, removing "Portions created by..." is not allowed. Fix ASAP as well.

>--- a/js/src/jsapi.cpp
>+++ b/js/src/jsapi.cpp
>@@ -95,16 +94,17 @@
> #include "jsobjinlines.h"
> #include "jsscopeinlines.h"
> #include "jsregexpinlines.h"
> #include "jsscriptinlines.h"
> #include "assembler/wtf/Platform.h"
> 
> #include "vm/Stack-inl.h"
> #include "vm/String-inl.h"
>+#include "ds/LifoAlloc.h"

This should be before the inlines.
Target Milestone: --- → mozilla10
> Missing license boilerplate. Fix ASAP.

Please be as polite in bugzilla as you would be anywhere else. Terse imperatives don't come across as nice. Thanks for the comments, though.
So, as it turns out, the increase is expected behavior. The temp pool hits two spikes, one around .2s and another around .4s. We avoid thrashing for that second spike, but at the expense of holding the memory from .4s to about 1.5s, during which the type information is also growing, leading to a higher peak usage.

To summarize, we avoid thrashing alloc/frees in exchange for a potential short-term max heap size increase. Perhaps we should make a followup to get this to play into the GC heuristics (if they're sane at this point)? I'll at least CC and ping Bill.
Whoops, the upper figure has the title "temp" when it should be "allocs-ng.dat" -- it represents the new LifoAlloc trends on the same workload.
Depends on: 691132
Depends on: 691143
Depends on: 691898
Depends on: 690933
Depends on: 693928
I just found that js/src/yarr/pcre/pcre_exec.cpp has various remaining references to JSArenaPool.  Is that file dead?
(In reply to Nicholas Nethercote [:njn] from comment #46)
> I just found that js/src/yarr/pcre/pcre_exec.cpp has various remaining
> references to JSArenaPool.  Is that file dead?

I believe it's still being used by the OS X PowerPC + OpenBSD community maintained ports (and any others that don't have YARR JIT support), as of bug 684559.
Yes, we're using it still. I'm trying to decide whether to fix this to match, or give up and go back to the slower YARR interpreter.
(however, it looks pretty straightforward to convert)
Depends on: 699668
Depends on: 937132
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: