Closed Bug 943156 Opened 6 years ago Closed 3 years ago

Convert plarena.h macros to template functions

Categories

(Core :: MFBT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: emk, Assigned: erahm)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 3 obsolete files)

As an added bonus, we could make more files unifiable.
I would like to work on this bug could someone elaborate about this?
Flags: needinfo?(VYV03354)
Flags: needinfo?(nfroyd)
I don't have context on what Kimura-san had in mind when this bug was filed.  My guess is that it's essentially a replacement for plarena.h, but template functions, e.g.

  template<typename T>
  T*
  Arena::Allocate()

are used instead of the macros that plarena.h provide.  If we could make whatever we wrote up conform to the AllocPolicy interface:

http://dxr.mozilla.org/mozilla-central/source/mfbt/AllocPolicy.h#24

that would be a nice bonus.  Kimura-san can confirm whether I am correct or not.
Flags: needinfo?(nfroyd)
Nathan is right.
Flags: needinfo?(VYV03354)
Attached patch Add a templated ArenaAllocator (obsolete) — Splinter Review
This adds an arena allocator that can be used as a drop-in replacement for
NSPR's PLArena. Example usage for defining an 8-byte aligned allocator that
uses a 4K arena size:

> mozilla::ArenaAllocator<4096,8> a;
> void* memory = a.Allocate(200);

I limited the scope to just what we need in Gecko, so no AllocPolicy and none
of the other cruft in plarena.

Nathan what do you think, is this the right approach?
Attachment #8851181 - Flags: feedback?(nfroyd)
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Comment on attachment 8851181 [details] [diff] [review]
Add a templated ArenaAllocator

Review of attachment 8851181 [details] [diff] [review]:
-----------------------------------------------------------------

Even with the tests on the first cut, nice.

js/src/ds/LifoAlloc.h has code that may be worth looking at and/or stealing.  It's probably overkill for replacing plarena.h, though.

Not that much to argue about here, just a few comments below.

::: xpcom/ds/ArenaAllocator.h
@@ +85,5 @@
> +     */
> +    uintptr_t tail;
> +  };
> +
> +  struct Arena : public LinkedListElement<Arena> {

I know this makes for less code, but do areans really need to be in a doubly-linked list?  I think a singly-linked list can work, with an extra pointer to the last element of the list.

I think this should be called ArenaChunk or somesuch.

@@ +104,5 @@
> +
> +  /**
> +   * Allocates an arena of the given size and initializes its header.
> +   */
> +  Arena* AllocateArena(size_t aSize, size_t aOffset)

Is the `aOffset` argument really needed, since we pass in the same value everywhere?

@@ +110,5 @@
> +    MOZ_ASSERT(aOffset < aSize);
> +
> +    void* p = malloc(aSize);
> +    if (!p) {
> +      return nullptr;

Are we making things fallible by default to match the behavior of plarena.h?  Maybe we should consider making things infallible by default and providing a fallible interface if really needed...

@@ +141,5 @@
> +    // Check if any of the arenas have enough room.
> +    // TODO(ER): I think this loop can go. Basically there should never be
> +    // anything after current because we always insert new arenas at the end.
> +    // And if we don't need the loop we could probably just use an nsTArray
> +    // and get rid of the memory overhead of the linked list.

nsTArray would require expensive copying when allocating amounts larger than the arena size, though, assuming you insert those at the front.

@@ +152,5 @@
> +
> +      arena = arena->getNext();
> +    }
> +
> +    // We need to allocate a default sized arena.

MOZ_ASSERT(!arena);

::: xpcom/tests/gtest/TestArenaAllocator.cpp
@@ +34,5 @@
> +  void* x = a.Allocate(8);
> +  void* y = a.Allocate(8);
> +
> +  // Given 8-byte aligment, and an 8-byte aligned request we expect the allocations
> +  // each other exactly.

"we expect the allocations each other exactly"?  I think that's missing a part of speech or two.

@@ +48,5 @@
> +  // We expect 7 bytes of padding to have been added.
> +  EXPECT_EQ(uintptr_t(x) + 8 * 2, uintptr_t(y));
> +
> +  // Make sure aligment is correct for 1-8.
> +  for (int i = 1; i < 9; i++) {

Maybe this loop could be made to serve the 8 and 9 byte cases above as well, with some careful programming?

@@ +71,5 @@
> +
> +  for (int i = 0; i < 50; i++) {
> +    void* x = a.Allocate(i);
> +    // All the allocations should be aligned properly.
> +    EXPECT_EQ(uintptr_t(x) % 4, uintptr_t(0));

This test and the AllocateAlignment test seem to be testing the same thing, albeit with different alignments?

@@ +75,5 @@
> +    EXPECT_EQ(uintptr_t(x) % 4, uintptr_t(0));
> +  }
> +}
> +
> +TEST(ArenaAllocator, AllocateAcrossArenas)

I think a more descriptive name for this test is AllocateInDifferentChunks (?).
Attachment #8851181 - Flags: feedback?(nfroyd) → feedback+
Attached patch Add a templated ArenaAllocator (obsolete) — Splinter Review
I think this covers most of the comments. Signficant changes:

  - Using 'chunk' terminology over arena
  - Switched over to using a singly linked list
  - Added infallible Allocation function
  - Looked at LifoAlloc, lifted a few ideas:
    - Assert that the alignment is a power of two
    - Make sure the offset pointer is properly aligned
  - Tests are expanded / reworded
  - Added SizeOf function

Also see the follow up perf test for details on why some choices were made.
Attachment #8852206 - Flags: review?(nfroyd)
Attachment #8851181 - Attachment is obsolete: true
These tests were used to tune the performance of ArenaAllocator. This helped
show:
  - Singly linked list was faster than nsTArray
  - *Not* using an in-place new when allocating ArenaChunk was faster than
    in-place new with a constructor
  - Where MOZ_ALWAYS_INLINE had impact: Allocate(fallible) and AllocateInternal
  - Using NotNull had no perf impact
  - ArenaAllocator has about the same perf as PLArenaPool for the fallible case

Example numbers on my Ubuntu 16.04 desktop with an opt build using clang 4.0:

> ./mach gtest 'ArenaAllocator.*Perf'
> ...
> [ RUN      ] ArenaAllocator.Perf
> [       OK ] ArenaAllocator.Perf (258 ms)
> [ RUN      ] ArenaAllocator.PlPerf
> [       OK ] ArenaAllocator.PlPerf (252 ms)

Nathan, to test out locally you'd just need to qimport this patch and run
the command noted above.
Attachment #8852207 - Flags: feedback?(nfroyd)
Blocks: 1351732
This adds a |Clear| function to |ArenaAllocator|. This helps in switching over
usages of plarena that expect to be able to clear all allocations before the
owner of the arena is destroyed.

I can fold this into the first part if you'd like, thought it might be easier
to review separately.
Attachment #8852613 - Flags: review?(nfroyd)
Blocks: 1351801
Blocks: 1351804
Comment on attachment 8852207 [details] [diff] [review]
Do not land: Perf test used to tune ArenaAllocator

Review of attachment 8852207 [details] [diff] [review]:
-----------------------------------------------------------------

WFM.  Thanks for doing the performance measurements.
Attachment #8852207 - Flags: feedback?(nfroyd) → feedback+
Blocks: 1351904
Comment on attachment 8852206 [details] [diff] [review]
Add a templated ArenaAllocator

Review of attachment 8852206 [details] [diff] [review]:
-----------------------------------------------------------------

A few more questions.

::: xpcom/ds/ArenaAllocator.h
@@ +40,5 @@
> +class ArenaAllocator
> +{
> +public:
> +  ArenaAllocator()
> +    : mHead()

Invoking a no-arg constructor on an object that has an implicitly-defined constructor always sets off alarm bells, because I can never remember what the rules are as to what gets initialized and what doesn't.  Maybe you should go ahead and provide ArenaChunk with constructors--no-arg and multi-arg variants, so you can use placement new?  (I don't know why non-placement new would be any faster...)

@@ +41,5 @@
> +{
> +public:
> +  ArenaAllocator()
> +    : mHead()
> +    , mCurrent(WrapNotNull(&mHead))

This makes me a little nervous, but assuming we'll never allocate anything into mHead, I think this is OK.

@@ +146,5 @@
> +  };
> +
> +  static uintptr_t AlignPtr(void* p)
> +  {
> +    return (uintptr_t(p) + (Alignment - 1)) & (~Alignment + 1);

This function is identical to AlignedSize, except for the argument type.  Maybe we want to unify them--just one function taking a uintptr_t or something?

@@ +180,5 @@
> +
> +  MOZ_ALWAYS_INLINE void* InternalAllocate(size_t aSize)
> +  {
> +    static const size_t kMaxArenaCapacity =
> +        ArenaSize - AlignedSize(sizeof(ArenaChunk));

Should probably static_assert(ArenaSize > AlignedSize(sizeof(ArenaChunk))) somewhere, so this doesn't unexpectedly wrap on us.

@@ +187,5 @@
> +      return mCurrent->Allocate(aSize);
> +    }
> +
> +    ArenaChunk* arena = AllocateChunk(std::max(kMaxArenaCapacity, aSize));
> +    return arena ? arena->Allocate(aSize) : nullptr;

The allocation/chunk strategy in the previous patch seemed better: when we have to allocate an oversized area, we move it out of the way so we never try to allocate from it again.  With this strategy, we'll allocate an oversized area, hand out the memory, and then on the next allocation request, we'll try (unsuccessfully) to allocate from the full chunk, so we'll have to allocate another chunk.  Whereas before, we might have been able to satisfy the request from an already-existing chunk.  Why the change?

::: xpcom/tests/gtest/TestArenaAllocator.cpp
@@ +201,5 @@
> +  ArenaAllocator<kArenaSize> a;
> +
> +  // Excluding *this we expect an empty arena allocator to have no overhead.
> +  size_t sz = a.SizeOfExcludingThis(TestSizeOf);
> +  EXPECT_EQ(sz, size_t(0));

Nice.
Attachment #8852206 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8852613 [details] [diff] [review]
Part 2: Add ability to clear an ArenaAllocator

Review of attachment 8852613 [details] [diff] [review]:
-----------------------------------------------------------------

WFM.  Your call on whether you want to fold this into part 1 or not.
Attachment #8852613 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #12)
> Comment on attachment 8852206 [details] [diff] [review]
> Add a templated ArenaAllocator
> 
> Review of attachment 8852206 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> A few more questions.
> 
> ::: xpcom/ds/ArenaAllocator.h
> @@ +40,5 @@
> > +class ArenaAllocator
> > +{
> > +public:
> > +  ArenaAllocator()
> > +    : mHead()
> 
> Invoking a no-arg constructor on an object that has an implicitly-defined
> constructor always sets off alarm bells, because I can never remember what
> the rules are as to what gets initialized and what doesn't.  Maybe you
> should go ahead and provide ArenaChunk with constructors--no-arg and
> multi-arg variants, so you can use placement new?  (I don't know why
> non-placement new would be any faster...)

Sure.  I'm not sure either why the perf was better, I re-ran with placement new and it seems fine. I think just inlining those two functions improved things more than any of the other small tweaks I initially tried.

> @@ +41,5 @@
> > +{
> > +public:
> > +  ArenaAllocator()
> > +    : mHead()
> > +    , mCurrent(WrapNotNull(&mHead))
> 
> This makes me a little nervous, but assuming we'll never allocate anything
> into mHead, I think this is OK.

I'm getting rid of WrapNotNull so that we can use constexpr, this will become nullptr.

> @@ +146,5 @@
> > +  };
> > +
> > +  static uintptr_t AlignPtr(void* p)
> > +  {
> > +    return (uintptr_t(p) + (Alignment - 1)) & (~Alignment + 1);
> 
> This function is identical to AlignedSize, except for the argument type. 
> Maybe we want to unify them--just one function taking a uintptr_t or
> something?

I'll switch to just AlignedSize (still taking size_t, it's used elsewhere).

> @@ +180,5 @@
> > +
> > +  MOZ_ALWAYS_INLINE void* InternalAllocate(size_t aSize)
> > +  {
> > +    static const size_t kMaxArenaCapacity =
> > +        ArenaSize - AlignedSize(sizeof(ArenaChunk));
> 
> Should probably static_assert(ArenaSize > AlignedSize(sizeof(ArenaChunk)))
> somewhere, so this doesn't unexpectedly wrap on us.

Sure. I'd be pretty surprised if someone declared a 16-byte arena!

> @@ +187,5 @@
> > +      return mCurrent->Allocate(aSize);
> > +    }
> > +
> > +    ArenaChunk* arena = AllocateChunk(std::max(kMaxArenaCapacity, aSize));
> > +    return arena ? arena->Allocate(aSize) : nullptr;
> 
> The allocation/chunk strategy in the previous patch seemed better: when we
> have to allocate an oversized area, we move it out of the way so we never
> try to allocate from it again.  With this strategy, we'll allocate an
> oversized area, hand out the memory, and then on the next allocation
> request, we'll try (unsuccessfully) to allocate from the full chunk, so
> we'll have to allocate another chunk.  Whereas before, we might have been
> able to satisfy the request from an already-existing chunk.  Why the change?

Again, a supposed perf improvement. I think I had the logic in |InternalAllocate| originally. Adding it to |AllocateChunk| doesn't seem to affect perf, I'll update.
MozReview-Commit-ID: 26l0xjVmnFW
This is a rollup of all the updates:
  - constexpr constructors
  - in-place new for ArenaChunk, appropriate ctors added
  - |mCurrent| logic is restored
  - static assert added for arena size smaller than header size
  - removed duplicated alignment logic

Plus a few platform specific test fixes:
  - Windows had a -Werror for assigning a literal value to a char
  - ASAN SizeOf test failed. ASAN is intercepting mallocs so we can't predict
    the exact alloction sizes. Instead we just loosen the restrictions so
    that we test if the size is growing as expected.
Attachment #8853120 - Flags: review?(nfroyd)
Attachment #8852206 - Attachment is obsolete: true
Comment on attachment 8852613 [details] [diff] [review]
Part 2: Add ability to clear an ArenaAllocator

Rolled this up into the parent patch.
Attachment #8852613 - Attachment is obsolete: true
Attachment #8853120 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/f09969186032
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.