Closed Bug 528338 Opened 15 years ago Closed 14 years ago

[Speed] Tune the paths into and through GC::Alloc, GC::Free, et al

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Future

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

Attachments

(18 obsolete files)

Cleanup inlining & specialize new 15 years ago Lars T Hansen 5.31 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Streamline GC::Alloc 15 years ago Lars T Hansen 12.87 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Free 15 years ago Lars T Hansen 5.11 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Free, v2 15 years ago Lars T Hansen 6.90 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Alloc and GC::Free 15 years ago Lars T Hansen 21.07 KB, patch		Details \| Diff \| Splinter Review
Streamline GCAlloc::Alloc and GCAlloc::Free 15 years ago Lars T Hansen 40.40 KB, patch		Details \| Diff \| Splinter Review
Sketches for specialized allocation 15 years ago Lars T Hansen 3.51 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Alloc and GC::Free, and specialize 15 years ago Lars T Hansen 31.58 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Alloc and GC::Free, and specialize - v2 15 years ago Lars T Hansen 31.67 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Streamline GCAlloc::Alloc and GCAlloc::Free, v2 15 years ago Lars T Hansen 44.46 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Bug fixes for the two patches 15 years ago Lars T Hansen 7.18 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Streamline GC::Alloc and GC::Free, and specialize - v3 15 years ago Lars T Hansen 32.83 KB, patch		Details \| Diff \| Splinter Review
Streamline GC::Alloc and GC::Free, and specialize - v4 15 years ago Lars T Hansen 34.51 KB, patch		Details \| Diff \| Splinter Review
Speed up GC::Free by rearranging and refactoring (preliminary) 15 years ago Lars T Hansen 18.26 KB, patch		Details \| Diff \| Splinter Review
Speed up GC::Free by rearranging and refactoring (clean) 15 years ago Lars T Hansen 18.40 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Quick lists for GCAlloc (preliminary) 15 years ago Lars T Hansen 51.20 KB, patch		Details \| Diff \| Splinter Review
Quick lists for GCAlloc 15 years ago Lars T Hansen 51.77 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Patch to fix injection with -Dgreedy 15 years ago Lars T Hansen 3.96 KB, patch		Details \| Diff \| Splinter Review

Lars T Hansen

Assignee

Description

•

15 years ago

Turns out there's quite a bit that can be done to reduce the computation on the paths into and through GC::Alloc: tests can be coalesced or lifted off the common paths; methods can be in-lined; lookup tables can replace decision trees; compiler problems can be worked around; accounting calls can be merged.

Lars T Hansen

Assignee

Updated

•

15 years ago

Priority: -- → P2

Target Milestone: --- → flash10.1

Lars T Hansen

Assignee

Comment 1

•

15 years ago

Attached patch Cleanup inlining & specialize new (obsolete) — Details — Splinter Review

Simple adjustments: make sure functions that should be inlined really are inlined, and specialize 'new' a couple of places so that AllocExtra is only called when there's anything extra to allocate - that path has one extra overflow check on it.

Attachment #412462 - Flags: review?(treilly)

Tommy Reilly

Updated

•

15 years ago

Attachment #412462 - Flags: review?(treilly) → review+

Lars T Hansen

Assignee

Comment 2

•

15 years ago

Comment on attachment 412462 [details] [diff] [review]
Cleanup inlining & specialize new

redux changeset:   3123:e7bc2f5c27ca

Attachment #412462 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 3

•

15 years ago

Attached patch Streamline GC::Alloc (obsolete) — Details — Splinter Review

Lars T Hansen

Assignee

Comment 4

•

15 years ago

Comment on attachment 412831 [details] [diff] [review]
Streamline GC::Alloc

This is more or less a wash on the Tamarin benchmarks, even with the AllocDouble optimization disabled.  It has shown some speedups in the Flash Player (benchmark results to appear).  Inspection of the generated code shows it to be considerably shorter and less branchy, though, so I expect it will benefit some platforms more than my development machine.

Lars T Hansen

Assignee

Comment 5

•

15 years ago

Attached patch Streamline GC::Free (obsolete) — Details — Splinter Review

Minor changes: move one test in under another, group the flags together in the GC object to allow caching to work better, and add a bunch of comments about remaining work and where the cost goes.

Lars T Hansen

Assignee

Updated

•

15 years ago

Summary: [Speed] Tune the paths into and through GC::Alloc → [Speed] Tune the paths into and through GC::Alloc, GC::Free, et al

Tommy Reilly

Comment 6

•

15 years ago

+#ifdef MMGC_MEMORY_INFO
+		item = GetUserPointer(item);
+#endif

Is a verbose way of saying:

+		item = GetUserPointer(item);

Lars T Hansen

Assignee

Comment 7

•

15 years ago

(In reply to comment #6)
> +#ifdef MMGC_MEMORY_INFO
> +        item = GetUserPointer(item);
> +#endif
> 
> Is a verbose way of saying:
> 
> +        item = GetUserPointer(item);

I know, but I was in an über-pedantic mood and wanted to be able to see exactly what work would be performed.  Will clean it up tho'.

Lars T Hansen

Assignee

Comment 8

•

15 years ago

Attached patch Streamline GC::Free, v2 (obsolete) — Details — Splinter Review

One more tweak: the C++ spec states that calling 'delete' on a NULL pointer has no effect.  Ergo the null pointer check at the beginning of Free is redundant, if Free is called from delete.  So introduce a new function FreeNotNull called from 'delete', then make Free() an inline wrapper that checks for NULL and calls FreeNotNull as appropriate.

Also fixed some warnings and a Debug-mode compile problem.

Attachment #412844 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 9

•

15 years ago

Attached patch Streamline GC::Alloc and GC::Free (obsolete) — Details — Splinter Review

Cleaned up and tested more extensively.

Attachment #412831 - Attachment is obsolete: true

Attachment #413082 - Attachment is obsolete: true

Attachment #413590 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 10

•

15 years ago

Attached patch Streamline GCAlloc::Alloc and GCAlloc::Free (obsolete) — Details — Splinter Review

This is preliminary, but apart from hanging the sampler/Callback test, it works fine.

The gist is that we layer a quick list on top of the blocks in each instance of GCAlloc.  This removes most decision points from both Alloc and Free, effectively batches many expensive operations, and requires fewer expensive oeprations overall.  A consistent global state with empty quick lists is reestablished as needed.  See comment at the head of GCAlloc.cpp in the patch for more info.

This patch needs a couple of things still:

- proper performance numbers
- more testing
- a mechanism to prevent the quick lists from growing without bound, since
  that would increase memory consumption.  I have some ideas, not fully baked
  yet.

Lars T Hansen

Assignee

Comment 11

•

15 years ago

Together these two patches drop the time spent in MMgc by 2% on the AS2 GCBench benchmark.  (This is with MMGC_HOOKS turned on; I need to re-measure with that disabled, and if the difference is significant look into tuning the hook paths some more.  For example, there are two tests on the alloc path for whether hooks are enabled, but one ought to be enough.)

Tommy Reilly

Comment 12

•

15 years ago

We can take the hook checks out if !MMGC_MEMORY_PROFILER and !DEBUG no?

Lars T Hansen

Assignee

Comment 13

•

15 years ago

This is the test currently in MMgc.h, should it be refined?

#if defined DEBUGGER || defined MMGC_MEMORY_PROFILER || defined MMGC_MEMORY_INFO
    #ifndef MMGC_HOOKS
        #define MMGC_HOOKS
    #endif
#endif

Lars T Hansen

Assignee

Comment 14

•

15 years ago

(In reply to comment #11)
> Together these two patches drop the time spent in MMgc by 2% on the AS2 GCBench
> benchmark.

2 percentage points out of 18 for MMgc = 11% speedup in MMgc for this benchmark.

Tommy Reilly

Comment 15

•

15 years ago

(In reply to comment #13)
> This is the test currently in MMgc.h, should it be refined?
> 
> #if defined DEBUGGER || defined MMGC_MEMORY_PROFILER || defined
> MMGC_MEMORY_INFO
>     #ifndef MMGC_HOOKS
>         #define MMGC_HOOKS
>     #endif
> #endif

Looks right to me

Lars T Hansen

Assignee

Comment 16

•

15 years ago

Here are some more ideas.

- Merge hooks more (alluded to above) and control more precisely when the
  hook check is needed (ditto)

- Have a soft cache of blocks that have already been initialized for a given
  allocator.  The idea is that once an allocator frees a block it should keep
  it but notify GCHeap that it has it - the block is in a consistent state
  and re-initialization should be particularly cheap.  If GCHeap needs a block
  it can take it from somebody's soft cache instead of allocating from the
  heap.  A little complication creeps in when GCHeap needs more than one
  consecutive block, we'd like to be able to perform coalescing of free blocks
  across the allocators' free lists.  Anyway, I mention this because 
  CreateChunk is showing up in the profile and it would be interesting to see
  if we could make it go away again.

Lars T Hansen

Assignee

Comment 17

•

15 years ago

Here's yet another ideas:

Observe that GC::Alloc performs a doble dispatch: it dispatches on size and on the flag bits.  In most cases these are both known (because the calls to Alloc are mostly from the new operator and the new operator inserts those flags, and knows the size).  Ergo the 'new' operator may be able to specialize the call.

Specializing by flags is dead easy, we just provide allocator methods for the flag combinations we care about.  Specializing by size is easy once we've specialized by flag, because then we already know which allocator pool to choose from - it's just a simple table lookup.  The code is small and can be inlined everywhere.

A quick test shows that this works: String::createDynamic invokes 'new (gc) String(...)', this is open-coded by the compiler as a direct call to GCAlloc::Alloc, taking GC::Alloc entirely out of the loop (modulo running of hooks and so on, which we'd have to figure out how to deal with).

Lars T Hansen

Assignee

Comment 18

•

15 years ago

Attached patch Sketches for specialized allocation (obsolete) — Details — Splinter Review

This is not something finished, just a general idea of what it would look like.  We still need to figure out how to run hooks, and trigger the collector somehow.  Both add to the inline path.  In the existing case of AllocDouble we don't specialize if MMGC_HOOKS is enabled; this is a possible solution but not ideal.

Lars T Hansen

Assignee

Comment 19

•

15 years ago

Attached patch Streamline GC::Alloc and GC::Free, and specialize (obsolete) — Details — Splinter Review

Combines the first streamlining patch with an elaboration of the specialized allocation patch; this one also specializes AllocExtra by means of a fast-path overflow check.

(This probably makes the GCAlloc::Alloc/Free patch no longer apply cleanly.)

Attachment #413590 - Attachment is obsolete: true

Attachment #414037 - Attachment is obsolete: true

Attachment #414319 - Flags: review?(treilly)

Attachment #413590 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 20

•

15 years ago

Attached patch Streamline GC::Alloc and GC::Free, and specialize - v2 (obsolete) — Details — Splinter Review

Like the previous patch, but fixes a bug in debug accounting.

Attachment #414319 - Attachment is obsolete: true

Attachment #414520 - Flags: review?(treilly)

Attachment #414319 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 21

•

15 years ago

Attached patch Streamline GCAlloc::Alloc and GCAlloc::Free, v2 (obsolete) — Details — Splinter Review

This is clean and has all facilities.  Again, see major block comment in GCAlloc.cpp for an explanation of how the quick lists work.

Attachment #413591 - Attachment is obsolete: true

Attachment #414708 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 22

•

15 years ago

Attached patch Bug fixes for the two patches (obsolete) — Details — Splinter Review

Some bugs I found while mopping up, these fixes apply to both patches.  I can fold them in if you think it's unworkable to review multiple patches like this.

- Inline specialized allocators did not always go to the right allocator pool
- Inline specialized allocators did not compute size classes correctly
- DRC pinning can modify free objects, which messes up the storage of the
  index of an object within its block in the second word of the free object.
  Worked around this by documenting the issue and masking the bit on load.
- Removed one redundant division in GCAlloc::CreateChunk.  Fairly strange that
  it was not constant-folded but it showed up hot in a profiler run (could be
  profiler noise).  Anyway we don't need it.

Attachment #414741 - Flags: review?(treilly)

Tommy Reilly

Comment 23

•

15 years ago

Comment on attachment 414520 [details] [diff] [review]
Streamline GC::Alloc and GC::Free, and specialize - v2

AllocPtrZero/AllocPtrZeroFinalized/AllocRCObject all use containsPointersRCAllocs, that's not right is it?

Tommy Reilly

Comment 24

•

15 years ago

Nevermind, I see patch 3 fixes it.

Tommy Reilly

Updated

•

15 years ago

Attachment #414520 - Flags: review?(treilly) → review+

Tommy Reilly

Updated

•

15 years ago

Attachment #414708 - Flags: review?(treilly) → review+

Tommy Reilly

Updated

•

15 years ago

Attachment #414741 - Flags: review?(treilly) → review+

Lars T Hansen

Assignee

Comment 25

•

15 years ago

Will integrate into TR after Beta 2 branch / integration.

Whiteboard: Has patch

Lars T Hansen

Assignee

Comment 26

•

15 years ago

Edwin notes elsewhere about how the JIT code is distributed (this is TR): 

"With an AS3-symbol-instrumented shark run, I get 6.5% self time in Point(), 5.9% in allocate(), 1.3% in Object(), negligible everywhere else.  If I flatten, it all attributes to allocate(), except for a few stray samples attributed to Point but without allocate on the stack.  (i blame shark for fuzzy noise like that)"

Comparatively that's a lot of time in allocate(), but if it has to call finddef_cache, then call newInstance, then call the constructor, I can see it,
depending on how well those calls are done.

Ed again: "[Calls] are not superb, so the relative time in Point vs Allocate doesn't surprise me.  Improvements to finddef, the as3 jit ABI, and general jit code quality should uniformly help."

Lars T Hansen

Assignee

Comment 27

•

15 years ago

Comment on attachment 414520 [details] [diff] [review]
Streamline GC::Alloc and GC::Free, and specialize - v2

redux changesets 3245:206804842ea6 through 3248:8f8a3b392128

Attachment #414520 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 28

•

15 years ago

Comment on attachment 414708 [details] [diff] [review]
Streamline GCAlloc::Alloc and GCAlloc::Free, v2

redux changesets 3245:206804842ea6 through 3248:8f8a3b392128

Attachment #414708 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 29

•

15 years ago

Comment on attachment 414741 [details] [diff] [review]
Bug fixes for the two patches

redux changesets 3245:206804842ea6 through 3248:8f8a3b392128

Attachment #414741 - Attachment is obsolete: true

Lars T Hansen

Assignee

Updated

•

15 years ago

Whiteboard: Has patch

Lars T Hansen

Assignee

Comment 30

•

15 years ago

I have one more fix coming, an optimization to the GC::FreeNotNull path.  Details currently available in attachment #415359 [details] [diff] [review] and in comments on bug #531250.

Lars T Hansen

Assignee

Comment 31

•

15 years ago

Patches were backed out due to regression, will re-land shortly.

Whiteboard: Has patch

Lars T Hansen

Assignee

Comment 32

•

15 years ago

Certainly one bug in the change is that the patch assumes that every object is always two words on 64-bit systems.  That turns out not to be true, of course - short non-pointer-containing objects (strings) - are still allocated in 8 bytes, one word.  Thus the trick of storing the object index in the second word of objects on the free list only works on 32-bit systems.

I found another bug which may or may not have any influence on the result; RCObject::Stick is pretty much completely wrong.  It should 'or' in the sticky bit, not set composite to sticky, and if the object is in the ZCT it needs to be removed from the ZCT.  I'll file a separate bug for that.

Lars T Hansen

Assignee

Comment 33

•

15 years ago

Yet another bug shows up in debug mode: the collector may now be triggered while no userptr to the object is on the stack or in a register, and the newly allocated object may be reaped as a consequence.

Tommy Reilly

Comment 34

•

15 years ago

How, is this a GetUserPointer/GetRealPointer issue?

Lars T Hansen

Assignee

Comment 35

•

15 years ago

(In reply to comment #34)
> How, is this a GetUserPointer/GetRealPointer issue?

In the patch, SignalAllocWork has moved into GCAlloc::Alloc.  However the 'item' that's returned from GCAlloc::Alloc is a RealPtr and a UserPtr may not exist at that point (it hasn't been computed yet - the caller computes it).  Since SignalAllocWork can trigger a collection, only the RealPtr may be on the stack or in a register when the GC runs.  But the GC does not work on RealPtrs, it needs a UserPtr.  Ergo the GC will reclaim the newly allocated 'item' - in debug builds.

Lars T Hansen

Assignee

Updated

•

15 years ago

Depends on: 533957, 503304

Lars T Hansen

Assignee

Comment 36

•

15 years ago

Attached patch Streamline GC::Alloc and GC::Free, and specialize - v3 (obsolete) — Details — Splinter Review

I consider this reviewed already; it's logged here because it integrates the bugfixes into the patch and also has a refinement necessary to avoid regressing the sampling code (see bug #533954).

Lars T Hansen

Assignee

Updated

•

15 years ago

Depends on: 533954
No longer depends on: 533957

Lars T Hansen

Assignee

Comment 37

•

15 years ago

Attached patch Streamline GC::Alloc and GC::Free, and specialize - v4 (obsolete) — Details — Splinter Review

Tested, passes the buildbot, rebased to 3306, ready to land once the tree opens.

Attachment #416913 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 38

•

15 years ago

Attached patch Speed up GC::Free by rearranging and refactoring (preliminary) (obsolete) — Details — Splinter Review

This comes from bug #531250, applies on top of attachment #417102 [details] [diff] [review], and is relative to a work tree I have - not clean yet, but passes regression testing on my system.

Lars T Hansen

Assignee

Comment 39

•

15 years ago

Comment on attachment 417102 [details] [diff] [review]
Streamline GC::Alloc and GC::Free, and specialize - v4

redux changeset:   3320:05f8ad8e3e2c

Attachment #417102 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 40

•

15 years ago

Attached patch Speed up GC::Free by rearranging and refactoring (clean) (obsolete) — Details — Splinter Review

Make GCAlloc::Free and GCLargeAlloc::Free a virtual function; subclass the allocators from a common abstract base; move logic into the two Free functions; make GC::FreeNotNull a simple inlinable wrapper.  We get speedup from the moved complexity - the old FreeNotNull was not being compiled well by GCC - and from the inlining.

Attachment #417116 - Attachment is obsolete: true

Attachment #417515 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 41

•

15 years ago

Attached patch Quick lists for GCAlloc (preliminary) (obsolete) — Details — Splinter Review

This has been scaled back a little from what we had before, because the method of communicating the object index from CreateChunk/Free/Sweep to Alloc does not work well on 64-bit systems (objects can be one word long) and more importantly, interacts poorly with the invariants on the 'composite' field in RCObject.

(This patch includes some very light cleanup of RCObject and documentation of those invariants, all in the name of making things clearer.  I may or may not factor that out before this patch lands.)

Tommy Reilly

Comment 42

•

15 years ago

Comment on attachment 417515 [details] [diff] [review]
Speed up GC::Free by rearranging and refactoring (clean)

What's with the padding in GCBlockHeader?

Lars T Hansen

Assignee

Comment 43

•

15 years ago

(In reply to comment #42)
> (From update of attachment 417515 [details] [diff] [review])
> What's with the padding in GCBlockHeader?

You mean in GCLargeAlloc::LargeBlock?  I meant to document that.  It's necessary to get 8-byte alignment for large-object allocation.  GCBlockHeader increased from three entries to four, and with only one in LargeBlock we need to pad.

Tommy Reilly

Comment 44

•

15 years ago

Comment on attachment 417515 [details] [diff] [review]
Speed up GC::Free by rearranging and refactoring (clean)


might as well cache m_itemsPerBlock in a local instead of just GCAlloc no?

speeding up Free involves adding a virtual method call, really?  virtual call can't be faster than the math based large alloc check can it?

> GCAssert(m_gc->collecting == false || m_gc->marking == true)
I can't get my head around this, could both be false?	

Another goto bites the dust, yeah!

explicit Collecting check in ~StringObject seems unnecessary

Lars T Hansen

Assignee

Comment 45

•

15 years ago

The quicklist work changes object behavior enough that bug #506013 starts showing up as failures in the mmgc selftests.

Depends on: 506013

Lars T Hansen

Assignee

Comment 46

•

15 years ago

(In reply to comment #44)
> (From update of attachment 417515 [details] [diff] [review])
> 
> might as well cache m_itemsPerBlock in a local instead of just GCAlloc no?

Good point, will fix this.

> speeding up Free involves adding a virtual method call, really?  virtual call
> can't be faster than the math based large alloc check can it?

As I alluded to in comment #40, there are several reasons why it is faster:

  - The out-of-line FreeNotNull was not compiled well (by GCC at least) because
    of all the special cases in the function; the prologue/epilogue ended up
    being fairly complicated for what is essentially a simple trampoline in
    many cases
  - Complexity moves into the underlying Free where it can be shared with
    the complexity that was already there, so we gain a bit that way
  - You pretty much get perfect branch prediction everywhere when
    FreeNotNull is inlined and makes a virtual call to the underlying Free,
    since typical sites that free objects only free objects of one size.
    In the case where we have a test-and-call, branch prediction is going to
    be less perfect, plus there are two branches to predict.
  - With no decision tree in it FreeNotNull is now so small that it makes
    sense to inline it so we save one function call/return too

As reported in bug #531250, on a synthetic benchmark meant to focus on alloc/dealloc cost, we got a 7% speedup from this change alone.

> > GCAssert(m_gc->collecting == false || m_gc->marking == true)
> I can't get my head around this, could both be false?

Yes.  The point of this test is to ensure that collecting can only be true if marking is true; when that's the case, a test of the former only needs to be performed if the latter is true.  This shaves a little bit off the deallocation path when the marker's not running.

> Another goto bites the dust, yeah!

I stared at that code for a while and then ... "that's a while loop!"

> explicit Collecting check in ~StringObject seems unnecessary

Will remove it.

Tommy Reilly

Updated

•

15 years ago

Attachment #417515 - Flags: review?(treilly) → review+

Lars T Hansen

Assignee

Comment 47

•

15 years ago

Comment on attachment 417515 [details] [diff] [review]
Speed up GC::Free by rearranging and refactoring (clean)

(In reply to comment #46)
> (In reply to comment #44)
> > (From update of attachment 417515 [details] [diff] [review] [details])
> > 
> > might as well cache m_itemsPerBlock in a local instead of just GCAlloc no?
> 
> Good point, will fix this.

Changed my mind, it's all debug code, and if we cache this we should also cache m_itemSize and b->items.  (The reason alloc is in a local anyway was just to avoid eye bleed from the cast that was required everywhere.)

(In reply to comment #46)
> (In reply to comment #44)
> > (From update of attachment 417515 [details] [diff] [review] [details])
> > 
> > might as well cache m_itemsPerBlock in a local instead of just GCAlloc no?
> 
> Good point, will fix this.

Changed my mind, it's all debug code, and if we cache this we should also cache m_itemSize and b->items.  (The reason alloc is in a local anyway was just to avoid eye bleed from the cast that was required everywhere.)

redux changeset:   3346:d4334f9c401c

Attachment #417515 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 48

•

15 years ago

Attached patch Quick lists for GCAlloc (obsolete) — Details — Splinter Review

Optimizes the paths through GCAlloc::Alloc and GCAlloc::Free by moving away from the current situation, where we have either free space at the end of a block or an object on an on-block free list, to a situation where we have one per-allocator list of free objects from various blocks (a "quick list").  The effect of the current situation is akin to eager coalescing with the costs that come with that; with quick lists, we move toward lazy coalescing and try to create fast paths that are hit most of the time.

See documentation block at the beginning of GCAlloc.cpp for more information.

Attachment #417520 - Attachment is obsolete: true

Attachment #421531 - Flags: review?(treilly)

Tommy Reilly

Comment 49

•

15 years ago

There's probably a way to get rid of the if (size <= kLargestAlloc) by adding the large alloc to the allocs table and doing a little smarter size->index calculation.

Lars T Hansen

Assignee

Comment 50

•

15 years ago

(In reply to comment #49)
> There's probably a way to get rid of the if (size <= kLargestAlloc) by adding
> the large alloc to the allocs table and doing a little smarter size->index
> calculation.

Possibly.

There might also be ways of avoiding the constant checking for the quick list budget by batching that.

I will consider these as follow-on items though; won't try to do them in this patch.

Tommy Reilly

Updated

•

15 years ago

Attachment #421531 - Flags: review?(treilly) → review+

Lars T Hansen

Assignee

Comment 51

•

15 years ago

Comment on attachment 421531 [details] [diff] [review]
Quick lists for GCAlloc

tamarin-argo changeset: 3497:c0d7bbdb6dbb

Lars T Hansen

Assignee

Comment 52

•

15 years ago

tamarin-redux changeset: 3559:9e00648eb776 (merge from tamarin-argo).

Remaining issues deferred.

Priority: P2 → --

Whiteboard: Has patch

Target Milestone: flash10.1 → Future

Lars T Hansen

Assignee

Comment 53

•

15 years ago

Attached patch Patch to fix injection with -Dgreedy (obsolete) — Details — Splinter Review

(In reply to comment #48)
> Created an attachment (id=421531) [details]
> Quick lists for GCAlloc

The quick list patch has a bug with -Dgreedy which manifests itself with a stack overflow or an infinite loop, depending on the compiler's ability to perform tail call elimination.  This patch fixes that.

tamarin-argo 3498:30e38e285ea0 (self-reviewed)

Lars T Hansen

Assignee

Updated

•

15 years ago

Attachment #421531 - Attachment is obsolete: true

Lars T Hansen

Assignee

Updated

•

15 years ago

Attachment #422518 - Attachment is obsolete: true

Lars T Hansen

Assignee

Comment 54

•

15 years ago

Due to a snafu, the quicklist patch landed in tamarin-redux-argo changeset: 3561:a74a4f33061a, not the previously mentioned change sets.

Lars T Hansen

Assignee

Comment 55

•

14 years ago

Large-alloc follow-on item: bug #557898.
Quicklist budget follow-on item: bug #557901.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.