introduce code allocator and eliminate Assembler dependency on Fragmento

VERIFIED FIXED

Status

Tamarin
Baseline JIT (CodegenLIR)
VERIFIED FIXED
9 years ago
9 years ago

People

(Reporter: Edwin Smith, Assigned: Edwin Smith)

Tracking

(Blocks: 1 bug)

Details

(Whiteboard: j)

Attachments

(7 obsolete attachments)

(Assignee)

Description

9 years ago
To support keeping some methods/traces and freeing others, nanojit needs a code allocator that has more granularity than currently.

- Assembler doesn't know how much code it will need on allocation
- once done, Assembler can free unused portions of memory

code should be allocated & free'd in small pieces as needed (no rounding up to a 4K page size for teensy bits of code), but also support memory management operations:
 
- set/clear RWX bits on code
- flush icache on newly created or patched code
(Assignee)

Comment 1

9 years ago
Created attachment 375667 [details] [diff] [review]
new code allocator

Creates & uses new CodeAlloc class for code allocation.

Code is managed in a series of linked lists; linked list manipulation
is encapsulated in the CodeAlloc class, as is interfacing with the
underlying OS for allocating & modifying code pages.

Tamarin will use a mark-sweep algorithm to ensure active code blocks
are kept in memory during a flush, so the CodeAlloc class supports an
api classifying pointers during the stack scan; this api isn't used yet
and the details of walking the stack are meant to be vm specific
(not in nanojit).

CodeAlloc's api reflects the usage pattern already in place; Assembler
requsts an arbitrary block of memory, puts code into it, then releases
the remaining memory back to the allocator for later use.  (alloc takes
no size argument, but free needs to specify an address range).

In addition, in-use blocks of code memory are kept in lists which are
manipulated by CodeAlloc; the head of the list might live in nanojit
(during assembly), or in a VM-defined object (method or trace).

Assembler no longer retains code pointers inbetween sessions, so now it
can be short-lived.  This patch makes it short-lived in Tamarin;
only CodeAlloc instances are retained over long periods.  Fragmento
is no longer used by Tamarin.  Fragment's are used since the Assembler
still depends on them, but that dependency is small now and can be
removed in a future patch. (this patch is full).

Limitations & TODO's:
 - pointer classification is a brute force search.  if this isn't fast
   enough then the VM could need an exact way to traverse jit stack frames,
   or else CodeAlloc could add structures to speed the search.
 - no coalescing of adjacent blocks
 - Assembler must reject blocks that are too small. We could add a min-size
   argument to alloc, making it more like malloc().
 - only x86 backend is updating, others need work.
 - arm backend will require a more sophisticated addRemaining() api, since
   it must retain two parts of each block and free the middle part.
Assignee: nobody → edwsmith
Attachment #375667 - Flags: review?(lhansen)
(Assignee)

Updated

9 years ago
Attachment #375667 - Flags: review?(lhansen)

Comment 2

9 years ago
Can you post CodeAlloc?  -- I believe its missing from the patch.

Comment 3

9 years ago
The overall direction is good; nicely separating concerns.  

Is anyone using the tree version of LIR.compile() ? 

I'm wondering how we're going to manage the fragmentation that is implicit in this approach.  

One abstraction which would centralize the control, would be for the CodeAlloc to return CodeBlocks (which would contain address,length,start,end). address and length are fixed for the block while start,end are used to denote which part of the block contains code and are mutable by Assembler.  (I think you're calling these entities CodeList)

A call to CodeAlloc.tidy(CodeBlockList*) - once compilation is complete, might result in trimming the block(s) (i.e shrink length, adjust address, all based on how aggressively we want to create/manage the holes in mem.  Its similar to what you have but hides the decision/policy inside of CodeAlloc.
(Assignee)

Comment 4

9 years ago
I beleive tracemonkey uses LIR.compile(), but its a good candidate to move into tracemonkey since it assumes a lot about how tree fragments ought to be compiled.

An open TODO is for CodeAlloc to perform coalescing of adjacent blocks in the same CodeList, and the necessary control is already centralized in CodeAlloc.cpp.  each node in a CodeList has start/end/size.  clients of the allocator cannot see inside CodeList entries and only know the start/end bounds of single blocks, so if we need sentinels at the ends of blocks for effecient coalescing, it should be a local change in CodeAlloc.

An earlier iteration of the code had CodeBlock and CodeList, and CodeList was simply an opaque wrapper around the pointer to the first node, ie. class CodeList { CodeBlock* list }.  instances of CodeList were declared opaquely as "CodeList myList".  this was problematic because of include file ordering dependencies, so I eliminated CodeBlock, and clients of the allocator use "CodeList* mylist" (opaquely) now instead of "CodeList mylist".  CodeList* can be forward-declared.  If its confusing, we could rename CodeList to CodeBlock and change nothing else.

> CodeAlloc.tidy(...)

see CodeAlloc::addRemainder()

assembler tells CodeAlloc the areas it's using for sure, and its up to CodeAlloc to split up blocks and free the rest.  any policy decisions already can be localized in CodeAlloc, I think.

Another TODO is to fix this for ARM, which adds constants from [codeStart, _nSlot] and adds code from [_nIns, codeEnd].  addRemainder() assumes a block is only split into two parts (free, used).  ARM needs a 3-way split: (used, free, used)
(Assignee)

Comment 5

9 years ago
Created attachment 376159 [details] [diff] [review]
patch 1: add CodeAlloc, use it on x86, dont use Fragmento
Attachment #375667 - Attachment is obsolete: true
Attachment #376159 - Flags: review?(lhansen)
(Assignee)

Updated

9 years ago
Attachment #376159 - Flags: review?(rreitmai)
(Assignee)

Comment 6

9 years ago
Created attachment 376161 [details] [diff] [review]
updates for ARM-winmo, x64, PPC, PPC64, but not sparc or arm-linux

Tweaks the addRemaining() api to take the original alloc range (start,end) and a range for a hole in the middle (holeStart, holeEnd).  Inside addRemaining, we handle the cases of a middle hole, left-justified hole, but not a right-justified hole since we never hit that case.  (assert is there).

this is needed because on ARM we generate constant pools and code at the same time, into the same code block (growing towards each other).  On all other cpu's we don't generate the constants; so code just fills up the block backwards.

arm linux untested -- uses the _clear_cache() api provided by gcc.  N/A for x86/x64 cpu's since they have cache interlocks.

sparc untested, in progress.  (FIXME's will draw attention by breaking compile).
Attachment #376161 - Flags: review?(rreitmai)

Comment 7

9 years ago
Comment on attachment 376159 [details] [diff] [review]
patch 1: add CodeAlloc, use it on x86, dont use Fragmento

Since CodeAlloc does not use the GC argument except for its heap member, it would be cleaner for the caller to pass the heap member - makes it clear the codealloc is not tied to a GC.

CodeAlloc::flush_icache is no worse than what we have but needs to be factored into a porting API eventually.  Ditto mark_executable obviously.

Any reason why those two are underscore_named and not camelCase like all the others?

I assume classifyPtr is used for a conservative scan; should not be hard to speed this up.
Attachment #376159 - Flags: review?(lhansen) → review+
(Assignee)

Comment 8

9 years ago
(In reply to comment #7)
> (From update of attachment 376159 [details] [diff] [review])

> Since CodeAlloc does not use the GC argument except for its heap member, it
> would be cleaner for the caller to pass the heap member - makes it clear the
> codealloc is not tied to a GC.

Good point, but the ctor does use GC*, to initialize the heapblocks member list.

> CodeAlloc::flush_icache is no worse than what we have but needs to be factored
> into a porting API eventually.  Ditto mark_executable obviously.

agreed.

> Any reason why those two are underscore_named and not camelCase like all the
> others?

because I can't decide which I like better?  I'll fix for consistency.

> I assume classifyPtr is used for a conservative scan; should not be hard to
> speed this up.

Correct on both counts.  Once a pointer into active code is found, the client actually has another O(N) search to do, to find the MethodInfo that owns the code.  If CodeList stored a client pointer (maybe as void*) then that search would disapear.
Whiteboard: j
(Assignee)

Updated

9 years ago
Blocks: 491850

Comment 9

9 years ago
Comment on attachment 376159 [details] [diff] [review]
patch 1: add CodeAlloc, use it on x86, dont use Fragmento

Hard to tell if outputXXX functions changed.  I don't think they did.

Previously pages would be marked r/w until we completed 
compilation then they would be marked r/x. 

This patch appears to mark the pages r/w/x upon allocation 
and never changes them.  

Also we should move the mark_executable() method into VMPI.

I'm also guessing that we'll need to add some amount of monitoring
into the new code, similar to dumpFragStats.

Marking + with the caveat that we'll resolve these issues prior 
to landing in TC.
Attachment #376159 - Flags: review?(rreitmai) → review+

Updated

9 years ago
Attachment #376161 - Flags: review?(rreitmai) → review+

Comment 10

9 years ago
Comment on attachment 376161 [details] [diff] [review]
updates for ARM-winmo, x64, PPC, PPC64, but not sparc or arm-linux

addRemainder - nanojit.h has a bunch of macros, alignTo/alignUp, etc that make
reading alignment calcs much easier.

ok i give -  what's special about 0x1d4818 ?

didn't check too deeply , but are we guaranteeing
that post-underrunProtect() PC_OFFSET_FROM(_nSlot,_nIns-1) <= -0x1000 
for all cases. 

Maybe an assert in there might be good.  Also, may want
to remove any of the old comments that refer to 'page'
that are now irrelevant.
(Assignee)

Comment 11

9 years ago
(In reply to comment #9)
> (From update of attachment 376159 [details] [diff] [review])
> Hard to tell if outputXXX functions changed.  I don't think they did.

correct

> Previously pages would be marked r/w until we completed 
> compilation then they would be marked r/x. 

previously in MIR, perhaps, but not nanojit.  Agree this is still required for shipping.

> Also we should move the mark_executable() method into VMPI.

already planned.

> I'm also guessing that we'll need to add some amount of monitoring
> into the new code, similar to dumpFragStats.

In the code manager, yes.  The allocator already has api's
for getting heap-allocated memory size and free-mem size,
as well as measuring the size of any CodeList (for in-use mem).

> Marking + with the caveat that we'll resolve these issues prior 
> to landing in TC.

Since TC already marks pages RWX, and never changes them, and doesn't have VMPI, I don't think doing these things blocks landing in TC.  However they do need to be done soon.
(Assignee)

Comment 12

9 years ago
(In reply to comment #10)
> (From update of attachment 376161 [details] [diff] [review])
> addRemainder - nanojit.h has a bunch of macros, alignTo/alignUp, etc that make
> reading alignment calcs much easier.

i'll investigate before landing.

> ok i give -  what's special about 0x1d4818 ?

debugging cruft, will remove.

> didn't check too deeply , but are we guaranteeing
> that post-underrunProtect() PC_OFFSET_FROM(_nSlot,_nIns-1) <= -0x1000 
> for all cases. 

i'll fix this.  it's not required, even on ARM, as long as
the allocator gives us 4K at a time.  if the allocator becomes
capable of giving out larger code blocks, we will want to restrict
it on arm because of how we generate constant pools.

> Maybe an assert in there might be good.  Also, may want
> to remove any of the old comments that refer to 'page'
> that are now irrelevant.

will do.
(Assignee)

Comment 13

9 years ago
both patches landed as http://hg.mozilla.org/tamarin-redux/rev/8f05ae1c7a08
(Assignee)

Updated

9 years ago
Attachment #376159 - Attachment is obsolete: true
(Assignee)

Updated

9 years ago
Attachment #376161 - Attachment is obsolete: true

Comment 14

9 years ago
Comment on attachment 376159 [details] [diff] [review]
patch 1: add CodeAlloc, use it on x86, dont use Fragmento

Maybe the CodeAlloc::free memset should be debug only?
(Assignee)

Comment 15

9 years ago
(In reply to comment #14)
> (From update of attachment 376159 [details] [diff] [review])
> Maybe the CodeAlloc::free memset should be debug only?

correct, will fix

Update: Fragmentaton is a problem under stress testing where methods
are aggressively freed to create intentional churn.  patch to coalesce free
blocks will be coming shortly.
(Assignee)

Comment 16

9 years ago
Created attachment 378848 [details] [diff] [review]
free block coalescing to minimize fragmentation

Adds free block coalescing to the allocator.

The CodeList (block header) structure now contains references to its adjacent neighbors as well as a next pointer for building lists of blocks:

    block --higher-> block
          <-lower--- 

lower is null at the low-end and higher is null at the high end.  Each page allocated from GCHeap has a small block at the end which is always in use and is a teriminator of the linked list of blocks.

The heapblocks array is now a linked list using the terminal blocks of each heap allocation.

This patch additionally adds code to identify whole free blocks in CodeAlloc.sweep(), mark them back to RW-no-X, and give them back to GCHeap.  It's up to the VM to call sweep() at appropriate times.  Changing page protections is slow and this approach is measurably faster than freeing empty pages immediately upon discovering them in free().

CodeAlloc.freelist is no longer used; each block has a status (free or in-use) and we locate free blocks by searching the heapblocks list.  No attempt has been made to make allocation fast since it hasn't shown up as a hotspot.

Also, the block header (CodeList) is bigger that necessary, in particular the isFree bit could be stuffed into one of the pointer fields.  Since we don't allocate more than 4K at a time, we could also combine lower and higher/end into two int16 size fields (allowing up to 64k per block on 32bit systems).  This would make CodeList be 2 words instead of 4.

also:
* commented out debugging memset
* CodeAlloc is constructed with GCHeap* now since it no longer has
  an internal array which requires GC* (previous review)

I'm willing to split this patch up if it's too hard to review.
Attachment #378848 - Flags: superreview?(lhansen)
Attachment #378848 - Flags: review?(rreitmai)
(Assignee)

Updated

9 years ago
Attachment #378848 - Flags: superreview?(lhansen)
Attachment #378848 - Flags: review?(rreitmai)
(Assignee)

Comment 17

9 years ago
Comment on attachment 378848 [details] [diff] [review]
free block coalescing to minimize fragmentation

cancelling review, will post smaller patch.  this one has a couple problems.

(clobbered solaris fixes, not tested on ARM yet)
(Assignee)

Comment 18

9 years ago
Created attachment 378865 [details] [diff] [review]
finish using VMPI for page protection, remove old makeExecutable() code.
Attachment #378848 - Attachment is obsolete: true
Attachment #378865 - Flags: superreview?(lhansen)
(Assignee)

Comment 19

9 years ago
Created attachment 378867 [details] [diff] [review]
Coalesce free blocks
Attachment #378867 - Flags: review?
(Assignee)

Updated

9 years ago
Attachment #378867 - Flags: superreview?(lhansen)
Attachment #378867 - Flags: review?(rreitmai)
Attachment #378867 - Flags: review?
(Assignee)

Comment 20

9 years ago
Created attachment 378869 [details] [diff] [review]
mark pages RW and give back to gcheap when CodeAlloc.sweep() is called.

final of 3 patches, rebased, replacing earlier patch bomb
Attachment #378869 - Flags: superreview?(lhansen)

Updated

9 years ago
Attachment #378865 - Flags: superreview?(lhansen) → superreview+

Updated

9 years ago
Attachment #378867 - Flags: superreview?(lhansen) → superreview+

Updated

9 years ago
Attachment #378869 - Flags: superreview?(lhansen) → superreview+
(Assignee)

Updated

9 years ago
Attachment #378865 - Attachment is obsolete: true
(Assignee)

Updated

9 years ago
Attachment #378867 - Attachment is obsolete: true
Attachment #378867 - Flags: review?(rreitmai)
(Assignee)

Comment 22

9 years ago
Comment on attachment 378869 [details] [diff] [review]
mark pages RW and give back to gcheap when CodeAlloc.sweep() is called.

http://hg.mozilla.org/tamarin-redux/rev/e1cef75a226c
Attachment #378869 - Attachment is obsolete: true
(Assignee)

Updated

9 years ago
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED

Updated

9 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.