Closed Bug 545295 Opened 10 years ago Closed 10 years ago

OS requests for code memory should be distinct from heap memory ones (re: CodeAlloc::allocCodeChunk)

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: rreitmai, Assigned: lhansen)

References

Details

(Whiteboard: Has patch, Symbian, fixed-in-nanojit, fixed-in-tracemonkey)

Attachments

(7 files, 4 obsolete files)

3.46 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
1.67 KB, patch
treilly
: review+
Details | Diff | Splinter Review
3.14 KB, patch
treilly
: review+
Details | Diff | Splinter Review
4.86 KB, patch
edwsmith
: review+
skekki
: feedback+
Details | Diff | Splinter Review
1.26 KB, patch
Details | Diff | Splinter Review
13.12 KB, patch
edwsmith
: review+
Details | Diff | Splinter Review
1.49 KB, patch
njn
: review+
Details | Diff | Splinter Review
Currently CodeAlloc::allocCodeChunk is written with the assumption that the GCHeap is able to manage code memory via the AllocCodeMemory call.

Within GCHeap this call currently calls Alloc() which then obscures the fact that this is a code memory allocation and thus heap and code memory are assumed to be an interchangeable shared resource.

This is not always true; e.g. symbian, as some OSs require unique up front calls for code memory.

Two possible solutions to this issue are : 

(1) re-route the request at the allocCodeChunk() call, similar to what is done with flushICache() and its multitudes of ifdefs.

(2) promote GCHeapAllocCodeMemory and FreeCodeMemory to 1st class citizens and migrate the implementation out of the .h file.  Make the GCHeap aware of code memory and do the appropriate calls for this 'type' .
Blocks: 536223
What's the underlying deal here, are code and heap memory somehow forced to come from different virtual memory regions or something?   Is heap memory not allowed to have execute permissions?
On Symbian there are two separate OS calls for allocating non-exec memory and exec-memory. The exec-bit can not be changed after the memory has been allocated. In addition, there is a method for flushing the I cache just before execution, like on other platforms.

In the patch that I tried to push to Tamarin I use RChunk::CreateDisconnectedLocal for creating a Symbian OS chunk/heap for allocating non-exec memory and RChunk::CreateLocalCode for exec-memory. Later on I use User::IMB_Range to flush the I cache.

Based on my conversation with Symbian OS employees this is the only way to allocate exec memory on Symbian.

In my patch I created new VMPI methods called VMPI_AllocCodeMemory and VMPI_FreeCodeMemory. Regular non-exec memory allocation goes through the existing VMPI virtual memory APIs.
Flags: flashplayer-qrb+
Priority: -- → P2
Whiteboard: Symbian
Target Milestone: --- → flash10.1
I'm not tied to any particular solution.  What are the requirements?

 - need to allocate code memory separately on some platforms
 - would like to use the standard GCHeap to allocate code memory on other
   platforms, to reduce platform complexity
 - need to be able to mprotect or similar on platforms that support that
 - need to do proper accounting of the code memory: the fact that we have it,
   and that it is code memory.  Exact requirements tbd.

This suggests a broad VMPI_allocCodeMemory / protectCodeMemory API that is called directly from the JIT - not via GCHeap - and which is allowed to call back into GCHeap on platforms where special memory is not needed.  I'm not sure where the accounting code should go but there are two paths, for when VMPI_allocCodeMemory gets it from GCHeap and when it doesn't.

Ie, I think this is Rick's solution (1) above, more or less.
A slight tweak to that plan would be to have VMPI_allocCodeMemory / protectCodeMemory never use gcheap to allocate memory, but always use this new accounting api.  This gives us one code path for accounting for code memory, and lets code memory be allocated and aligned on system page size boundaries without running into MMgc's 4K page size requirement.
Agreed, I think adding VMPI_allocCodeMemory / VMPI_freeCodeMemory and not using the GCHeap at all makes sense.  

CodeAlloc is in fact already managing the memory, so having it directly call the OS (via VMPI_) seems more natural.

While we're at it we should probably also hoist the flushICache's calls behind a VMPI_ wrapper.

I don't believe that this will cause any issues for tracemonkey, but we may want to confirm before going off in this direction.
(In reply to comment #4)
> A slight tweak to that plan would be to have VMPI_allocCodeMemory /
> protectCodeMemory never use gcheap to allocate memory, but always use this new
> accounting api.  This gives us one code path for accounting for code memory,
> and lets code memory be allocated and aligned on system page size boundaries
> without running into MMgc's 4K page size requirement.

Actually this is not a tweak but a substantial change, since it forces every platform to go and reimplement block management whether it wants to or not - it can't use GCHeap for that.  Also it adds one more allocator to the set that competes for blocks and will likely further fragment the block space in the common case.  GCHeap at least has the option of choosing block placement such that fragmentation of the heap is statistically alleviated; with another allocator that kind of coordination will be harder (probably won't exist).

(I'm not saying we can't do it that way, just that there are significant downsides to doing it that way that we should consider.)
Also going directly to the OS for each 64k chunk will greatly increase the burden on the OS's vm implementation (for a 32mb JIT cache that's 512 distinct regions).  Using GCHeap avoids this.  Although in practice the OS has to track these pages separate once we make the pages r-x and these get interleaved with the normal rw- regions.   A separate GCHeap instance seems like the best of both worlds.  All the r-x pages can potentially be merged by the OS.
(In reply to comment #7)
> Also going directly to the OS for each 64k chunk will greatly increase the
> burden on the OS's vm implementation (for a 32mb JIT cache that's 512 distinct
> regions).  Using GCHeap avoids this.  Although in practice the OS has to track
> these pages separate once we make the pages r-x and these get interleaved with
> the normal rw- regions.   A separate GCHeap instance seems like the best of
> both worlds.  All the r-x pages can potentially be merged by the OS.

A separate GCHeap instance seems like a huge amount of complexity for what ought to be a simple problem to solve, IMO.
Inventing a new block manager or extending CodeAlloc to deal with more than 64k chunks seems just as, if not more complex to me.   Definitely the simplest choice is to not sweat the block management/fragmentation issues and go right to the OS each time, although increasing the default allocation size from 64k might be in order if we did that.
On ARM platforms each allocCodeChunk allocates only 4k. There is following comment in CodeAlloc.cpp: "ARM requires single-page allocations, due to the constant pool that lives on each page that must be reachable by a 4kb pcrel load."
If CodeAlloc can't be easily made to make large allocations and still have a cpool on every page that would seem to be a strong argument for having a block manager in play.
I'm not sure I totally follow;  CodeAlloc is already managing the memory.   So in fact, we already have 2 mem managers;  GCHeap and CodeAlloc or is CodeAlloc not as complete a mem mgr as I am assuming.
CodeAlloc gets its memory from GCHeap so its got a block manager between it and the OS.  Having all the memory come from GCHeap was simple for accounting purposes, any change away from that will have to "account" for it ;-)   It also probably helped from a fragmentation perspective and from an OS resource comsumption perspective (fewer VM regions for it to manage) but those aren't dealbreakers from what we've seen.
When allocating executable memory on Symbian, virtual allocation can not be used. Because of this having a separated GCHeap instance may not make sense. It will not reduce the fragmentation because it can only call VMPI_allocAlignedMemory, which allows the OS return memory from any address. And it can allocate only 4k a time.

Personally I don't know where this 4k limitation comes from. Maybe someone could verify if this applies to all ARM platforms etc.
Target Milestone: flash10.1 → flash10.1.1
Pinging about this bug. How could we make this go into conclusion? This bug is blocking another bug, namely checking in Symbian JIT changes to Tamarin.
I'll take ownership and try to make something happen by the end of the week.
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Depends on: 553444
Attached patch New porting interfaces (VMPI.h) (obsolete) — Splinter Review
Removes VMPI_setPageProtection; its API is not ideal for what we're trying to do and we're not currently using the generality it offers.

Adds VMPI_allocateCodeMemory, VMPI_freeCodeMemory, and VMPI_makeCodeMemoryExecutable, all with quite restrictive semantics.  Notably, though, the alloc/free API is allowed to use GCHeap memory on platforms where that makes sense.
Attachment #434658 - Flags: review?(edwsmith)
Attachment #434658 - Flags: feedback?(skekki)
There are two minor drive-by changes:

- the pinning of the SPARC page size to 4KB has been removed, for the time
  being, to facilitate testing - we probably want to make that change
  permanent

- one call in Windows platform code that went directly to GetPlatformInfo for
  the page size has been replaced by a call to VMPI_getVMPageSize, this makes
  testing easier (because the latter can simulate a different page size)
Attachment #434659 - Flags: review?(edwsmith)
Attachment #434659 - Flags: feedback?(skekki)
Comment on attachment 434659 [details] [diff] [review]
Implementations of those porting interfaces for Posix, Windows; stubs for Symbian

I should note that these changes require the aligned-allocation patch from bug #555444.
Whiteboard: Symbian → Has patch, Symbian
Based on the patches VMPI_allocateCodeMemory should return memory aligned by page boundaries.

I doubt we can guarantee VMPI_allocateCodeMemory never returns null on Symbian. What should we do when the OS runs out of memory?
(In reply to comment #22)
> Based on the patches VMPI_allocateCodeMemory should return memory aligned by
> page boundaries.
> 
> I doubt we can guarantee VMPI_allocateCodeMemory never returns null on Symbian.
> What should we do when the OS runs out of memory?

Good point.  At that stage you would want to either (a) abort or (b) invoke the general out of memory handling machinery in GCHeap, in the hope that a collection or selective instance shutdown will free up some memory, and then retry the allocation.  If the allocation still fails you must abort by invoking some generic instance shutdown mechanism in GCHeap, we should discuss whether we have enough APIs for that.  But you can't return NULL.

I could change it so that you could return NULL, but the JIT would just do the same thing, it has no provision for failing to allocate memory.  Yet on most platforms the VMPI level will just call GCHeap here, and GCHeap will take care of invoking the low-memory machinery, and it seems reasonable to place the burden on the Symbian VMPI level, by providing suitable APIs.
Blocks: 551051
Blocks a 10.1 bug, upgrading.
Target Milestone: flash10.1.1 → flash10.1
Comment on attachment 434658 [details] [diff] [review]
New porting interfaces (VMPI.h)

Whats there looks good.  Presumably the jit should call VMPI_getVMPageSize() to ensure not using allocations smaller than the system page size.

the ARM JIT backend will break if the page size is >4K.
Attachment #434658 - Flags: review?(edwsmith) → review+
Comment on attachment 434659 [details] [diff] [review]
Implementations of those porting interfaces for Posix, Windows; stubs for Symbian

substance:
* the windows code for VMPI_makeCodeMemoryExecutable() uses VirtualQuery to adjust the region size/length before marking. (because GCHeap might have asked for a larger chunk and we're just getting part of it).  

It looks like all its doing is coalescing regions but it is critical that it not go outside the bounds of what's passed in, or else we'd make non-code memory RX, or adjacent code memory RW.  (at granularities larger than a page).  The logic looks right to me, in that it won't go outside [address,nbytes).  But, if nbytes > mbi.regionSize, we'll end up using markSize = nbytes, i.e. pass in a size larger than mbi.regionSize.  will VirtualProtect allow that?

style:

* factor and re-use the invariant-checking code?  
* if VMPI_abort is compiled out with #ifdef DEBUG, then the whole block could be ifdef'd.  if not, then fine-as is.
Comment on attachment 434659 [details] [diff] [review]
Implementations of those porting interfaces for Posix, Windows; stubs for Symbian

R+ presuming there's no Gotchas in the windows impl.
Attachment #434659 - Flags: review?(edwsmith) → review+
Comment on attachment 434670 [details] [diff] [review]
Remove existing GCHeap code allocation APIs, introduce new accounting APIs

GetTotalCodeSize() looks dead for now.  I'm guessing thats because GCHeap still is managing pressure itself when VMPI_alloc/freeCodeMemory() call GCHeap->Alloc/Free().
Attachment #434670 - Flags: review?(edwsmith) → review+
Comment on attachment 434671 [details] [diff] [review]
Changes to CodegenLIR and nanojit to use new porting interfaces

Needs an ARM-only assert that bytesPerPage >= 4096, since the  JIT requires pages aren't bigger than 4K.
Attachment #434671 - Flags: review?(edwsmith)
Attachment #434671 - Flags: review+
Attachment #434671 - Flags: feedback?
Attachment #434671 - Flags: feedback?
I believe it is currently hard coded in avm how much code memory is allocated each time. In the new API doc the size can change between allocations. Any comments on that? This will affect what kind of implementation and APIs I will use on Symbian platform (is the allocation size constant 4k or can it change between calls).
the ARM jit backend always alocates 4K at a time right now, but in the future we want to change that (allocate bigger chunks, less often) hopefully without having to make significant changes in the Symbian VMPI_allocCodeMemory implementation.
Attachment #434658 - Flags: feedback?(skekki) → feedback+
Attachment #434659 - Flags: feedback?(skekki) → feedback+
(In reply to comment #19)
> (From update of attachment 434659 [details] [diff] [review])
> I should note that these changes require the aligned-allocation patch from bug
> #555444.

Actually bug #553444.
(In reply to comment #26)
> style:
> 
> * factor and re-use the invariant-checking code?  

I'm disinclined to do that, the point here is that the invariants are inherently platform-specific, even if that leads to duplication across platforms.  Maybe you're thinking within each platform?  Could do that, didn't seem worth the bother.

> * if VMPI_abort is compiled out with #ifdef DEBUG, then the whole block could
> be ifdef'd.  if not, then fine-as is.

It's crucial that the code /will/ abort in release builds if the assumptions are violated, so I sure hope not.  Nothing suggests this to be otherwise at the present time - VMPI_abort maps directly to ::abort.
Another problem is how to stay within heap soft and hard limits if the VMPI layer does not go through GCHeap.  The simple fix would perhaps be for GCHeap::SignalCodeMemoryAllocation(..., false) to be called before any allocation
attempt is made in the VMPI layer, and to try to make space available if hard/soft limits are about to be broken.

The documentation would need to be specific about that.

Also, the documentation needs to state that if the allocation from the OS fails
for any reason then the VMPI layer needs to invoke OOM handling in GCHeap.  The no-brainer solution appears to be to make GCHeap::SystemOOMEvent generally available so that it can be used in this situation.
See justification in previous comment; we want native code memory allocators to be able to call this in the event of allocation failure.
Attachment #435480 - Flags: review?(treilly)
CodegenLIR won't be using the existing code-allocation APIs, so remove them.

Add APIs for signaling code allocation and deallocation from the new VMPI layer.  These APIs perform accounting, factoring external code memory into the total heap size (see update to GetTotalHeapSize).  The allocation API also tries to make space available if necessary, by interacting with the OOM machinery.  I'm particularly interested in a review of the latter bit.

GetTotalCodeSize() is currently unused but the cost of leaving it in seemed slight, so I did.
Attachment #435488 - Flags: review?(treilly)
Attachment #434670 - Attachment is obsolete: true
Same API, but comments updated to clarify that accounting APIs must be used (and that they take care of interacting with memory monitoring) and clarifying how to signal allocation failure.
Attachment #434658 - Attachment is obsolete: true
Attachment #435490 - Flags: review?(edwsmith)
Attachment #435490 - Flags: feedback?(skekki)
Extracted from attachment #434659 [details] [diff] [review], without changes.  I consider these reviewed.
(In reply to comment #26)
> substance:
> * the windows code for VMPI_makeCodeMemoryExecutable() uses VirtualQuery to
> adjust the region size/length before marking. (because GCHeap might have asked
> for a larger chunk and we're just getting part of it).  
> 
> It looks like all its doing is coalescing regions but it is critical that it
> not go outside the bounds of what's passed in, or else we'd make non-code
> memory RX, or adjacent code memory RW.  (at granularities larger than a
> page).  The logic looks right to me, in that it won't go outside
> [address,nbytes).  But, if nbytes > mbi.regionSize, we'll end up using
> markSize = nbytes, i.e.
> pass in a size larger than mbi.regionSize.  will VirtualProtect allow that?

If nbytes > regionSize then marksize = regionSize, otherwise, marksize = nbytes.  Looks right to me.  Details?

What's going on here is this.  The code was written to handle the case where the protection request spans memory allocated by more than one call to VirtualAlloc; this can happen if the JIT merges code memory allocations (it should not) or if GCHeap merges regions (it won't, on Windows, if it obeys the VMPI call that determines whether it should or not).  So the code may be more general than it needs to be.
Same as before, except that the drive-by fixes were factored out and the ordering of the SignalCodeMemoryAllocation / Alloc calls was changed, as required by the new VMPI spec.
Attachment #434659 - Attachment is obsolete: true
Attachment #435493 - Flags: review?(edwsmith)
Attachment #435480 - Flags: review?(treilly) → review+
Attachment #435488 - Flags: review?(treilly) → review+
Attachment #435490 - Flags: review?(edwsmith) → review+
Attachment #435493 - Flags: review?(edwsmith) → review+
Comment on attachment 435493 [details] [diff] [review]
Implementations of those porting interfaces for Posix, Windows; stubs for Symbian

(In reply to comment #40)
> If nbytes > regionSize then marksize = regionSize, otherwise, marksize =
> nbytes.  Looks right to me.  Details?

Ah, right.  I mis-parsed the code as the "max" idiom, but i'ts "min".  reversing the comparison would have helped me.  your call if you want to make any change there.

only skimmed this version, nothing obviously wrong.
Attachment #435490 - Flags: feedback?(skekki) → feedback+
I think that's consensus then.  Will run it all through the sandbox and land as soon as I can.
tamarin-redux-argo changeset:   3895:cd9b23eebc4c
tamarin-redux changeset:   4244:5d3bb61c42ab
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reopening because of some bugs not found during inspection and prior testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Bug fixes (obsolete) — Splinter Review
Two problems: The use of '-' does not play well with size_t, so reorder expressions; and one use of a constant 'block' variable instead of the correct 'firstBlock'.
Attachment #436074 - Flags: review?(treilly)
Comment on attachment 436074 [details] [diff] [review]
Bug fixes

Preemptively pushed: tamarin-redux-argo changeset:   3896:1b909e8fd9b9, keeping bug open waiting for review.  Will forward merge to TR.
Depends on: 556275
Comment on attachment 436074 [details] [diff] [review]
Bug fixes

New patch coming.
Attachment #436074 - Attachment is obsolete: true
Attachment #436074 - Flags: review?(treilly)
Fait accompli.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Attachment #437029 - Flags: review?(nnethercote) → review+
pushed nanojit changes
http://hg.mozilla.org/projects/nanojit-central/rev/f1734da5dae9

I goofed in the (irrevocable) change notes, it should read r=nnethercote+
This broke the NJ tinderbox: http://tinderbox.mozilla.org/showbuilds.cgi?tree=Nanojit

Problem is that VMPI_getVMPageSize() is defined in TR, but not in NJ or TM.  So I added it:

http://hg.mozilla.org/projects/nanojit-central/rev/54edfaf7f895

This seemed like the right fix but I wasn't certain, please speak up if there's a better way to do it.
I'm batting zero, I should have included this in the n-c checkin.  VMPI_getVMPageSize() should return whatever getpagesize() (or the os-specific equivalent) returns. typically 4k, sometimes 8k or other values.  in TR it returns a cached value computed at static-init time:

e.g. on linux, it gets initialized here:
http://hg.mozilla.org/tamarin-redux/file/a6eae040f68f/VMPI/MMgcPortUnix.cpp#l75

and VMPI_getVMPageSize() is just a wrapper:
http://hg.mozilla.org/tamarin-redux/file/a6eae040f68f/VMPI/MMgcPortUnix.cpp#l85
(In reply to comment #53)
> VMPI_getVMPageSize() should return whatever getpagesize() (or the os-specific
> equivalent) returns.

Can you change what I wrote to something better?  Thanks.
http://hg.mozilla.org/tracemonkey/rev/ff2465307763
http://hg.mozilla.org/tracemonkey/rev/3f8001aa203a
Whiteboard: Has patch, Symbian → Has patch, Symbian, fixed-in-nanojit, fixed-in-tracemonkey
QA Contact: vm → dschaffe
is there a testcase we can use to regression test and mark this bug verified?  could a selftest testcase be written?
Dan Schaffer: no test case, this is really a work item.
marking as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.