Closed Bug 587727 Opened 9 years ago Closed 9 years ago

nanojit: Reduce paging traffic from calls to CodeAlloc.markExec()

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rreitmai, Assigned: rreitmai)

References

Details

(Whiteboard: fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin)

Attachments

(3 files, 3 obsolete files)

Attached patch fix (obsolete) — Splinter Review
When CodegenLIR is torn-down the LirHelper dtor() is invoked which calls LirHelper.cleanup().   

The CodegenLIR dtor contains an explicit call to LirHelper.cleanup() which results in the method getting invoked twice.
Attachment #466366 - Flags: review?(edwsmith)
Attachment #466366 - Attachment is patch: true
Attachment #466366 - Attachment mime type: application/octet-stream → text/plain
is this safe even along the verifyException path?
Comment on attachment 466366 [details] [diff] [review]
fix

double check me, but by my reading, the call to LirHelper::cleanup() is required when the CodeWriter pipeline is destroyed via CodeWriter::cleanup() when handling a verify exception.  Maybe we could use a flag to skip the extra call to CodeAlloc::markAllExec(), in the normal cleanup case.
Attachment #466366 - Flags: review?(edwsmith) → review-
Good catch!  

We could use a flag as you suggest, but maybe we should go a bit deeper and attack markAllExec().   

It's iterating over all heapblocks looking for ones marked -x which could be expensive.  E.g. ARM where bytesPerAlloc is a single 4KB page.

We could add another pointer to CodeList for tracking only those blocks that have been passed out via alloc() and then limit the transition to +x on only those blocks.   Although free() might complicate this approach.
Yep, it reads the isExec bit on each page, and only marks pages RX when we already know they are RW.  This will only be the case for pages touched during jitting of a single method.  So, we'll only make a few mprotect calls().

But, as a CodeAlloc gets larger, the # of pages touched in a single jit session remains flat, this gets more and more expensive (we're just reading from the page header, but we will read from each page header, of ever page under management).  If we keep a list pages marked W during assembly, then mark them back to X when we're done, this list should not be owned by CodeAlloc since its transient.
(In reply to comment #4)
> this list should not be owned by CodeAlloc since its transient.

Not sure I completely agree.  CodeAlloc (indirectly via CodeList) is already responsible for checking and setting the permission bits on the blocks.

I can see how we might want to allocate the data structure outside of the CodeList/CodeAlloc, but the manipulation and maintenance of this structure should surely be in CodeList/CodeAlloc.
I do agree with that.  At the end of compiling a function, we have a list of all the code blocks in the function.  (assm->codeList).  can we build a CodeAlloc that takes that list and markes the required pages as RX?  would it miss any pages?
(In reply to comment #6)
> can we build a CodeAlloc that ...
oops. insert              ^API
Attached patch v2 (obsolete) — Splinter Review
Consolidated patch including Tamarin and NJ changes - if +ve review I will break apart into 2 (maybe 3) patches prior to landing.

This patch does some deeper surgery which results in less work during the page bit transition from RW- to R-X, when heapblocks is large.  

endAssembly() now has a call to markExec() which uses the list of blocks comprising the codeList as the input for marking all the affected chunks.

A few other misc items:

(1) asserts to add/removeBlock() to enforce that they don't modify terminator blocks.
(2) Consolidated tear-down code for the assembler into a few helper functions
(3) Assembler codeList is now a private member and clients no longer need to explicitly manage the CodeAlloc object in the case of errors.  This logic is now contained within endAssembly() which calls cleanupAfterError().
(4) cleanupAfterError() was left as a public member to allow clients to free up the code resources even after the assembler completed without an error.  E.g. for debugging purposes.  The code blocks used will be returned to the free list and all code memory will be restored to R-X.

I have not yet investigated what these changes mean for TM, but will do shortly and ask NJN to review.
Assignee: nobody → rreitmai
Attachment #466366 - Attachment is obsolete: true
Attachment #471618 - Flags: review?(edwsmith)
Attachment #471618 - Attachment is patch: true
Attachment #471618 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 471618 [details] [diff] [review]
v2

R- only because I think the patches should be split, and this could induce enough churn to be worth a separate review.  I didn't find any bugs.  Since part of this patch is an optimization, it should be separate and it should have some kind of supporting data.

cleanupAfterError() needs a comment explaining what it does, and when it should be called by clients of Assembler, and when it's not necessary to call it.

Similar for clearNInsPtrs().  if it's intended to be called separately from reset(), then what are the conditions for calling it.  if the only callers are in Assembler, should it be private?

Looks like now, we use assm->codeList to only mark modified chunks RX, instead of sweeping through all chunks (cool), after Assembler succeeds.  I can't think of a reason why we would need markAllExec() for the error case, and markExec() for the success case.  Couldn't we use markExec() for both cases?  The two partially-full blocks (codeStart-codeEnd, and exitStart-exitEnd), could be marked RX explicitly, or added to codeList just like the success case, then the codeList is freed+RX-ified in one loop.

then the only reason to loop over all blocks would be in a validation function (assert they're all RX).
Attachment #471618 - Flags: review?(edwsmith) → review-
Attached patch v3 - cleanup patch (obsolete) — Splinter Review
re-work of patch, including simon since his background compiler work is probably impacted slightly.
Attachment #471618 - Attachment is obsolete: true
Attachment #480341 - Flags: review?(edwsmith)
Attachment #480341 - Flags: feedback?(siwilkin)
Attachment #480341 - Flags: superreview?(edwsmith)
Attachment #480341 - Flags: review?(wmaddox)
Attachment #480341 - Flags: review?(edwsmith)
Besides the optimization (i.e. no longer iterate over entire heapblock list), the tear-down sequence for assembler has somewhat changed.

LirWriter dtor() no longer triggers page permissions bits to switch over to R-X.  The ownership now resides with endAssembly(), the client no longer needs to call freeAll() in the case of errors.

Also of note, in the error case, the code falls back to calling markAllExec(), i.e. iterating through the entire code alloc heap, setting permissions R-X.
Attachment #480807 - Flags: review?(nnethercote)
Attachment #480807 - Flags: review?(edwsmith)
Blocks: 601724
Attachment #480341 - Flags: review?(nnethercote)
Summary: CodegenLIR cleanup() redundantly calls LirHelper::cleanup() → nanojit: Reduce paging traffic from calls to CodeAlloc.markExec()
Comment on attachment 480341 [details] [diff] [review]
v3 - cleanup patch

>+    // make an entire chunk executable
>+    void CodeAlloc::markChunkExec(CodeList* term) {
>+        NanoAssert(term->terminator == NULL);
>+        if (!term->isExec) {
>+            term->isExec = true;
>+            markCodeChunkExec(firstBlock(term), bytesPerAlloc);
>         }
>     }

Nit: Is 'hb' a better name than 'term'?

The patch seems fine, it's all non-functional changes AFAICT.
Attachment #480341 - Flags: review?(nnethercote) → review+
Comment on attachment 480807 [details] [diff] [review]
no longer iterate over all heapblocks

I don't understand this patch, despite talking to Rick about it on IRC today.  Rick, can you give a step-back-and-explain-from-the-start explanation?  I don't want to proceed with exec-marking changes without understanding them very carefully.  Thanks.
(In reply to comment #13)
> Comment on attachment 480807 [details] [diff] [review]
>
> ... without understanding them very carefully.  Thanks.

As it should be!

When Assembler calls CodeAlloc.alloc() , a contiguous region of memory is given to Assembler in the form of (start,end) addresses.  As a side-effect of this activity, codeAlloc calls out to makeCodeChunkWrite(), marking the entire chunk (64KB on osx and windows) read-writable (RW), even if only a portion of the chunk is handed over.

Assembler fills the memory backwards, starting from end address.  If more memory is required another call to alloc() is performed and a new start,end address pair is returned.  Rinse, repeat until endAssembly() is called.

endAssembly() calls codeAlloc.addRemainder() which places any unused memory back on the free list (aka availblocks).

Now we've got code written into one of more contiguous regions of memory and the 64KB chunks that the region(s) reside in, are marked RW.  We need to flip the permission bits of those chunks to RX.

In Tamarin, when the LirHelper dtor() fires, codeAlloc.markAllExec() is called, which then iterates over *all* the 64KB chunks the codeAlloc maintains and marks the pages RX for those chunks that are RW.

The 2nd patch does a couple of things, it moves the call which makes memory RX out of the LirHelper.dtor() and into endAssembly().  Which is just good general hygiene, since the management of the permission bits really doesn't belong in the nanojit client.

The *optimization* part comes in the form of not iterating over *all* chunks (markAllExec) but instead only looking at the chunks used by Assembler (markExec).

The TM code is not calling markAllExec() and so doesn't need to change at all, but with the addition of codeAlloc.markExec() in endAssembly() there could be a small hit to compile time.
Attachment #480341 - Flags: review?(wmaddox)
(In reply to comment #14)
> 
> The TM code is not calling markAllExec() and so doesn't need to change at all,
> but with the addition of codeAlloc.markExec() in endAssembly() there could be a
> small hit to compile time.

I was wondering how TM worked without calling markAllExec();  it seems we just allocate code chunks as RWX and markCodeChunk{Write,Exec} are no-ops.  Hmm.

Can someone give me a brief explanation of the difference between chunks and blocks, including how they relate to page sizes?
Comment on attachment 480807 [details] [diff] [review]
no longer iterate over all heapblocks

I measured SunSpider, The perf hit on TM is truly negligible.
Attachment #480807 - Flags: review?(nnethercote) → review+
(In reply to comment #15)
> Can someone give me a brief explanation of the difference between chunks and
> blocks, including how they relate to page sizes?

note to self: explain this better in CodeAlloc.h.

A chunk is a large page-aligned region of memory that is a multiple of the system page size, and is the unit of memory managed by the SPI methods with CodeChunk in their name.  ({alloc|free}CodeChunk(), markCodeChunk{Exec,Write}).  Chunks are always bytesPerAlloc in size, although the SPI doesn't require it.

Blocks are arbitrary sized regions of code, not necessarily page aligned.  A block can span a page boundary, but not a chunk boundary.  Single instances of CodeList track single blocks.  Blocks are linked together into lists; typically a single compiled piece of code (method in TR, fragment in TM) is one or more blocks.  CodeBlock might be a better name for class CodeList.

In TM, the minimum number of blocks per trace fragment is probably two, since mainline code and exit code are generated into separate blocks.
(In reply to comment #17)
> 
> note to self: explain this better in CodeAlloc.h.

Yes please! :)

> A chunk is a large page-aligned region of memory that is a multiple of the
> system page size, and is the unit of memory managed by the SPI methods with

What is SPI?

> Blocks are arbitrary sized regions of code, not necessarily page aligned.  A
> block can span a page boundary, but not a chunk boundary.

A block must fit within a chunk, presumably?

> Single instances of CodeList track single blocks.

Er, surely a CodeList tracks multiple blocks?
(In reply to comment #18)

> What is SPI?

Service Provider Interface

> > Blocks are arbitrary sized regions of code, not necessarily page aligned.  A
> > block can span a page boundary, but not a chunk boundary.
> 
> A block must fit within a chunk, presumably?

Correct.  The allocator does not coalesce chunks it gets from the host VM via chunkAlloc, so blocks necessarily fit inside chunks.

> Er, surely a CodeList tracks multiple blocks?

Each instance of a CodeList represents a single block.  It is a linked list
node, so conceptually it is a dual of the remaining list or a single node.

In hindsight I dont like this name much, I prefer to think of the "list"
as the pointer that points to the first block, e.g. Assembler::codeList,
and each node as a single block.  CodeBlock is more clear.
Comment on attachment 480341 [details] [diff] [review]
v3 - cleanup patch

Could you clarify this comment:

  /** protect an entire chunk */
  void markChunkExec(CodeList* term);

The comment in C++ code is clearer, but doesn't articulate the connection
between 'term' and 'chunk':

  // make an entire chunk executable
  void CodeAlloc::markChunkExec(CodeList* term) {

I think it means the chunk tracked by "term"?  All the other "chunk"
functions use the term CodeChunk, I think this one should too.  R+ with that fixed.
Attachment #480341 - Flags: superreview?(edwsmith) → superreview+
http://hg.mozilla.org/tracemonkey/rev/4e4ef0a6abe7
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
Comment on attachment 483153 [details] [diff] [review]
Clarify block and chunk terminology and roles of CodeList and CodeAlloc


>+     * CodeList is a single block of blocks of code.  

"block of blocks"?

I still find this highly confusing.  Surely it's not that hard.  A block is
contiguous, right?  Is a CodeList instance just a wrapper around a block
that allows the block to be put into a linked list?  Is there an actual C++
type that corresponds to "block", or is that somehow what CodeList is?
Attachment #483153 - Flags: review?(nnethercote) → review-
(In reply to comment #24)
> Comment on attachment 483153 [details] [diff] [review]
> Clarify block and chunk terminology and roles of CodeList and CodeAlloc
> 
> 
> >+     * CodeList is a single block of blocks of code.  
> 
> "block of blocks"?

sigh, my bad. it should read:

  * CodeList is a single block of code.

> I still find this highly confusing.  Surely it's not that hard.  A block is
> contiguous, right?

Yeah, its not that hard.  our lines are getting crossed somehow.  I think you're just getting confused by the name CodeList -- CodeBlock would be clearer, but I didn't want to muddy things up by renaming classes in the middle of trying to explain what they are.  (but, i'm happy to rename, clarity is king).

> Is a CodeList instance just a wrapper around a block
> that allows the block to be put into a linked list?

Yep, its just the node type.  the actual code is next to the CodeList in memory:

    [CodeList fields][code]
    ^
    CodeList* pointers point here

> Is there an actual C++
> type that corresponds to "block", or is that somehow what CodeList is?

Thats what CodeList is.
There are a few more CodeAlloc patches (all awaiting review) besides the ones above, that I'd like to land prior to any mass change.

I'm happy to do the rename once we've cleared these out.  Also in the interest of reducing the amount of rebasing needed, I'd like to manage the landing of Ed' comment patch, once its sorted out.
Attachment #480341 - Flags: feedback?(siwilkin)
(In reply to comment #25)
> 
>     [CodeList fields][code]
>     ^
>     CodeList* pointers point here
> 
> > Is there an actual C++
> > type that corresponds to "block", or is that somehow what CodeList is?
> 
> Thats what CodeList is.

Right, I think that's what I've been missing.  "CodeBlock" is much better (how about "Block"?)(In reply to comment #26)

> There are a few more CodeAlloc patches (all awaiting review) besides the ones
> above, that I'd like to land prior to any mass change.
> 
> I'm happy to do the rename once we've cleared these out.  Also in the interest
> of reducing the amount of rebasing needed, I'd like to manage the landing of
> Ed' comment patch, once its sorted out.

Ok, perhaps that should be a separate bug?
Sounds fine with me -- Rick, you can just take this over and land the improved comments and rename when it is convenient.
Will do.
Comment on attachment 480807 [details] [diff] [review]
no longer iterate over all heapblocks

The patch is hard to review because it was generated (apparently) with hg export instead of hg diff.  in BZ, filenames don't show up in the side by side diff.

Could you post before/after performance numbers?  This bug is an attempt to improve performance, so the difference should be measureable.
Attachment #480807 - Flags: review?(edwsmith) → review+
http://hg.mozilla.org/mozilla-central/rev/4e4ef0a6abe7
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
re-opening ... some of the patches have not landed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #30)
> Comment on attachment 480807 [details] [diff] [review]
> no longer iterate over all heapblocks
> 
> The patch is hard to review because it was generated (apparently) with hg
> export instead of hg diff.  in BZ, filenames don't show up in the side by side
> diff.

<brokenrecord>
You don't need to switch from hg-export to hg-diff.  You just need to turn on git-style diff formatting if you're going to use hg-export.

More discussion at:
https://zerowing.corp.adobe.com/display/FlashPlayer/asteam+mercurial+best+practices

and also at:
https://developer.mozilla.org/en/Installing_Mercurial
https://developer.mozilla.org/en/Mercurial_FAQ
(though I don't think this particular Bugzilla issue is mentioned there; its just implicit in the fact that the Installing_Mercurial page mandates the use of git-style diff formatting.
</brokenrecord>
(In reply to comment #33)
> (In reply to comment #30)
> You don't need to switch from hg-export to hg-diff.  You just need to turn on
> git-style diff formatting if you're going to use hg-export.

Right, this is the only patch that has this issue; after Nick/Ed pointed it out I switched settings.
Attachment #480341 - Attachment is obsolete: true
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tarmain
Blocks: 609517
pushed comment changes http://hg.mozilla.org/projects/nanojit-central/rev/3b0fb708c5c9

comment 11 discusses more extension changes which lead to the opening of bug 609517
rreitmai http://hg.mozilla.org/tamarin-redux/rev/aaf6b36c00b1
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tarmain → fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin
http://hg.mozilla.org/mozilla-central/rev/0e57e014bc91
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.