Closed Bug 1004831 Opened 10 years ago Closed 10 years ago

Implement native to bytecode mapping for Ion

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: djvj, Assigned: djvj)

References

Details

Attachments

(6 files, 32 obsolete files)

67.62 KB, patch
djvj
: review+
Details | Diff | Splinter Review
60.47 KB, patch
djvj
: review+
Details | Diff | Splinter Review
9.91 KB, patch
djvj
: review+
Details | Diff | Splinter Review
27.82 KB, patch
djvj
: review+
Details | Diff | Splinter Review
46.21 KB, patch
djvj
: review+
Details | Diff | Splinter Review
8.41 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
We need to have tables to map native code addresses into jitcode to the sequence of (JSScript *, jsbytecode *) pairs active at that site within the frame, to eventually be able to remove the pseudostack usage by jitcode.

Some foundation for this work went in as the fix for bug 994957: the addition of an InlineScriptTree and BytecodeSite to track the stack of inlined scripts and their call locations at a given IR instruction.

The next step is to accumulate these into a table during compile time, and then encode them into a compactly represented map.
This patch adds PcOffset MIR and LIR instructions that track the BytecodeSite each bytecode op that gets compiled by Ion.  During codegeneration, these instructions don't emit any code, but add an entry to a vector in CodeGeneratorShared that documents the list of all native code correspondences to bytecodes.
Attached patch add-compact-map.patch (obsolete) — Splinter Review
This adds the compact representation (JitcodeMap.h and JitcodeMap.cpp), and instruments the codegenerator to generate those maps.

There are some remaining issues with figuring out proper BytecodeSites for OOL code.
Assignee: nobody → kvijayan
Attached patch add-compact-map.patch (obsolete) — Splinter Review
Added a simple debug-only verifier of the generated data (just reading out the compact data and ensuring that it is sane), which revealed a lot of little bugs, which were fixed.

Now with verifier, jit-tests pass.  Still need to handle mappings for OOL code properly.
Attachment #8416241 - Attachment is obsolete: true
Attached patch fix-ool-site-tracking.patch (obsolete) — Splinter Review
This patch fixes up generation of OOL (not ICs yet, just OOL code emitted in the mainline jitcode) so that we track them properly by the bytecode that generated them.  Most of this is just ensuring that MIR instructions get forwarded correctly to the place where do addOutOfLineInstruction, where we pull out the bytecodeSite for the instruction and attach it to the OOL structure.

Later on, when emitting OOL code, we make add table entries in the nativeToBytecodeMap before doing the emission of each individual OOL site.
Attached patch add-global-table.patch (obsolete) — Splinter Review
More work on the native => pc mapping.  This patch adds a global table containing a sorted array of JitcodeGlobalEntry values.
Attached patch add-lookup-methods.patch (obsolete) — Splinter Review
Fixed up the binary search methods on global table.  Added lookup methods to allow for full lookup across the global table, the region table, and delta-runs.
Modifies table generation so that it happens if either profiling or DEBUG is enabled.

Changes CodeGenerator::link to register the native => bytecode maps for each piece of generated Ion jitcode into the JitcodeGlobalTable.

Currently this patch causes a lot test failures because we never de-register these entries, and it hits an assert at the end of the script that checks that the jitcode global table should be empty on destruction.

Disabling that assert causes tests to run fine.
Blocks: 104110
Blocks: 1004110
No longer blocks: 104110
Some updates on this:

I've hooked in the native=>pc mapping code into InlineFrameIterator for verification purposes only.  This revealed a couple of issues that need to be addressed.

1. Bytecode sites returned from InlineFrameIterator do not have to correspond go the specific op that the call is returning from.  Instead, if they are idempotent calls, the bytecode PC returned by InlineFrameIterator will correspond go the PC of the resume point for the instruction, not the instruction itself.

2. I think code-motion is moving some calls into different bytecode regions, causing the verifier to complain.  I'm not sure that MPCOffset is actually necessary.  Most MIR instructions implicitly carry a bytecode site, and almost all LIR instructions carry their corresponding MIR.  These can be used directly to record native => bytecode region sites, instead of relying on MPcOffset which may actually yield an incorrect reflection of the native => bytecode correspondence.
Well, I think this is finally working.  I'm leaving MPcOffset in, but they are augmented with entries for individual MIR operations.  The generated table is de-duplicated and normalized before compaction, so this is not an issue - just a bit more compile-time memory usage.  We can remove MPcOffset later on if it's found to be truly unnecessary.

Rebasing to tip and posting fresh patches.  I think this is ready for review, but probably after merging the patches.
This patch adds the PcOffset IR instructions, and adds a vector of BytecodeSites to the codegenerator, and instruments codegeneration to record the places where bytecode generated native code.
Attachment #8416239 - Attachment is obsolete: true
Attached patch 2-add-compact-map.patch (obsolete) — Splinter Review
This builds on patch 1.  We use the recorded vector of BytecodeSites to generate a compact representation of the mapping.  Some basic verification code double-checks the generated tables.
Attachment #8416738 - Attachment is obsolete: true
Attached patch 3-fix-ool-site-tracking.patch (obsolete) — Splinter Review
This patch fixes up our code-generation for OOLs so that all OOL code generated within Ion has a proper BytecodeSite associated with it, and that this is registered with the vector of BytecodeSites recorded in the CodeGenerator.
Attachment #8418354 - Attachment is obsolete: true
Attached patch 4-add-global-table.patch (obsolete) — Splinter Review
Adds the a global table (sorted array, binary searched) of JitcodeGlobalEntry values, hanging off of JitRuntime, to allow for a global lookup of any return address.
Attachment #8430960 - Attachment is obsolete: true
Attached patch 5-add-lookup-methods.patch (obsolete) — Splinter Review
Adds lookup methods to the enable retrieval of bytecode sites from native return addresses.
Attachment #8430961 - Attachment is obsolete: true
Attached patch 6-register-native-mappings.patch (obsolete) — Splinter Review
Modifies jitcode generation and destruction to update the global table of native=>bytecode maps.
Attachment #8431797 - Attachment is obsolete: true
Adds some verification code to double-check the registrations above, by instrumenting JitFrameIterator to cross-check the native=>bytecode mappings against the InlineFrameIterator report for the same frame.

Some fudging is necessary here because the InlineFrameIterator reports resume points, not exact bytecodes associated with the native code.
Comment on attachment 8441672 [details] [diff] [review]
1-track-pc-offsets-during-compile.patch

I would suggest taking a look at CodeGenerator-shared.{h,cpp} first.  It adds a vector of native to bytecode site mappings to the code generator object.

There's a new opcode (MPcoffset/LPcOffset) added, which doesn't generate any code, but adds entries to the vectors.  Codegen also adds entries for every IR instruction that has a valid BytecodeSite, which should add more specificity for operations which move via GVN/LICM.

Some close attention to the logic in "addNativeToBytecodeEntry" in CodeGeneratorShared would be useful.  It tries to keep the vector of entries normalized, incrementally at each call.
Attachment #8441672 - Flags: review?(jdemooij)
Comment on attachment 8441674 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8441674 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.h
@@ +332,5 @@
> + *            respectively.
> + *
> + *      List<NativeAndBytecodeDelta> deltaRun;
> + *          - The rest of the entry is a deltaRun that stores a series of variable-length
> + *            encoded NativeAndBytecodeDelta datums.

NOTE: Add a diagram of what the variable-length run sections look like.
Comment on attachment 8441674 [details] [diff] [review]
2-add-compact-map.patch

This is the packed mapping code.  This just contains the generation of the packed map.  Lookup happens in a later patch.

Starting with js/src/jit/JitcodeMap.{h,cpp} is probably best.

We use a vector of Native=>Bytecode maps in the CodeGenerator, to write out a compact map.  I've made a note for one place I feel needs more documentation, but let me know if you feel there are places where more docs and descriptions would help understandability.
Attachment #8441674 - Flags: review?(luke)
Comment on attachment 8441675 [details] [diff] [review]
3-fix-ool-site-tracking.patch

Review of attachment 8441675 [details] [diff] [review]:
-----------------------------------------------------------------

Most of this patch is just making sure that the appropriate MIR instruction gets associated with all the OOL generators that are run after the main body codegen.
Attachment #8441675 - Flags: review?(jdemooij)
Comment on attachment 8441675 [details] [diff] [review]
3-fix-ool-site-tracking.patch

Review of attachment 8441675 [details] [diff] [review]:
-----------------------------------------------------------------

Switching review to Hannes instead to get some spread on the review.
Attachment #8441675 - Flags: review?(jdemooij) → review?(hv1989)
Comment on attachment 8441677 [details] [diff] [review]
4-add-global-table.patch

Review of attachment 8441677 [details] [diff] [review]:
-----------------------------------------------------------------

The main addition here is JitcodeGlobalTable, which is the table of all JitcodeGlobalEntry entries for all mapped native code.  There's also some fixups to the definition of SizedScriptList.
Attachment #8441677 - Flags: review?(luke)
Comment on attachment 8441672 [details] [diff] [review]
1-track-pc-offsets-during-compile.patch

Review of attachment 8441672 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay. Looks good, r=me with comments below addressed.

::: js/src/jit/IonBuilder.cpp
@@ +1280,5 @@
>          }
>  #endif
>  
> +        if (instrumentedProfiling())
> +            current->add(MPcOffset::New(alloc()));

As discussed yesterday, I think this could result in too much compilation time/memory overhead for huge scripts. Also because many (most?) bytecode ops don't result in any MIR/codegen. It'd be nice if we could remove MPcOffset, or else only do this for "interesting" ops (we could check if the previous MIR instruction was also a MPcOffset and update it in that case, or have a whitelist/blacklist of ops).

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +88,5 @@
>      JSScript *topScript = sps_.getPushed();
>      for (size_t i = 0; i < outOfLineCode_.length(); i++) {
> +        // KVKV: FIXME - this is adding bytecode sites with null tree/pc.
> +        //if (!addNativeToBytecodeEntry(outOfLineCode_[i]->bytecodeSite()))
> +        //    return false;

Fix the comment/commented out code.

@@ +125,5 @@
> +
> +    } else if (current) {
> +        // Otherwise, if a current LIR instruction is active, use its MIR to get
> +        // the site.
> +        code->setBytecodeSite(current->mir()->trackedSite());

Nit: s/instruction/block to make the comment match the code.

@@ +163,5 @@
> +        // If the new entry is for the same inlineScriptTree and same bytecodeOffset, but the
> +        // nativeOffset has changed, do nothing.  The same site just generated some more code.
> +        if (lastEntry.tree == tree && lastEntry.pc == pc) {
> +            IonSpew(IonSpew_Profiling, " => In-place update [%d-%d]",
> +                    int(lastEntry.nativeOffset.offset()), int(nativeOffset));

Nit: if you use %u you can drop the int() casts.

@@ +168,5 @@
> +            return true;
> +        }
> +
> +        // If the new entry is for the same native offset, then update the previous entry with the
> +        // new bytecode site, since the previous bytecode site did not generate any native code.

Nit: comments should fit within 80 columns

::: js/src/jit/shared/CodeGenerator-shared.h
@@ +568,1 @@
>      }

Pre-existing nit: these methods can be const.
Attachment #8441672 - Flags: review?(jdemooij) → review+
Comment on attachment 8441675 [details] [diff] [review]
3-fix-ool-site-tracking.patch

Review of attachment 8441675 [details] [diff] [review]:
-----------------------------------------------------------------

That was one boring patch :P.
(f? is for the ASM question)

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +226,2 @@
>      OutOfLineBailout *ool = new(alloc()) OutOfLineBailout(snapshot, masm.framePushed());
> +    if (!addOutOfLineCode(ool, BytecodeSite(tree, tree->script()->code())))

Can you add the same comment like in x86 about the bailouts and what we track?

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +95,5 @@
>  {
>      JSScript *topScript = sps_.getPushed();
>      for (size_t i = 0; i < outOfLineCode_.length(); i++) {
> +        // Add native => bytecode mapping entries for OOL sites.
> +        // Not enabled on ASM yet since asm doesn't contain bytecode mappings.

@Luke: Do we use the term ASM in code/comments nowadays? Or do we still try to keep that to a minimum?
Attachment #8441675 - Flags: review?(hv1989)
Attachment #8441675 - Flags: review+
Attachment #8441675 - Flags: feedback?(luke)
(In reply to Hannes Verschore [:h4writer] from comment #24)
> @Luke: Do we use the term ASM in code/comments nowadays? Or do we still try
> to keep that to a minimum?

Yes, we do, except it's spelled "asm.js".  It'd only "Odin" that I keep out of the codebase.
Attachment #8441675 - Flags: feedback?(luke)
I've been thinking about the global table implementation.  One aspect of it that has really been troubling me is the sorted-vector store with binary search lookup.

The log(n) lookup is fine. The bigger issue is one that Nicolas brought up: the linear-time insert/delete.  We'll need to insert into the table for every new piece of main-line ion jitcode, every new piece of baseline jitcode, and every new piece of ion IC stubcode.. and to delete when each of these are destroyed, and |n| here scales with the sum of ion, ion ic, and baseline jitcodes.

The "obvious" next solution is a balanced tree of some sort, but then the code complexity is high.  Also, all of these are still log(N) lookup time.

I had this thought today: We can use a hash-table which operates on fixed-size address ranges, and just insert multiple hashtable entries for jitcode larger than the fixed address range size.

e.g. if we assume 4K for the size of the address range, then for jitcode located at memory range [A-B], there would be an entry in the hashtable for every 4K-aligned memory page that the range intersected.  Lookup would just align the query address to 4k and do a regular hashtable lookup.

The current js hashtable structures can be used, lookup is constant time on average, insertion, and deletion are all linear on the size of the jitcode being inserted, and it's hard to execute collision attacks (there's a small, fixed limit to the number of independent jitcodes that can all intersect a single page, even if they're packed end to end).

I think it's probably worth it to scrap the sorted-array implementation and do the multiplexed hashtable implementation.

Thoughs, luke, nicolas?
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(luke)
I'm not sure if the hashtable strategy is simpler than using a balanced binary tree.  For one thing, multiple pieces of code can be in the same 4K region, so the value of the map will need to be a bucket, which adds some code complexity in all operations.  Another problem is that an asm.js module's code range can be like 20mb which would be 5,000 entries in the hash table.

Note, we already have a generic splay tree in mfbt (Brian added it for the backtracking allocator), so it should be pretty easy to just plug in.
Flags: needinfo?(luke)
One clarification about the multi-hashing: we wouldn't need buckets per entry for multiple jitcodes per entry.  Their hashes would collide, but we'd just store them open-addressed.  So:

  Hash(queryPtr) = HashInt(uintptr_t(queryPtr) & ~uintptr_t(0xFFF))

and

  Equal(queryPtr, entry) = entry.value.contains(queryPtr)

would be sufficient.


I hadn't considered the large asm.js modules.  Trees are indeed far better for those.  However, I still dislike the overhead of trees for the non-asm.js common case: there will be many small entries in the range of 32-1024 bytes each, and trees work less well for those.

That said, if there's a sitting splay tree implementation.. that's a pretty strong argument for just going with trees and see if it works (which it probably should).  Thanks for the info.. I think I'll go the splay tree route.
(In reply to Kannan Vijayan [:djvj] from comment #28)
> just store them open-addressed.  So:
>   Hash(queryPtr) = HashInt(uintptr_t(queryPtr) & ~uintptr_t(0xFFF))
>   Equal(queryPtr, entry) = entry.value.contains(queryPtr)
> would be sufficient.

Ooh, neat, that really leverages the Entry/Lookup type split.

> However, I still dislike the overhead of trees for the
> non-asm.js common case: there will be many small entries in the range of
> 32-1024 bytes each, and trees work less well for those.

I thought (1) baseline stubs were per-Runtime, so small finite, (2) Ion stubs were dynamic, but so much less code gets Ion-compiled that this would still be <100,000.  Also, while a binary tree has two words of overhead per entry, a HashMap has 1 (entryHash) AND a HashMap is roughly ~50% over-capacity, so it might even use more space.  If each Splay entry was malloc'd, there'd be additional malloc overhead, but mfbt/SplatyTree leaves allocation up to the user, so you could pool allocate.
From what I understand you want to solve the insertion/removal issue.  I do not think hashtable nor splay tree have an API for adding/removing a bunch of values, and we do not want to iterate a minimal amount as possible.

It seems that the real abstraction that you are looking for is for memory ranges instead.  The 4k blocks is just a simple way to compare pointers, but ranges can be compared as easily with underflow/overflow.

  x is in [a, b] range is  size_t(x - a) <= size_t(b - a)

The way I think of it is a 3 way mapping, One for contiguous memory blocks. One for contiguous {non-,}GC-able ranges within a contiguous buffer.  And a last one for the mapping ranges to PCs.  The reason is simple, every time we allocate new pages, you want to insert data in the first mapping.  Each time you allocate an script code or an IC, you want to register it in the second level.  And each time we generate code, we want to map it inside the third mapping.  This lead us to the following complexity (splay trees on 3 levels):

  log(ExecAllocEntries) + log((ICEntries + FunctionEntries) or AsmEntry) + log(CodeSize / ??)

As luke mentionned, asm.js might have really large block (hashtable for asm?) but I guess we can avoid walking the first 2 levels by caching the range lookup for asm, as we might be likely to remain in asm functions calls.  Note: that in case of asm, the second level might just be one entry.

The advantage of such model is that we reduce the size of the structure where we might have to remove immutable chunks of data.  The 3rd level can be made constant, while the 2sd level index logical blocks.  Adding/Removing becomes a matter of changing logical blocks.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #30)
> The advantage of such model is that we reduce the size of the structure
> where we might have to remove immutable chunks of data.  The 3rd level can
> be made constant, while the 2sd level index logical blocks.  Adding/Removing
> becomes a matter of changing logical blocks.

Also, as we are likely removing all ICs at once, it might be faster to just create a new mapping of the non-GC logical blocks (Baseline / Ion code) than removing the GC-ed once.
(In reply to Jan de Mooij [:jandem] from comment #23)
> As discussed yesterday, I think this could result in too much compilation
> time/memory overhead for huge scripts. Also because many (most?) bytecode
> ops don't result in any MIR/codegen. It'd be nice if we could remove
> MPcOffset, or else only do this for "interesting" ops (we could check if the
> previous MIR instruction was also a MPcOffset and update it in that case, or
> have a whitelist/blacklist of ops).

I just tested this by disabling the generation of MPcOffset MIR instructions, and the verification tests pass just fine.  I'll leave the instruction code around and just disable the generation of them for now, and remove it all once we know for sure that we won't be needing it.

> ::: js/src/jit/shared/CodeGenerator-shared.cpp
> @@ +88,5 @@
> >      JSScript *topScript = sps_.getPushed();
> >      for (size_t i = 0; i < outOfLineCode_.length(); i++) {
> > +        // KVKV: FIXME - this is adding bytecode sites with null tree/pc.
> > +        //if (!addNativeToBytecodeEntry(outOfLineCode_[i]->bytecodeSite()))
> > +        //    return false;
> 
> Fix the comment/commented out code.

This is actually fixed in a later patch (OOL site tracking fixes) that will be merged with this patch before pushing.


Other comments addressed.
Ok I fixed up review comments, changed the global table to use a splay tree instead of a sorted array, and merged some related patches to arrive at a 4-patch changeset.
This patch merges the previous track-pc-offsets and fix-ool-site-tracking patches into one.  Forwarding r+ from jandem and hannes' reviews on those individual patches.
Attachment #8441672 - Attachment is obsolete: true
Attachment #8441675 - Attachment is obsolete: true
Attachment #8450237 - Flags: review+
Attached patch 2-add-compact-map.patch (obsolete) — Splinter Review
This patch merges the previous 'add-compact-map' and 'add-global-table' patches.  The only thing not included in here is the lookup routines for extracting a bytecode site from the maps.  I thought that logic would be better to review in isolation.

The add-global-table patch has also been changed to use a SplayTree instead of a sorted array.
Attachment #8441674 - Attachment is obsolete: true
Attachment #8441677 - Attachment is obsolete: true
Attachment #8441674 - Flags: review?(luke)
Attachment #8441677 - Flags: review?(luke)
Attachment #8450239 - Flags: review?(luke)
Attached patch 3-add-lookup-methods.patch (obsolete) — Splinter Review
Updated add-lookup-methods patch.
Attachment #8441679 - Attachment is obsolete: true
Attached patch 4-register-native-mappings.patch (obsolete) — Splinter Review
I rolled the register-native-mappings and the verify-native-mappings-in-jit-frame-iterator patches into one.
Attachment #8441680 - Attachment is obsolete: true
Attachment #8441682 - Attachment is obsolete: true
This adds support for tracking baseline jitcode using the native=>bytecode table.

Originally, I had planned on making baseline jitcode use the same compact delta-run encoding that Ion uses.. except that the Baseline jitcode would reflect a trivial case of the general format (single script, no inlining).

While implementing this, I noticed/remembered that we already keep a sorted table of native/PC mappings on BaselineScript which we use to look up return addresses.. so we don't need the compact mapping and can just save that space entirely.
New patch that adds mapping entries for both Baseline and IonCache stubs.  It also moves de-registration of entries into the destory handler for JitCodes.  JitCodes now keep track of whether or not they have a corresponding map entry (via a bit flag that eats into a bit of padding space), and de-register themselves on destruction.

The entries for both Baseline and IonCache stub jitcode are trivial.  The BaselineScript already stores a sorted (pc, nativeAddress) array in it's pcMapping - and both increase monotonically since baseline's codegen guarantees that the sequence of bytecode ops maps directly to the sequence of native code regions generated.

For IonCache ICs, the op associated with each stub will be precisely the op associated with the rejoin address of the IC in the mainline jitcode.  So table entries for IonCache stubs just store the rejoin address, and they forward all of their queries to the rejoin address.  This effectively means that every query for an address in an IonCache stub will do two splay tree lookups, but it avoids any additional memory overhead.

This patch also extends the debug-only verification done by IonFrameIterator to apply to baseline frames.

Passes jit-tests.  Gonna push to try soon and check.
Attachment #8458846 - Attachment is obsolete: true
Comment on attachment 8450239 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8450239 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this all looks reasonable, but as I got into it I realized you should probably have a core Ion hacker review these patches instead since they'll be the ones who deal with this code in practice; Odin would only trivially integrate at the JitcodeGlobalEntry level.  Also, I'm unfamiliar with some of the Ion compilation things this patch touches.

I did have a few high-level nits/suggestions as I was scanning over the patch, though:

::: js/src/jit/CompactBuffer.h
@@ +153,5 @@
> +        // Must be at 4-byte boundary
> +        JS_ASSERT(length() % sizeof(uint32_t) == 0);
> +
> +        for (unsigned i = 0; i < 4; i++)
> +            writeByte(0);

Can you just writeFixedUint32(value) after the assert?

::: js/src/jit/JitcodeMap.h
@@ +19,5 @@
> + * The Ion jitcode map implements tables to allow mapping from addresses in ion jitcode
> + * to the list of (JSScript *, jsbytecode *) pairs that are implicitly active in the frame at
> + * that point in the native code.
> + *
> + * To represent this informaiton efficiently, a multi-level table is used.

information

@@ +22,5 @@
> + *
> + * To represent this informaiton efficiently, a multi-level table is used.
> + *
> + * At the top level, a global table with fixed-size entries stores an array of
> + * entries describing the mapping for each individual IonCode scirpt generated by

script

@@ +31,5 @@
> + *      +------------------------------------------------+
> + *      | Entry 0                                        |
> + *      |                                                |
> + *      |   void *        nativeStartAddr                |
> + *      |   void *        nativeEndAddr                  |

I'm not sure this half-page diagram is really necessary, since it basically just illustrates an array.  "a sorted array of address ranges that allows binary lookup" basically captures it all.

@@ +70,5 @@
> + *      |   ScriptList *  scriptList                     |
> + *      |   RegionTable * regionTable                    |
> + *      +------------------------------------------------+
> + *
> + * The following entry structure is used for trampolines for Ion and Baseline:

I'm not sure this part of the comment is necessary; it basically restates the struct definitions and thus will only become out of date over time.  Rather, it makes sense to talk about the qualitative differences.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +508,5 @@
>      }
>  }
>  
> +void
> +CodeGeneratorShared::fixupNativeToBytecodeTableOffsets()

Given only one caller and a relatively simple body, it seems like you could inline this into the caller.

::: js/src/jit/shared/CodeGenerator-shared.h
@@ +111,5 @@
> +    uint32_t nativeToBytecodeTableOffset_;
> +    uint32_t nativeToBytecodeNumRegions_;
> +
> +    JSScript **nativeToBytecodeScriptList_;
> +    uint32_t nativeToBytecodeScriptListLength_;

Rather than sticking this in CodeGeneratorShared, perhaps you could create a NativeToBytecodeMapBuilder that hild this state and provided the methods below and then embed an instance of the buidler in CodeGenerator-shared?  That would allow all the logic pertaining to this map to be in JitcodeMap.h.
Attachment #8450239 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #40)
> Overall, this all looks reasonable, but as I got into it I realized you
> should probably have a core Ion hacker review these patches instead since
> they'll be the ones who deal with this code in practice; Odin would only
> trivially integrate at the JitcodeGlobalEntry level.  Also, I'm unfamiliar
> with some of the Ion compilation things this patch touches.

Sure, I'll point them to the just the ion-interfacing portions of this patch.

> Can you just writeFixedUint32(value) after the assert?

Yeah, I can replace the loop with a writeFixedUint32, but the secondary negative-index write is necessary, since writeFixedUint32 only writes in little-endian.


> I'm not sure this half-page diagram is really necessary, since it basically
> just illustrates an array.  "a sorted array of address ranges that allows
> binary lookup" basically captures it all.

Agreed.  The massive top-level comment has been removed.  It wasn't even correct after I changed the top-level table to use a splay tree.

> Given only one caller and a relatively simple body, it seems like you could
> inline this into the caller.

Done.

> Rather than sticking this in CodeGeneratorShared, perhaps you could create a
> NativeToBytecodeMapBuilder that hild this state and provided the methods
> below and then embed an instance of the buidler in CodeGenerator-shared? 
> That would allow all the logic pertaining to this map to be in JitcodeMap.h.

I feel less comfortable with this.  The code here is very Ion-specific.  Only Ion has nontrivial tables with varying inlining depths across multiple JSScripts.  I think this logic belongs in the CodeGen file, but it may be worthwhile to factor it out into a separate object.
Attachment #8450239 - Flags: review?(hv1989)
Attachment #8450237 - Attachment is obsolete: true
Attachment #8464919 - Flags: review+
Attached patch 2-add-compact-map.patch (obsolete) — Splinter Review
Attachment #8450239 - Attachment is obsolete: true
Attachment #8450239 - Flags: review?(hv1989)
Attachment #8464922 - Flags: review?(hv1989)
Attached patch 3-add-lookup-methods.patch (obsolete) — Splinter Review
Attachment #8450240 - Attachment is obsolete: true
Attached patch 4-register-native-mappings.patch (obsolete) — Splinter Review
Attachment #8450241 - Attachment is obsolete: true
Attachment #8459185 - Attachment is obsolete: true
(In reply to Kannan Vijayan [:djvj] from comment #41)
> I feel less comfortable with this.  The code here is very Ion-specific. 
> Only Ion has nontrivial tables with varying inlining depths across multiple
> JSScripts.  I think this logic belongs in the CodeGen file, but it may be
> worthwhile to factor it out into a separate object.

Well, the point wasn't so much to reuse it for anything, just to encapsulate the process of building and generating this map.  But maybe it's overkill; up to you.
Comment on attachment 8464922 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8464922 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. I have three performance questions, but they don't block the r+. I'm merely pinpointing possible improvements if the profiling seems to be too slow...

(Do I also need to review the other patches. Bit confused since you didn't set the r? flag)

::: js/src/jit/JitcodeMap.cpp
@@ +370,5 @@
> +            uint32_t scriptIdx = 0;
> +            for (; scriptIdx < scriptListSize; scriptIdx++) {
> +                if (scriptList[scriptIdx] == curTree->script())
> +                    break;
> +            }

This also seems very inefficient. Having to find the index for multiple scripts, multiple times...
Can't we annotate the tree with the id already in createNativeToBytecodeScriptList, when creating that list?

::: js/src/jit/JitcodeMap.h
@@ +407,5 @@
> + * holds a sequence of variable-sized runs.  The table in the tail section serves to
> + * locate the variable-length encoded structures in the head section.
> + *
> + * The entryOffsets in the table indicate the bytes offset to subtract from the regionTable
> + * pointer to arrive at the encoded region in the payload.

I'm a bit thrown off with this comment and the picture.
In the picture the Region 1 entryOffset is 0. Which means "regionTable pointer - 0" = "regionTable pointer". Which sounds wrong to me. Also the fact that in the picture the first Region payload is the topmost region, I assume the 0 is totally wrong and should be size(payload) right?

@@ +423,5 @@
> + * pairs for the run.
> + *
> + *      VarUint32 nativeOffset;
> + *          - The offset from nativeStartAddr in the global table entry at which this
> + *            the jitcode for this region starts.

fix comment

@@ +440,5 @@
> + */
> +class JitcodeRegionEntry
> +{
> +  private:
> +    static const unsigned MAX_RUN_LENGTH = 100;

So IIUC if we have an address and want to know the stack/pc, we can binary search the regions.
After which we need to run over all entries in the delta list in order (since they are Variable-length encoded).
This value decides how big they can be, right. Since we want profiling to take as less time as possible. Is this low enough? Do we have time to iterate 100 of such values for every profile snapshot?
Or do we just save the address and only compute the stack/pc when profiling is done?

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +524,5 @@
> +        }
> +        if (!found) {
> +            if (!scriptList.append(tree->script()))
> +                return false;
> +        }

Oh this is a bit unfortunate. If we had a boolean free on JSScript, we could make this o(n) instead of the current o(n^2).

@@ +647,5 @@
> +    JS_ASSERT(mainTable->regionOffset(0) == nativeToBytecodeTableOffset_);
> +
> +    // Verify each region.
> +    for (uint32_t i = 0; i < mainTable->numRegions(); i++) {
> +        // back-offset must point into the payload region preceding the table, not before it.

Start sentence with a capital letter.

@@ +651,5 @@
> +        // back-offset must point into the payload region preceding the table, not before it.
> +        JS_ASSERT(mainTable->regionOffset(i) <= nativeToBytecodeTableOffset_);
> +
> +        // back-offset must point to a later area in the payload region than previous
> +        // back-offset.  This means that back-offsets decrease monotonically.

Start sentence with a capital letter.
Attachment #8464922 - Flags: review?(hv1989) → review+
Attachment #8464923 - Flags: review?(hv1989)
Attachment #8464925 - Flags: review?(hv1989)
Comment on attachment 8464926 [details] [diff] [review]
5-add-baseline-and-ioncache-entries.patch

Review of attachment 8464926 [details] [diff] [review]:
-----------------------------------------------------------------

Just a heads up: the entries for Baseline jitcode and IonCache stubs are almost trivial.

Baseline already maintains a sorted array of bytecode/native mappings used by bailouts to compute the return address into baseline.  So baseline just needs a top-level table entry, and the second-level table is already present on the BaselineScript itself.

IonCache stubs map to the same inline stack that corresponds to their return point into the mainline jitcode.  So this too needs only a top-level entry that identifies the rejoin address for the IC.  The lookup for IonCache stubs is a double look up: look up the top-level entry for the IonCache, get the rejoinAddr, and then look up the rejoinAddr.
Attachment #8464926 - Flags: review?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #48)
> ::: js/src/jit/JitcodeMap.cpp
> @@ +370,5 @@
> > +            uint32_t scriptIdx = 0;
> > +            for (; scriptIdx < scriptListSize; scriptIdx++) {
> > +                if (scriptList[scriptIdx] == curTree->script())
> > +                    break;
> > +            }
> 
> This also seems very inefficient. Having to find the index for multiple
> scripts, multiple times...
> Can't we annotate the tree with the id already in
> createNativeToBytecodeScriptList, when creating that list?

Good eye.  Yeah, this approach has high time complexity, but I'm hoping it's not too bad.  Annotating the tree is more tricky than it seems: the tree could contain multiple occurrences of the same JSScript, inlined at different points, and we want to uniqify all of those.

An alternative I had considered was this: instead of storing a scriptList in the top-level table, we store an encoded version of the entire tree at the top-level table.  Then, instead of each Run storing a sequence of (scriptIndex, pcOffset) entries, they'll store a single offset into the encoded script tree to identify the entire inlining stack at that point.

That approach should reduce memory usage too, since the number of runs is at least equal to the size of the inlineScriptTree.. so duplicating the stack info at each run is less efficient than encoding the entire tree and sharing it between all runs.

> I'm a bit thrown off with this comment and the picture.
> In the picture the Region 1 entryOffset is 0. Which means "regionTable
> pointer - 0" = "regionTable pointer". Which sounds wrong to me. Also the
> fact that in the picture the first Region payload is the topmost region, I
> assume the 0 is totally wrong and should be size(payload) right?

Yes, fixed.

> @@ +423,5 @@
> > + * pairs for the run.
> > + *
> > + *      VarUint32 nativeOffset;
> > + *          - The offset from nativeStartAddr in the global table entry at which this
> > + *            the jitcode for this region starts.
> 
> fix comment

Not sure what needs fixing here.  The nativeOffset does store the positive offset that the run starts at.

> So IIUC if we have an address and want to know the stack/pc, we can binary
> search the regions.
> After which we need to run over all entries in the delta list in order
> (since they are Variable-length encoded).
> This value decides how big they can be, right. Since we want profiling to
> take as less time as possible. Is this low enough? Do we have time to
> iterate 100 of such values for every profile snapshot?
> Or do we just save the address and only compute the stack/pc when profiling
> is done?

Yeah, the latter.  The sampler will just capture return addresses.  When it is done capturing and about to send the traces to the profiler frontend, it will tree-ify the traces, and do a single translation before sending stuff over.

This means that a large number of lookups will be made redundant and eliminated, since a large number of traces will share most of their prefix with other traces.

> 
> ::: js/src/jit/shared/CodeGenerator-shared.cpp
> @@ +524,5 @@
> > +        }
> > +        if (!found) {
> > +            if (!scriptList.append(tree->script()))
> > +                return false;
> > +        }
> 
> Oh this is a bit unfortunate. If we had a boolean free on JSScript, we could
> make this o(n) instead of the current o(n^2).

This too, can be fixed by the approach above: encoding the entire inlineScriptTree directly instead of assembling a unique list of scripts.

I actually think that's the better approach, but I didn't think of it until after I had implemented this.  However, this works, and if it ends up being too slow, changing it to the better scheme is reasonably straightforward.
(In reply to Kannan Vijayan [:djvj] from comment #50)
> (In reply to Hannes Verschore [:h4writer] from comment #48)
> > ::: js/src/jit/JitcodeMap.cpp
> > @@ +370,5 @@
> > > +            uint32_t scriptIdx = 0;
> > > +            for (; scriptIdx < scriptListSize; scriptIdx++) {
> > > +                if (scriptList[scriptIdx] == curTree->script())
> > > +                    break;
> > > +            }
> > 
> > This also seems very inefficient. Having to find the index for multiple
> > scripts, multiple times...
> > Can't we annotate the tree with the id already in
> > createNativeToBytecodeScriptList, when creating that list?
> 
> Good eye.  Yeah, this approach has high time complexity, but I'm hoping it's
> not too bad.  Annotating the tree is more tricky than it seems: the tree
> could contain multiple occurrences of the same JSScript, inlined at
> different points, and we want to uniqify all of those.
> 
> An alternative I had considered was this: instead of storing a scriptList in
> the top-level table, we store an encoded version of the entire tree at the
> top-level table.  Then, instead of each Run storing a sequence of
> (scriptIndex, pcOffset) entries, they'll store a single offset into the
> encoded script tree to identify the entire inlining stack at that point.
> 
> That approach should reduce memory usage too, since the number of runs is at
> least equal to the size of the inlineScriptTree.. so duplicating the stack
> info at each run is less efficient than encoding the entire tree and sharing
> it between all runs.

Can you open a follow-up bug for this. To measure this and see if we need to do this?
Is indeed not high priority for the "first version".

> 
> > @@ +423,5 @@
> > > + * pairs for the run.
> > > + *
> > > + *      VarUint32 nativeOffset;
> > > + *          - The offset from nativeStartAddr in the global table entry at which this
> > > + *            the jitcode for this region starts.
> > 
> > fix comment
> 
> Not sure what needs fixing here.  The nativeOffset does store the positive
> offset that the run starts at.

Sorry I should have been a bit more clear. There is a grammatical error in there, right? "this the"
Comment on attachment 8464922 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8464922 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.h
@@ +36,5 @@
> +{
> +  public:
> +    enum Kind {
> +        INVALID = 0,
> +        Main,

Only noticing this now. When doing the next review, I noticed the Main/Kind thing is not that clear.
Can we change Kind to JitCodeKind? and kind_ to jitCodeKind_?

This will contain:
Main,
OOL,
Trampoline,
... 
right?
Comment on attachment 8464922 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8464922 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.h
@@ +428,5 @@
> + *
> + *      Uint8_t scriptDepth;
> + *          - The depth of inlined scripts for this region.
> + *
> + *      List<2 * scriptDepth, VarUint32> inlineScriptPcStack;

s/List<2 * scriptDepth, VarUint32>/List<VarUint32>

I forgot the 2*scriptDepth in the List description really confused me. It is just fine to have it the description.
Comment on attachment 8464923 [details] [diff] [review]
3-add-lookup-methods.patch

Review of attachment 8464923 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.cpp
@@ +9,5 @@
>  #include "js/Vector.h"
>  
>  namespace js {
>  namespace jit {
> + 

Nit: remove extra newline/space

@@ +14,5 @@
> +
> +bool
> +JitcodeGlobalEntry::MainEntry::lookupInlineCallSite(void *ptr, BytecodeLocationVector &results,
> +                                                    uint32_t *depth) const
> +{

What about something like: callStackAtPtr
And some explanation describing that this returns the script/pc and inlined script/pc of a specific ptr.

@@ +36,5 @@
> +        // from the delta-run encodings.
> +        if (first) {
> +            pcOffset = region.findPcOffset(ptrOffset, pcOffset);
> +            first = false;
> +        }

This seems like really annoying. Can't we abstract ScriptPcIterator to do this? So code iterating the script/pc pairs don't need to worry about that?

@@ +486,5 @@
>      }
>  
>      deltaRun_ = reader.currentPosition();
>  }
> + 

Nit: remove whitespace
Attachment #8464923 - Flags: review?(hv1989)
Attached patch 6-fix-debugmode-osr-issues.patch (obsolete) — Splinter Review
There were some crashes in jit-tests caused by BaselineDebugOSR stuff, where the JitScript held by a BaselineScript is switched out from underneath it.  The JitFrameIterator is used in this code, and the verification method was doing lookups in the middle of this critical section.

Also, in some corner cases, we may not find an entry for a given native address.  The verification code is changed to fallibly do a lookup and return prematurely (with success) if the lookup fails.
Comment on attachment 8464925 [details] [diff] [review]
4-register-native-mappings.patch

Review of attachment 8464925 [details] [diff] [review]:
-----------------------------------------------------------------

All "enableNativeToBytecodeMap" should be isNativeToBytecodeMapEnabled or nativeToBytecodeMapEnabled. See comment later-on.

::: js/src/jit/CodeGenerator.cpp
@@ +6772,5 @@
>          return false;
>      }
>  
>      // Encode native to bytecode map if profiling is enabled.
> +    if (cx->runtime()->jitRuntime()->enableNativeToBytecodeMap(cx->runtime())) {

Shouldn't this be?
if (isNativeToBytecodeMapEnabled()).

So using the check that looks at MIRGenerator to know if it was enabled/disabled.

@@ +6796,5 @@
> +            entry.destroy();
> +            return false;
> +        }
> +    }
> +        

Nit: remove whitespace

@@ +6802,5 @@
>      ionScript->setSkipArgCheckEntryOffset(getSkipArgCheckEntryOffset());
>  
>      // If SPS is enabled, mark IonScript as having been instrumented with SPS
>      if (sps_.enabled())
>          ionScript->setHasSPSInstrumentation();

This will get removed at the end of everything, right?

::: js/src/jit/IonFrames.cpp
@@ +2236,5 @@
>  
> +#ifdef DEBUG
> +bool
> +JitFrameIterator::verifyReturnAddressUsingNativeToBytecodeMap()
> +{

Good to have these checks, thanks!

@@ +2266,5 @@
> +    uint32_t depth = UINT32_MAX;
> +    if (!entry.lookupInlineCallSite(returnAddressToFp_, location, &depth))
> +        return false;
> +    JS_ASSERT(depth != UINT32_MAX);
> +    JS_ASSERT(!location.empty());

&& location.size() == depth

::: js/src/jit/JitCompartment.h
@@ +394,5 @@
> +        JS_ASSERT(hasJitcodeGlobalTable());
> +        return jitcodeGlobalTable_;
> +    }
> +
> +    bool enableNativeToBytecodeMap(JSRuntime *rt) {

isNativeToBytecodeMapEnabled() or nativeToBytecodeMapEnabled()

The current name would suggest this function enables it, instead of checks if it is enabled.

::: js/src/jit/MIRGenerator.h
@@ +86,5 @@
> +    bool enableNativeToBytecodeMap() {
> +        if (compilingAsmJS())
> +            return false;
> +#ifdef DEBUG
> +        return true;

Is it the intention to have this always on on debug?
Attachment #8464925 - Flags: review?(hv1989)
Comment on attachment 8464926 [details] [diff] [review]
5-add-baseline-and-ioncache-entries.patch

Review of attachment 8464926 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineCompiler.cpp
@@ +247,5 @@
>  
> +    // Register a native=>bytecode mapping entry for this script if needed.
> +    if (cx->runtime()->jitRuntime()->enableNativeToBytecodeMap(cx->runtime())) {
> +        IonSpew(IonSpew_Profiling, "Added JitcodeGlobalEntry for baseline script %s:%d (%p)",
> +                    script->filename(), script->lineno(), baselineScript);

Shouldn't this be at the bottom? (Nit-picking, we haven't added it yet. We can still fail. "return Method_Error". So we should only print this message at the bottom of this conditional. Where we definitely added the info.)

@@ +255,5 @@
> +        JitcodeGlobalTable *globalTable = cx->runtime()->jitRuntime()->getJitcodeGlobalTable();
> +        if (!globalTable->addEntry(entry))
> +            return Method_Error;
> +
> +        // Mark the jitcode as having a bytecode map.

s/as having/to have/

But feel free to proof me wrong. English is not my first language.

@@ +258,5 @@
> +
> +        // Mark the jitcode as having a bytecode map.
> +        code->setHasBytecodeMap();
> +    }
> + 

Nit: remove whitespace.

::: js/src/jit/BaselineJIT.cpp
@@ +691,5 @@
>  
>  jsbytecode *
>  BaselineScript::pcForReturnOffset(JSScript *script, uint32_t nativeOffset)
>  {
> +    return pcForNativeOffset(script, nativeOffset, true);

add "/* isReturn = */ true"

@@ +706,5 @@
> +
> +jsbytecode *
> +BaselineScript::pcForNativeOffset(JSScript *script, uint32_t nativeOffset)
> +{
> +    return pcForNativeOffset(script, nativeOffset, false);

add "/* isReturn = */ false"

@@ +710,5 @@
> +    return pcForNativeOffset(script, nativeOffset, false);
> +}
> +
> +jsbytecode *
> +BaselineScript::pcForNativeOffset(JSScript *script, uint32_t nativeOffset, bool isReturn)

Since we have nativeOffset and curNativeOffset. What about renaming "nativeOffset".
Something like:

searchingOffset / subjectOffset / soughtOffset / ...

possibly there is a better one, I don't find.

@@ +739,5 @@
> +    JS_ASSERT_IF(isReturn, nativeOffset >= curNativeOffset);
> +
> +    // In the raw native-lookup case, the native code address can occur
> +    // before the start of ops.  Associate those with bytecode offset 0.
> +    if (!isReturn && (curNativeOffset > nativeOffset))

Just a preference. But can you switch "curNativeoffset" and "nativeOffset" to:
nativeoffset < curNativeOffset

@@ +749,5 @@
>          uint8_t b = reader.readByte();
>          if (b & 0x80)
>              curNativeOffset += reader.readUnsigned();
>  
> +        // Return addresses associate with 

Remove trailing whitespace and complete comment.

::: js/src/jit/CodeGenerator.cpp
@@ +6802,2 @@
>      }
>          

Remove pre-existing whitespace.

::: js/src/jit/IonCaches.cpp
@@ +429,5 @@
>  #endif
>  
>      attachStub(masm, attacher, code);
>  
> +    // Add entry to native=>bytecode mapping for this stub if needed.

Nit: add spaces around =>

::: js/src/jit/IonCode.h
@@ +48,5 @@
>      uint8_t kind_ : 3;                // JSC::CodeKind, for the memory reporters.
>      bool invalidated_ : 1;            // Whether the code object has been invalidated.
>                                        // This is necessary to prevent GC tracing.
> +    bool hasBytecodeMap_ : 1;         // Whether the code object has been registered with
> +                                      // native=>bytecode mapping tables.

Nit: add spaces around =>

::: js/src/jit/IonFrames.cpp
@@ +2239,5 @@
>  JitFrameIterator::verifyReturnAddressUsingNativeToBytecodeMap()
>  {
>      JS_ASSERT(returnAddressToFp_ != nullptr);
>  
>      // Only handle Ion frames for now.

Update comment.
Attachment #8464926 - Flags: review?(hv1989)
I didn't r+ yet, to give myself another opportunity to look at them. Also now knowing/understanding all patches as a whole. I should be able to review them faster next time you put them for r?.
Nice work!
(In reply to Hannes Verschore [:h4writer] from comment #52)
> Comment on attachment 8464922 [details] [diff] [review]
> 2-add-compact-map.patch
> 
> Review of attachment 8464922 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/JitcodeMap.h
> @@ +36,5 @@
> > +{
> > +  public:
> > +    enum Kind {
> > +        INVALID = 0,
> > +        Main,
> 
> Only noticing this now. When doing the next review, I noticed the Main/Kind
> thing is not that clear.
> Can we change Kind to JitCodeKind? and kind_ to jitCodeKind_?
> 
> This will contain:
> Main,
> OOL,
> Trampoline,
> ... 
> right?

I would prefer to keep this.  JitcodeMap::JitcodeKind seems rather redundant to me.  Also, it's not really the jitcode kind, it's the entry kind.  Yes, the entries all correspond to different kinds of jitcode, but really the kind is describing the entries, not jitcode.  For example, the Query kind introduced later doesn't have any jitcode associated with it, just an address.
(In reply to Kannan Vijayan [:djvj] from comment #59)
> I would prefer to keep this.  JitcodeMap::JitcodeKind seems rather redundant
> to me.  Also, it's not really the jitcode kind, it's the entry kind.  Yes,
> the entries all correspond to different kinds of jitcode, but really the
> kind is describing the entries, not jitcode.  For example, the Query kind
> introduced later doesn't have any jitcode associated with it, just an
> address.

djvj explained on irc that Main will eventually get split into Ion and Baseline. So this goes away. So no need to adjust in this patch yet.
(In reply to Hannes Verschore [:h4writer] from comment #56)
> This will get removed at the end of everything, right?

That's the hope and intent.

> Is it the intention to have this always on on debug?

Yes.
Attachment #8464919 - Attachment is obsolete: true
Attachment #8468650 - Flags: review+
Attached patch 2-add-compact-map.patch (obsolete) — Splinter Review
Attachment #8464922 - Attachment is obsolete: true
Attachment #8468651 - Flags: review?(hv1989)
Attached patch 3-add-lookup-methods.patch (obsolete) — Splinter Review
Attachment #8464923 - Attachment is obsolete: true
Attachment #8468652 - Flags: review?(hv1989)
Attached patch 4-register-native-mappings.patch (obsolete) — Splinter Review
Attachment #8464925 - Attachment is obsolete: true
Attachment #8468653 - Flags: review?(hv1989)
Attachment #8464926 - Attachment is obsolete: true
Attachment #8468655 - Flags: review?(hv1989)
Attachment #8466405 - Attachment is obsolete: true
Attachment #8468657 - Flags: review?(hv1989)
Comment on attachment 8468651 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8468651 [details] [diff] [review]:
-----------------------------------------------------------------

Some early issues seen testing on the ARM. Shall continue testing.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +579,5 @@
> +    for (unsigned i = 0; i < nativeToBytecodeList_.length(); i++) {
> +        NativeToBytecode &entry = nativeToBytecodeList_[i];
> +
> +        // Fixup code offsets.
> +        entry.nativeOffset = CodeOffsetLabel(masm.actualOffset(entry.nativeOffset.offset()));

Looks good. This does appear to be the appropriate use of actualOffset().

@@ +657,5 @@
> +
> +        JitcodeRegionEntry entry = mainTable->regionEntry(i);
> +
> +        // Ensure native code offset for region falls within jitcode.
> +        JS_ASSERT(entry.nativeOffset() < code->instructionsSize());

There appears to be an off-by-one issue here. Changing the test to <= works around the issue. See below too.

@@ +695,5 @@
> +            curNativeOffset += nativeDelta;
> +            curPcOffset = uint32_t(int32_t(curPcOffset) + pcDelta);
> +
> +            // Ensure that nativeOffset still falls within jitcode after delta.
> +            JS_ASSERT(curNativeOffset < code->instructionsSize());

There is an off-by-one error here. The last curNativeOffset can equal the instructionsSize() and this is causing this test to fail. This appears to just be an issue with the test, not the curNativeOffset, and changing this test to <= works around this.
Comment on attachment 8468651 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8468651 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.cpp
@@ +41,5 @@
> +}
> +
> +/* static */ int
> +JitcodeGlobalEntry::compare(const JitcodeGlobalEntry &ent1, const JitcodeGlobalEntry &ent2)
> +{

What about restructuring this to:

> if (ent1.isQuery() || ent2.isQuery()) {
>     // Both parts of compare cannot be a query.
>     JS_ASSERT(!ent1.isQuery() || !ent2.isQuery());
> 
>     JitcodeGlobalEntry *query = ent1.isQuery() ? &ent1 : &ent2;
>     JitcodeGlobalEntry *other = ent1.isQuery() ? &ent2 : &ent1;
> 
>     bool result = ComparePointers(query->queryEntry().addr(), other->nativeStartAddr());
>     return ent1.isQuery() ? result : -result;
> }
> JS_ASSERT(...);
> return ComparePointers(ent1.nativeStartAddr(), ent2.nativeStartAddr());

1) to avoid duplicate code
2) comparePointers already returns -1/1/0. No need for extra checks.
3) shorter

@@ +77,5 @@
> +
> +bool
> +JitcodeGlobalTable::lookup(void *ptr, JitcodeGlobalEntry *result)
> +{
> +    JS_ASSERT(result != nullptr);

Nit: JS_ASSERT(result);

@@ +88,5 @@
> +void
> +JitcodeGlobalTable::lookupInfallible(void *ptr, JitcodeGlobalEntry *result)
> +{
> +    bool success = lookup(ptr, result);
> +    JS_ASSERT(success);

Nit: I think in this case we use:
DebugOnly<bool> success = ...

@@ +184,5 @@
> +        return;
> +    }
> +
> +    // Should never get here.
> +    JS_ASSERT(false);

MOZ_CRASH("reason");

Is currently the way to do this.

::: js/src/jit/JitcodeMap.h
@@ +56,5 @@
> +        }
> +
> +        void init(Kind kind, void *nativeStartAddr, void *nativeEndAddr) {
> +            JS_ASSERT(nativeStartAddr != nullptr);
> +            JS_ASSERT(nativeEndAddr != nullptr);

JS_ASSERT(nativeStartAddr);
JS_ASSERT(nativeEndAddr);

@@ +111,5 @@
> +                  JSScript *script, JitcodeMainTable *regionTable)
> +        {
> +            JS_ASSERT((uintptr_t(script) & LowMask) == 0);
> +            JS_ASSERT(script != nullptr);
> +            JS_ASSERT(regionTable != nullptr);

JS_ASSERT(scripts);
JS_ASSERT(regionTable);

@@ +124,5 @@
> +            JS_ASSERT((uintptr_t(scripts) & LowMask) == 0);
> +            JS_ASSERT(numScripts >= 1);
> +            JS_ASSERT(numScripts <= 6);
> +            JS_ASSERT(scripts != nullptr);
> +            JS_ASSERT(regionTable != nullptr);

JS_ASSERT(scripts);
JS_ASSERT(regionTable);

@@ +136,5 @@
> +        {
> +            JS_ASSERT((uintptr_t(scripts) & LowMask) == 0);
> +            JS_ASSERT(scripts->size > 6);
> +            JS_ASSERT(scripts != nullptr);
> +            JS_ASSERT(regionTable != nullptr);

JS_ASSERT(scripts);
JS_ASSERT(regionTable);
Attachment #8468651 - Flags: review?(hv1989) → review+
Comment on attachment 8468652 [details] [diff] [review]
3-add-lookup-methods.patch

Review of attachment 8468652 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.cpp
@@ +535,5 @@
> +        // If nothing found, assume it falls within last region.
> +        return regions - 1;
> +    }
> +
> +    // For larger ones, binary search the region table.

I think it might be better to comprehend this, if you used the following code?
There is one variable less. And for me this is closer to the example binary samples we had in school. (Feel free to disagree).

> uint32_t start_i = 0
> uint32_t end_i = regions
> 
> while(start_i != end_i) {
>    uint32_t mid = (start_i + end_i) / 2
> 
>    if ()
>       start_i = mid + 1
>    else
>       end_i = mid
> }
Attachment #8468652 - Flags: review?(hv1989) → review+
Comment on attachment 8468651 [details] [diff] [review]
2-add-compact-map.patch

Review of attachment 8468651 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.cpp
@@ +41,5 @@
> +}
> +
> +/* static */ int
> +JitcodeGlobalEntry::compare(const JitcodeGlobalEntry &ent1, const JitcodeGlobalEntry &ent2)
> +{

Oh I just notice I oversimplified this a bit.

> bool result = ComparePointers(query->queryEntry().addr(), other->nativeStartAddr());
> return ent1.isQuery() ? result : -result;

should be:

> if (ComparePointers(query->queryEntry().addr(), other->nativeStartAddr()) < 0)
>     return ent1.isQuery() ? -1 : 1;
>  
> if (ComparePointers(query->queryEntry().addr(), other->nativeEndAddr()) > 0)
>     return ent1.isQuery() ? 1 : -1;
> 
> return 0;
Comment on attachment 8468653 [details] [diff] [review]
4-register-native-mappings.patch

Review of attachment 8468653 [details] [diff] [review]:
-----------------------------------------------------------------

One major thing, when failing we need to clean everything during linking, which doesn't happen atm!
Otherwise good. So r+ with that addressed.

::: js/src/jit/CodeGenerator.cpp
@@ +6778,2 @@
>          if (!generateCompactNativeToBytecodeMap(cx, code))
>              return false;

Just dawned on me. But we cannot just return false here. You'll need to free ionScript and invalidate the recompileInfo. Like happens a few lines earlier.

BTW this issue is also present in the current code:
http://dxr.mozilla.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#6738
Can you open a new bug for this?

@@ +6785,5 @@
> +        JitcodeGlobalEntry::MainEntry entry;
> +        if (!mainTable->makeMainEntry(cx, code, nativeToBytecodeScriptListLength_,
> +                                      nativeToBytecodeScriptList_, entry))
> +        {
> +            return false;

ditto

@@ +6793,5 @@
> +        JitcodeGlobalTable *globalTable = cx->runtime()->jitRuntime()->getJitcodeGlobalTable();
> +        if (!globalTable->addEntry(entry)) {
> +            // Memory may have been allocated for the entry.
> +            entry.destroy();
> +            return false;

ditto

@@ +6796,5 @@
> +            entry.destroy();
> +            return false;
> +        }
> +    }
> +        

Nit: remove whitespace.

::: js/src/jit/Ion.cpp
@@ +1181,5 @@
>  void
>  IonScript::Destroy(FreeOp *fop, IonScript *script)
>  {
> +    /* By this point, the jitocde in |script| cannot be on stack.  Remove its entry from
> +     * the native => bytecode mapping table, if needed. */

Nit: all comments here use:
// something
// even more

Can you do the same just for consistency?

::: js/src/jit/MIRGenerator.h
@@ +79,5 @@
>          if (!instrumentedProfilingIsCached_) {
>              instrumentedProfiling_ = GetIonContext()->runtime->spsProfiler().enabled();
>              instrumentedProfilingIsCached_ = true;
>          }
>          return instrumentedProfiling_;

Nice!
Attachment #8468653 - Flags: review?(hv1989) → review+
Comment on attachment 8468655 [details] [diff] [review]
5-add-baseline-and-ioncache-entries.patch

Review of attachment 8468655 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineCompiler.cpp
@@ +244,5 @@
>  
>      if (script->compartment()->debugMode())
>          baselineScript->setDebugMode();
>  
> +    // Register a native=>bytecode mapping entry for this script if needed.

Nit: spaces before and after =>

@@ +258,5 @@
> +
> +        // Mark the jitcode as having a bytecode map.
> +        code->setHasBytecodeMap();
> +    }
> + 

Nit: remove whitespace.

::: js/src/jit/BaselineJIT.cpp
@@ +749,5 @@
>          uint8_t b = reader.readByte();
>          if (b & 0x80)
>              curNativeOffset += reader.readUnsigned();
>  
> +        // Return addresses associate with 

Nit: remove whitespace

::: js/src/jit/IonCaches.cpp
@@ +429,5 @@
>  #endif
>  
>      attachStub(masm, attacher, code);
>  
> +    // Add entry to native=>bytecode mapping for this stub if needed.

Nit: add whitespace before and after =>

::: js/src/jit/IonFrames.cpp
@@ +2245,1 @@
>          return true;

Nit: remove the unneeded () around the "a != b"
Attachment #8468655 - Flags: review?(hv1989) → review+
Comment on attachment 8468657 [details] [diff] [review]
6-fix-debugmode-osr-recompile.patch

Review of attachment 8468657 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineDebugModeOSR.cpp
@@ -12,2 @@
>  #include "jit/PerfSpewer.h"
> -

This newline needs to stay if you don't want "make check-style" errors

::: js/src/vm/Runtime.h
@@ +1019,5 @@
> +    }
> +    void enableProfilerSampling() {
> +        suppressProfilerSampling = false;
> +    }
> +    

Nit: Remove whiteline + newline

::: js/src/vm/SPSProfiler.h
@@ +223,5 @@
> +/*
> + * This class is used to suppress profiler sampling during
> + * critical sections where stack state is not valid.
> + */
> +class AutoSuppressProfilerSampling

You'll need to add the MOZ hints to make sure the compiler can't optimize it away and we get errors when you use it without name:
Like with this AutoXXX class:
http://dxr.mozilla.org/mozilla-central/source/js/src/vm/TraceLogging.h?from=vm/TraceLogging.h#651
Attachment #8468657 - Flags: review?(hv1989)
Forwarding review from previous patch.
Attachment #8468650 - Attachment is obsolete: true
Attachment #8472349 - Flags: review+
Forwarding review from previous patch.
Attachment #8468651 - Attachment is obsolete: true
Attachment #8472350 - Flags: review+
Forwarding review from previous patch.
Attachment #8468652 - Attachment is obsolete: true
Attachment #8472351 - Flags: review+
Forwarding review from previous patch.
Attachment #8468653 - Attachment is obsolete: true
Attachment #8472352 - Flags: review+
Forwarding review from previous patch.
Attachment #8468655 - Attachment is obsolete: true
Attachment #8472353 - Flags: review+
Updated and review comments addressed.  The try run with all patches applied looks good: https://tbpl.mozilla.org/?tree=Try&rev=c176c2a4f3f6

So close I can taste it ;)
Attachment #8468657 - Attachment is obsolete: true
Attachment #8472355 - Flags: review?(hv1989)
Comment on attachment 8472355 [details] [diff] [review]
6-fix-debugmode-osr-recompile.patch

Review of attachment 8472355 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome work!
Attachment #8472355 - Flags: review?(hv1989) → review+
I'm seeing a ton of PcOffset instructions in the shell in debug builds, even without profiling.

Can we stop enabling this always in debug builds? It could hide bugs because there's a big difference between the MIR in debug and opt builds now. It'd be good to have similar MIR graphs in debug + opt builds.

Also it seems the following concern wasn't addressed (having 4 MPcOffset instructions without anything between them is very easy to avoid):

(In reply to Jan de Mooij [:jandem] from comment #23)
> As discussed yesterday, I think this could result in too much compilation
> time/memory overhead for huge scripts. Also because many (most?) bytecode
> ops don't result in any MIR/codegen. It'd be nice if we could remove
> MPcOffset, or else only do this for "interesting" ops (we could check if the
> previous MIR instruction was also a MPcOffset and update it in that case, or
> have a whitelist/blacklist of ops).

[Codegen] instruction Goto
[Codegen] instruction Label
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
[Codegen] instruction LoadSlotT
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
[Codegen] instruction AddI:OverflowCheck
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
[Codegen] instruction PcOffset
Flags: needinfo?(kvijayan)
Thanks for bringing this to my attention, Jan.  I was so focused on getting this landed that I forgot to remove this aspect.  The PcOffset MIR should not even be necessary at this point, as the instructions themselves carry their site information, so I'll create a patch to remove this pronto.
Flags: needinfo?(kvijayan)
Depends on: 1054340
I rebased my tree yesterday to find that a bunch of jstests (shell-only jstests, I should note) seemed to be failing due to timeouts.  From shell scrollback to one such (partial) batch of failures:

[jwalden@find-waldo-now src]$ python tests/jstests.py dbg/js/src/js ecma_5/Object/15.2.3.6
[ 2| 0| 0| 0]   4% =>                                                 | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-10-of-32.js
[ 2| 1| 1| 0]   6% ==>                                                | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-18-of-32.js
[ 2| 2| 2| 0]   8% ===>                                               | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-23-of-32.js
[ 2| 3| 3| 0]  10% ====>                                              | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-middle-redefinition-4-of-8.js
[ 2| 4| 4| 0]  12% =====>                                             | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-14-of-32.js
[ 2| 5| 5| 0]  14% ======>                                            | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-22-of-32.js
[ 2| 6| 6| 0]  16% =======>                                           | 150.0s
TIMEOUT - ecma_5/Object/15.2.3.6-redefinition-2-of-4.js
[ 4| 7| 7| 0]  22% ==========>                                        | 174.2s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-01-of-32.js
[ 6| 8| 8| 0]  29% =============>                                     | 300.0s
TIMEOUT - ecma_5/Object/15.2.3.6-middle-redefinition-8-of-8.js
[ 6| 9| 9| 0]  31% ==============>                                    | 300.0s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-13-of-32.js
[ 6|10|10| 0]  33% ================>                                  | 300.1s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-30-of-32.js
[ 6|11|11| 0]  35% =================>                                 | 300.1s
TIMEOUT - ecma_5/Object/15.2.3.6-redefinition-3-of-4.js
[ 6|12|12| 0]  37% ==================>                                | 300.6s
TIMEOUT - ecma_5/Object/15.2.3.6-middle-redefinition-1-of-8.js
[ 6|13|13| 0]  39% ===================>                               | 305.5s
TIMEOUT - ecma_5/Object/15.2.3.6-dictionary-redefinition-15-of-32.js
[ 6|14|14| 0]  41% ====================>                              | 307.6s
TIMEOUTS
    ecma_5/Object/15.2.3.6-dictionary-redefinition-10-of-32.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-18-of-32.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-23-of-32.js
    ecma_5/Object/15.2.3.6-middle-redefinition-4-of-8.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-14-of-32.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-22-of-32.js
    ecma_5/Object/15.2.3.6-redefinition-2-of-4.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-01-of-32.js
    ecma_5/Object/15.2.3.6-middle-redefinition-8-of-8.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-13-of-32.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-30-of-32.js
    ecma_5/Object/15.2.3.6-redefinition-3-of-4.js
    ecma_5/Object/15.2.3.6-middle-redefinition-1-of-8.js
    ecma_5/Object/15.2.3.6-dictionary-redefinition-15-of-32.js
FAIL (partial run -- interrupted by user)

These tests have almost all consistently passed without timeouts with regular debug, no-optimize builds on my system.  (The exception: ecma_5/Object/15.2.3.6-dictionary-redefinition-*-of-*.js have timed out semi-consistently for me, but I bumped it from x-of-8 to x-of-32 shortly before this bug's changes landed.  No timeouts with the 32-way split at the time.)

This bug's changes -- or rather, one of these three (I had to skip 55f95c3b5dbe because of "Assertion failure: success, at /home/jwalden/moz/slots/js/src/jit/JitcodeMap.cpp:107", and I had to skip 0cc8cbeb849e because of "Assertion failure: success, at /home/jwalden/moz/slots/js/src/jit/JitcodeMap.cpp:143") -- are the cause of the significant increase in timeouts:

Due to skipped revisions, the first bad revision could be any of:
changeset:   199243:55f95c3b5dbe
user:        Kannan Vijayan <kvijayan@mozilla.com>
date:        Wed Aug 13 11:59:55 2014 -0400
summary:     Bug 1004831 - Part 4 - Register native to bytecode mappings when new IonCode is generated. r=h4writer

changeset:   199244:0cc8cbeb849e
user:        Kannan Vijayan <kvijayan@mozilla.com>
date:        Wed Aug 13 11:59:56 2014 -0400
summary:     Bug 1004831 - Part 5 - Add mapping entries for baseline jitcode and ion IC jitcode. r=h4writer

changeset:   199245:410a7457f588
user:        Kannan Vijayan <kvijayan@mozilla.com>
date:        Wed Aug 13 11:59:58 2014 -0400
summary:     Bug 1004831 - Part 6 - Fixups to ensure that entries get rejigged curretly during BaselineDebugModeOSR. r=h4writer

Is this related to comment 83 and bug 1054340, or should I file another bug on it?
Depends on: 1055152
Depends on: 1167027
No longer depends on: 1167027
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: