Closed Bug 1057082 Opened 5 years ago Closed 5 years ago

Add JitFrame support to ProfilingFrameIterator

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Windows 8.1
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: djvj, Assigned: djvj)

References

(Blocks 1 open bug)

Details

Attachments

(12 files, 42 obsolete files)

13.91 KB, patch
luke
: review+
Details | Diff | Splinter Review
3.44 KB, patch
luke
: review+
Details | Diff | Splinter Review
84.77 KB, patch
Details | Diff | Splinter Review
110.95 KB, patch
jandem
: review+
Details | Diff | Splinter Review
13.38 KB, patch
jandem
: review+
Details | Diff | Splinter Review
129.13 KB, patch
jandem
: review+
Details | Diff | Splinter Review
14.22 KB, patch
jandem
: review+
Details | Diff | Splinter Review
14.79 KB, patch
jandem
: review+
Details | Diff | Splinter Review
61.84 KB, patch
jandem
: review+
Details | Diff | Splinter Review
102.89 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
2.33 KB, patch
vporof
: review+
Details | Diff | Splinter Review
5.17 KB, patch
nbp
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
Luke added a ProfilingFrameIterator to help the profiler walk the AsmJS stack when profiling.  This needs to be extended to also be able to walk regular JitFrames contained within JitActivations.
Is the goal to only use the pc-lookup for the last pushed frame, and use the descriptor otherwise, as the BailoutFrameIterator works?  Or is the goal to replace the frameSize of the descriptor?
Well, the script/pc mapping stuff is already done.  This is to get the stack of return-addresses within an activation, which can be done pretty simply by just walking the descriptor word chains.
This adds a notion of a "profiling" activation, which the profiler is able to touch, establishes a linked list of these profiling activations, and cleans them up (by separating the notion of a profiling activation from that of an AsmJS activation).
Comment on attachment 8477622 [details] [diff] [review]
01-add-profiling-activation-linked-list.patch

Review of attachment 8477622 [details] [diff] [review]:
-----------------------------------------------------------------

Try run looks good: https://tbpl.mozilla.org/?tree=Try&rev=d0326f7706ec

I think this can land before further work.
Attachment #8477622 - Flags: review?(luke)
Whiteboard: [leave open]
Comment on attachment 8477622 [details] [diff] [review]
01-add-profiling-activation-linked-list.patch

Review of attachment 8477622 [details] [diff] [review]:
-----------------------------------------------------------------

Great!  A few questions:

::: js/src/asmjs/AsmJSLink.cpp
@@ +456,5 @@
>          AsmJSActivation activation(cx, module);
>          JitActivation jitActivation(cx, /* firstFrameIsConstructing = */ false, /* active */ false);
>  
> +        // Link the activation into the list of profiling activations.
> +        activation.registerProfilingActivation();

Could you do this in the AsmJSActivation constructor or, even better, do the register/unregister in the Activation() ctor/dtor (if isProfilingActivation)?

::: js/src/vm/Runtime.h
@@ +570,5 @@
> +    /*
> +     * Points to the most recent profiling activation running on the
> +     * thread.  Protected by rt->interruptLock.
> +     */
> +    js::Activation *profilingActivation_;

I'd add 'volatile' just to be maximally cautious.

::: js/src/vm/Stack.cpp
@@ +1583,5 @@
>      cx->mainThread().asmJSActivationStack_ = this;
>  
>      (void) errorRejoinSP_;  // squelch GCC warning
> +
> +    // Register this activation with the set of provilin

leftovers

@@ +1624,5 @@
> +Activation::registerProfilingActivation()
> +{
> +    JS_ASSERT(isProfilingActivation());
> +    JSRuntime::AutoLockForInterrupt lock(cx_->asJSContext()->runtime());
> +    cx_->perThreadData->profilingActivation_ = this;

How about making profilingActivation_ a mozilla::Atomic with sequential consistency?  The unregister case, it seems like, shouldn't need the "if (cx->perThreadData->profilingActivation_ == this)".

@@ +1628,5 @@
> +    cx_->perThreadData->profilingActivation_ = this;
> +}
> +
> +void
> +Activation::deregisterProfilingActivation()

I think "unregister" is more common.

@@ +1690,5 @@
> +    if (!done()) {
> +        JS_ASSERT(activation_->isProfilingActivation());
> +
> +        JS_ASSERT(activation_->isAsmJS());
> +        asmJSIter().~AsmJSProfilingFrameIterator();

iteratorDestroy()?

::: js/src/vm/Stack.h
@@ +1151,5 @@
>      }
>  
> +    inline bool isProfilingActivation() const;
> +    void registerProfilingActivation();
> +    void deregisterProfilingActivation();

The "Activation" suffix seems unnecessarily verbose since these are fields of Activation.
(In reply to Luke Wagner [:luke] from comment #5)
> ::: js/src/asmjs/AsmJSLink.cpp
> @@ +456,5 @@
> >          AsmJSActivation activation(cx, module);
> >          JitActivation jitActivation(cx, /* firstFrameIsConstructing = */ false, /* active */ false);
> >  
> > +        // Link the activation into the list of profiling activations.
> > +        activation.registerProfilingActivation();
> 
> Could you do this in the AsmJSActivation constructor or, even better, do the
> register/unregister in the Activation() ctor/dtor (if isProfilingActivation)?


I've moved this as requested, to the constructor for Activation, guarded by 'isProfiling()'.

Now, the reason I went the original route was concerns with the activation not being fully initialized by the time it got linked into the list.  I was thinking that each site where an activation was initialized would call 'registerProfiling()' after it knew that all relevant state on the activation was properly initialized, and thus reduce the surface area for issues relating to signal-handlers accessing the activations before they are in a well-known state.
Updated patch.
Attachment #8477622 - Attachment is obsolete: true
Attachment #8477622 - Flags: review?(luke)
Attachment #8478421 - Flags: review?(luke)
Comment on attachment 8478421 [details] [diff] [review]
01-add-profiling-activation-linked-list.patch

Review of attachment 8478421 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, thanks!
Attachment #8478421 - Flags: review?(luke) → review+
Ok, so I've been poking around this problem for a while now, trying to get a good handle on how it should be done.

The most reasonable route seems to be an approach that luke suggested: keep some instrumentation that, when Ion frames are entered or exited, updates a per-thread global with a pointer to the current frame.

This will duplicate some of the issues around instrumenting the jitcode for pseudostack operations, but it should be a lot simpler, because instead of keeping two stacks in sync whenever we implicitly or explicitly push/pop frames in weird corner-case paths, we just need to make sure that whatever frame we return into, that frame's stack address is written into the global variable.
This fixes a bug in the jitcode JitcodeGlobalEntry relating to comparing QueryEntries against regular entries.  Also adds some helper methods to JitcodeGlobalTable to inspect the splay tree.
Attached patch WIP-add-instrumentation.patch (obsolete) — Splinter Review
This is a very wip patch to add instrumentation to Ion and Baseline entries and exits to keep track of a "last frame pointer".  It's very ugly and will need cleanup and splitting most likely before landing.  But I'm just trying to get to something that works at this point.
This seems to fix of the last debugger-related issues in jit-tests.  PJS tests still fail, but that's because I'm hardcoding the global variable into the codegeneration, and forgetting to disable profiling instrumentation for PJS.  At some later point we can enable profiling for PJS too by fixing the code to store to per-thread data instead of a global var.

Next step is to fix up this patch so it's not nearly as ugly, and is written with tight assembly to test out performance.
Attachment #8484593 - Attachment is obsolete: true
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Split out from the WIP patch, this adds global fields jitActivation, prevJitActivation, and lastProfilingFrame.
Comment on attachment 8484592 [details] [diff] [review]
fix-jitcode-map.patch

This fixes some bugs in the JitcodeGlobalEntry code when comparing queries against entries.  Also adds some debug methods to JitcodeGlobalTable to scan through the entire thing.
Attachment #8484592 - Flags: review?(luke)
This adds x64-assembler-specific methods to generate code for entering and leaving ion frames.  Unlike the WIP patch, this patch splits out the proilerExitFrame instrumentation into a separate stub that can be jumped into.
Attachment #8484592 - Flags: review?(luke) → review+
Landed jitcode map fix (after earlier backout of land):  https://hg.mozilla.org/integration/mozilla-inbound/rev/8d7b20425420
I've been diligently plodding along with the implementation for this.  This patch set is going to be annoyingly large.  I've split it up, but there are still some biggies in that split set.  Saving my work because a HD corruption at this point would be catastrophic :)
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8485981 - Attachment is obsolete: true
Attachment #8486011 - Attachment is obsolete: true
Attached patch 05-add-assembler-helpers.patch (obsolete) — Splinter Review
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attached patch 08-add-jit-stack-walking.patch (obsolete) — Splinter Review
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Latest set of patches.  This runs in browser without crashing, and actually profiles stuff without using the pseudostack for jitcode.

Hasn't been tested through try yet.  Going to start testing the existing stuff via try, landing some of the prelim preparatory ptatches, and getting arch-coverage for non-x64.

I also need to confirm that performance overhead is not high for some reason (I expect it to be about the same as before).

Arch-specific stuff not implemented for ARM, X86, or MIPS.

All that said, I'm getting a non-crashy browser when running this, and jit-tests are passing (except parallel-JS stuff.  I need to find the places where PJS workers are being profiled and turn that off)
Attachment #8493395 - Attachment is obsolete: true
Attachment #8493396 - Attachment is obsolete: true
Attachment #8493397 - Attachment is obsolete: true
Attached patch 05-add-assembler-helpers.patch (obsolete) — Splinter Review
Attachment #8493398 - Attachment is obsolete: true
Attachment #8493399 - Attachment is obsolete: true
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attachment #8493400 - Attachment is obsolete: true
Attachment #8493401 - Attachment is obsolete: true
Just did a perf run with octane.  Measured stock browser score, stock browser with pseudostack profiling, modified browser score, and modified browser with stackwalk profiling.  Each trial was re-run 5 times and averaged.

Stock browser - avg 22136
With pseudostack profiling - avg 19623

Modifed browser - avg 21840
With new-style profiling - avg 20007

Overall, this seems reasonable.  The check of the stock vs modified browser scores with profiling disabled is just a sanity check.  The only thing changing is some compile-time behaviour, and codegen is not affected, so scores should not really change.  The modified browser has a slightly lower octane score, but it seems within noise levels (1.3%).

Comparing the two scores with profiling, the results also seem within noise levels.  The modified browser's profiling score is slightly higher (by 1.9%) than the stock browser's profiling scores.

Overall, I'm somewhat satisfied that this approach is not introducing any real regressions.  There's a small chance that different archs (esp. ARM) might buck the trend, but no real reason to suspect.  Also, the upside of finally getting inline frames back is a big deal.
Comment on attachment 8497764 [details] [diff] [review]
02-add-global-fields.patch

Try run for this looks good: https://tbpl.mozilla.org/?tree=Try&rev=db7acc59c8b2

We can review and lands this independently of other stuff.  The core change here is the following:

Adds jitActivation, lastProfilingFrame, and lastProfilingCallSite fields to JSRuntime.
Attachment #8497764 - Flags: review?(jdemooij)
(In reply to Kannan Vijayan [:djvj] from comment #34)
> Adds jitActivation, lastProfilingFrame, and lastProfilingCallSite fields to
> JSRuntime.

Why are there new jitActivation and prevJitActivation pointers? There is already a linked list of activations, it'd be nice if we could use that and just skip non-JIT activations.
(In reply to Jan de Mooij [:jandem] from comment #35)
> (In reply to Kannan Vijayan [:djvj] from comment #34)
> > Adds jitActivation, lastProfilingFrame, and lastProfilingCallSite fields to
> > JSRuntime.
> 
> Why are there new jitActivation and prevJitActivation pointers? There is
> already a linked list of activations, it'd be nice if we could use that and
> just skip non-JIT activations.

This is required for the exit-script instrumentation, which comes later.  The instrumentation when exiting a script is invoked immediately before return, when the stack looks like:

  [..., JitFrameDescriptor, ReturnAddr] <-- top of stack

From here, the instrumentation needs to update |perThreadData->lastProfilingActivation| to point to the address of the |lastProfilingFrame|, which is the previous jit frame that is being profiled.

If the instrumentation gets invoked for the first frame (i.e. prevFrameType is JitFrame_Entry), then it needs to get the current JitActivation and retrieve the prevJitTop from that to be able to walk up to the previous jit frame.

Now, I considered the possibility that it would always be the cast that the top-most activation when this code was invoked, would be a JitActivation.. but I wasn't sure where all the exit-frame instrumentation would be invoked (it's invoked in some strange places, like debug-trap-returns, and exception-based returns).

Basically, those two fields are there to provide a well-known place which always holds the latest JitActivation, so I can get at without subtle second-guessing, or potentially introducing bugs where there is some strange corner case where the top Activation is NOT a JitActivation.

To remove them, I'd have to go through all the places where it's accessed and prove to myself that the top Activation is a JitActivation.

Alternatively, I could change the exit-frame instrumentation to loop on |prev|, looking for the latest active JitActivation, but that seemed like it might hit performance.

A field in PerThreadData and one in JitActivation doesn't seem like too much a price to pay for the certainty and work it saved, so I went that route.
Are JSRuntime::{lastProfilingFrame,lastProfilingCallSite} analogous to AsmJSActivation::{fp_,exitReason_}?  One reason I put these in the AsmJSActivation is that it allowed the exit code to simply clobber these values (w/o saving the previous value) since each activation had its own copy.  I also found this simplified things a bit when iterating over activations (each activation simply had its own pointers).
They are roughly analagous.  I don't have to save the previous value, though.  I always clobber these on the runtime.  Also, keeping it on the runtime saves an extra indirection when loading/storing to these fields.  Odin has the benefit of not having to worry about saving frames for asm => asm calls, but the Ion and Baseline instrumentation has to save it for every (Ion|Baseline) => (Ion|Baseline) call, so avoiding the indirection is helpful.
(In reply to Jan de Mooij [:jandem] from comment #35)
> (In reply to Kannan Vijayan [:djvj] from comment #34)
> > Adds jitActivation, lastProfilingFrame, and lastProfilingCallSite fields to
> > JSRuntime.
> 
> Why are there new jitActivation and prevJitActivation pointers? There is
> already a linked list of activations, it'd be nice if we could use that and
> just skip non-JIT activations.

Maybe because we are only walking JitActivations during GCs?
(In reply to Kannan Vijayan [:djvj] from comment #38)
> They are roughly analagous.  I don't have to save the previous value,
> though.  I always clobber these on the runtime.

For lastProfilingFrame: since that is the 'virtual fp', don't you have to save/restore it at every call/return?

> Odin has the benefit of not having to worry about saving frames for asm =>
> asm calls, but the Ion and Baseline instrumentation has to save it for every
> (Ion|Baseline) => (Ion|Baseline) call, so avoiding the indirection is
> helpful.

I don't understand this distinction: Odin pushes/pops AsmJSActivation::fp_ on every call/return (for the reason mentioned above).
(In reply to Nicolas B. Pierron [:nbp] from comment #39)
> > Why are there new jitActivation and prevJitActivation pointers? There is
> > already a linked list of activations, it'd be nice if we could use that and
> > just skip non-JIT activations.
> 
> Maybe because we are only walking JitActivations during GCs?

The reason for a separate linked list is that this linked list has to be async-safe since it can be traversed at any pc (from a signal handler or from another thread via SuspendThread).  Eventually, when all activations can be profiled, all will be async-safe so we won't need a separate linked list; so you can view this as temporary.
(In reply to Luke Wagner [:luke] from comment #40)
> (In reply to Kannan Vijayan [:djvj] from comment #38)
> > They are roughly analagous.  I don't have to save the previous value,
> > though.  I always clobber these on the runtime.
> 
> For lastProfilingFrame: since that is the 'virtual fp', don't you have to
> save/restore it at every call/return?
> 
> > Odin has the benefit of not having to worry about saving frames for asm =>
> > asm calls, but the Ion and Baseline instrumentation has to save it for every
> > (Ion|Baseline) => (Ion|Baseline) call, so avoiding the indirection is
> > helpful.
> 
> I don't understand this distinction: Odin pushes/pops AsmJSActivation::fp_
> on every call/return (for the reason mentioned above).

Ah in that case there is little distinction between the lastProfilingFrame and your AsmJSActivation::fp_ field, then.  We _could_ have it as a field on JitActivation, and store it there.  But I'm not sure what benefit that provides.  There are logically global fields, and I already don't have any issue with having to save/restore previous activation's values.  JitActivations already keep track of the |prevJitTop| for the previous JitActivation, which is effectively a frame pointer to start walking from.  The |lastProfilingFrame| and |lastProfilingCallSite| exist because for the "topmost" jit activation (which may be actively executing, or have called-out to C++), jitTop is not a reliable field to use during sampling.


The analagous implementation to what you've done in AsmJS, is if we take |jitTop|, and move it into the current JitActivation (instead of it being runtime-global), and then eliminate |prevJitTop| from JitActivation (since |jitActivation->prevJitTop()| would then be the same as |jitActivation()->prevJitActivation()->jitTop()|.

That still doesn't eliminate the need for |lastProfilingFrame| and |lastProfilingCallSite|, though.  Now, we _could_ store lastProilingFrame and lastProfilingCallSite on JitActivation, but as I said before, I don't see the benefit of that.  These are global "top of the jit stack" pointers.  They don't need to be duplicated across every JitActivation.
(In reply to Kannan Vijayan [:djvj] from comment #42)
> That still doesn't eliminate the need for |lastProfilingFrame| and
> |lastProfilingCallSite|, though.  Now, we _could_ store lastProilingFrame
> and lastProfilingCallSite on JitActivation, but as I said before, I don't
> see the benefit of that.  These are global "top of the jit stack" pointers. 
> They don't need to be duplicated across every JitActivation.

IIUC, JitActivation would need to save copies of the soon-to-be-clobbered-by-JIT-code lastProfiling(Frame|CallSite), right?

So how does this exactly work when jitcode A calls jitcode B via C++?  It seems like the only way to avoid dangling references (as jitcode B is returning to jitcode A) is for jitcode B to restore lastProfiling(Frame|CallSite) before returning to C++.  Is that right?

From this perspective, I think storing these global fields in the activation is mostly just a minor simplification at the cost of an extra indirection in profiling prologue/epilogues.  For asm.js, there was the additional goal of removing absolute references to the JSRuntime (for the eventual goal of sharing jit code between workers that are running the same asm.js code).  But I guess we could achieve this in a number of ways.
No, we don't need to save copies of lastProfilingFrame and lastProfilingCallSite.  The way these fit together is as follows.

Your second paragraph has the right idea, almost.  All jitcode _always_ updates |lastProfilingFrame| and |lastProfilingCallSite| when exiting frame (when profiling is turned on, of course).

So even if Jitcode A calls jitcode B directly, jitcode B's instrumentation around the return point will ensure that the lastProfilingCallFrame is updated to point to A's frame and lastProfilingCallSite points to the returnAddr into the A's jitcode (it skips past any intermediate helper frames like Rectifier frames and BaselineStub frames to do this).  For a good perspective on how it does that, see the logic in |::profilerExitFrame()| and |::profileExitFrametail()| here:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1057082&attachment=8497765
Wow, that is a monster stub.  I know it involves changing the stack layout, but saving lastProfilingFrame still seems preferable to me.  It seems like the space could be statically reserved (even when profiling isn't enabled) so that you'd get full testing via normal Ion execution.

For lastProfilingCallsite, there isn't any need to save of course: I assume you clear lastProfilingCallSite right after returning to JIT code (that is, its resting state is null)?
Yeah, that's a bit of code in the stub.  On the other hand, A lot of it represent branched execution paths.  All the likely hot paths only execute the small cases in that stub.  For example (Ion => Ion) just takes the first branch and it's done.  Ion => Baseline takes the second branch, skips past the BaselineStub frame, and is done.

I've ordered the branches to check for the most likely cases first.

About |lastProfilingCallSite|, that field is used in other places too.  In a later patch, I instrument call/return sites within jitcode to set |lastProfilingCallSite| before calling out.  It doesn't actually need to be cleared on return.  It can hang free, because it doesn't matter if it has "bad" values, since it's used speculatively.

So, when we obtain the last-frame during sampling, I use the following method:

1. Get the lastProfilingFrame (which is always accurate)
2. Get the jitcode range for the callee of the lastProfilingFrame
3. Check if the native pc is within that jitcode range.  If so, we are good to go.
4. Check if lastProfilingCallSite is within that jitcode range.  If so, use that instead of the native pc.
5. If neither are contained within the jitcode, pop the top frame and work from there.

Since we _always_ set lastProfilingCallSite before leaving jitcode (either via a call, or a return), we can be sure that either state.pc points into the last profiling frame, or if it doesn't, then lastProfilingCallSite will point into the frame.  There are a few corner cases where this doesn't apply (sample within interrupt, for example), in which case both checks will fail and we'll just pop the top frame for those cases.
Also, despite the size of that exit instrumentation stub, one of the things I really like about it is that it is a completely "stateless" solution.  It doesn't save any values, it doesn't need to worry about values syncing up.  It just leverages knowledge of the stack format to walk up the stack, no saving/restoring necessary.

I think the added simplicity of that approach is even more reason to go that route.
Yes, it leverages the fact that Ion already effectively saves the caller fp (just encoded as an offset not a pointer) and it is attractive to not save fp, effectively, twice in the frame.  Ideally, Ion could just save the caller's fp directly (instead of the offset), but that would be slower in any execution mode.

Aren't you worried about people changing the frame layout (or adding frame types) and that breaking the stub?
We haven't had the need to change frame layout for a while.  Saving the prevFp along with this patch will make an already-large patchset even larger and more complicated, and itself represents a change to the frame layout that will affect a lot of other code - many other places will need to be fixed up to adjust to that change.

Personally, I think going that route falls into the trap of letting the perfect hinder the good.  The initial goal of this patchset is: get the backend change in, without regressing performance, in a short timeframe, and enabling inline frame profiling.  That goal alone leads to a pretty large changeset.  I don't think it's wise to expand that scope.

Note that saving prevFP is not trivial.  We'll have to shuttle the frame pointer through stubs like the rectifier, and through C++/JS re-entries, along with all of the other changes required to make Ion work with the new frame layout.

Once the new profiling backend is in, we can sit down and start thinking about prioritizing and landing any subsequent changes.  Not only that, it'll allow work to occur in parallel.  Currently, a lot of the devtools stuff we want to do is blocked on this profiler backend change.  I want to get this in as quickly as possible.

I want reviews for everything to be well underway by the end of this month, and for everything to have landed and took (no backouts) by end of November.  Scope creep jeopardizes that goal, IMHO.
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8497764 - Attachment is obsolete: true
Attachment #8497764 - Flags: review?(jdemooij)
Attachment #8497765 - Attachment is obsolete: true
Attachment #8497766 - Attachment is obsolete: true
Attached patch 05-add-assembler-helpers.patch (obsolete) — Splinter Review
Attachment #8497767 - Attachment is obsolete: true
Attachment #8497768 - Attachment is obsolete: true
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attachment #8497769 - Attachment is obsolete: true
This latest set of patches fixes more bugs in profiling.  Several crashes, some asserts, and issues with dropped last frames when profiling.

Only remaining issue is asserts hitting with PJS.  I need to disable profiling on PJS code before this can be landed.
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8508961 - Attachment is obsolete: true
Attachment #8508962 - Attachment is obsolete: true
Attachment #8508963 - Attachment is obsolete: true
Attached patch 05-add-assembler-helpers.patch (obsolete) — Splinter Review
Attachment #8508964 - Attachment is obsolete: true
Attachment #8508965 - Attachment is obsolete: true
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attachment #8508967 - Attachment is obsolete: true
Well, I ran into a problem (raised by the single-stepping profiling tester), that will require some minor structural changes to the patch.

Basically, Luke's earlier suggestion of keeping |lastProfilingFrame_| and |lastProfilingCallSite_| on the activation instead of as globals hanging off of JSRuntime.. is the right way to go.

Mainly, there is a race condition when we're adding a new activation (which gets set as the current global profilingActivation), but the lastProfilingFrame_ still refers to a frame on the previous activation.  This can lead to inconsistent stacks (not a security issue, but a correctness issue) in some corner case samples (immediately after a new JitActivation has been installed, but before that activation's first frame has been entered).

I can't see a way of solving this without moving the fields into the activation.  The fields would start off as NULL (indicating that the activation is fresh, and that no frame has been entered), and this allows the profiler to discard that top activation.

This should mainly involve some minor changes to the entry/exit instrumentation.  The remaining issues (turning off profiling for PJS, various ARM issues) have been fixed.  Crossing fingers and hoping this is the last issue to resolve.
Ok, finished entry/exit instrumentation changes to use activation instead.  Found and squashed a bunch of bugs.  Down to fixing two tests: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=48b60da42603

Both test failures are in xpcshell/tests/toolkit/devtools/server/tests/unit/...

Reduced and reorganized the patchset as well.
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8509680 - Attachment is obsolete: true
Attachment #8509681 - Attachment is obsolete: true
Attachment #8509683 - Attachment is obsolete: true
Attached patch 05-add-assembler-helpers.patch (obsolete) — Splinter Review
Attachment #8509684 - Attachment is obsolete: true
Attachment #8509685 - Attachment is obsolete: true
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attachment #8509686 - Attachment is obsolete: true
Attached patch 08-test-fixes.patch (obsolete) — Splinter Review
Attachment #8535129 - Flags: review?(jdemooij)
Attachment #8535130 - Flags: review?(jdemooij)
Attachment #8535132 - Flags: review?(jdemooij)
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8535129 - Attachment is obsolete: true
Attachment #8535129 - Flags: review?(jdemooij)
Attachment #8540781 - Flags: review?(jdemooij)
Attachment #8535130 - Attachment is obsolete: true
Attachment #8535130 - Flags: review?(jdemooij)
Attachment #8540782 - Flags: review?(jdemooij)
Attachment #8535132 - Attachment is obsolete: true
Attachment #8535132 - Flags: review?(jdemooij)
Attachment #8540783 - Flags: review?(jdemooij)
Attachment #8535133 - Attachment is obsolete: true
Attachment #8540784 - Flags: review?(jdemooij)
Attachment #8535135 - Attachment is obsolete: true
Attachment #8540785 - Flags: review?(jdemooij)
Attached patch 07-integrate-with-profiler.patch (obsolete) — Splinter Review
Attachment #8535136 - Attachment is obsolete: true
Attachment #8540786 - Flags: review?(jdemooij)
Attachment #8535137 - Attachment is obsolete: true
Attachment #8540787 - Flags: review?(jdemooij)
Comment on attachment 8540781 [details] [diff] [review]
02-add-global-fields.patch

Review of attachment 8540781 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, just not sure about the signal issue. Does asm.js profiling have the same issue? If so Luke might know more.

::: js/src/vm/Stack.h
@@ +1278,5 @@
> +    // When profiling is enabled, these fields will be updated to reflect the
> +    // last pushed frame for this activation, and if that frame has been
> +    // left for a call, the native code site of the call.
> +    void       * volatile lastProfilingFrame_;
> +    void       * volatile lastProfilingCallSite_;

Do we emit a single machine instruction to write this value, even on ARM/MIPS? If we can't guarantee that (might be hard for C++ code?) we probably need to make these Atomic. Also not sure if it's safe if the write *is* a single instruction...
Attachment #8540781 - Flags: review?(jdemooij) → feedback+
Comment on attachment 8540782 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8540782 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, also great comments. Some suggestions below.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5021,5 @@
> +MacroAssemblerARMCompat::profilerExitFrameTail(Register scratch1, Register scratch2,
> +                                               Register scratch3, Register scratch4)
> +{
> +    /*
> +     * The code generated below expects that the current stack pointer points to

Nit: use //-comments and max 80 characters per line for comments.

@@ +5044,5 @@
> +     *
> +     * So this jitcode is responsible for "walking up" the jit stack, finding the previous
> +     * Ion or Baseline JS frame, and storing its address into the global per-thread variable.
> +     *
> +     * There are a fixed number of different path types taht can lead to the current frame,

Nit: that (typo)

@@ +5088,5 @@
> +        Label checkOk;
> +        branchPtr(Assembler::Equal, StackPointer, scratch1, &checkOk);
> +        movePtr(ImmPtr("Mismatch between stored lastProfilingFrame and current stack pointer."),
> +                r5);
> +        breakpoint();

Nit: can use assumeUnreachable("Mismatch between...")

Might have to move this code into Trampoline-arm.cpp for this to work, because assumeUnreachable is in a subclass of MacroAssemblerARMCompat. I think that'd be nice also for consistency with most of the other trampolines in Trampoline-*.cpp

@@ +5116,5 @@
> +    branch32(Equal, scratch2, Imm32(JitFrame_Rectifier), &handle_Rectifier);
> +    branch32(Equal, scratch2, Imm32(JitFrame_Entry), &handle_Entry);
> +
> +    movePtr(ImmPtr("Invalid caller frame type when exiting from Ion frame."), r8);
> +    breakpoint();

Same here.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +553,5 @@
> +void
> +MacroAssemblerX64::profilerExitFrameTail(Register scratch1, Register scratch2, Register scratch3,
> +                                         Register scratch4)
> +{
> +    /*

How hard would it be to share this code between the different architectures? Are there any significant differences?

Having less platform-dependent code is also nice for the ARM64 and MIPS people. I think eventually we should try to do the same for handleFailureWithHandlerTail() and others.

@@ +675,5 @@
> +        storePtr(scratch2, lastProfilingCallSite);
> +
> +        // Store return frame in lastProfilingFrame.
> +        // scratch2 := StackPointer + Descriptor.size*1 + IonJSFrameLayout::Size();
> +        lea(Operand(StackPointer, scratch1, TimesOne, IonJSFrameLayout::Size()), scratch2);

This could be computeEffectiveAddress if we share the code.
Attachment #8540782 - Flags: review?(jdemooij)
Comment on attachment 8540783 [details] [diff] [review]
04-profiler-lastprofilingframe-usage.patch

Review of attachment 8540783 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Bailouts.cpp
@@ +88,5 @@
>      // increment the reference counter for each activation that appear on the
>      // stack. As the bailed frame is one of them, we have to decrement it now.
>      if (iter.ionScript()->invalidated())
>          iter.ionScript()->decrementInvalidationCount(cx->runtime()->defaultFreeOp());
> + 

Nit: some trailing whitespace on this line. Nice comment btw!

@@ +181,5 @@
>          JitSpew(JitSpew_IonInvalidate, "   new  ra %p", (void *) frame->returnAddress());
>      }
>  
>      iter.ionScript()->decrementInvalidationCount(cx->runtime()->defaultFreeOp());
> + 

Same here.

::: js/src/jit/BaselineCompiler.cpp
@@ +186,5 @@
>          BaselineScript::New(script, prologueOffset_.offset(),
>                              epilogueOffset_.offset(),
>                              spsPushToggleOffset_.offset(),
> +                            profilerEnterFrameToggleOffset_.offset(),
> +                            profilerExitFrameToggleOffset_.offset(),

We should fix those up (for platforms with pools like ARM). See

  ...
  postDebugPrologueOffset_.fixup(&masm);

above.

@@ +3640,5 @@
> +    {
> +        Register scratchReg = scratch2;
> +        Label skip;
> +        masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), scratchReg);
> +        masm.branchTest32(Assembler::Zero, scratchReg, scratchReg, &skip);

Nit: this is shorter and probably more efficient:

masm.branch32(Assembler::Equal, AbsoluteAddress(..), Imm32(0), &skip);

@@ +3642,5 @@
> +        Label skip;
> +        masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), scratchReg);
> +        masm.branchTest32(Assembler::Zero, scratchReg, scratchReg, &skip);
> +        masm.loadPtr(AbsoluteAddress(cx->mainThread().addressOfProfilingActivation()), scratchReg);
> +        // TODO: ensure that the activation is a JitActivation

We should either do this or remove the comment if it's not an issue.

::: js/src/jit/BaselineCompiler.h
@@ +261,5 @@
>      bool emitSPSPush();
>      void emitSPSPop();
>  
> +    bool emitProfilerEnterFrame();
> +    bool emitProfilerExitFrame();

Nit: these two always return true so we can s/bool/void/

::: js/src/jit/BaselineIC.cpp
@@ +1078,5 @@
> +    // the frame currently being OSR-ed
> +    {
> +        Label checkOk;
> +        masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), scratchReg);
> +        masm.branchTest32(Assembler::Zero, scratchReg, scratchReg, &checkOk);

Nit: can use branch32 here too

@@ +1083,5 @@
> +        masm.loadPtr(AbsoluteAddress((void*)&cx->mainThread().jitActivation), scratchReg);
> +        masm.loadPtr(Address(scratchReg, JitActivation::offsetOfLastProfilingFrame()), scratchReg);
> +        masm.branchPtr(Assembler::Equal, scratchReg, BaselineStackReg, &checkOk);
> +        masm.movePtr(ImmPtr("Baseline OSR lastProfilingFrame mismatch."), scratchReg);
> +        masm.breakpoint();

masm.assumeUnreachable("Baseline OSR lastProfilingFrame mismatch.");

::: js/src/jit/JitCompartment.h
@@ +426,1 @@
>          return rt->spsProfiler.enabled();

Nit: Maybe JitRuntime should have a "JSRuntime *" pointer so callers don't have to pass one explicitly? Just a thought; either way is fine with me.

::: js/src/jit/JitFrames.cpp
@@ +752,5 @@
>  {
>      JSContext *cx = GetJSContextFromJitCode();
>      TraceLogger *logger = TraceLoggerForMainThread(cx->runtime());
>  
> +    AutoResetLastProfilerFrameOnReturnFromException profFrameReset(cx, rfe);

Nice :)

::: js/src/jit/arm/Trampoline-arm.cpp
@@ +291,5 @@
> +        {
> +            Label skipProfilingInstrumentation;
> +            Register realFramePtr = numStackValues;
> +            masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), scratch);
> +            masm.branchTest32(Assembler::Zero, scratch, scratch, &skipProfilingInstrumentation);

Nit: branch32 instead of load32/branchTest32

@@ +1005,5 @@
> +    // is set to the correct caller frame.
> +    {
> +        Label skipProfilingInstrumentation;
> +        masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), r5);
> +        masm.branchTest32(Assembler::Zero, r5, r5, &skipProfilingInstrumentation);

Same here.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +454,5 @@
> +    {
> +        Label skipProfilingInstrumentation;
> +        // Test if profiler enabled.
> +        load32(AbsoluteAddress(GetJitContext()->runtime->spsProfiler().addressOfEnabled()), rdi);
> +        branchTest32(Assembler::Zero, rdi, rdi, &skipProfilingInstrumentation);

Nit: branch32 instead of load32 + branchTest32

::: js/src/jit/x64/Trampoline-x64.cpp
@@ +829,5 @@
> +    // is set to the correct caller frame.
> +    {
> +        Label skipProfilingInstrumentation;
> +        masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), rdi);
> +        masm.branchTest32(Assembler::Zero, rdi, rdi, &skipProfilingInstrumentation);

Nit: branch32 instead of load32 + branchTest32

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +434,5 @@
> +    {
> +        Label skipProfilingInstrumentation;
> +        // Test if profiler enabled.
> +        load32(AbsoluteAddress(GetJitContext()->runtime->spsProfiler().addressOfEnabled()), edi);
> +        branchTest32(Assembler::Zero, edi, edi, &skipProfilingInstrumentation);

Nit: branch32 instead of load32 + branchTest32

::: js/src/jit/x86/Trampoline-x86.cpp
@@ +232,5 @@
> +        {
> +            Label skipProfilingInstrumentation;
> +            Register realFramePtr = numStackValues;
> +            masm.load32(AbsoluteAddress(cx->runtime()->spsProfiler.addressOfEnabled()), scratch);
> +            masm.branchTest32(Assembler::Zero, scratch, scratch, &skipProfilingInstrumentation);

Nit: branch32 instead of load32 + branchTest32

::: js/src/vm/SPSProfiler.cpp
@@ +425,5 @@
> +    jit::FrameType prevType = exitFrame->prevType();
> +
> +    uint8_t *prev = exitFramePtr + (jit::ExitFrameLayout::Size() + prevSize);
> +    //fprintf(stderr, "KVKV followExitFrame prev=%p prevSize=%x prevType=%d\n",
> +    //        prev, (int)prevSize, (int)prevType);

Remove comment.
Attachment #8540783 - Flags: review?(jdemooij) → review+
Comment on attachment 8540784 [details] [diff] [review]
05-add-assembler-helpers.patch

Review of attachment 8540784 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comment below addressed.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5346,5 @@
>      }
>  }
> +
> +void
> +MacroAssemblerARMCompat::profilerPreCallImpl()

These two methods are the same for each platform, right? If so let's move them to jit/MacroAssembler.h
Attachment #8540784 - Flags: review?(jdemooij) → review+
Comment on attachment 8540785 [details] [diff] [review]
06-replace-sps-instrumentation-with-exitaddr-instrumentation.patch

Review of attachment 8540785 [details] [diff] [review]:
-----------------------------------------------------------------

djvj++, would review again! \o/

::: js/src/jit/BaselineFrame.cpp
@@ -181,1 @@
>      JSContext *cx = GetJSContextFromJitCode();

We can now move this line inside the "if (fp->isDebuggee())" block below.

::: js/src/vm/Stack.cpp
@@ +1332,5 @@
>  bool
>  AbstractFramePtr::hasPushedSPSFrame() const
>  {
>      if (isInterpreterFrame())
>          return asInterpreterFrame()->hasPushedSPSFrame();

Just curious, the interpreter implementation will stay the same?
Attachment #8540785 - Flags: review?(jdemooij) → review+
Attachment #8540787 - Flags: review?(jdemooij) → review+
(Will review part 7 tomorrow.)
Comment on attachment 8540786 [details] [diff] [review]
07-integrate-with-profiler.patch

Review of attachment 8540786 [details] [diff] [review]:
-----------------------------------------------------------------

Looks OK but a few things I don't fully understand. Somebody else should review the tools/profiler/* changes.

::: js/public/ProfilingFrameIterator.h
@@ +101,5 @@
> +        FrameKind kind;
> +        void* stackAddress;
> +        void* returnAddress;
> +        void* activation;
> +        const char* label;

Nit: * should go right before the name for these 4 members.

::: js/src/jit/BaselineCompiler.cpp
@@ +261,5 @@
> +
> +        // Generate profiling string.
> +        char *str = JitcodeGlobalEntry::createScriptString(cx, script);
> +        if (!str)
> +            return Method_Error;

createScriptString does quite a lot of work and memory allocations. If the entry also has the JSScript, can't we generate this string lazily when the profiler is enabled? Or walk the stack at that point and generate the strings there?

::: js/src/jit/CodeGenerator.cpp
@@ +7561,5 @@
> +        // Mark the jitcode as having a bytecode map.
> +        code->setHasBytecodeMap();
> +    } else {
> +        // Add a dumy jitcodeGlobalTable entry.
> +        JitcodeGlobalEntry::DummyEntry entry;

Why do we need the dummy entries?

::: js/src/jit/Ion.cpp
@@ +779,5 @@
>  
>      // Buffer can be freed at any time hereafter. Catch use-after-free bugs.
>      // Don't do this if the Ion code is protected, as the signal handler will
>      // deadlock trying to reacquire the interrupt lock.
> +    if (rt->jitRuntime())

While you're here, can you remove the "if (rt->jitRuntime())" and the last two lines of the comment? There used to be a check for is-the-code-protected here but that was removed recently and this condition makes no sense anymore (it's always true).

::: js/src/jit/JitcodeMap.cpp
@@ +209,5 @@
>  
>      // query ptr < entry
>      return flip * -1;
>  }
> + 

Nit: trailing whitespace.

@@ +244,5 @@
> +    // Calculate lineno length
> +    bool hasLineno = false;
> +    size_t linenoLength = 0;
> +    char linenoStr[15];
> +    if (hasName || script->lineno() > 1) {

Why is a function that starts at the first line not treated the same as a function that starts at the second line?

@@ +268,5 @@
> +        fullLength = filenameLength;
> +    }
> +
> +    // Allocate string.
> +    char *str = (char *) js_malloc(sizeof(char) * (fullLength + 1));

cx->pod_malloc<char>(..);

@@ +271,5 @@
> +    // Allocate string.
> +    char *str = (char *) js_malloc(sizeof(char) * (fullLength + 1));
> +    if (!str) {
> +        if (nameStr)
> +            js_free(nameStr);

We should use UniquePtr to avoid potential memory leaks if we forget to free.

@@ +737,5 @@
> +
> +struct AutoFreeProfilingStrings {
> +    ProfilingStringVector &profilingStrings_;
> +    bool keep_;
> +    AutoFreeProfilingStrings(ProfilingStringVector &vec)

Should be `explicit` or the static analysis build will fail nowadays.

::: js/src/vm/SPSProfiler.h
@@ +275,5 @@
>  };
>  
>  /*
> + * This class is used in the interpreter to bound regions where the baseline JIT
> + * being entered via OSR.  It markes the current top pseudostack entry as

Nit: marks (typo)

::: js/src/vm/Stack.cpp
@@ +1841,5 @@
> +        return 1;
> +    }
> +
> +    MOZ_ASSERT(isJit());
> +    // Look up the nativeToBytecode mapping entry for the a

Nit: fix comment.
Attachment #8540786 - Flags: review?(jdemooij)
Attached patch 02-add-global-fields.patch (obsolete) — Splinter Review
Attachment #8544683 - Flags: review?(jdemooij)
Updated patch to use Atomics.
Attachment #8540781 - Attachment is obsolete: true
Attachment #8544683 - Attachment is obsolete: true
Attachment #8544683 - Flags: review?(jdemooij)
Attachment #8544685 - Flags: review?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #80)
> Review of attachment 8540782 [details] [diff] [review]:
> ----------------------------------------------------------------- 
> How hard would it be to share this code between the different architectures?
> Are there any significant differences?

It's easy enough to share them - the difficulty is in getting optimal times when sharing them.  On ARM and MIPS, directly using the 3-operand instructions allows us to shave off crucial instructions here and there.

I agree that it would be really nice to share them, since the logic itself is the same across all platforms.  On the other hand, I think this is one of those rare cases where the marginal perf advantages justifies differing implementations.  We really want to be able to claim that turning on profiling affects runtime perf as minimally as possible.
(In reply to Kannan Vijayan [:djvj] from comment #88)
> It's easy enough to share them - the difficulty is in getting optimal times
> when sharing them.  On ARM and MIPS, directly using the 3-operand
> instructions allows us to shave off crucial instructions here and there.

Can we add a 3-operand add32() to the macro-assemblers that uses a move on x86/x64 and just an add on ARM?

Also, the rectifier case requires most code, but that one is actually not very common. It seems like we should be able to share that and use an #ifdef or platform-specific method to squeeze out the last cycles on the hot paths...
(In reply to Jan de Mooij [:jandem] from comment #89)
> (In reply to Kannan Vijayan [:djvj] from comment #88)
> > It's easy enough to share them - the difficulty is in getting optimal times
> > when sharing them.  On ARM and MIPS, directly using the 3-operand
> > instructions allows us to shave off crucial instructions here and there.
> 
> Can we add a 3-operand add32() to the macro-assemblers that uses a move on
> x86/x64 and just an add on ARM?

We need a bit more than that.  For example, in the instructions to extract the type tag, on x86 we have:

    movePtr(scratch1, scratch2);
    rshiftPtr(Imm32(FRAMESIZE_SHIFT), scratch1);
    and32(Imm32((1 << FRAMETYPE_BITS) - 1), scratch2);

on ARM we have:

    ma_and(Imm32((1 << FRAMESIZE_SHIFT) - 1), scratch1, scratch2);
    rshiftPtr(Imm32(FRAMESIZE_SHIFT), scratch1);

a single move, sure.. but I guess I could just factor that particular instruction out into a "extractFrameDescriptorInfo()" method and use that.

You know what.. I think I'll try to unify these and share the code.  Also, moving this implementation to the high-level MacroAssembler opens up "assumeUnreachable" and other friends without having to move into Trampoline-*.cpp
Ok, after discussion this with Jan, we agreed to mark the stub unification work as a follow-up bug.  Mostly it's due to the fact that while it's a nice to have, it's a nice fixed size bug and not a correctness issue, and mostly a cleanliness issue.

Jan has convinced me that this is the better way to go, but in terms of scheduling, I'd rather land the existing stuff (and avoid the extra testing iteration for this change - on top of any rebasing, etc.), and unblock shu and jump onto his work, and come back to this after we're sure we can meet our timeline goals (which involves making progress on some more-difficult-to-predict work).
Assignee: nobody → kvijayan
Depends on: 1119344
Blocks: 1119344
No longer depends on: 1119344
(In reply to Jan de Mooij [:jandem] from comment #83)
> Comment on attachment 8540785 [details] [diff] [review]
> Just curious, the interpreter implementation will stay the same?

Yeah, the interpreter will still use the pseudostack.  Entry and exit from frames is a lot more sensible in the interpreter.  That particular usage has never produced a bug, so I don't want to mess with it.
(In reply to Kannan Vijayan [:djvj] from comment #92)
IIUC, even with the interpreter still using psuedo frames, we can remove the StackEntry::ASMJS flag and the associated gross hacks in TableTicker.cpp/Stack.cpp
  http://hg.mozilla.org/mozilla-central/file/d480b3542cc2/tools/profiler/TableTicker.cpp#l522
  http://hg.mozilla.org/mozilla-central/file/d480b3542cc2/js/src/vm/Stack.cpp#l1594
since we will now have an isCpp pseudo frame pushed (by js::Interpret's prologue) before new isJs pseudo frames are pushed.  Is that right?  That will be nice.  (Even nicer if we can remove isJs frames altogether, but not blocking this bug and should be relatively easy given the existing abstractions in place, I'd think.)
Attachment #8540786 - Flags: review?(bgirard)
(In reply to Luke Wagner [:luke] from comment #93)
> (In reply to Kannan Vijayan [:djvj] from comment #92)
> IIUC, even with the interpreter still using psuedo frames, we can remove the
> StackEntry::ASMJS flag and the associated gross hacks in
> TableTicker.cpp/Stack.cpp

Yeah.  The patchset DOES remove the ASMJS flag, but it adds a new OSR flag.  The merge code in table ticker is cleaned up a lot, and is way more readable now.  I think we should be able to simplify it even further over time, but for now I'm pretty happy with the outcome.

I completely intend to do some housekeeping on this code after we complete the jit coach.  It'll be nice pick-up work to wind down after this project.

> since we will now have an isCpp pseudo frame pushed (by js::Interpret's
> prologue) before new isJs pseudo frames are pushed.  Is that right?  That
> will be nice.  (Even nicer if we can remove isJs frames altogether, but not
> blocking this bug and should be relatively easy given the existing
> abstractions in place, I'd think.)

Yeah.  It'd be really nice to teach the stackwalker to walk interpreter stacks.  I think it's doable.
Ah, great, I see that now in patch 7.  With that patch, AsmJSActivation::profiler_ is now dead so can you remove it and the if-block that assigns to it in the ctor (along with the "NB" comment)?
(In reply to Jan de Mooij [:jandem] from comment #85)
> Comment on attachment 8540786 [details] [diff] [review]
> 07-integrate-with-profiler.patch
> 
> Review of attachment 8540786 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks OK but a few things I don't fully understand. Somebody else should
> review the tools/profiler/* changes.

Nod.  Sent r? to BenWa.


> createScriptString does quite a lot of work and memory allocations. If the
> entry also has the JSScript, can't we generate this string lazily when the
> profiler is enabled? Or walk the stack at that point and generate the
> strings there?

Theoretically possible to do this, but it involves changes all the way up the stack - because the APIs for enabling profiling (several layers up, well into geckoland) assume infallibility.

I like the generate-strings-on-profiler-enable (it should be possible, as you mentioned, to scan the tree and generate the strings then).

As it stands, the overhead of this is not that high, given that we're only doing it when we compile baseline scripts, and there is a lot of memory allocation and (relatively) heavyweight stuff happening during baseline compilation anyway.


> Why do we need the dummy entries?

Strictly speaking, we don't.. it's just a really good idea.  This is to handle the case when we enable profiling with Ion frames on stack.  The Ion jitcode will be invalidated, sure, but the invalidated frames are still on stack, and they'll still be walked by the ProfilingFrameIterator.

I was hitting an ASSERT in there that basically checked that EVERY return address mapped to something legit in the JitcodeMap.  We could leave the dummy entries out, but it would require removing that ASSERT, and it's a very, very useful sanity-checking ASSERT.

In addition, we only do this in association with compiling an Ion script, so the marignal space and memory overhead of a dummy entry in the tree is pretty small.  Also given that only a very small fraction of the code gets Ion compiled in the first place, it seemed reasonable to just go ahead and add dummy entries to enable the ASSERT to stay in place.

> @@ +244,5 @@
> > +    // Calculate lineno length
> > +    bool hasLineno = false;
> > +    size_t linenoLength = 0;
> > +    char linenoStr[15];
> > +    if (hasName || script->lineno() > 1) {
> 
> Why is a function that starts at the first line not treated the same as a
> function that starts at the second line?

It's kind of a hacky way to distinguish top-level scripts from other stuff.  For top-levels, we just want the filename, for others (anonymous functions, evals, some other special cases), we want to show linenos.  Is there a better way of doing this?
(In reply to Kannan Vijayan [:djvj] from comment #96)
> It's kind of a hacky way to distinguish top-level scripts from other stuff. 
> For top-levels, we just want the filename, for others (anonymous functions,
> evals, some other special cases), we want to show linenos.  Is there a
> better way of doing this?

script->functionNonDelazifying() || script->isForEval() should give you all functions and evals. That way at least those have line numbers... If it doesn't cover all scripts we can always add the lineno > 1 back.
Comment on attachment 8540786 [details] [diff] [review]
07-integrate-with-profiler.patch

Review of attachment 8540786 [details] [diff] [review]:
-----------------------------------------------------------------

r- but just because I want to see how the tick masking is adjusted.

::: tools/profiler/TableTicker.cpp
@@ +390,5 @@
>  static
>  void addPseudoEntry(volatile StackEntry &entry, ThreadProfile &aProfile,
>                      PseudoStack *stack, void *lastpc)
>  {
>    // Pseudo-frames with the ASMJS flag are just annotations and should not be

This comment doesn't seem to reflect your change. Please adjust it. If it does please make it clearer.

@@ +478,5 @@
>  #endif
>  
>      JS::ProfilingFrameIterator jsIter(pseudoStack->mRuntime, registerState);
> +    for (; jsCount < mozilla::ArrayLength(jsFrames) && !jsIter.done(); ++jsIter) {
> +      uint32_t extracted = jsIter.extractStack(jsFrames, jsCount, JS_NUM_FRAMES);

any reason why these don't also use ArrayLength?

@@ +515,5 @@
> +      if (pseudoFrame.isCpp())
> +        lastPseudoCppStackAddr = (uint8_t *) pseudoFrame.stackAddress();
> +
> +      // Skip any pseudo-stack JS frames which are marked isOSR
> +      if (pseudoFrame.isJs() && pseudoFrame.isOSR()) {

OSR isn't defined from the content of the patch. It would be nice to not have internal JS acronym leak into the profiler. I have no idea what OSR is and it makes it hard for the reader of this function to understand the logic. Can we make sure that the branch here is well documented for someone without JS background? Perhaps renaming the function wouldn't be bad, I'll leave that one up to you.

@@ +714,5 @@
>  
>  void TableTicker::Tick(TickSample* sample)
>  {
> +  // Don't allow for ticks to happen within other ticks.
> +  if (!mTickIsActive.compareExchange(false, true))

I'd rather that we not lose the entire sample because it's going to create dark matter in the profile. If you can't unwind the JS stack during this period that's fine, please only disable that. This way the profile wont have dark matter during this interval.

::: tools/profiler/TableTicker.h
@@ +246,5 @@
>  #if defined(XP_WIN)
>    IntelPowerGadget* mIntelPowerGadget;
>  #endif
> +
> +  mozilla::Atomic<bool> mTickIsActive;

// We can enter the tick function twice if a thread is capturing a debugging backtrace/SyncProfile synchronous for debugging and we interrupt that for profiling.
Attachment #8540786 - Flags: review?(bgirard) → review-
Comment on attachment 8544685 [details] [diff] [review]
02-add-global-fields.patch

Review of attachment 8544685 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8544685 - Flags: review?(jdemooij) → review+
Attachment #8540782 - Attachment is obsolete: true
Attachment #8547786 - Flags: review?(jdemooij)
Comment on attachment 8547786 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8547786 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, forgot to fixup uses of breakpoint() to assumeUnreachable() still.
Attachment #8547786 - Flags: review?(jdemooij)
Attachment #8547786 - Attachment is obsolete: true
Attachment #8548444 - Flags: review?(jdemooij)
Attachment #8540786 - Attachment is obsolete: true
Attachment #8548445 - Flags: review?(bgirard)
Attachment #8548445 - Flags: review?(bgirard) → review+
Comment on attachment 8548444 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8548444 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5028,5 @@
> +{
> +    AbsoluteAddress activation(GetJitContext()->runtime->addressOfProfilingActivation());
> +#ifdef DEBUG
> +    // Confirm that the activation we're setting lastProfilingFrame on is a JitActivation.
> +#endif

Nit: either add the code or remove the #ifdef, same for the other archs.
Attachment #8548444 - Flags: review?(jdemooij) → review+
All backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/6ec139c64468 for ASAN failures.
Flags: needinfo?(kvijayan)
fyi, this appeared to have caused a 3% tp5 main RSS regression on linux32/64:
http://alertmanager.allizom.org:8080/alerts.html?rev=99213cacd671&showAll=1&testIndex=0&platIndex=0

when it was backed out the regression went away, please verify this is expected prior to landing again, it will save some concerns when the bug does land again :)
(In reply to Joel Maher (:jmaher) from comment #106)
> fyi, this appeared to have caused a 3% tp5 main RSS regression on linux32/64:
> http://alertmanager.allizom.org:8080/alerts.
> html?rev=99213cacd671&showAll=1&testIndex=0&platIndex=0
> 
> when it was backed out the regression went away, please verify this is
> expected prior to landing again, it will save some concerns when the bug
> does land again :)

This was most likely caused by the leaks that led to backout.  They have been fixed.
Flags: needinfo?(kvijayan)
Depends on: 1122886
Comment on attachment 8548445 [details] [diff] [review]
07-integrate-with-profiler.patch

Review of attachment 8548445 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/ProfilingStack.h
@@ +73,3 @@
>      };
>  
>      // Keep these in sync with browser/devtools/profiler/utils/global.js

And yet these were not kept in sync with browser/devtools/profiler/utils/global.js
:)
Re: comment 110, care to file a followup or do you want to fix that here?
Flags: needinfo?(kvijayan)
Can this be closed now?
Yes(In reply to Victor Porof [:vporof][:vp] from comment #111)
> Re: comment 110, care to file a followup or do you want to fix that here?

Let's fix this here.
Flags: needinfo?(kvijayan)
(In reply to :Ehsan Akhgari [Away: 1/29-2/20] (not reading bugmail, needinfo? me!) from comment #112)
> Can this be closed now?

Not yet.. one last fixup to land.  I'm taking off the [leave open] so it gets closed after this landing.
Attachment #8551497 - Flags: review?(vporof)
Whiteboard: [leave open]
Attachment #8551497 - Flags: review?(vporof) → review+
Comment on attachment 8551497 [details] [diff] [review]
fix-profile-entry-category-mappings.patch

Review of attachment 8551497 [details] [diff] [review]:
-----------------------------------------------------------------

Actually,

::: browser/devtools/profiler/utils/global.js
@@ +46,5 @@
>  
>  // Human-readable "other" category bitmask. Older Geckos don't have all the
>  // necessary instrumentation in the sampling profiler backend for creating
>  // a categories graph, in which case we default to the "other" category.
>  const CATEGORY_OTHER = 8;

Need to update CATEGORY_OTHER and CATEGORY_JIT as well.
Attachment #8551497 - Flags: review+ → review-
Attachment #8551497 - Attachment is obsolete: true
Attachment #8551507 - Flags: review?(vporof)
Attachment #8551507 - Flags: review?(vporof) → review+
Summary: Add JitFrame support to ProflingFrameIterator → Add JitFrame support to ProfilingFrameIterator
Depends on: 1124036
Comment on attachment 8548444 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8548444 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips/Trampoline-mips.cpp
@@ +1128,5 @@
> +
> +        // Store return frame in lastProfilingFrame.
> +        // scratch2 := StackPointer + Descriptor.size*1 + JitFrameLayout::Size();
> +        masm.ma_add(scratch2, StackPointer, scratch1);
> +        masm.ma_add(scratch2, scratch2, Imm32(JitFrameLayout::Size()));

ma_add is not a public method of MacroAssemblerMIPS.
Comment on attachment 8548444 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8548444 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips/Trampoline-mips.cpp
@@ +1009,5 @@
> +
> +    Register scratch1 = t0;
> +    Register scratch2 = t1;
> +    Register scratch3 = t2;
> +    Register scratch3 = t3;

Can you make the follow-up fixes to keep MIPS compiling?
Comment on attachment 8548444 [details] [diff] [review]
03-profiler-exit-and-enter-frame-codegen.patch

Review of attachment 8548444 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/Trampoline-x86.cpp
@@ +894,5 @@
> +    // was entered by some caller jit-frame further down the stack).
> +    //
> +    // So this jitcode is responsible for "walking up" the jit stack, finding
> +    // the previous Ion or Baseline JS frame, and storing its address and the
> +    // return address into the appropriate fields on the current jitActivation.

I do not understand why this code is written done in assembly, where most of the logic already exists in C++, is there any advantage to have JitCode for each platform instead of having a C++ function which is called from each platform and perform the same job using the JitFrameIterator?
Attachment #8553125 - Flags: review?(nicolas.b.pierron)
Attachment #8553125 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #123)
> I do not understand why this code is written done in assembly, where most of
> the logic already exists in C++, is there any advantage to have JitCode for
> each platform instead of having a C++ function which is called from each
> platform and perform the same job using the JitFrameIterator?

Speed.  This instrumentation is executed every time we return from a jit frame.  Calling into C++ at every one of these points is heavyweight (e.g. on x86 have to pass in the jitframe ptr to walk back from, which means a stack push for the arg, a stack push for the call return addr, then whatever spills the C++ code does for non-volatile regs, and then the return back).

Compared to a simple tail-recursive jump, this is far cheaper.  It's important because profiling instrumentation needs to disturb the execution as little as possible.  Hence the use of all the arch-specific 3-operand instructions where possible, etc.

In this scenario, in the common high performance case (returning from a jit frame to an Ion caller), the exit instrumentation adds the following overhead:

jump (absolute), load, load, move, shift, bit-and, branch, load, store, lea, store.. or:

1 well-predicted absolute jump
1 well-predicted conditional branch
3 loads, 2 stores, 3 ALU ops.

which is pretty tight.  This would be hard to accomplish with a call into C++.
Attachment #8553125 - Flags: checkin?(kvijayan)
https://hg.mozilla.org/mozilla-central/rev/11540adce3b6
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1125304
Depends on: 1124070
Blocks: 1129510
No longer blocks: 1129510
Blocks: 1130910
Blocks: 1131295
Depends on: 1132265
Depends on: 1134515
You need to log in before you can comment on or make changes to this bug.