Closed Bug 846363 Opened 11 years ago Closed 11 years ago

BaselineCompiler: Fix SPS profiler integration

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: djvj)

References

Details

(Whiteboard: [leave open])

Attachments

(2 files, 2 obsolete files)

Tinderbox builds are very stable, but as soon as I install/enable the profiler add-on the browser crashes after a few seconds.

I hope we can just reuse the profiler code from Ion.
Assignee: general → kvijayan
Started on this earlier today, but I want to document some thoughts while I'm at it.

There are a few options for the approach here, and just for completeness, here they are:

1. Invalidate and recompile when SPS is turned on.  This is the Ion route.
  Pros:
    JITcode size stays small when not profiling.
    No performance impact when profiling not turned on.
  Cons:
    We'll need to implement OSI for Baseline.
    OSI goes against Baseline's intention of "never needing to invalidate".

2. Emit profiling code in all JIT scripts, guard with toggle-able jumps.  The pros and cons here are basically the inverse of the pros/cons for (1).

I'd like to go with option (2), because it's just simpler, and I started down that path today, but I'm having second thoughts.

The perf issue doesn't bother me - Baseline is not there to be a speed demon, just be a passable fallback when Ion fails.

The size issue does bother me.  If we're going for full SPS support, we're talking about memory bloat at the following locations:

a) Toggled code in the function prologue to push an SPS entry.  This is going to be about 32 bytes or so in the best case.

b) Toggled code in the function epilogue to pop the SPS entry.  This is going to be another 32 bytes or so.

c) For every call entering and leaving the script, two pieces of toggled code before and after the call for emitting the leave/reenter SPS updates.  The code for this will be shared between all scripts, but it will require an extra 4 bytes in every site-specific stub that does this (to store the PC to update SPS's tables).

d) Benoit has indicated that he'd like to see line-number support enabled as well.  This would mean an additional 32-bytes or so for every logical line in the source script.

The 32-byte rough estimate is based on the most space efficient way I can come up of doing this: having the toggled call sites not inline the SPS updates, but instead call out to an IC.  All IC entries share a single fallback stub (the stubcode can call the fallback stub function passing the return-addr into the script, so that the stub can look up the actual IC entry it was entered from).

I don't think the amount of extra memory is _that_ high, but it's nontrivial, and it'll be unused in 99.9% of use cases.

I'm going to step back and run some measurements tomorrow on how much code we actually compile with baseline, and how much extra memory we can expect to use if we go this route.
Attached patch Integrate SPS Profiler (obsolete) — Splinter Review
Notes about implementation:

1. Pushing of SPS frame is implemented as an non-OP IC at the beginning of the jitcode, guarded by a toggled jump.  After pushing an SPS frame, the Baseline frame is marked with a flag bit that indicates that an SPS frame has been pushed.

2. Popping of SPS frame is not guarded by a toggled jump, and is inline (the jitcode to do it is smaller than the size of the jitcode required to jump into an IC, so might as well inline it).  It's guarded by a runtime check of the Baseline Frame flag bit set when the SPS frame was pushed.

This is because SPS may be turned on/off after entering a baseline frame, but before returning from it.  In any case, popping an SPS entry must only be done if and only if it was pushed on entry.  There's no point in having a static, toggled jump around this logic - it HAS to be runtime checked.

3. For any optimized stub that may leave/re-enter the frame via a call to another function, a runtime check is performed to determine whether to update the SPS profiler entry before and after the call.  To allow for this, the optimized stubs are given a new pcOffset field, as well as changed to use their 'extra_' field as a runtime indicator of whether profiling is enabled.

The runtime check on performing the update involves checking both the 'extra_' field as well as checking the profiler-pushed flag on the BaselineFrame.

4. IC fallback handler functions that may leave/enter the current script use an RAII-helper class "AutoSPSProfilerUpdate" to make profiler updates before and after the call.
Attachment #726711 - Flags: review?(jdemooij)
Couple of spot fixes made after I posted the patch for review:

AutoSPSProfilerUpdate was not taking into account the BaselineFrame's sps flag when doing updates.  That's been fixed.

After updating to tip, script iteration changed to use zones instead of compartments.  That has also been fixed.
Comment on attachment 726711 [details] [diff] [review]
Integrate SPS Profiler

Canceling this review.  My try run showed some oranges.  Re-running the try with debug builds and tracking them down.  There will probably be more fixups to the patch, so might as well ask for review after those are done.
Attachment #726711 - Flags: review?(jdemooij)
Attached patch SPS Integration, try 2. (obsolete) — Splinter Review
Turned out codegen changes around x86 call stubs were clobbering ArgumentsRectifierReg when we had arguments underflow, leading to oranges.

Things look relatively green now.  See earlier patch posting for notes.
Attachment #726711 - Attachment is obsolete: true
Attachment #726999 - Flags: review?(jdemooij)
Comment on attachment 726999 [details] [diff] [review]
SPS Integration, try 2.

Review of attachment 726999 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! Resetting review flag since there will be some changes, as discussed on IRC.

::: js/src/ion/BaselineCompiler.h
@@ +192,5 @@
>      }
>      bool emitNonOpIC(ICStub *stub) {
>          return emitIC(stub, false);
>      }
> +    

Micro-nit: trailing whitespace.

::: js/src/ion/BaselineFrame.h
@@ +58,5 @@
>          // Frame has hookData_ set.
> +        HAS_HOOK_DATA    = 1 << 7,
> +
> +        // Frame has profiler entry pushed.
> +        PROFILER_PUSHED  = 1 << 8

Nit: maybe HAS_PUSHED_SPS_FRAME to match StackFrame?

@@ +299,5 @@
>          flags_ |= HAS_HOOK_DATA;
>      }
>  
> +    bool profilerPushed() const {
> +        return flags_ | PROFILER_PUSHED;

flags_ & PROFILER_PUSHED

::: js/src/ion/BaselineIC.cpp
@@ +960,5 @@
> +    // the profile string.
> +    JS_ASSERT_IF(icEntry->firstStub() != stub,
> +                 icEntry->firstStub()->isProfiler_PushFunction() &&
> +                 icEntry->firstStub()->next() == stub);
> +    for (ICStubIterator iter = stub->beginChain(); !iter.atEnd(); iter++) {

Can we use stub->unlinkStubsWithKind(Profiler_PushFunction) here?

::: js/src/ion/BaselineJIT.h
@@ +108,5 @@
>      // Native code offset right before the scope chain is initialized.
>      uint32_t prologueOffset_;
>  
> +    // The offsets for the toggledJump instructions for SPS update ICs.
> +    mozilla::DebugOnly<bool> spsOn_;

An empty class will have sizeof >= 1, so this will increase sizeof(BaselineScript) in non-debug builds too. To avoid this you can use the flags_ field, or #ifdef DEBUG.
Attachment #726999 - Flags: review?(jdemooij)
Comments addressed, and changes made as per our IRC discussion.  Also, contrary to my earlier belief, we can eliminate the IC stub traversal altogether when we enable profiling.  In all stubs we just directly check the spsProfiler.enabled_ field (after checking the frame's flag, of course).

Also, as we discussed, the sps entry's initial idx field is set to 0 (not -1) on push.  And we only update the idx when we call out to another function.. we don't reset it on return.

Additionally, I realized that the old patch was not correctly setting the SPS frame flags when leaving Ion and entering baseline via bailout, when the profiler was enabled.  This new patch addresses that issue in much the same way the bailout path from Ion to the interpreter addresses it.

This should mimic interpreter behaviour almost exactly.  The only difference is that the optimized stubs for GetProp_CallScripted and SetProp_CallScripted also update the PC.  I figure it does no harm and since it's already there (and will probably be required once we get fuller PC tracking support in the profiler), might as well leave it in.

Try run on linux 32 and 64-bit (opt and debug) seems green: https://tbpl.mozilla.org/?tree=Try&rev=6d27533f6477

Pending try on remaining archs:
https://tbpl.mozilla.org/?tree=Try&rev=9e2e4e7f63cf
Attachment #726999 - Attachment is obsolete: true
Attachment #728020 - Flags: review?(jdemooij)
Comment on attachment 728020 [details] [diff] [review]
SPS Profiler Integration try 3

Review of attachment 728020 [details] [diff] [review]:
-----------------------------------------------------------------

Great work, r=me with comments below addressed. Can you make sure this doesn't crash on some common websites with the profiling add-on installed, and shows the correct call stack for an infinite loop with Ion disabled?

Does the patch have any effect on SS? Probably not since calls are heavy and it's a single branch (that's never taken most of the time), but would be good to measure.

::: js/src/ion/BaselineBailouts.cpp
@@ +649,5 @@
>      if (!iter.moreFrames()) {
>  
> +        // If bailing out due to argument check, make sure that the HAS_PUSHED_SPS_FRAME
> +        // flag is not set on the final frame.
> +        if (iter.bailoutKind() == Bailout_ArgumentCheck) {

We can also resume into the prologue if pcOffset == 0 && !resumeAfter, so should this be "scopeChain == NULL" to match the condition below, where we call baselineScript->prologueEntryAddr()?

::: js/src/ion/BaselineFrame.h
@@ +299,5 @@
>          flags_ |= HAS_HOOK_DATA;
>      }
>  
> +    bool hasPushedSPSFrame() const {
> +        return flags_ | HAS_PUSHED_SPS_FRAME;

Use & instead of |.

::: js/src/ion/BaselineIC.cpp
@@ +958,5 @@
> +    Register scratch2 = R1.scratchReg();
> +
> +    // Check if profiling is enabled.
> +    uint32_t *enabledAddr = cx->runtime->spsProfiler.addressOfEnabled();
> +    masm.branch32(Assembler::Equal, AbsoluteAddress(enabledAddr), Imm32(0), &failure);

When the profiler is disabled, won't that toggle the jump in the main script so that we never end up here? If it is somehow possible to get here, it will fail the JS_ASSERT(profiler->enabled()); in DoProfilerFallback. Maybe we can make these two lines "#ifdef DEBUG"?

::: js/src/ion/BaselineJIT.cpp
@@ +43,4 @@
>      flags_(0)
> +{
> +#ifdef DEBUG
> +    spsOn_ = false;

Nit: use spsOn_(false) in the initializer list.
Attachment #728020 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #8)
> Great work, r=me with comments below addressed. Can you make sure this
> doesn't crash on some common websites with the profiling add-on installed,
> and shows the correct call stack for an infinite loop with Ion disabled?

I didn't test with an infinite loop, but definitely one that should have overflowed the profiler buffer, and it seemed to display the correct stack for the first hundred or so frames.

Also did some browsing on the top sites on alexa top 500 and the browser held up fine.  With the SPS profiler add-on full-UI.

> Does the patch have any effect on SS? Probably not since calls are heavy and
> it's a single branch (that's never taken most of the time), but would be
> good to measure.

Tested, no effect.

Other comments addressed and pushed:
https://hg.mozilla.org/projects/ionmonkey/rev/0eaefffce290

Leaving open for some final
Whiteboard: [leave open]
Found a minor issue that was left unaddressed by the last patch:

If bailing out from Ion to baseline because of an argument check failure, then the Ion code would not have hit the MFunctionBoundary before it bails out.  However, the bailout procedure resumes execution within the baseline jit in the prologue of the inlined function.

Since Ion doesn't automatically set the SPS frame pcIdx to 0 on entry to a function (it only sets it before entering a callee, and resets it after leaving).. baseline finds itself in the inner function, pushing a new SPS entry, when the parent entry's pcIdx is invalid.

This can be fixed by detecting these situations (i.e. if (bailoutKind == ArgumentCheck && isInlinedCallee && spsEnabled), and updating the top SPS entry with (caller, callerPC).


Additionally, sometimes we bail out into baseline's prologue, but NOT because of an argument check failure.  In these cases, Ion would have pushed an entry for the innermost frame, but not set the pcIdx on it yet.  In this case, we have to pop the spsEntry at the top of the stack and let the baseline jit's prologue re-push it.

Basically the logic here is:

if (bailingOutIntoBaselinePrologue && bailoutKind != ArgumentCheck), then spsProfiler.pop()
Attachment #728370 - Flags: review?(jdemooij)
Comment on attachment 728370 [details] [diff] [review]
Fix for issues described in comment 10

Review of attachment 728370 [details] [diff] [review]:
-----------------------------------------------------------------

Good find.
Attachment #728370 - Flags: review?(jdemooij) → review+
Pushed:
https://hg.mozilla.org/projects/ionmonkey/rev/f035cd0ee56e

Watching tbpl before closing.
How is tbpl looking?
Seems to be all good :)  Thanks for the poke.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: