Baldr: always enable async stack-walking and saving TLS in the frame

RESOLVED FIXED in Firefox 55

Status

()

defect
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: luke, Assigned: luke)

Tracking

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(13 attachments, 12 obsolete attachments)

98.17 KB, patch
luke
: review+
Details | Diff | Splinter Review
18.75 KB, patch
luke
: review+
Details | Diff | Splinter Review
38.99 KB, patch
luke
: review+
Details | Diff | Splinter Review
22.01 KB, patch
luke
: review+
Details | Diff | Splinter Review
6.07 KB, patch
luke
: review+
Details | Diff | Splinter Review
8.41 KB, patch
luke
: review+
Details | Diff | Splinter Review
28.72 KB, patch
lth
: review+
Details | Diff | Splinter Review
10.37 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
15.79 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
3.00 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
49.35 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
4.18 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
254.60 KB, patch
Details | Diff | Splinter Review
This set of patches makes the changes to:
 - always maintain a virtual frame pointer (instead of only in profiling mode)
 - always save the TLS* in the frame (instead of only in baseline+debugging mode)

This allows:
 - removing a bunch of complexity that supports the profiling-mode toggling
 - a pretty major reduction in code and metadata size from removing the above
 - marking live instances during GC instead of the current hack which is to keep all instances alive in a compartment when there is *any* live wasm in a compartment
 - always being able to asynchronously walk the stack which I think will be necessary for wasm threads

The downside is the overhead of saving fp/tls in the prologue and popping fp in the epilogue.  With the current patch queue, which makes the change, but doesn't optimize the prologue at all, the overhead 5.9% slower on bullet and 4.3% slower on box2d, and mostly negligible in the other embenchen workloads.  However, there are a few additional patches I'd like to write (probably before landing) to remove 1 indirection make the prologue/epilogue shorter.
Posted patch maintain-fp (obsolete) — Splinter Review
This patch rips out the "normal" function entry by making the "profiling" entry the only entry.  This removes a bunch of metadata and patching.  Note: the new CallFarJump in Assembler-shared.h is just the new name for CallThunk which can now be MacroAssembler-scoped instead of needing to live in Metadata.
Attachment #8831213 - Flags: review?(bbouvier)
Posted patch change-iter-fp (obsolete) — Splinter Review
This patch changes the implementation of FrameIter so that FrameIter::fp_ always refers to the "current" frame, not the child frame of the current frame.  This both simplifies FrameIter and avoids what I think is a pre-existing bug exposed by the next patch wherein FrameIter::fp_ becomes invalid even though the frame logically pointed at by FrameIter stays valid.
Attachment #8831215 - Flags: review?(ydelendik)
Posted patch save-tls-reg (obsolete) — Splinter Review
This patch saves the TLS register in the wasm::Frame which is fairly simple.
Attachment #8831224 - Flags: review?(bbouvier)
Comment on attachment 8831215 [details] [diff] [review]
change-iter-fp

Review of attachment 8831215 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8831215 - Flags: review?(ydelendik) → review+
Posted patch kill-baseline-tls (obsolete) — Splinter Review
This patch removes TLS from the baseline which also interacts with wasm debug frames.  The patch also moves the ever-shrinking wasm::DebugFrame into WasmTypes.h, right under wasm::Frame and adds comments.
Attachment #8831241 - Flags: review?(ydelendik)
Posted patch kill-ion-tls (obsolete) — Splinter Review
Ion compilation can now be much simplified.
Attachment #8831243 - Flags: review?(bbouvier)
Posted patch kill-stub-tls (obsolete) — Splinter Review
Stubs can also avoid having to preserve the TLS reg themselves since it's already present in the wasm::Frame.
Attachment #8831245 - Flags: review?(bbouvier)
Posted patch rm-movwt-limit (obsolete) — Splinter Review
The previous patches remove SymbolicAddress from the prologue which in turn removes the only reason why we disabled wasm when movw/t weren't present.  This exposed a bug in ARM regalloc for the callout case of atomics.  The reason I'm using fixed allocations is because of some weird internal backtracking allocator constraint on at-start + fixed + calls:
  http://searchfox.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#670
Attachment #8831248 - Flags: review?(bbouvier)
Posted patch rm-movwt-limit (obsolete) — Splinter Review
It looks like another bug just recently crept in that was hidden by the movw/t requirement: one has to flush() constants before attempting to patch them.
Attachment #8831248 - Attachment is obsolete: true
Attachment #8831248 - Flags: review?(bbouvier)
Attachment #8831296 - Flags: review?(bbouvier)
Comment on attachment 8831241 [details] [diff] [review]
kill-baseline-tls

Review of attachment 8831241 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with the changes below.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +7554,1 @@
>      if (!localInfo_.resize(locals_.length() + 1))

`+ 1` was for tlsSlot_ -- we can remove it now

::: js/src/wasm/WasmTypes.h
@@ +1523,5 @@
> +            bool    prevUpToDate_ : 1;
> +            bool    hasCachedSavedFrame_ : 1;
> +        };
> +        void*   reserved1_;
> +    };

After removal of tlsData_, I think this will not be 8-byte aligned on 32-bit platforms. Can you wrap funcIndex_ above the same way as for flags?

    union
    {
          uint32_t funcIndex_;
          void* reserved0_;
    };
Attachment #8831241 - Flags: review?(ydelendik) → review+
(In reply to Luke Wagner [:luke] from comment #3)
> Created attachment 8831224 [details] [diff] [review]
> save-tls-reg
> 
> This patch saves the TLS register in the wasm::Frame which is fairly simple.

We're going to need something more general than loadWasmTlsRegFromFrame() because the baseline compiler will want to load this value into a scratch register, see bug 1335068.  Perhaps it can take an optional destination register argument.
Comment on attachment 8831213 [details] [diff] [review]
maintain-fp

Review of attachment 8831213 [details] [diff] [review]:
-----------------------------------------------------------------

Cool work, thanks!

::: js/src/wasm/WasmCode.cpp
@@ +800,5 @@
> +const char*
> +Code::profilingLabel(uint32_t funcIndex) const
> +{
> +    if (funcIndex >= profilingLabels_.length() || !profilingLabels_[funcIndex])
> +        return "";

Should we be consistent with above code and return "?" here too?

::: js/src/wasm/WasmCompartment.cpp
@@ +67,4 @@
>  }
>  
>  bool
>  Compartment::registerInstance(JSContext* cx, HandleWasmInstanceObject instanceObj)

The comment there might need an update, since registering isn't really used for knowing the profiling state anymore: http://searchfox.org/mozilla-central/source/js/src/wasm/WasmModule.cpp#933-936

::: js/src/wasm/WasmFrameIterator.cpp
@@ +192,5 @@
>  DebugFrame*
>  FrameIterator::debugFrame() const
>  {
>      MOZ_ASSERT(!done() && debugEnabled());
> +    return (DebugFrame*)((uint8_t*)CallerFPFromFP(fp_) - DebugFrame::offsetOfFrame());

Could we keep the static_cast?

@@ +549,4 @@
>      uint8_t* fp = activation.fp();
> +    uint8_t* pc = (uint8_t*)state.pc;
> +    void** sp = (void**)state.sp;
> +    const CodeRange* codeRange = code_->lookupRange(pc);

nit: pre-existing, but how about a \n between this line and the above one?
Attachment #8831213 - Flags: review?(bbouvier) → review+
Comment on attachment 8831224 [details] [diff] [review]
save-tls-reg

Review of attachment 8831224 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for the patch.

Yury, can you look at the change in WasmDebugFrame.h, please? I have no clue why there's reserved1 in the first place, nor what the assertion checks.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +470,5 @@
>          stackForCall += ComputeByteAlignment(stackForCall + sizeof(intptr_t),
>                                               ABIStackAlignment);
>      } else {
> +        uint32_t alignmentAtPrologue = callFromWasm ? sizeof(wasm::Frame) : 0;
> +        stackForCall += ComputeByteAlignment(stackForCall + framePushed() + alignmentAtPrologue,

Good catch!

::: js/src/wasm/WasmFrameIterator.cpp
@@ +585,5 @@
>              AssertMatchesCallSite(*activation_, callerPC_, callerFP_);
> +        } else if (offsetFromEntry < PushedTLS) {
> +            // The return address and TLS have been pushed on the stack but fp
> +            // still points to the caller's fp.
> +            callerPC_ = sp[1];

Having these constants scattered in the code doesn't help maintainability. And the Frame itself is not even defined in this file, but in WasmTypes.h. That being said, moving these constants (0 and 1) to WasmTypes (as well as the (number-of-ptrs - 1) that's used in both GenerateCallable{Pro,Epi}logue) doesn't sound any better...

@@ +595,5 @@
>              callerPC_ = ReturnAddressFromFP(sp);
> +            callerFP_ = fp;
> +            AssertMatchesCallSite(*activation_, callerPC_, callerFP_);
> +        } else if (offsetInModule == codeRange->ret()) {
> +            // fp has been restored to teh caller's frame and both the TLS and

nit: teh => the
Attachment #8831224 - Flags: review?(ydelendik)
Attachment #8831224 - Flags: review?(bbouvier)
Attachment #8831224 - Flags: review+
Comment on attachment 8831243 [details] [diff] [review]
kill-ion-tls

Review of attachment 8831243 [details] [diff] [review]:
-----------------------------------------------------------------

Sweet.

::: js/src/jit/shared/CodeGenerator-shared.cpp
@@ +1505,4 @@
>      switch (callee.which()) {
>        case wasm::CalleeDesc::Func:
>          masm.call(desc, callee.funcIndex());
> +        reloadRegs = false;

It's weird to reset reloadRegs to its initialization value. I like that it is in each case, plus a static analyzer would figure out that one branch doesn't set reloadRegs. Maybe we can remove the first initialization of reloadRegs?
Attachment #8831243 - Flags: review?(bbouvier) → review+
Comment on attachment 8831245 [details] [diff] [review]
kill-stub-tls

Review of attachment 8831245 [details] [diff] [review]:
-----------------------------------------------------------------

Yep.
Attachment #8831245 - Flags: review?(bbouvier) → review+
Comment on attachment 8831296 [details] [diff] [review]
rm-movwt-limit

Review of attachment 8831296 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/jit/arm/Lowering-arm.cpp
@@ +882,5 @@
> +                                                      useFixedAtStart(ins->oldValue(), IntArgReg3),
> +                                                      useFixedAtStart(ins->newValue(), CallTempReg0),
> +                                                      useFixedAtStart(ins->tls(), WasmTlsReg),
> +                                                      tempFixed(IntArgReg0),
> +                                                      tempFixed(IntArgReg1));

Why were these changes necessary? I assume it hadn't been tested and was due to new constraints in the backtracking allocator?
Attachment #8831296 - Flags: review?(bbouvier) → review+
Comment on attachment 8831224 [details] [diff] [review]
save-tls-reg

Review of attachment 8831224 [details] [diff] [review]:
-----------------------------------------------------------------

Provided feedback, r? depends on answers for the questions below.

::: js/src/wasm/WasmDebugFrame.h
@@ -41,5 @@
>      };
>  
>      // The fields below are initialized by the baseline compiler.
>      uint32_t    funcIndex_;
> -    uint32_t    reserved0_;

The reserved0_ worked together with reserved1_. Windows compiler is trying to re-align data structures on 64bit, so it was adding artificial padding after funcIndex_ because the union below had `void* reserved1_`. Is reserved1_ uint32_t now (which required change at WasmBaselineCompiler to initialize it)?

@@ +100,5 @@
>  static_assert(DebugFrame::offsetOfResults() == 0, "results shall be at offset 0");
>  static_assert(DebugFrame::offsetOfTlsData() + sizeof(TlsData*) == DebugFrame::offsetOfFrame(),
>                "TLS pointer must be a field just before the wasm frame");
> +static_assert(sizeof(DebugFrame) % 8 == 0,
> +              "DebugFrame is 8-bytes aligned for AbstractFramePtr");

AFIAK sizeof(Frame) shall be %8 == 0. DebugFrame::offsetOfFrame() was just ensuring that Windows compiler did not insert artificial padding.

::: js/src/wasm/WasmTypes.h
@@ -1479,5 @@
>      // address is pushed by the first instruction of the prologue).
>      void* returnAddress;
>  };
>  
> -static_assert(sizeof(Frame) == 2 * sizeof(void*), "?!");

Will AbstractFramePtr fail to store the lower type/kind bits if sizeof(Frame) % 8 == 0 on 32bit systems?
Blocks: 1338217
Any ETA on this?  It blocks sharing of code which in turn blocks tiering.
Flags: needinfo?(luke)
I started getting back to this last Friday; should be finishing this week, perhaps in the next day or two.
Flags: needinfo?(luke)
Oops, missed the reviews and questions above earlier before falling into the black hole.

(In reply to Benjamin Bouvier [:bbouvier] from comment #16)
> Why were these changes necessary? I assume it hadn't been tested and was due
> to new constraints in the backtracking allocator?

The backtracking constraints are old; I think these were essentially never being compiled for one reason or another.

(In reply to Yury Delendik (:yury) from comment #17)
Good points, the rebased patch has better static asserts and uses an #if JS_BITS_PER_WORD == 32 to conditionally insert padding which is probably a bit more explicit.
Posted patch maintain-fpSplinter Review
Rebased with comments addressed; carrying forward r+
Attachment #8847425 - Flags: review+
Attachment #8831213 - Attachment is obsolete: true
Comment on attachment 8831215 [details] [diff] [review]
change-iter-fp

(Already landed in bug 1342497.)
Attachment #8831215 - Attachment is obsolete: true
Posted patch save-tls-regSplinter Review
Rebased with comments addressed; carrying forward r+
Attachment #8831224 - Attachment is obsolete: true
Attachment #8831224 - Flags: review?(ydelendik)
Attachment #8847426 - Flags: review+
Rebased with comments addressed; carrying forward r+
Attachment #8831241 - Attachment is obsolete: true
Attachment #8847427 - Flags: review+
Posted patch kill-ion-tlsSplinter Review
Rebased with comments addressed; carrying forward r+
Attachment #8831243 - Attachment is obsolete: true
Attachment #8847428 - Flags: review+
Posted patch kill-stub-tlsSplinter Review
Rebased with comments addressed; carrying forward r+
Attachment #8831245 - Attachment is obsolete: true
Attachment #8847429 - Flags: review+
Rebased with comments addressed; carrying forward r+
Attachment #8831296 - Attachment is obsolete: true
Attachment #8847430 - Flags: review+
This (new) patch removes SymbolicAddress::ContextPtr and its remaining use in the interrupt stub.  This requires shuffling around the interrupt stub to use C++'s TLS to grab the resumePC (instead of doing it from stub code).  In the process, found some simplifications, esp for ARM.
Attachment #8847431 - Flags: review?(lhansen)
Comment on attachment 8847431 [details] [diff] [review]
rm-symbolic-cx

Review of attachment 8847431 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #8847431 - Flags: review?(lhansen) → review+
I'd hoped I could optimize the prologues to both have less indirection and not be necessary for leaf functions, but neither seem to be the case:
 - with the multi-threaded JSContext plans, there's no good singleton memory location for the per-thread virtual fp; ZoneGroups can (eventually) move between JSContexts and multiple ZoneGroups can be active on a single JSContext, so without introducing a complex stack-walking scheme, you need 3 indirections (tls->&ZoneGroup::ownerContext->cx->fp)
 - leaf functions still have non-constant sp-to-fp offsets so we need a frame pointer to unwind at an arbitrary pc

I'm thinking now that perhaps the overhead of simply maintaining a physical fp will be less than these expensive prologues/epilogues, especially if we weight ARM/x64 more heavily than x86.  I'm going to try that out and measure the difference.
Posted patch rm-gc-hack (obsolete) — Splinter Review
Oops, I forgot to post this patch earlier: it removes the GC hack which keeps everything alive if anything is alive.
Attachment #8848574 - Flags: review?(bbouvier)
Initial timing experiments with embenchen show x64 performance is essentially unchanged when reserving the FrameRegister so for x64, it would definitely be a win to use a physical fp instead of the virtual fp.

For x86, the overhead of claiming the register so far is comparable or less for most workloads (box2d, bullet); however in one case (fannkuch), it was 9.8% worse.  Amusingly, reserving fp is a 5% *speedup* on zlib on x86... there's something super-twitchy about regalloc on zlib which we've seen before.

It'd be nice to get some numbers for ARM to check that it's closer to x64 than x86, but it's seeming to me like a good idea to just reserve fp.
Just tested on my Android LG G3 d855 device, and I see no sensible differences in box2d, bullet, fasta, skinning, fannkuch and zlib. The biggest one is fannkuch with a 3% slowdown, but that could just be noise.
One bug I hit trying to reserve FramePointer is that some code buried inside CodeGenerator-x86.cpp was doing it's own custom register logic that ended up clobbering FrameRegister.  The reason for this (instead of just using a temp() register) is that same bizarre backtracking assertion I hit in the previous patch in the rm-movwt-limit patch.  So the fix in this patch does the same thing: gives everything a fixed register (which shouldn't matter; we're on a super-slow call path that is spilling everything anyways).
Attachment #8848668 - Flags: review?(bbouvier)
Posted patch physical-fp (WIP) (obsolete) — Splinter Review
Here's a mostly-working patch; passes all tests for x86/x64, just need to fiddle with ARM which doesn't have a FramePointer atm.

So with this patch, the prologue/epilogue are now super-short: push;push;mov and  pop;pop;ret.  Overall (with another patch yet to be posted), we're saving about 10% total code size having shrunk the prologue/epilogue considerably.
Comment on attachment 8848668 [details] [diff] [review]
change-x86-codegen

Review of attachment 8848668 [details] [diff] [review]:
-----------------------------------------------------------------

Nice find.
Attachment #8848668 - Flags: review?(bbouvier) → review+
Posted patch rm-gc-hack (obsolete) — Splinter Review
Updated to include fix for corner case in WasmHandleThrow() where the unwinding that happens during throw causes live frames not to be marked.
Attachment #8848574 - Attachment is obsolete: true
Attachment #8848574 - Flags: review?(bbouvier)
Attachment #8849395 - Flags: review?(bbouvier)
Posted patch rm-gc-hack (obsolete) — Splinter Review
(Now with test case!)
Attachment #8849395 - Attachment is obsolete: true
Attachment #8849395 - Flags: review?(bbouvier)
Attachment #8849397 - Flags: review?(bbouvier)
Posted patch rm-gc-hackSplinter Review
Oops, forgot to actually run all the tests which found thinko.
Attachment #8849397 - Attachment is obsolete: true
Attachment #8849397 - Flags: review?(bbouvier)
Attachment #8849398 - Flags: review?(bbouvier)
This patch changes FramePointer to use the standard ARM fp (r11).  This allows, e.g., the gecko profiler code to to be changed minimally since it already harvests r11 as fp.  HeapReg was already using r11 so it's moved to r10 (which used to be GlobalReg before bug 1335068).
Attachment #8849425 - Flags: review?(bbouvier)
Posted patch physical-fpSplinter Review
Here's the finished patch which preserves the fp reg and uses that for unwinding.  Mostly pretty simple; a few notes:
 - The TLS reg no longer needs to be loaded before the epilogue: the epilogue already needs to perform a pop operation into a register, so it does pop(WasmTlsReg).
 - WasmActivation::fp is renamed to exitFP and only set by exit prologues/epilogues.  It's purpose is to capture the fp register before it is lost (for stack unwinding purposes) by a call to C++.
 - The 'native call' frames disappear in asm.js profiling since now, if you sample outside wasm code and WasmActivation::exitFP wasn't set by an exit, you can't unwind.  The stub thunks in bug 1340219 could be used to save an exitFP here.
Attachment #8848670 - Attachment is obsolete: true
Attachment #8849427 - Flags: review?(bbouvier)
Posted patch shrink-codeSplinter Review
Here's the last patch, which is an optimization.  Turns out the save-tls-reg patch was a huge codesize regression (9mb on EpicZenGarden) because it added 4-6 bytes to the out-of-line trap path of every load/store of which there are millions.  This patch pushes the tls reload into the FarJumpIsland (of which there is ~1 for every kind of trap).

Overall, the patch queue is now an 11.1% size reduction on EpicZenGarden (from 66.7mb to 59.3mb), mostly from reducing all the prologue/epilogue gunk.
Attachment #8849440 - Flags: review?(bbouvier)
Comment on attachment 8849398 [details] [diff] [review]
rm-gc-hack

Review of attachment 8849398 [details] [diff] [review]:
-----------------------------------------------------------------

Interesting edge case (in WasmTypes) and test case. Thanks!

::: js/src/wasm/WasmTypes.cpp
@@ +191,5 @@
> +
> +    // Live wasm code on the stack is kept alive (in wasm::TraceActivations) by
> +    // marking the instance of every wasm::Frame found by FrameIterator.
> +    // However, as explained above, we're popping frames while iterating which
> +    // means that a GC during this loop could collect the code of frames whose 

nit: trailing whitespace
Attachment #8849398 - Flags: review?(bbouvier) → review+
Comment on attachment 8849425 [details] [diff] [review]
change-arm-frame-ptr

Review of attachment 8849425 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.

::: js/src/jit/RegisterAllocator.h
@@ +289,5 @@
>              allRegisters_.take(AnyRegister(HeapLenReg));
>  #endif
>          } else {
> +#if defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM64)
> +            if (mir->instrumentedProfiling())

I kinda preferred the previous check (and think it could be generalized to the above, for HeapReg/HeapLenReg), since it makes implementing a new architecture easier, but we're getting away from this with cretonne anyways.
Attachment #8849425 - Flags: review?(bbouvier) → review+
Comment on attachment 8849427 [details] [diff] [review]
physical-fp

Review of attachment 8849427 [details] [diff] [review]:
-----------------------------------------------------------------

This makes sense and the code is much simpler and closer to what native would do. Thanks!

::: js/src/wasm/WasmFrameIterator.cpp
@@ +358,5 @@
>  
>      if (reason != ExitReason::None) {
> +        Register scratch = ABINonArgReturnReg0;
> +        masm.loadWasmActivationFromTls(scratch);
> +        masm.storePtr(ImmWord(0), Address(scratch, WasmActivation::offsetOfExitFP()));

Don't we need to reset the exit reason here too? If so, bonus points for a test case!

::: js/src/wasm/WasmStubs.cpp
@@ +721,3 @@
>      masm.loadWasmTlsRegFromFrame();
> +    masm.moveStackPtrTo(FramePointer);
> +    masm.addPtr(Imm32(masm.framePushed()), FramePointer);

Wouldn't it be simpler if we just pushed FP before the call, so we could pop it here? (we're still before the error handling bits where paths diverge)

@@ +1126,5 @@
>  
>  #ifdef DEBUG
>      // We are about to pop all frames in this WasmActivation. Checking if fp is
>      // set to null to maintain the invariant that fp is either null or pointing
>      // to a valid frame.

Might be worth updating the comment since we now have two different FPs (register vs exitfp which is referred to, here). There's another comment above (before the call to HandleThrow) referring to exitfp where it reads "fp" at the moment.

@@ +1129,5 @@
>      // set to null to maintain the invariant that fp is either null or pointing
>      // to a valid frame.
>      Label ok;
> +    Address exitFP(act, WasmActivation::offsetOfExitFP());
> +    masm.branchPtr(Assembler::Equal, exitFP, ImmWord(0), &ok);

Could we generalize the debug checks and explicit the invariants somewhere (in a function refactoring these checks happening in several places, maybe)?

If I've understood correctly:
- exitFP is only set in exits, null otherwise
- exitReason may be non-none on exits, none otherwise
Attachment #8849427 - Flags: review?(bbouvier) → review+
Comment on attachment 8849440 [details] [diff] [review]
shrink-code

Review of attachment 8849440 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks.

::: js/src/wasm/WasmGenerator.cpp
@@ -365,5 @@
>              masm_.patchCall(callerOffset, p->value());
>              break;
>            }
>            case CallSiteDesc::TrapExit: {
> -            if (maybeTrapExits) {

IIUC, you're removing this code block because the TLS register wouldn't be reloaded if |calleeOffset - callerOffset| < JumpRange(). This doesn't matter much because we're on a trap path, so it can be slow anyway. Is that right? If so, the maybeTrapExits argument might now be dead and could be removed from this function's signature.
Attachment #8849440 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #45)
Great points, done!

> > +        masm.loadWasmActivationFromTls(scratch);
> > +        masm.storePtr(ImmWord(0), Address(scratch, WasmActivation::offsetOfExitFP()));
> 
> Don't we need to reset the exit reason here too? If so, bonus points for a
> test case!

Technically we don't, since exitReason is only observed when exitFP is non-null, but probably it's a nice invariant to keep exitReason at None and then assert this in ~WasmActivation() (as we do for exitFP).

> >      masm.loadWasmTlsRegFromFrame();
> > +    masm.moveStackPtrTo(FramePointer);
> > +    masm.addPtr(Imm32(masm.framePushed()), FramePointer);
> 
> Wouldn't it be simpler if we just pushed FP before the call, so we could pop
> it here? (we're still before the error handling bits where paths diverge)

Unfortunately, that would require messing with the complicated computation of jitFramePushed so that the call was aligned on all archs.
(In reply to Benjamin Bouvier [:bbouvier] from comment #47)
Correct!
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4701cf47fa90
Baldr: always enable profiling prologue (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/36a11b6821fa
Baldr: save TLS reg in Frame (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/3859ce4ab1f5
Baldr: remove baseline's explicit TLS-saving (r=yury)
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ad92cd2418b
Baldr: remove ion's explicit TLS-saving (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae5ca99b1322
Baldr: remove stubs' explicit TLS-saving (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/2af1e0581be2
Baldr: remove movw/movt requirement for wasm on ARM (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/60f7f03a4a6c
Baldr: remove the wasm::Compartment GC hack (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/caf34785ed27
Baldr: remove SymbolicAddress::ContextPtr (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5b32e0346dc
Baldr: remove hacky register allocation from i64 div/mod (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/42d309089d33
Baldr: set ARM's FrameRegister (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/39cc35b68133
Baldr: maintain fp register instead a virtual fp (r=bbouvier)
https://hg.mozilla.org/integration/mozilla-inbound/rev/6630355407bf
Baldr: move reload of TLS out of bloaty out-of-line paths (r=bbouvier)
Do you plan to remove lookupInstanceDeprecated and the instance comparison machinery as part of this bug or should I undertake that as part of bug 1338217?
Flags: needinfo?(luke)
Asked you for feedback on some tenative work on bug 1338217 instead.
Flags: needinfo?(luke)
Depends on: 1350988
Depends on: 1350744
Depends on: 1351660
Depends on: 1353352
Depends on: 1355699
Sorry I don't follow the change. But I'm having trouble with sparc build:

 9:43.18 /scratch/firefox/js/src/jit/none/MacroAssembler-none.h:51:72: error: converting to 'js::jit::Register' from initializer list would use explicit constructor 'constexpr js::jit::Register::Register(js::jit::Register::Encoding)'
 9:43.18  static constexpr Register WasmIonExitTlsReg = { Registers::invalid_reg };

Is '=' as I you have added into MacroAssembler-none.h really approprite? Following seems to help me (to get little bit further):

--- a/js/src/jit/none/MacroAssembler-none.h     Thu Jun 22 20:44:50 2017 -0700
+++ b/js/src/jit/none/MacroAssembler-none.h     Fri Jun 23 12:32:05 2017 +0000
@@ -48,7 +48,7 @@

 static constexpr Register WasmIonExitRegReturnData { Registers::invalid_reg };
 static constexpr Register WasmIonExitRegReturnType { Registers::invalid_reg };
-static constexpr Register WasmIonExitTlsReg = { Registers::invalid_reg };
+static constexpr Register WasmIonExitTlsReg { Registers::invalid_reg };
 static constexpr Register WasmIonExitRegD0 { Registers::invalid_reg };
 static constexpr Register WasmIonExitRegD1 { Registers::invalid_reg };
 static constexpr Register WasmIonExitRegD2 { Registers::invalid_reg };
Flags: needinfo?(luke)
(In reply to Petr Sumbera from comment #54)
> Sorry I don't follow the change. But I'm having trouble with sparc build:
> 
>  9:43.18 /scratch/firefox/js/src/jit/none/MacroAssembler-none.h:51:72:
> error: converting to 'js::jit::Register' from initializer list would use
> explicit constructor 'constexpr
> js::jit::Register::Register(js::jit::Register::Encoding)'
>  9:43.18  static constexpr Register WasmIonExitTlsReg = {
> Registers::invalid_reg };
> 
> Is '=' as I you have added into MacroAssembler-none.h really approprite?
> Following seems to help me (to get little bit further):
> 
> --- a/js/src/jit/none/MacroAssembler-none.h     Thu Jun 22 20:44:50 2017
> -0700
> +++ b/js/src/jit/none/MacroAssembler-none.h     Fri Jun 23 12:32:05 2017
> +0000
> @@ -48,7 +48,7 @@
> 
>  static constexpr Register WasmIonExitRegReturnData { Registers::invalid_reg
> };
>  static constexpr Register WasmIonExitRegReturnType { Registers::invalid_reg
> };
> -static constexpr Register WasmIonExitTlsReg = { Registers::invalid_reg };
> +static constexpr Register WasmIonExitTlsReg { Registers::invalid_reg };
>  static constexpr Register WasmIonExitRegD0 { Registers::invalid_reg };
>  static constexpr Register WasmIonExitRegD1 { Registers::invalid_reg };
>  static constexpr Register WasmIonExitRegD2 { Registers::invalid_reg };

Oops, looking at the lines around I think you're obviously right. Thanks for noticing this! Can you open another bug to make this change, please? If there's an attached patch, that would be even better :) Cheers!
Flags: needinfo?(luke)
You need to log in before you can comment on or make changes to this bug.