Closed Bug 1279312 Opened 8 years ago Closed 8 years ago

Baldr: Use a hidden argument register for a TLS pointer

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: jolesen, Assigned: jolesen)

References

Details

Attachments

(6 files, 3 obsolete files)

WebAssembly modules need a per-thread per-module TLS data structure that can hold information like:

- The current stack limit.
- Pointer to global variables.
- Heap size.
- Heap base.
- Wasm TLS variables for the module.

This allows us to run multiple instances of the same module without having to specialize the generated code to each instance.

The TLS pointer can be passed to compiled wasm functions in a designated ABI register. If the register allocator known that this register is not clobbered by a function call, it is not necessary to reserve the register. It would be permitted to spill the TLS pointer between function calls, making more registers available for computing stuff.
When enabled with CompileArgs::useTLSPointerArgument, WebAssembly functions
expect to be passed a hidden argument in WasmTLSPointerReg which is a pointer to
a TLSData struct.

Temporarily allocate a TLSData instance in the wasm::Module itself. When wasm
supports multithreading, we will need to allocate a TLSData instance per thread
per module.

This patch generates code to pass the TLS pointer to WebAssembly functions,
preserving it through intra-module calls. The pointer is not used for anything
yet, and the the TLS pointer register is not currently preserved across function
calls.
Attachment #8761776 - Flags: feedback?(luke)
When enabled with CompileArgs::useTlsStackLimit, get the stack limit from
TlsData::stackLimit instead of SymbolicAddress::StackLimit. Since the TLS
pointer register is available at every function prologue, the over-recursion
check is the same cost as using the statically linked address.

Initialize the TlsData stack limit during Module::staticallyLink. We probably
need to update this limit if the context's stack limit is modified later.
Attachment #8761777 - Flags: feedback?(luke)
Luke, this is just a prototype. It's passing a TLS pointer around in the right register, but none of the register allocator optimizations are there yet.

Note that we're making a copy of the stack limit in staticallyLink. If the stack limit is changes later, the wasm modules won't notice. Do we need a hook to update all the copies of the stack limit? I want to avoid a double indirection in every function prologue, hence all the copies.
Assignee: nobody → jolesen
Comment on attachment 8761776 [details] [diff] [review]
Pass a TLS pointer hidden argument to WebAssembly functions.

Review of attachment 8761776 [details] [diff] [review]:
-----------------------------------------------------------------

Agreed with the strategy here, looks good.

::: js/src/asmjs/WasmStubs.cpp
@@ +136,5 @@
> +    // This is the third argument in the EntryFuncPtr prototype.
> +    if (usesTlsPointer) {
> +#if defined(JS_CODEGEN_X86)
> +        masm.loadPtr(
> +          Address(masm.getStackPointer(), EntryFrameSize + masm.framePushed() + 2 * sizeof(void*)),

Pre-existing, b/c we already do this for argv, but we really should be using ABIArgIter (like below) to determine where and how to load TLS reg from.

Also, it might be good to move this loading of the TLS reg to right before the call so that we don't have to depend on TLSReg being disjoint from ABINonArgReturnReg(1,2) (which are used as scratch registers before the call).

::: js/src/asmjs/WasmTypes.h
@@ +774,5 @@
>      bool useSignalHandlersForInterrupt;
>  
> +    // Do WebAssembly functions take a hidden TLS pointer argument in
> +    // `WasmTlsPointerReg`?
> +    bool useTlsPointerArgument;

Totally fine to keep this in the patch for testing but I think by the time we'd land it we'd simply have it always on.  Or is there a platform dependency that might disable it?  I can't think of one.
Attachment #8761776 - Flags: feedback?(luke) → feedback+
Comment on attachment 8761777 [details] [diff] [review]
Use TLS pointer for the stack limit check.

Review of attachment 8761777 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, makes sense.

::: js/src/asmjs/WasmTypes.h
@@ +779,5 @@
>  
> +    // Should we use the stackLimit in TlsData when checking for over-recursion,
> +    // or the global AddressOf(StackLimit)?
> +    // This requires useTlsPointerArgument to be true.
> +    bool useTlsStackLimit;

Same basic point of, for landing, let's have it always on.

::: js/src/jit/CodeGenerator.cpp
@@ +8824,5 @@
> +        if (useTlsStackLimit) {
> +            // Get the per-thread stack limit from the `TlsData` struct pointed
> +            // to by the `WasmTlsPointerReg` hidden argument register.
> +            masm.branchPtr(Assembler::AboveOrEqual,
> +                           Address(WasmTlsPointerReg, offsetof(wasm::TlsData, stackLimit)),

We're also depending here on WasmTlsPointerReg being disjoint from the regs which are clobbered inside GenerateFunctionPrologue (ABINonArgReg(0,1)).  In general, we try to informally capture these constraints by putting the necessarily-disjoint registers in Assembler-*.h.  See, e.g., WasmTableCall*Reg.  I'm thinking maybe the best way is to add WasmTlsPointerReg as a sibling to ABINonArgReg(0,1) since that's the constraint here.

Also, perhaps WasmTlsReg would be a shorter-yet-still-accurate name?
Attachment #8761777 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #4)
> Comment on attachment 8761776 [details] [diff] [review]
> Pass a TLS pointer hidden argument to WebAssembly functions.

> ::: js/src/asmjs/WasmStubs.cpp
> @@ +136,5 @@
> > +    // This is the third argument in the EntryFuncPtr prototype.
> > +    if (usesTlsPointer) {
> > +#if defined(JS_CODEGEN_X86)
> > +        masm.loadPtr(
> > +          Address(masm.getStackPointer(), EntryFrameSize + masm.framePushed() + 2 * sizeof(void*)),
> 
> Pre-existing, b/c we already do this for argv, but we really should be using
> ABIArgIter (like below) to determine where and how to load TLS reg from.

That would be nice, yes.
 
> Also, it might be good to move this loading of the TLS reg to right before
> the call so that we don't have to depend on TLSReg being disjoint from
> ABINonArgReturnReg(1,2) (which are used as scratch registers before the
> call).

Since the TLS pointer is arriving in an argument register, we have to capture it before setting up arguments for the call. IntArgReg2 would be clobbered otherwise.
 
> ::: js/src/asmjs/WasmTypes.h
> @@ +774,5 @@
> >      bool useSignalHandlersForInterrupt;
> >  
> > +    // Do WebAssembly functions take a hidden TLS pointer argument in
> > +    // `WasmTlsPointerReg`?
> > +    bool useTlsPointerArgument;
> 
> Totally fine to keep this in the patch for testing but I think by the time
> we'd land it we'd simply have it always on.  Or is there a platform
> dependency that might disable it?  I can't think of one.

No, it seems to work an all platforms. I thought you wanted a dynamic switch for this? Anyway, it's useful for performance testing.
WebAssembly functions now expect to be passed a hidden argument in WasmTlsReg
which is a pointer to a TlsData struct.

Temporarily allocate a TlsData instance in the wasm::Instance itself. When wasm
supports multithreading, we will need to allocate a TlsData instance per thread
per module instance.

This patch generates code to pass the TLS pointer to WebAssembly functions,
preserving it through intra-module calls. The pointer is not used for anything
yet, and the the TLS pointer register is not currently preserved across function
calls.
Attachment #8771112 - Flags: review?(luke)
Attachment #8761776 - Attachment is obsolete: true
Get the stack limit from TlsData::stackLimit instead of
SymbolicAddress::StackLimit. Since the TLS pointer register is available at
every function prologue, the over-recursion check is the same cost as using the
statically linked address.
Attachment #8771113 - Flags: review?(luke)
Attachment #8761777 - Attachment is obsolete: true
WebAssembly functions take a TLS pointer argument and now ensure that the
WasmTlsReg register has the same value when they return.

This is not yet exploited by the register allocator which still thinks that all
registers are clobbered by function calls.
Attachment #8771115 - Flags: review?(bbouvier)
- Add a virtual isCallPreserved() method to LNode which allows a call
  instruction to indicate that it preserves the values of some registers. Use
  this hook in BacktrackingAllocator when processing a call instruction.

- Add a preservesTlsReg() property to MAsmJSCall and use this to implement the
  LAsmJSCall::isCallPreserved() method.

- Mark intra-module WebAssembly calls as preserving the TLS pointer register.

This change allows the backtracking register allocator to leave the TLS pointer
register alone in small functions that don't need it for something else. There
are probably more improvements to be done if we need to split the live range of
the TLS pointer register. For example, BacktrackingAllocator::splitAcrossCalls()
will still split that live range at all calls.
Attachment #8771116 - Flags: review?(bhackett1024)
Treat WasmTlsReg as a pinned register in code generated by the baseline compiler.
This ensures that the register is not clobbered by any Wasm functions.

Use the stack limit stored in TlsData instead of the StackLimit symbolic address
in baseline prologues.
Attachment #8771117 - Flags: review?(lhansen)
This symbolic address is not used any longer. It has been replaced by the
stackLimit entry in the TlsData struct.
Attachment #8771118 - Flags: review?(luke)
Attachment #8771116 - Flags: review?(bhackett1024) → review+
Comment on attachment 8771117 [details] [diff] [review]
Use WasmTlsReg in baseline Wasm compiler.

Review of attachment 8771117 [details] [diff] [review]:
-----------------------------------------------------------------

Could you add a little prose to the block at the top of the file that defines ScratchRegX86 to note that in addition to not being one of the WasmTableCall registers, ScratchRegX86 should also not be WasmTlsReg?

::: js/src/asmjs/WasmBaselineCompile.cpp
@@ +6279,5 @@
>  #endif
> +    // In the baseline compiler we treat WasmTlsReg as a pinned register.
> +    // In the optimizing compiler it is not pinned, except as a fixed hidden
> +    // argument register.
> +    availGPR_.take(WasmTlsReg);

So this might be OK for right now but once we add 64-bit support on x86 it may actually cause us to be register-starved since we'll need two registers for every operand.  Right now there are six registers available on x86 (after the scratch register was taken out), giving us three pairs to play with.  If dropping down to five causes a lot of turbulence in the code generation it will probably be better to save on entry and restore on call-outs.  That said, it may be that scratchRegX86 can be part of a scratch-register pair and that we only need five registers to have three pairs, so we'll see.

(Hannes is working on 64-bit support.)
Attachment #8771117 - Flags: review?(lhansen) → review+
Comment on attachment 8771115 [details] [diff] [review]
Preserve TLS pointer register across calls.

Review of attachment 8771115 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but I'm not too sure to understand why this is needed, if the register allocator knows that it must not clobber the TLS register? Won't this force reloads of the TLS register at the end of intra-function calls? Or is it because as calls will split live ranges, then the TLS register can be dead after a call till the end of a function?

function g() {}
function f() {
  // TLS alive
  g(); // TLS dead
  // MAsmJSVoidReturn in f() makes TLS live again?
}

::: js/src/jit/Lowering.cpp
@@ +4136,5 @@
>          MOZ_CRASH("Unexpected asm.js return type");
> +
> +    // Preserve the TLS pointer we were passed in `WasmTlsReg`.
> +    MDefinition* tlsPtr = ins->getOperand(1);
> +    if (tlsPtr->type() == MIRType::Pointer)

In which cases can't it be a pointer? It seems to be it could either be not defined (tlsPtr == nullptr) or the type be MIRType::Pointer.

@@ +4149,5 @@
> +    auto* lir = new(alloc()) LAsmJSVoidReturn;
> +
> +    // Preserve the TLS pointer we were passed in `WasmTlsReg`.
> +    MDefinition* tlsPtr = ins->getOperand(0);
> +    if (tlsPtr->type() == MIRType::Pointer)

ditto
Attachment #8771115 - Flags: review?(bbouvier) → review+
(In reply to Lars T Hansen [:lth] from comment #13)
> Comment on attachment 8771117 [details] [diff] [review]
> Use WasmTlsReg in baseline Wasm compiler.
> 
> Review of attachment 8771117 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Could you add a little prose to the block at the top of the file that
> defines ScratchRegX86 to note that in addition to not being one of the
> WasmTableCall registers, ScratchRegX86 should also not be WasmTlsReg?

Good idea, will do.

> ::: js/src/asmjs/WasmBaselineCompile.cpp
> @@ +6279,5 @@
> >  #endif
> > +    // In the baseline compiler we treat WasmTlsReg as a pinned register.
> > +    // In the optimizing compiler it is not pinned, except as a fixed hidden
> > +    // argument register.
> > +    availGPR_.take(WasmTlsReg);
> 
> So this might be OK for right now but once we add 64-bit support on x86 it
> may actually cause us to be register-starved since we'll need two registers
> for every operand.  Right now there are six registers available on x86
> (after the scratch register was taken out), giving us three pairs to play
> with.  If dropping down to five causes a lot of turbulence in the code
> generation it will probably be better to save on entry and restore on
> call-outs.  That said, it may be that scratchRegX86 can be part of a
> scratch-register pair and that we only need five registers to have three
> pairs, so we'll see.

Maybe the TLS pointer could be treated as a special kind of local on x86?
(In reply to Benjamin Bouvier [:bbouvier] from comment #14)
> Comment on attachment 8771115 [details] [diff] [review]
> Preserve TLS pointer register across calls.
> 
> Review of attachment 8771115 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks good, but I'm not too sure to understand why this is needed, if
> the register allocator knows that it must not clobber the TLS register?
> Won't this force reloads of the TLS register at the end of intra-function
> calls? Or is it because as calls will split live ranges, then the TLS
> register can be dead after a call till the end of a function?

The idea is to keep the TLS pointer in an SSA value between calls instead of pinning the register everywhere. In code with dense calls, this generates the same machine code as pinning the register once the register allocator understands that the TLS register is preserved by function calls. It will simply choose WasmTlsReg for that live range.

In computational code with high register pressure and no calls, however, the register allocator can split the range of the TLS pointer SSA value and spill it to the stack. This frees up the register where it's needed for something else.

So yes, if the body of a function has high register pressure, the TLS register will be used for computations and there will be a restore of the TLS pointer before returning. But if the register pressure is low, no spills or even move instructions are needed since the TLS pointer is already in the right register for returning.

> > +    if (tlsPtr->type() == MIRType::Pointer)
> 
> In which cases can't it be a pointer? It seems to be it could either be not
> defined (tlsPtr == nullptr) or the type be MIRType::Pointer.

Good catch. This code was left over from an earlier version of the patch where the use of a TLS pointer was controlled by a flag. I'll just delete the 'if'.
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> (In reply to Lars T Hansen [:lth] from comment #13)
> > Comment on attachment 8771117 [details] [diff] [review]
> > Use WasmTlsReg in baseline Wasm compiler.
> > 
> > Review of attachment 8771117 [details] [diff] [review]:
> > -----------------------------------------------------------------
> 
> Maybe the TLS pointer could be treated as a special kind of local on x86?

Yes, I think so too.  The medium-term plan for the baseline compiler is to cache locals in registers in basic blocks, and to "pre-cache" (ie not eagerly spill) incoming register arguments, and that would work just fine for the TLS register.
Priority: -- → P1
Attachment #8771113 - Flags: review?(luke) → review+
Attachment #8771118 - Flags: review?(luke) → review+
Comment on attachment 8771112 [details] [diff] [review]
Pass a TLS pointer hidden argument to WebAssembly functions.

Review of attachment 8771112 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/WasmInstance.cpp
@@ +422,5 @@
> +void
> +Instance::updateStackLimit(JSContext* cx)
> +{
> +    // Capture the stack limit for cx's thread.
> +    tlsData_.stackLimit = *(void**)cx->stackLimitAddressForJitCode(StackForUntrustedScript);

Looking at Gecko again, I think it's fine to set this once, in the Instance() constructor, passing a JSContext* arg.

::: js/src/asmjs/WasmIonCompile.cpp
@@ +1784,5 @@
>      const FuncImportGenDesc& funcImport = f.mg().funcImports[funcImportIndex];
>      const Sig& sig = *funcImport.sig;
>  
>      FunctionCompiler::CallArgs args(f, lineOrBytecode);
> +    if (!EmitCallArgs(f, sig, false, &args))

Hmm, I'd like to understand why the TLS pointer isn't passed for indirect calls.  Currently, in trunk, call_indirect always stays intra-module.  Even when inter-module support is added, my understanding is that we'll want the TLS reg live in WasmTlsReg at the callsite so, right before hand, we can push it, load the callee's TLS reg, load the callee's HeapReg/GlobalReg (hopefully not for much longer...), do the call, then after pop the TLS reg (preserving the same call invariant as normal calls) and restore HeapReg/GlobalReg.

I also don't understand how, if we don't pass the TLS reg, we don't crash: the callee is expecting WasmTlsReg to be live on entry, right?

::: js/src/asmjs/WasmStubs.cpp
@@ +132,5 @@
>      if (usesHeap)
>          masm.loadAsmJSHeapRegisterFromGlobalData();
>  
> +// Put the per-thread, per-module TLS pointer into WasmTlsReg.
> +// This is the third argument in the ExportFuncPtr prototype.

nit: comment needs indent
Comment on attachment 8771112 [details] [diff] [review]
Pass a TLS pointer hidden argument to WebAssembly functions.

Review of attachment 8771112 [details] [diff] [review]:
-----------------------------------------------------------------

Oops, I was misreading "EmitCallImport" as "EmitCallIndirect", hence the massive confusion.  Yes, this makes great sense, fantastic patch, thanks!
Attachment #8771112 - Flags: review?(luke) → review+
Tests are failing js\src\jit-test\tests\asm.js\testFunctionPtr.js on Windows XP 32-bit with baseline.
I can reproduce the x86 baseline crash of asm.js\testFunctionPtr.js on macOS. A baseline-generated prolog is crashing in the stack limit check because %esi (the WasmTlsReg on x86) is 0.

The problem, probably: The baseline compiler's emitCallImport() does not save the pinned WasmTlsReg, and the called imported function is not guaranteed to do so either. While %esi is a callee-saved register in the C calling convention, JIT code does not preserve it.

So we need a mechanism for allocating a stack slot for the TLS pointer so it can be saved during import calls. I may as well bite the bullet and treat it as a special kind of local as discussed above.
Jakob, can't you use the mechanism that is already there that is used for saving HeapReg and GlobalReg on other architectures?
Allocate an additional slot in localInfo_ and use it to save the incoming TLS
pointer. When setting up arguments for a function call, get the TLS pointer from
that local slot. Also preserve the TLS pointer register by reloading it before
returning.

This makes the Baseline ABI compatible with the Ion ABI.
Attachment #8773445 - Flags: review?(lhansen)
Attachment #8771117 - Attachment is obsolete: true
Lars: The new baseline patch simply stores the TLS pointer in a local slot and reloads it when needed. I didn't allocate a slot in the stk_ stack to avoid interfering with wasm semantics. I think if we want better register allocation for the TLS pointer, we should push it onto stk_ before anything else, and leave it there. I guess this is very similar to what you wanted to do for the normal incoming register arguments.

The TLS pointer should be able to participate in these future register allocation optimizations. Additionally, sync() could be taught that intra-module calls don't clobber WasmTlsReg.
Blocks: 1288944
Comment on attachment 8773445 [details] [diff] [review]
Use WasmTlsReg in baseline Wasm compiler.

Moving review to Luke since Lars is busy this week.
Attachment #8773445 - Flags: review?(lhansen) → review?(luke)
Comment on attachment 8773445 [details] [diff] [review]
Use WasmTlsReg in baseline Wasm compiler.

Review of attachment 8773445 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/WasmBaselineCompile.cpp
@@ +6656,5 @@
>  
>      const ValTypeVector& args = func_.sig().args();
>  
> +    // localInfo_ contains an entry for every local in locals_, followed by
> +    // entries for speical locals. Currently the only special local is the TLS

s/speical/special/
Attachment #8773445 - Flags: review?(luke) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: