Closed Bug 1335068 Opened 7 years ago Closed 7 years ago

Wasm: Split the global allocation out of the CodeSegment

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(1 file, 1 obsolete file)

Currently the data for global variables is allocated at the end of the CodeSegment representing the module (see WasmCode.h).  For tiering, we'll need to share the global data between what amounts to two independently compiled versions of the same module, and it can only be allocated at the end of at most one of those CodeSegments, so we might as well lift the restriction altogether.

This will likely require us to abandon rip-relative global variable accesses on x64.  So we need to decide whether to fall back on patching, as we have for x86, or a globals register, as we have for ARM, or to indirect via Tls (requiring a double load).  The decision affects both Baldr and Rabaldr.

Speculative: Rip-relative accessing might continue to work if we can force all code segments that need to reach a global data segment to be allocated in memory that is sufficiently close to the global data segment's memory.

(P1 because it blocks tiering, which is P1.)
Agreed this is what we have to do.

How about hanging the global data off the end of TlsData to avoid the indirection and the patching?  This codegen path should be ISA-agnostic and so you could use it for all platforms and thus easily kill x86 patching and ARM's GlobalReg, both of which were already goals.
Note to self:

Actually, "patching as for x86" also requires a modest distance from the patch point to the code segment; the x86 patching really just emulates RIP-relative loading but gets away with it by being a 32-bit architecture with 32-bit offsets.  The x64 still only has 32-bit offsets.

(In reply to Luke Wagner [:luke] from comment #1)

> How about hanging the global data off the end of TlsData to avoid the
> indirection and the patching?  This codegen path should be ISA-agnostic and
> so you could use it for all platforms and thus easily kill x86 patching and
> ARM's GlobalReg, both of which were already goals.

You mean allocating the global data at the end of the Tls (enlarging the Tls by the size of the global data)?  I agree this would work now and it's easy to do and pretty sweet, but presumably it will no longer work once we start sharing code across multiple threads, each with its own Tls.  Thoughts?
Blocks: 1277562
No longer blocks: 1277652
Regarding loading from the Tls via a double load:  I assume Ion allows us to express a load whose value will not change.  In that case, splitting the double load for a getglobal, or the load+store for a setglobal, into two MIR nodes should at least make it possible to let Ion keep the globals pointer in a register for stretches of code, avoiding reloads in tight code.  Not a panacea, to be sure, but an improvement.
Agreed that, when we have cross-thread-shared globals, we'll need the double-indirection you mention but that the second indirection should be imminently optimizable by Ion.
This appears to work and is fairly clean, but there are a couple of items to discuss (see next) and thus asking for feedback only.

(1) With a little bit of work it is now probably possible to merge the Baldr
    codegens for wasm getglobal and setglobal.  I'm inclined to do this work
    as a follow-up bug but it could be folded in here.  (The work is mostly
    in creating and exposing cross-platform abstractions; eg, load64 is not
    defined on x64 but is just a simple alias for existing functionality;
    eg, SIMD access is platform-specific but need not be for these simple
    cases.)

(2) I had to replace the code that loads NaN values for asm.js oob accesses
    to floating-point arrays.  Previously these loads were via the GlobalReg,
    which is now gone.  They could clearly go via the TlsReg, but then the
    TlsReg becomes an input to every asm.js heap load and store instruction,
    which sounds potentially worse.  To fix that I could break the asm.js
    load and store code up so that code higher up could decide whether to
    rematerialize the Tls register or not.  And so on.  Discuss.

(3) Mips compilation was already broken, but since I have not done anything for
    Mips in this patch it is now thoroughly broken.  Fixing (1) will alleviate
    that somewhat, assuming I do the Mips changes in (1)...
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Attachment #8831993 - Flags: feedback?(luke)
Comment on attachment 8831993 [details] [diff] [review]
bug1335068-move-wasm-globals.patch

Review of attachment 8831993 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this is looking great!  Nice to see all that old global stuff go.  On your discussion points:
 1. I'd certainly like to see the codegen paths merged; whether it's a follow-up or folded-in is up to you.
 2. I think you made the right choice for the asm.js ARM nans; on first glance, the ma_vimm logic seems like it'll just be swapping a load from GlobalReg for a PC-relative load from a constant pool and so I'd expect this is the lowest-overhead solution.
 3. I often try to give a courtesy compile (with --enable-simulator=mips) but if it's already terribly broken, then it seems fine to land with best effort.  (Don't forget 'make check-masm' though :)

::: js/src/wasm/WasmCode.h
@@ +613,5 @@
>  
>  class Code
>  {
> +    const UniqueCodeSegment   segment_;
> +    const SharedGlobalSegment globals_;

Although wasm::Code and wasm::Instance are currently 1:1, the goal is for wasm::Code to be shared between a wasm::Module and all its wasm::Instances.  In this configuration, the GlobalSegment would be per-Instance.  Given that, I think perhaps GlobalSegment belongs in WasmInstance.h.  Also, I was thinking that it could be held by UniquePtr and that the GlobalSegment could own the raw TlsData allocation (replacing the current TlsData pointer).
Attachment #8831993 - Flags: feedback?(luke) → feedback+
This addresses all your comments on the previous iteration, and additionally incorporates the commoning of the wasm global load/store code (hence the patch is quite large).  It also builds on Mips32/64 if my patches on bug 1329650 are applied first - it was easy, since all that code had moved out of the Mips layer to the common layer.
Attachment #8831993 - Attachment is obsolete: true
Attachment #8832486 - Flags: review?(luke)
I see there are some minor addenda here: a commented-out line should be removed; an indentation change in class Code should be undone; and most importantly the definitions of NaN64GlobalDataOffset and NaN32GlobalDataOffset should be removed, which will necessitate a change to the MIPS code like the one I did for ARM.
Comment on attachment 8832486 [details] [diff] [review]
bug1335068-move-wasm-globals-v2.patch

Review of attachment 8832486 [details] [diff] [review]:
-----------------------------------------------------------------

Delightful!

::: js/src/wasm/WasmCode.h
@@ +80,1 @@
>      uint32_t codeLength() const { return codeLength_; }

nit: perhaps rename method and field to just 'length' now?

::: js/src/wasm/WasmInstance.cpp
@@ +863,5 @@
> +
> +/* static */ UniqueGlobalSegment
> +GlobalSegment::create(const LinkData& linkData)
> +{
> +    uint32_t globalDataLength = linkData.globalDataLength;

nit: could you perhaps take globalDataLength as the parameter instead?

@@ +872,5 @@
> +    if (!gs)
> +        return nullptr;
> +
> +    TlsData* tlsData =reinterpret_cast<TlsData*>(js_calloc(offsetof(TlsData, globalArea) +
> +                                                           globalDataLength, 1));

nit: space after '='; also I think the second arg to js_calloc isn't necessary

@@ +893,5 @@
> +    js_free(tlsData_);
> +}
> +
> +size_t
> +GlobalSegment::addSizeOfMisc(MallocSizeOf mallocSizeOf) const

nit: since this method isn't adding to anything, could it be named sizeOfMisc()?

::: js/src/wasm/WasmInstance.h
@@ +26,5 @@
>  namespace js {
>  namespace wasm {
>  
> +class GlobalSegment;
> +typedef UniquePtr<GlobalSegment> UniqueGlobalSegment;

nit: the pattern in Wasm*.h files is to put these Vector<T>/UniquePtr<T> right after (separated by \n) T's declaration (e.g. UniqueInstance below)

::: js/src/wasm/WasmModule.cpp
@@ +902,5 @@
>      auto codeSegment = CodeSegment::create(cx, code_, linkData_, *metadata_, memory);
>      if (!codeSegment)
>          return false;
>  
> +    auto globalSegment = GlobalSegment::create(linkData_);

need null check
Attachment #8832486 - Flags: review?(luke) → review+
https://hg.mozilla.org/mozilla-central/rev/a250e0834223
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: