Closed Bug 1422043 Opened 2 years ago Closed 2 years ago

wasm: Lazy entry stub generation for function tables

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox59 --- wontfix
firefox60 blocking fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 21 obsolete files)

28.26 KB, patch
luke
: review+
Details | Diff | Splinter Review
3.76 KB, patch
luke
: review+
Details | Diff | Splinter Review
63.15 KB, patch
luke
: review+
Details | Diff | Splinter Review
17.05 KB, patch
luke
: review+
Details | Diff | Splinter Review
7.10 KB, patch
luke
: review+
Details | Diff | Splinter Review
61.22 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
At the moment, whenever a function is exported (explicitly via export, implicitly via table export), an interpreter entry is generated; after bug 1319203 lands, a jit entry will be generated too.

Each imported function gets its interpreter exit and its jit exit (used when it's called frequently enough).

We should investigate generate the entry/exits stubs lazily, at least for the jit ones that might never get used; we could extend this to interpreter ones as well.

It might also be nice to use a per-signature stub (per instance, or per process), to reduce the number of entries and exits.
See also bug 1399624 (which might be subsumed by this bug) on the size wins.
Priority: -- → P3
We decided that the size of new jit->wasm stubs had to be accompanied by a patch in this bug.

Scoping the issue a bit better here: after discussion with Luke and looking at some data (in AngryBots, 284 functions out of 19069 exported functions are explicitly exported, the other ones being exported through function tables; in ZenGarden, 499 functions out of 50367 are explicitly exported), we came to the conclusion it would be enough to lazily compile only export stubs for function tables (since getting/setting them happens from C++ anyway).
Assignee: nobody → bbouvier
Blocks: 1319203
Status: NEW → ASSIGNED
Summary: wasm: Lazy entry/exit stub generation → wasm: Lazy entry stub generation for function tables
Btw, with stub-generation lazy, I'm guessing there isn't much win to be had from the JitEntryOol and JitEntryException shared stubs and, rather, some complexity (linking to them from the lazy stubs), so I'm assuming this bug's patches will remove them?
Need to be considered indeed (I figured this out during coding too :)).
Attached patch 1.split.patch (obsolete) — Splinter Review
(WIPpy)
Attached patch 2.wip.patch (obsolete) — Splinter Review
(wippy)
Attached patch 1.refactor.patch (obsolete) — Splinter Review
Attachment #8945147 - Attachment is obsolete: true
Attached patch 2.lazystubs.patch (obsolete) — Splinter Review
TODO:
- handle tiering
- probably some work with sync/async stack iteration
- clean up
Attachment #8945148 - Attachment is obsolete: true
Attached patch 1.refactor.patch (obsolete) — Splinter Review
(rebased)
Attachment #8945856 - Attachment is obsolete: true
Attached patch 2.lazy.patch (obsolete) — Splinter Review
(rebased)
Attachment #8945857 - Attachment is obsolete: true
Attachment #8948747 - Attachment is obsolete: true
Attachment #8949483 - Flags: review?(luke)
Attached patch 2.lazystubs.patch (obsolete) — Splinter Review
still a WIP.
Attachment #8948748 - Attachment is obsolete: true
Gave this some thought:

- there can be 2 threads writing lazy entries (main thread requesting a lazy JSFunction exported in a table + background thread finishing compiling tier2), so we need locking here.
- there can be one thread writing a lazy stub (background thread finishing compilation for a function that has been lazily exported) while the main thread is reading it (calling into the lazy JSFunction exported before), so we need locking here too.
- there are 2 writers and one reader, but at any time, there can either be: 2 writers, or 1 writer/1 reader.

So I *think* one possible implementation (and the one I'll do) is to have two mutexes, one for writers, and one for readers. When creating a new stub, we'll need to take the reader mutex too.

Other thoughts about the needs of a LazyStubs structure in wasm::Code:

- needs to perform stub lookup from function index fast, to 1) know if we already have compiled a given function stub or not, 2) to get the interpreter entry address from a function index.
- needs to perform code range lookup from any PC fast, for range lookup happening in wasm::Code::lookupRange.

So we might need two vectors: one ordered by function index, containing struct that have stub address, etc; another vector ordered by LazyCodeSegment's base address, containing a struct of { LazyCodeSegment, index in first vector }. This way we can perform binary search in both of them and have both lookups be fast.

Luke, do these two ideas sound good to you, or do you see anything better we could do? (the current WIP patch has racy lazy stubs writing + inefficient lazy code segment storage, but it can help understanding the requirements here)
Flags: needinfo?(luke)
Attached patch 2.wip.patch (obsolete) — Splinter Review
(rebased)
Attachment #8949484 - Attachment is obsolete: true
Comment on attachment 8949483 [details] [diff] [review]
1.refactoring.patch

Review of attachment 8949483 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for splitting into a separate patch!  How about following through the distinction you've already established in this patch of "module segment" vs. "lazy stubs segment" and rename CodeSegment to ModuleSegment and then rename CodeSegmentBase to CodeSegment?  Then later, I assume, there is a LazyStubsSegment.  That's all mechanical, so rs=me if you agree.

::: js/src/wasm/WasmCode.h
@@ +52,5 @@
>  
>  typedef RefPtr<ShareableBytes> MutableBytes;
>  typedef RefPtr<const ShareableBytes> SharedBytes;
>  
> +// Executable code must be deallocated specially.

nit: to match the top-level .h style, could you put a \n after this comment?

@@ +62,5 @@
> +};
> +
> +using UniqueCodeBytes = UniquePtr<uint8_t, FreeCode>;
> +
> +class CodeSegment;

Could you add a comment before this (with \n before and after) explaining what a CodeSegmentBase is and specifically mention the two subtypes of CodeSegmentBase?

@@ +96,5 @@
> +    ~CodeSegmentBase();
> +
> +    bool isLazyStubs() const { return kind_ == Kind::LazyStubs; }
> +    bool isModule() const { return kind_ == Kind::Module; }
> +    const CodeSegment* ofModule() const { MOZ_ASSERT(isModule()); return (CodeSegment*) this; }

nit: we usually name these asserted-down-casts with "as".  "of" made me expect this was a cast like Foo::of(bar) which is a cast from Bar to Foo.
Attachment #8949483 - Flags: review?(luke) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/50145ed82626
Split CodeSegment into two classes for future usage; r=luke
Keywords: leave-open
Discussed with Luke:

- we can remove the lookup(funcIndex) -> pc by using LookupCodeSegment (and introduce a new outparam CodeRange when we need it).
- in tier2 background compilation, grouping entry stubs in the same LazyCodeSegment makes sense.
- having two locks (one for tier1, one for tier2 lazy stubs) helps solving one race condition: 1. tier2 is finishing so recompiling all the stubs already lazily generated for tier1, 2. tier1 is requesting a new lazy stub that tier2 doesn't know; in this case, the tier1 jit entry would always be used (and stub lookup of the best tier interpreter entry in the C++ WasmCall would fail).
- we want to assert that once we've started tier2 compilation of stubs, we're not generating any new tier1 stubs.
Flags: needinfo?(luke)
Simple optimization we discussed.
Attachment #8949800 - Attachment is obsolete: true
Attachment #8950646 - Flags: review?(luke)
Attached patch 3.refactoring.patch (obsolete) — Splinter Review
(refactoring WIP, to put all the data that belongs to a given tier into one place, CodeTier)
Comment on attachment 8950646 [details] [diff] [review]
2.pointer-arith.patch

Review of attachment 8950646 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, thanks!
Attachment #8950646 - Flags: review?(luke) → review+
Attached patch 3.refactor-tier.patch (obsolete) — Splinter Review
Attachment #8950648 - Attachment is obsolete: true
Comment on attachment 8951017 [details] [diff] [review]
3.refactor-tier.patch

Review of attachment 8951017 [details] [diff] [review]:
-----------------------------------------------------------------

Except for the LinkData part that I can easily revert, I think it's pretty ready for review.

::: js/src/wasm/WasmCode.h
@@ +460,5 @@
> +// LinkData is built incrementally by ModuleGenerator and then stored immutably
> +// in Module. LinkData is distinct from Metadata in that LinkData is owned and
> +// destroyed by the Module since it is not needed after instantiation; Metadata
> +// is needed at runtime.
> +// TODO check comment^

While the logic of putting LinkDataTier into CodeTier was appealing, it's actually a clear memory regression in the case where a Module is freed while its Instances aren't, because now we'd keep the LinkData in CodeTier thus in Code (owned by Instance). Should I revert this change?
Attachment #8951017 - Flags: review?(luke)
Comment on attachment 8951017 [details] [diff] [review]
3.refactor-tier.patch

Review of attachment 8951017 [details] [diff] [review]:
-----------------------------------------------------------------

Ahhhh, this change feels really nice.  Thanks!

::: js/src/wasm/WasmCode.cpp
@@ +551,4 @@
>  {
> +#define cp(x) \
> +    if (!x.appendAll(src.x)) \
> +        return false

nit: I'm not sure the brevity win is worth the general grossness of using a macro.  Some functions are just a big sequence of "if (!...)\nreturn false" and I think that's ok.

@@ +797,5 @@
> +
> +const uint8_t*
> +CodeTier::deserialize(const uint8_t* cursor, const SharedBytes& bytecode, Metadata& metadata)
> +{
> +    metadata_ = Move(js::MakeUnique<MetadataTier>(Tier::Serialized));

nit: Move() shouldn't be necessary here and below because an rval (temporary value, incl the return value of a function) is implicitly convertible to &&.

@@ +899,3 @@
>  {
> +    MOZ_RELEASE_ASSERT(!hasTier2());
> +    MOZ_RELEASE_ASSERT(tier2->tier() == Tier::Ion && tier1_->tier() != Tier::Ion);

nit: preexisting: I think we can be more specific and say tier() == Tier::Baseline

::: js/src/wasm/WasmCode.h
@@ +246,5 @@
>          return pod.interpEntryOffset_;
>      }
>  
> +    bool clone(const FuncExport& src) {
> +        PodCopy(&pod, &src.pod, 1);

nit: PodAssign() (here and below)

@@ +460,5 @@
> +// LinkData is built incrementally by ModuleGenerator and then stored immutably
> +// in Module. LinkData is distinct from Metadata in that LinkData is owned and
> +// destroyed by the Module since it is not needed after instantiation; Metadata
> +// is needed at runtime.
> +// TODO check comment^

Yes, agreed.

@@ +535,5 @@
> +// built during module generation and then immutably stored in a Code.
> +
> +class CodeTier
> +{
> +    Tier tier_;

nit: can you 'const' and align align member fields like MetadataTier?

::: js/src/wasm/WasmGenerator.cpp
@@ +1003,5 @@
> +    if (!codeTier)
> +        return nullptr;
> +
> +    metadataTier_ = nullptr;
> +    linkDataTier_ = nullptr;

nit: after Move(), this shouldn't be necessary.  For bug-catching properties, Move() of a UniquePtr ends up nulling out the source.

::: js/src/wasm/WasmProcess.cpp
@@ +234,5 @@
>  const Code*
>  wasm::LookupCode(const void* pc)
>  {
>      const CodeSegment* found = LookupCodeSegment(pc);
> +    return found ? &found->codeTier().code() : nullptr;

nit: how about re-adding a "code()" helper so this and all the changes in WasmSignalHandlers.cpp/WasmFrameIter.cpp aren't necessary?
Attachment #8951017 - Flags: review?(luke) → review+
Attached patch 3.refactor-tier-updated.patch (obsolete) — Splinter Review
Updated patch; I had to add a new method LinkData::initCode(), because now we use SharedCode in LinkData to know whether tier2 has been committed or not; and setting it up is deferred (because LinkData is created at the same time the Generator is, but the Code is created much later in Generator::finish).
Attachment #8951218 - Flags: review+
Re-posting: we can even remove the need for LinkData::initCode and a few other functions by:

1. having callers check Code::hasTier2 or not
2. having the Generator own LinkDataTier during construction and then pass it to the Module ctor by Move (no more "takeLinkData" for tier2)
3. deserialize using LinkDataTier::deserialize and not LinkData::deserialize, so the same idiom described in 2 can be applied
4. arguably, LinkData::serializedSize wasn't counting the tier2 data size if !hasTier2(), which was wrong in my opinion; the data is there and does take memory, even if it's unused at the moment.

Luke, can you please check that nothing wrong showed up in between? Thanks!
Attachment #8951017 - Attachment is obsolete: true
Attachment #8951218 - Attachment is obsolete: true
Attachment #8951304 - Flags: review?(luke)
Attachment #8951305 - Flags: review?(luke)
Comment on attachment 8951304 [details] [diff] [review]
3.refactor-tier.patch

Review of attachment 8951304 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/wasm/WasmModule.cpp
@@ +154,5 @@
>  
>  const uint8_t*
>  LinkData::deserialize(const uint8_t* cursor)
>  {
> +    MOZ_CRASH("use LinkDataTier::deserialize instead");

nit: for regularity with the overall deserialize() pattern, could this method stay unmodified and have Module keep calling it like normal?  The method will need to create the MakeUnique<LinkDataTier>(), but deserialize() is fallible since dynamic allocation is generally necessary in deserialize().

@@ +162,5 @@
>  LinkData::sizeOfExcludingThis(MallocSizeOf mallocSizeOf) const
>  {
>      size_t sum = 0;
> +    if (linkData1_)
> +        sum += linkData1_->sizeOfExcludingThis(mallocSizeOf);

nit: iiuc, no 'if' needed for linkData1_
Attachment #8951304 - Flags: review?(luke) → review+
Comment on attachment 8951305 [details] [diff] [review]
4.refactor-funccoderange.patch

Review of attachment 8951305 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!

::: js/src/vm/Stack.cpp
@@ +1775,1 @@
>              return false;

nit: could even fold into a single "if (!wasm::LookupCode...)"
Attachment #8951305 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #28)
> Comment on attachment 8951304 [details] [diff] [review]
> 3.refactor-tier.patch
> 
> Review of attachment 8951304 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks!
> 
> ::: js/src/wasm/WasmModule.cpp
> @@ +154,5 @@
> >  
> >  const uint8_t*
> >  LinkData::deserialize(const uint8_t* cursor)
> >  {
> > +    MOZ_CRASH("use LinkDataTier::deserialize instead");
> 
> nit: for regularity with the overall deserialize() pattern, could this
> method stay unmodified and have Module keep calling it like normal?  The
> method will need to create the MakeUnique<LinkDataTier>(), but deserialize()
> is fallible since dynamic allocation is generally necessary in deserialize().

Can't really in the current state, because now the Module ctor takes a UniqueLinkData*Tier* parameter that gets Moved. I could add a Module ctor that takes LinkData&& instead for this purpose, but it sounded overkill.
Actually nevermind: I can get around it.
Abusing ExitReason for the interpreter entry and avoiding one LookupCode in ProfilingFrameIterator::operator++, as discussed on irc.
Attachment #8951350 - Flags: review?(luke)
Comment on attachment 8951350 [details] [diff] [review]
5.interpentry-no-lookup.patch

Review of attachment 8951350 [details] [diff] [review]:
-----------------------------------------------------------------

Great!
Attachment #8951350 - Flags: review?(luke) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bdeee319fd1a
Use pointer arithmetic to determine a wasm function index; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/12219bfe0748
Put all tiered data into one new data structure wasm::CodeTier; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b0db89ec8e97
Repurpose Code::lookupRange to only target function code ranges; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/a274eb9c8f1f
Don't use LookupCode for the interpreter entry in wasm profiling iteration; r=luke
Attached patch wip.patch (obsolete) — Splinter Review
Attached patch lazy-stub-generation.patch (obsolete) — Splinter Review
(for self-review purposes)
Attachment #8951654 - Attachment is obsolete: true
Attached patch lazy-stub-generation.patch (obsolete) — Splinter Review
Attachment #8952756 - Attachment is obsolete: true
Attached patch lazy-stub-generation.patch (obsolete) — Splinter Review
Now with comments! And ready for review, I think. I've struggled a bit to make a nice API for LazyStubTier, so there's room for improvement there.
Attachment #8952757 - Attachment is obsolete: true
Attachment #8952765 - Flags: review?(luke)
Some benchmarking of entry stubs memory consumption and entry stubs compilation time for AngryBots and ZenGarden.

* Before jit entries landed (i.e. before bug 1319203):
- AngryBots:
  - entry stubs compilation time: 6.80ms
  - memory: 1.82 MB
- ZenGarden:
  - entry stubs compilation time: 19ms
  - memory: 4.69 MB

* After jit entries landed, with always eager stubs generation:
- AngryBots:
  - time: 28ms
  - memory: 12.19 MB
- ZenGarden:
  - time: 91ms
  - memory: 24.2 MB

* With jit entries and lazy stub generation:
- AngryBots:
  - time: 0.77ms
  - memory: 262 KB
- ZenGarden:
  - time: 1.17ms
  - memory: 362 KB

Some interesting stats: when running the Tanks/Godot programs on a browser, with special spew indicating when lazy stubs have been generated, I see only three of them in the worst case, for both programs, consuming up to 1.5 KB maximum. So we could probably lower the default size of a lazy stub segment, considering this data.
Attached patch lazy-stub-generation.patch (obsolete) — Splinter Review
Now with static analysis builds working!
Attachment #8952765 - Attachment is obsolete: true
Attachment #8952765 - Flags: review?(luke)
Attachment #8952801 - Flags: review?(luke)
Attached patch lazy-stub-generation.patch (obsolete) — Splinter Review
(sorry for the noise) Slight simplification in LazyStubTier to make the code less hairy and more readable.
Attachment #8952801 - Attachment is obsolete: true
Attachment #8952801 - Flags: review?(luke)
Attachment #8953022 - Flags: review?(luke)
Attached patch testing-tier2.patch (obsolete) — Splinter Review
An additional patch to test two-tiered compilation, with the following primitives:

- js shell only hasTier2CompilationSupport function (false if we don't enable baseline/ion, or we're debugging, or don't have threads)
- a new setJitCompilerOption option for always requesting tier 2 compilation, if we have support
- js shell only hasTier2CompilationCompleted(module) function, returning true if the module has reached tier2 compilation.

To make this actually useful, I needed to add a fake delay of 1 second when compiling with the new jit compiler option, otherwise the predicate hasTier2CompilationCompleted would return true on the first call. This is a bit messy, but this introduces testing support, which is better than no testing support, in my opinion.

Also testing it in the test introduced in the previous patch, where it would have helped catch issues during development.
Attachment #8953455 - Flags: review?(luke)
Attachment #8953455 - Flags: feedback?(lhansen)
Comment on attachment 8953455 [details] [diff] [review]
testing-tier2.patch

Review of attachment 8953455 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmGenerator.cpp
@@ +1061,5 @@
>          return false;
>  
> +    if (MOZ_UNLIKELY(JitOptions.wasmAlwaysRequestTier2)) {
> +        // Introduce an artificial delay when testing wasmAlwaysRequestTier2,
> +        // since we want to exerce both tier1 and tier2 code in this case.

"exercise"

@@ +1063,5 @@
> +    if (MOZ_UNLIKELY(JitOptions.wasmAlwaysRequestTier2)) {
> +        // Introduce an artificial delay when testing wasmAlwaysRequestTier2,
> +        // since we want to exerce both tier1 and tier2 code in this case.
> +        auto wakeup = TimeStamp::Now() + TimeDuration::FromSeconds(1.0);
> +        while (TimeStamp::Now() < wakeup) {}

If we want this for shell-only we have watchdog timers we can use instead, see eg sleep() in the shell.  Not sure what we can do for Firefox, maybe ask fitzgen or njfroyd.
Attachment #8953455 - Flags: feedback?(lhansen) → feedback+
Depends on: 1441554
Comment on attachment 8953022 [details] [diff] [review]
lazy-stub-generation.patch

Review of attachment 8953022 [details] [diff] [review]:
-----------------------------------------------------------------

Fantastic patch; the complexity of the problem being solved here and the relative simplicity and typiness of the code is impressive :)

I found only one non-nit involving oom failure that I'd like to see again in the new version of the patch.  I can review re-review very quickly with that fixed so we make code freeze.

::: js/src/vm/MutexIDs.h
@@ +29,5 @@
>                                        \
>    _(WasmInitBuiltinThunks,       450) \
> +  _(WasmLazyStubsTier1,          450) \
> +                                      \
> +  _(WasmLazyStubsTier2,          475) \

nit: to show the close relationship, how about 451, putting the \n before Tier1 and removing the \n between Tier1 and Tier2?

Also: could these be 500/501?  I was wondering which of the 500-level locks are taken and wondering if, based on the other comment, we could use the lock order to assert WasmModuleTieringLock isn't held while these two are.

::: js/src/wasm/WasmCode.cpp
@@ +567,5 @@
> +
> +    bytes_ = Move(codeBytes);
> +    length_ = length;
> +
> +    ExecutableAllocator::cacheFlush(bytes_.get(), RoundupCodeLength(length_));

I think this cacheFlush() is unnecessary since no code has been written yet.

@@ +606,5 @@
> +    return ret;
> +}
> +
> +bool
> +LazyStubSegment::appendFuncCodeRanges(CodeRange&& interpRange, CodeRange&& jitRange,

nit: CodeRange is a POD type so passing 'const CodeRange&' should be fine here

@@ +651,5 @@
> +        return uintptr_t(stubs_[index]->base());
> +    }
> +};
> +
> +static const unsigned LAZY_STUB_LIFO_DEFAULT_CHUNK_SIZE = 8 * 1024;

nit: could you move this def down to right before createMany()?

@@ +663,5 @@
> +
> +    MOZ_ALWAYS_FALSE(BinarySearch(ProjectBase(stubs_), 0, stubs_.length(),
> +                                  uintptr_t(newSeg->base()), stubIndex));
> +    if (!stubs_.insert(stubs_.begin() + *stubIndex, Move(newSeg)))
> +        return nullptr;

As best I can tell, we don't take advantage of the sorted property of stubs_ anywhere.  If that's right, could you remove ProjectBase and the BinarySearch()/insert()?

@@ +667,5 @@
> +        return nullptr;
> +    return stubs_[*stubIndex].get();
> +}
> +
> +static constexpr size_t DEFAULT_LAZY_STUB_SEGMENT_SIZE = 64 * 1024;

How about using ExecutableCodePageSize instead and removing this constant?

@@ +713,5 @@
> +    size_t codeLength = LazyStubSegment::AlignBytesNeeded(masm.bytesNeeded());
> +
> +    LazyStubSegment* segment = nullptr;
> +    if (funcExportIndices.length() > 1) {
> +        segment = createNewSegment(codeTier, codeLength, stubIndex);

Why is this length()>1 special case needed?  It seems like the 'else' case would handle the general case nicely.  In fact, under the hood, AllocateCodeBytes() will round up 'codeLength' to 64kb *anyhow* which means that the length()>1 will be wasting 64kb - codeLength whereas the 'else' clause will be able to reuse the 64kb for future lazy stubs.

If that's all correct and my above comment about stubs_ not needing to be sorted, I think you could inline createNewSegment() into its one callsite which would simply things a tad.

@@ +756,5 @@
> +
> +        size_t exportIndex;
> +        MOZ_ALWAYS_FALSE(BinarySearch(ProjectLazyFuncIndex(exports_), 0, exports_.length(),
> +                                      fe.funcIndex(), &exportIndex));
> +        if (!exports_.insert(exports_.begin() + exportIndex, Move(lazyExport)))

I'm worried what happens in the failure case where we oom in the middle of the loop; I worry it might leave us in an inconsistent state where exports are registered but ultimately the code hasn't been initialized.

I was thinking: maybe most of this logic could be moved into a single LazyStubSegment::create() that does all of this (and removed the need for, e.g., appendFuncCodeRanges()), so that, at the end, we had a fully-initialized LazyStubSegment that was ready to go and the update of exports_ could be done in one go (using reserve() to ensure no oom-in-the-middle).

@@ +834,5 @@
> +    MOZ_ALWAYS_TRUE(BinarySearch(ProjectLazyFuncIndex(exports_), 0, exports_.length(), funcIndex,
> +                    &match));
> +    const LazyFuncExport& fe = exports_[match];
> +    const LazyStubSegment& stub = *stubs_[fe.lazyStubIndex];
> +    return stub.base() + stub.codeRanges()[fe.codeRangeIndex].begin();

nit: can you rename this field to interpCodeRangeIndex so the name matches throughout?

::: js/src/wasm/WasmCode.h
@@ +228,5 @@
>      } pod;
>  
>    public:
>      FuncExport() = default;
> +    explicit FuncExport(Sig&& sig, uint32_t funcIndex, bool isExplicit)

nit: maybe name parameter hasEagerStubs?

@@ +261,5 @@
>          return pod.codeRangeIndex_;
>      }
>      uint32_t interpEntryOffset() const {
>          MOZ_ASSERT(pod.interpEntryOffset_ != UINT32_MAX);
> +        MOZ_ASSERT(hasEagerStubs());

nit: can you rename to eagerInterpEntryOffset()?

@@ +510,5 @@
> +
> +class LazyStubSegment : public CodeSegment
> +{
> +    CodeRangeVector codeRanges_;
> +    size_t finger_;

nit: I see the name is used in Ion, but what do you think about more descriptive term like "usedBytes_"?

@@ +512,5 @@
> +{
> +    CodeRangeVector codeRanges_;
> +    size_t finger_;
> +
> +    static constexpr size_t MPROTECT_BOUNDARY = 4 * 1024;

nit: how about MPROTECT_PAGE_SIZE?

@@ +531,5 @@
> +    const CodeRange* lookupRange(const void* pc) const;
> +
> +    void addSizeOfMisc(MallocSizeOf mallocSizeOf, size_t* code, size_t* data) const;
> +  private:
> +    bool initialize(UniqueCodeBytes codeBytes, size_t length);

nit: could this be above the 'public:'?

@@ +544,5 @@
> +
> +struct LazyFuncExport
> +{
> +    size_t funcIndex;
> +    size_t lazyStubIndex;

nit: based on following comment, could this be lazyStubSegmentIndex_?

@@ +564,5 @@
> +// stubs for tier2 at any time.
> +
> +class LazyStubTier
> +{
> +    LazyStubSegmentVector stubs_;

nit: how about stubSegments_ (in a few cases I was confused reading the code because 'stubs_' made me think individual stubs and this is one level higher)

@@ +566,5 @@
> +class LazyStubTier
> +{
> +    LazyStubSegmentVector stubs_;
> +    LazyFuncExportVector exports_;
> +    size_t lastStubIndex_;

nit: similarly, lastStubSegmentIndex_?

@@ +588,5 @@
> +    // commitJitEntries() is actually called, after the Code owner has
> +    // committed tier2.
> +    bool createTier2(const Uint32Vector& funcExportIndices, const CodeTier& codeTier,
> +                     Maybe<size_t>* stubIndex);
> +    void commitJitEntries(const Maybe<size_t>& stubIndex, const Code& code);

nit: similarly, stubSegmentIndex (here and in the impls)

@@ +596,5 @@
> +  private:
> +    LazyStubSegment* createNewSegment(const CodeTier& codeTier, size_t codeLength,
> +                                      size_t* stubIndex);
> +    bool createMany(const Uint32Vector& funcExportIndices, const CodeTier& codeTier,
> +                    size_t* stubIndex);

nit: could these be above 'public'?

@@ +648,5 @@
>          MOZ_ASSERT(!code_);
>          code_ = code;
>      }
>  
> +    const ExclusiveData<LazyStubTier>& lazyStubs() const { return lazyStubs_; }

nit: to maintain relative order, could you put this between tier() and metadata() below (no \n's before or after)?

::: js/src/wasm/WasmGenerator.cpp
@@ +288,5 @@
>      // deduplicated, so use an intermediate vector to sort and de-duplicate.
>  
> +    static_assert((uint64_t(MaxFuncs) << 1) < uint64_t(UINT32_MAX), "bit packing won't work");
> +
> +    class ExportFuncIndex {

Wow, the future has arrived; local classes in templates!
nit: how about calling it ExportedFunc since the struct contains more than just a function index.

::: js/src/wasm/WasmJS.cpp
@@ +1149,5 @@
> +    // If the best tier is Baseline, there could be a background compilation
> +    // happening at the same time.
> +    // - Either we take the tier1 lock before the background compilation takes it,
> +    //   then we generate the lazy stub for tier1. When background threads
> +    //   hold on tier1, it will see it has a lazy stub and will recompile it

nit: could you expand the explanation to explain how the background tier2-generator threads grab the tier locks in a particular order?

@@ +1167,5 @@
> +    const CodeTier& codeTier = instance.code(tier);
> +    if (tier == prevTier)
> +        return stubs->createOne(funcExportIndex, codeTier);
> +
> +    auto stubs2 = instance.code(tier).lazyStubs().lock();

Can you MOZ_ASSERT(prevTier == Tier::Baseline && tier == Tier::Ion)?

::: js/src/wasm/WasmModule.cpp
@@ +246,3 @@
>  Module::finishTier2(UniqueLinkDataTier linkData2, UniqueCodeTier tier2, ModuleEnvironment* env2)
>  {
> +    {

nit: pre-existing: could we establish some context here by asserting the current codeTier's tier is Tier::Baseline and tier2's tier is Tier::Ion?

@@ +282,5 @@
> +        for (uint32_t i = 0; i < elemSegments_.length(); i++)
> +            elemSegments_[i].setTier2(Move(env2->elemSegments[i].elemCodeRangeIndices(Tier::Ion)));
> +
> +        // Now that all the code and metadata is valid, make tier 2 code
> +        // visible and unblock anyone waiting on it.

Could you extend the comment to explain the ordering dependency: commitTier2 -> commitJitEntries -> notifyCompilationListeners.  In particular, I was wondering why notifyCompilationListeners() couldn't go right after commitTier2() (as it did before) until I read its comment about not holding locks while calling callbacks.

@@ +285,5 @@
> +        // Now that all the code and metadata is valid, make tier 2 code
> +        // visible and unblock anyone waiting on it.
> +
> +        code().commitTier2();
> +        stubs2->commitJitEntries(stub2Index, code());

nit: perhaps rename setJitEntries() to match below?  Because in theory we could do this racily (like the rest of the  setJitEntry() calls) except we need the LazyStubs' lock.

::: js/src/wasm/WasmProcess.cpp
@@ +230,5 @@
>  {
>      if (const CodeSegment* found = processCodeSegmentMap.lookup(pc)) {
> +        if (cr) {
> +            *cr = found->isModule()
> +                  ? found->codeTier().lookupRange(pc)

I was expecting to see 'found->asModule()->...' which made me realize that probably codeTier (the field and method) should be in ModuleSegment.  Technically it makes sense for a LazyCodeSegment to have a CodeTier, but it would be nice if, e.g., the change in this file was forced by the type system.  If the CodeTier of a LazyCodeSegment is necessary, it seems like it could be given a suggestive method name like associatedCodeTier().

::: js/src/wasm/WasmStubs.h
@@ +37,5 @@
>                const FuncExportVector& exports, CompiledCode* code);
>  
> +extern bool
> +GenerateLazyEntryStub(size_t funcExportIndex, const FuncExport& funcExport, void* callee,
> +                      jit::MacroAssembler& masm, CodeRangeVector* codeRanges);

nit: for symmetry with GenerateBuiltinThunk() and all the static Generate* in WasmStubs.cpp, could 'masm' be the first arg?  Also, could it be "Stubs" (plural)?
Attachment #8953022 - Flags: review?(luke)
Thanks for the review!

(In reply to Luke Wagner [:luke] from comment #45)
> ::: js/src/vm/MutexIDs.h
> @@ +29,5 @@
> >                                        \
> >    _(WasmInitBuiltinThunks,       450) \
> > +  _(WasmLazyStubsTier1,          450) \
> > +                                      \
> > +  _(WasmLazyStubsTier2,          475) \
> 
> nit: to show the close relationship, how about 451, putting the \n before
> Tier1 and removing the \n between Tier1 and Tier2?
> 
> Also: could these be 500/501?  I was wondering which of the 500-level locks
> are taken and wondering if, based on the other comment, we could use the
> lock order to assert WasmModuleTieringLock isn't held while these two are.

Grouped them together as 475/476; 500/501 didn't work because of conflicts with other locks.


> @@ +756,5 @@
> > +
> > +        size_t exportIndex;
> > +        MOZ_ALWAYS_FALSE(BinarySearch(ProjectLazyFuncIndex(exports_), 0, exports_.length(),
> > +                                      fe.funcIndex(), &exportIndex));
> > +        if (!exports_.insert(exports_.begin() + exportIndex, Move(lazyExport)))
> 
> I'm worried what happens in the failure case where we oom in the middle of
> the loop; I worry it might leave us in an inconsistent state where exports
> are registered but ultimately the code hasn't been initialized.
> 
> I was thinking: maybe most of this logic could be moved into a single
> LazyStubSegment::create() that does all of this (and removed the need for,
> e.g., appendFuncCodeRanges()), so that, at the end, we had a
> fully-initialized LazyStubSegment that was ready to go and the update of
> exports_ could be done in one go (using reserve() to ensure no
> oom-in-the-middle).

Yes! This simplifies things even more; see LazyStubSegment::addStubs. (I've also sprinkled the code with Vector::reserve() calls)

> ::: js/src/wasm/WasmProcess.cpp
> @@ +230,5 @@
> >  {
> >      if (const CodeSegment* found = processCodeSegmentMap.lookup(pc)) {
> > +        if (cr) {
> > +            *cr = found->isModule()
> > +                  ? found->codeTier().lookupRange(pc)
> 
> I was expecting to see 'found->asModule()->...' which made me realize that
> probably codeTier (the field and method) should be in ModuleSegment. 
> Technically it makes sense for a LazyCodeSegment to have a CodeTier, but it
> would be nice if, e.g., the change in this file was forced by the type
> system.  If the CodeTier of a LazyCodeSegment is necessary, it seems like it
> could be given a suggestive method name like associatedCodeTier().

CodeSegment (parent class of ModuleSegment/LazyStubSegment) needs to have the codeTier for wasm::StartUnwinding. We could add ModuleSegment::lookupRange which would just relay to codeTier().lookupRange, if you prefer? (really I thought at some point that the codeRanges should be in the ModuleSegment itself, not the CodeTier)
Attached patch lazy-stubs.patch (obsolete) — Splinter Review
Attachment #8953022 - Attachment is obsolete: true
Attachment #8955256 - Flags: review?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #46)
> CodeSegment (parent class of ModuleSegment/LazyStubSegment) needs to have
> the codeTier for wasm::StartUnwinding. We could add
> ModuleSegment::lookupRange which would just relay to codeTier().lookupRange,
> if you prefer? (really I thought at some point that the codeRanges should be
> in the ModuleSegment itself, not the CodeTier)

One or both of those sound great.
Comment on attachment 8955256 [details] [diff] [review]
lazy-stubs.patch

Review of attachment 8955256 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!

::: js/src/wasm/WasmCode.cpp
@@ +589,5 @@
> +    auto segment = js::MakeUnique<LazyStubSegment>();
> +    if (!segment || !segment->initialize(Move(codeBytes), length))
> +        return nullptr;
> +
> +    segment->initCodeTier(&codeTier);

nit: could codeTier be passed to the ctor of LazyStubSegment?

::: js/src/wasm/WasmCode.h
@@ +608,5 @@
> +    const Tier                  tier_;
> +    const Code*                 code_;
> +
> +    // Lazy stubs, not serialized.
> +    ExclusiveData<LazyStubTier> lazyStubs_;

nit: could you put lazyStubs_ after metadata_/segment_?

::: js/src/wasm/WasmJS.cpp
@@ +1149,5 @@
> +    // already completed and has been committed, so there's no risk of race
> +    // conditions here.
> +    //
> +    // If the best tier is Baseline, there could be a background compilation
> +    // happening at the same time. The background compilation will try to lock

nit: s/try to//

@@ +1150,5 @@
> +    // conditions here.
> +    //
> +    // If the best tier is Baseline, there could be a background compilation
> +    // happening at the same time. The background compilation will try to lock
> +    // the first tier lazy stubs first as a hint to stop generating them, then

nit: s/as a hint to stop generating them/to stop new Baseline stubs from being generated/

@@ +1153,5 @@
> +    // happening at the same time. The background compilation will try to lock
> +    // the first tier lazy stubs first as a hint to stop generating them, then
> +    // the second tier stubs to generate them.
> +    //
> +    // - either we take the tier1 lock before the background compilation gets

nit: here and below could you say "tier1 lazy stubs lock" so it's quite explicit which lock we're talking about (since there are others associated with tiering).

@@ +1155,5 @@
> +    // the second tier stubs to generate them.
> +    //
> +    // - either we take the tier1 lock before the background compilation gets
> +    //   it, then we generate the lazy stub for tier1. When the background
> +    //   thread holds on tier1, it will see it has a lazy stub and will

s/holds on tier1/gets the tier1 lazy stub lock/

::: js/src/wasm/WasmModule.cpp
@@ +291,2 @@
>  
> +        // Now tier2 is validated and we can update jump tables entries.

nit: s/validated/committed/ s/./to start making tier2 live./

@@ +291,4 @@
>  
> +        // Now tier2 is validated and we can update jump tables entries.
> +        // Because lazy stubs are protected by a lock and
> +        // notifyCompilationListeners should be called without any lock hold,

s/hold/held/

::: js/src/wasm/WasmStubs.cpp
@@ +462,5 @@
>  
>      GenerateJitEntryLoadTls(masm, frameSize);
>  
>      if (fe.sig().hasI64ArgOrRet()) {
> +        CallSymbolicAddress(masm, !!funcPtr, SymbolicAddress::ReportInt64JSCall);

nit: Here and at the other callsite, fe.hasEagerStubs() might be a bit more direct?

@@ +633,5 @@
>      masm.assertStackAlignment(WasmStackAlignment);
> +    if (funcPtr)
> +        masm.call(*funcPtr);
> +    else
> +        masm.call(CallSiteDesc(CallSiteDesc::Func), fe.funcIndex());

nit: can you factor out a CallFuncExport() to share with GenerateInterpEntry() and put it next to CallSymbolicAddress()?  This function would take the FuncExport and could assert fe.hasEagerStubs() == !!funcPtr.

@@ +1717,5 @@
>  }
>  
>  bool
> +wasm::GenerateLazyEntryStubs(MacroAssembler& masm, size_t funcExportIndex, const FuncExport& fe,
> +                             void* calleePtr, CodeRangeVector* codeRanges)

This function is almost identical to the body of the eager export for loop in GenerateStubs().  Could this function a bit so that GenerateStubs() can call this function too (removing "Lazy" from this function's name)?  Thus, this function would take a Maybe<ImmPtr> like the others and assert that fe.hasEagerStubs == !!calleePtr.
Attachment #8955256 - Flags: review?(luke) → review+
Comment on attachment 8953455 [details] [diff] [review]
testing-tier2.patch

Review of attachment 8953455 [details] [diff] [review]:
-----------------------------------------------------------------

So I'm trying to reconcile this with --test-wasm-await-tier2 and args.testTiering.  So currently, --test-wasm-await-tier2 does two things:
 1. sets CompileArgs.testTiering and testTiering is the bool that says "enable tiering if at all possible"
 2. makes instantiation wait for tier2 to complete (ensuring we always test the ion-after-baseline path)

In particular, cx->options().testWasmAwaitTier2() is tested twice, once to set CompileArgs.testTiering (for #1) and once to decide to wait (for #2).

It seems to me that we should:
 1. rename this new flag to be specific to making tier-2 take longer, say "testWasmDelayTier2"
 2. have CompileArgs.testTiering = testWasmDelayTier2 || testWasmAwaitTier2 (so that no WasmCompile.cpp changes are necessary)
 3. put testWasmDelayTier2 and testWasmAwaitTier2 next to each other (in ContextOptions, I assume, but anywhere as long as it's symmetric)

I'm also open to considering other configurations; just trying to keep our testing flags cohesive.

::: js/src/wasm/WasmGenerator.cpp
@@ +1063,5 @@
> +    if (MOZ_UNLIKELY(JitOptions.wasmAlwaysRequestTier2)) {
> +        // Introduce an artificial delay when testing wasmAlwaysRequestTier2,
> +        // since we want to exerce both tier1 and tier2 code in this case.
> +        auto wakeup = TimeStamp::Now() + TimeDuration::FromSeconds(1.0);
> +        while (TimeStamp::Now() < wakeup) {}

In js.cpp, we use std::this_thread::sleep_for(std::chrono::milliseconds(...))) from <chrono> without issue for a few releases now.

Also, 1 second is a relative eternity, maybe something less?
Attachment #8953455 - Flags: review?(luke)
Blocks: 1442656
Attached patch lazy-stubs.patch (obsolete) — Splinter Review
Carrying forward r+. Ready in case we'd like to merge despite the code freeze (as discussed with release managers in email threads).
Attachment #8953455 - Attachment is obsolete: true
Attachment #8955256 - Attachment is obsolete: true
Attachment #8955532 - Flags: review+
Attached patch lazy-stubs.patch (obsolete) — Splinter Review
(rebased)

Gary, Christian, could you fuzz this patch, please? We'd like to land it quickly despite the code freeze since it fixes a serious wasm memory regression, and it would raise up our confidence in its quality if we could fuzz it a bit before. Thank you in advance!

Patch applies cleanly onto mozilla-inbound d85679eb4275.
Attachment #8955532 - Attachment is obsolete: true
Attachment #8955534 - Flags: review+
Attachment #8955534 - Flags: feedback?(nth10sd)
Attachment #8955534 - Flags: feedback?(choller)
Attached patch lazy-stubs.patchSplinter Review
Carrying forward r+. A round of try builds showed unused variables in non-debug builds, so this patch fixes that with more DebugOnly.

Christian, Gary, please see comment 52 for fuzzing request (same base). If you've started fuzzing with previous patch, no need to update to that one. Thanks!
Attachment #8955534 - Attachment is obsolete: true
Attachment #8955534 - Flags: feedback?(nth10sd)
Attachment #8955534 - Flags: feedback?(choller)
Attachment #8955619 - Flags: review+
Attachment #8955619 - Flags: feedback?(nth10sd)
Attachment #8955619 - Flags: feedback?(choller)
I'd like to track this as a blocking fix needed to ship 60. Please nominate the patches for uplift to Beta60 after they've been stabilized in Nightly for a few days (2 to 4).
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd1f5af2f749
wasm: Lazy entry stub generation for function tables (r=luke)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment on attachment 8955619 [details] [diff] [review]
lazy-stubs.patch

Clearing f? since this has landed on inbound.
Attachment #8955619 - Flags: feedback?(nth10sd)
Attachment #8955619 - Flags: feedback?(choller)
Depends on: 1450795
You need to log in before you can comment on or make changes to this bug.