Refactoring of call_indirect optimization patch
Categories
(Core :: JavaScript: WebAssembly, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
Details
Attachments
(15 files, 3 obsolete files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
32.15 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
This bug will hold a series of patches that are the refactored and occasionally tweaked pieces of Dmitry's patch on bug 1639153. The purpose of this refactoring is to make possible a very detailed review, in order to look for a rare intermittent, see bug 1639153 comment 157 (!) and earlier.
I'm breaking this out as a separate bug since bug 1639153 already is very hard to follow.
Assignee | ||
Comment 1•3 years ago
|
||
Confidence: high
Trivial / obviously correct changes that set us up for further work,
mostly renaming. The only functional change here is that the
compilers now tag their calls as "Import" or "Indirect" instead of
"Dynamic", but nothing is done with this information in this patch.
Assignee | ||
Comment 2•3 years ago
|
||
Confidence: high
Existing binary searches are rewritten using lambdas, only bugs in
mfbt will make this incorrect I think.
Depends on D131620
Assignee | ||
Comment 3•3 years ago
|
||
Confidence: high
InsertIntoSortedContainer is not used by the patch, it is dead code
and can be commented out for now (and removed later).
Depends on D131621
Assignee | ||
Comment 4•3 years ago
|
||
Confidence: high
This looks right to me but there's an uncheckable(?) assumption that
makes the use of this a little scary. The function is not used in
this patch but later code will use it.
Depends on D131622
Assignee | ||
Comment 5•3 years ago
|
||
Confidence: medium, "no obvious bugs".
There are too many occasions when conditions have to be inserted at
various points in the system and not enough abstraction (probably).
Not a problem introduced by this patch; preexisting.
I've changed the comments quite a bit here, changed a function name to
be clearer, and added an assertion in wasmCaller().
Depends on D131623
Assignee | ||
Comment 6•3 years ago
|
||
Confidence: high
This is just the lowest level of code generation for the new stubs, no
driver logic or anything like that. The code seems fairly
straightforward.
Depends on D131624
Assignee | ||
Comment 7•3 years ago
|
||
Confidence: medium/high
This implements lookupIndirectStubRange, which is used by
Table::getFuncRef.
There is a potential problem here, in that lookupIndirectStubRange
iterates across all tiers and returns the first range found; in tiered
compilation it may return tier1 code even if tier2 code is available.
This could be a performance problem. If the caller has expectations
about the tier there could also be a logic problem.
This seems to be used only by Table::getFuncRef so that bug is
probably not the bug we're looking for.
It's possible we can move this patch backward in the queue later on.
Depends on D131625
Assignee | ||
Comment 8•3 years ago
|
||
Confidence: high
This implements the sorted indirect stubs vector and lookup in it.
This patch has a fix for a bug: the IndirectStubComparator will return
the wrong ordering if there are two equal funcIndex values (a common
occurrence in test cases as all modules will have funcIndex values
starting at zero) and the tls pointers are so far apart that their
difference will not be correctly represented by a signed integer.
This has been fixed.
(Another bug here was that uintptr_t was used to represent the pointer
values; intptr_t would have been more sensible.)
Depends on D131626
Assignee | ||
Comment 9•3 years ago
|
||
Confidence: high
This implements the IndirectStubTarget logic and the low-level driver
for indirect stub creation, notably createManyIndirectStubs.
There is a minor bug here, in that the less-than predicate used to
sort the candidate stubs before uniq'ing them is not compatible with
the equality predicate used to unique them: the former compares
funcIndex and tls value; the latter also compares the call entry
value. I see no reason why the call entry value should not match here
so I have turned the check on their equality into an assertion of
their equality that triggers if the funcIndex and tls match.
Depends on D131627
Assignee | ||
Comment 10•3 years ago
|
||
Confidence: high
checkedCallEntryForWasmImportedFunction is unused.
Depends on D131628
Assignee | ||
Comment 11•3 years ago
|
||
Confidence: high
Instance::ensureIndirectStubs and its callees. See comments on a
previous patch about the discrepancy in predicate meaning between
RemoveDuplicates (in this patch) and IndirectStubTarget equality (in
that patch).
Depends on D131629
Assignee | ||
Comment 12•3 years ago
|
||
These are just helper functions, currently unused, to be used by the main patch.
Both functions would originally read bestTier(). This is unlikely to be
correct, as in some cases the caller would also read the tier, and should
pass it in to these functions.
Depends on D131630
Assignee | ||
Comment 13•3 years ago
|
||
Confidence: high?
Note that this patch cannot be tested by itself, it requires part 2.
Change the calling conventions so that indirect calls assume
same-module, while import calls still assume cross-module. Since we
no longer load the tls on an indirect call, the code pointer must be
checked for null.
Frame iteration must account for the interstitial frames, too - this
is tricky.
Depends on D131631
Assignee | ||
Comment 14•3 years ago
|
||
Confidence: mediumish
This contains a change to how tiers are handled during table.grow: the
tier is read exactly once in Instance::tableGrow and then propagated
through fillFuncRef into createIndirectStub, so as to ensure that the
tier is always the same. As it was, the tier would be read twice and
there was a chance that two different values would be seen and that
the call to fillFuncRef would OOM since it might see a higher tier and
need to allocate.
There's still probably a bug here in getFuncRef, where
lookupIndirectStubRange may find the stub in the wrong tier, since it
iterates over the tiers and chooses the stable tier before the best
tier. This needs to be fixed. I left a comment in the code.
Depends on D131632
Assignee | ||
Comment 15•3 years ago
|
||
200 try runs later the intermittent is not reproducing with this patch stack.
The only important change in this stack relative to the original, with respect to the tests that did fail before, is the one in D131627, where the IndirectStubComparator could return the wrong result for tls values that were far away. I have no good explanation yet for how that error could lead to some table element not being properly initialized, but at this time it's the only plausible explanation for failing to repro.
Of course, it could be that the existing code is not sensitive to the ordering bug per se but that the change to the sort order led to different heap dynamics (because some table is now smaller or larger because duplicates are correctly filtered), and that we've just covered up some other bug....
There were some other very modest changes in this stack. In particular, the bestTier values is being propagated down the call stack in a couple of places to avoid being read multiple times. But this is in code that is not exercised by the failing tests (I'm fairly sure) and should not matter.
The best choice right now might be to continue to run more tests but in parallel work on rectifying the remaining problems in this patch stack, with an eye towards landing later in the week if no more failures arise in the mean time.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
First, restructure the Instance methods for table operations so that
they have the same structure: they dispatch on the table
representation type. There's no reason for these methods to all look
different.
Second, document the reason for the ensureIndirectStub call in
Instance::tableGrow, as it is subtle.
Third, rename createIndirectStub as ensureAndGetIndirectStub since
that is the function it has.
Fourth, document that fillFuncRef may OOM but that the OOM will be at a
safe spot in the code, before any effects are visible.
Depends on D131633
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
The stack is now pretty clean except for a potential rooting issue which I will address tomorrow. I have resolved a number of open issues, several of which were just confusion on my part. The two real bugs fixed in this stack so far are related to the different semantics for the ordering predicates for the indirect stubs tables, and the fact that the best tier was read twice in table.grow and there could be a race condition. The interdiff between this stack and Dmitry's original patch (modulo my "addendum" patch for cleaning up table code) is still quite small and manageable.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 18•3 years ago
|
||
Clean stack. On try for broad testing. Interdiffs coming, and then we'll need to separately review the addendum, which I could land on a different bug.
Assignee | ||
Comment 19•3 years ago
|
||
This is the interdiff between the original patch and the refactored stack without the addendum patch, both based on mozilla-central ba4d4963c38ba7a68e481d39b5b1a3698e5098d9 from Monday night UTC. The interdiff is fairly large - 750 lines - but once comments and trivial changes are removed it's down to about 250 lines, many of the remaining ones are also obvious changes.
For detailed information, see the commit messages on the individual patches in the queue, or ask. The three patches marked BUGFIX are the most interesting ones, but small changes appeared in some of the others too.
Dmitry, maybe you can take a look and we can discuss what to do next.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 20•2 years ago
•
|
||
A final tweak here: in the "Switchover part 2" patch there was a change (imported from Dmitry's original) where getFuncRef would now check elem.tls for null instead of elem.code, to test for a null entry. This change was spurious, it's related to a future fix (bug 1709578 and bug 1742936) and is not needed now, and indeed my draft for bug 1709578 would revert this. So I've reverted here.
Comment 21•2 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/314070fed10e Renamings and other prep work. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/0c87681b908f IsJSExportedFunction. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/f6396c5b1e3a Logic/comments for frame layout and frame tracing. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/68ccfb67dd27 Code generation for indirect stubs. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/3ab7006d1269 Indirect stub code range lookup. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/c5652456ad08 Indirect stub storage and lookup - BUGFIX. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/98f4dece320b Low-level driver for indirect stub creation. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/2347af2773c0 Instance-level creation of indirect stubs. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/2e185b11cc85 Support code for ref.func and table-of-func - BUGFIX. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/324375982e5d Switchover part 1: calls and frames. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/a7c7ce73c7f9 Switchover part 2: tables and funcref - BUGFIX. r=dbezhetskov https://hg.mozilla.org/integration/autoland/rev/2200d1f5c956 Addendum: Clean up table code and some other things. r=dbezhetskov
Assignee | ||
Comment 22•2 years ago
|
||
There's an interesting bug in the "switchover part 1" bug. The null check checks only the low 32 bits of the pointer. This is a pre-existing problem, but it probably did not matter before because it's extremely unlikely that a tls is allocated on a 4GB boundary. It is much more likely that a code chunk is, and so intermittently the new null check may fail spuriously. The easy fix should be to test the whole pointer.
Comment 23•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/314070fed10e
https://hg.mozilla.org/mozilla-central/rev/0c87681b908f
https://hg.mozilla.org/mozilla-central/rev/f6396c5b1e3a
https://hg.mozilla.org/mozilla-central/rev/68ccfb67dd27
https://hg.mozilla.org/mozilla-central/rev/3ab7006d1269
https://hg.mozilla.org/mozilla-central/rev/c5652456ad08
https://hg.mozilla.org/mozilla-central/rev/98f4dece320b
https://hg.mozilla.org/mozilla-central/rev/2347af2773c0
https://hg.mozilla.org/mozilla-central/rev/2e185b11cc85
https://hg.mozilla.org/mozilla-central/rev/324375982e5d
https://hg.mozilla.org/mozilla-central/rev/a7c7ce73c7f9
https://hg.mozilla.org/mozilla-central/rev/2200d1f5c956
Assignee | ||
Comment 24•2 years ago
|
||
Let's keep this open - there's another patch coming to deal with the intermittent, at a minimum.
Updated•2 years ago
|
Assignee | ||
Comment 25•2 years ago
|
||
Re comment 22 and the likelihood that a piece of code is allocated on a 4GB boundary. I instrumented a build to log the allocations of the executable memory area, and ran jit-tests on the failing platform again. The log has about 5500 lines. None of the executable areas start at a 4GB boundary, but that's not what we'd expect - we'd expect it to start a little before that boundary, so that some code can be allocated before the wasm test comes in and needs memory for its thunks. Thus the executable address we're looking for looks something like /ff[ef].....$/
. This is fairly common: there are eight of these in the log:
[task 2021-11-26T18:02:42.100Z] Executable memory at f8ffef0000
[task 2021-11-26T18:02:58.890Z] Executable memory at 196ffea0000
[task 2021-11-26T18:03:49.042Z] Executable memory at 26ffffd0000
[task 2021-11-26T18:04:50.195Z] Executable memory at 2cffeb0000
[task 2021-11-26T18:06:07.229Z] Executable memory at b7ffe60000
[task 2021-11-26T18:07:04.860Z] Executable memory at 20effe00000
[task 2021-11-26T18:10:22.502Z] Executable memory at 377fff10000
[task 2021-11-26T18:10:31.409Z] Executable memory at 7cffe30000
For a sufficient number of runs, and assuming a fairly random distribution of these patterns (with the last four digits always zero), it's not hard to believe that we'd occasionally hit the 4GB boundary.
There are several sources of randomness here that makes testing very hard. The executable area is placed on a randomish address. In addition, when allocating executable memory in the executable area, pages are skipped at random to create holes in the area and further confuse code placement.
One further fact here is that in a failing test it has always been the first indirect function in a table that would trip the null test. This further supports the hypothesis that code allocation on a 4GB boundary is a problem, because code pages are allocated on page boundaries and only the first function allocated in a new code allocation could have an address of the problematic kind.
(In other news, Dmitry reports that when he reverts my bugfix for indirect stub sorting in the patches above, the test failures come back in a big way. So that may be a separate bug.)
Assignee | ||
Comment 26•2 years ago
|
||
Also re comment 22, that bug has been there for over five years: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&ignore=&bug=1284156&attachment=8789927.
Assignee | ||
Comment 27•2 years ago
|
||
Fixes an old problem where the null test only tests the low 32 bits of
the pointer.
Comment 28•2 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f93e604de2df Null test must test the whole pointer. r=dbezhetskov
Comment 29•2 years ago
|
||
bugherder |
Assignee | ||
Comment 30•2 years ago
|
||
I confused myself for a while worrying about who keeps the stubs code alive
if an instance creates a stub for an imported function in a table that it
then exports; the stub-creating instance could subsequently die without the
table dying. But since the stub-creating instance's tls is stored in the table
with the stub, this is fine. Just clarifying this.
Assignee | ||
Comment 31•2 years ago
|
||
This is fixed, apart from the pending review on the improved comment.
Assignee | ||
Updated•2 years ago
|
Comment 32•2 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4f1a80f41637 Improve a comment. r=dbezhetskov DONTBUILD
Comment 33•2 years ago
|
||
bugherder |
Assignee | ||
Comment 34•2 years ago
|
||
Backed out by bug 1753061.
Description
•