Closed Bug 1742053 Opened 3 years ago Closed 2 years ago

Refactoring of call_indirect optimization patch

Categories

(Core :: JavaScript: WebAssembly, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
96 Branch
Tracking Status
firefox96 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(15 files, 3 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
32.15 KB, patch
Details | Diff | Splinter Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

This bug will hold a series of patches that are the refactored and occasionally tweaked pieces of Dmitry's patch on bug 1639153. The purpose of this refactoring is to make possible a very detailed review, in order to look for a rare intermittent, see bug 1639153 comment 157 (!) and earlier.

I'm breaking this out as a separate bug since bug 1639153 already is very hard to follow.

Confidence: high

Trivial / obviously correct changes that set us up for further work,
mostly renaming. The only functional change here is that the
compilers now tag their calls as "Import" or "Indirect" instead of
"Dynamic", but nothing is done with this information in this patch.

Confidence: high

Existing binary searches are rewritten using lambdas, only bugs in
mfbt will make this incorrect I think.

Depends on D131620

Attached file WIP: Bug 1742053 - Dead code (obsolete) —

Confidence: high

InsertIntoSortedContainer is not used by the patch, it is dead code
and can be commented out for now (and removed later).

Depends on D131621

Confidence: high

This looks right to me but there's an uncheckable(?) assumption that
makes the use of this a little scary. The function is not used in
this patch but later code will use it.

Depends on D131622

Confidence: medium, "no obvious bugs".

There are too many occasions when conditions have to be inserted at
various points in the system and not enough abstraction (probably).
Not a problem introduced by this patch; preexisting.

I've changed the comments quite a bit here, changed a function name to
be clearer, and added an assertion in wasmCaller().

Depends on D131623

Confidence: high

This is just the lowest level of code generation for the new stubs, no
driver logic or anything like that. The code seems fairly
straightforward.

Depends on D131624

Confidence: medium/high

This implements lookupIndirectStubRange, which is used by
Table::getFuncRef.

There is a potential problem here, in that lookupIndirectStubRange
iterates across all tiers and returns the first range found; in tiered
compilation it may return tier1 code even if tier2 code is available.
This could be a performance problem. If the caller has expectations
about the tier there could also be a logic problem.

This seems to be used only by Table::getFuncRef so that bug is
probably not the bug we're looking for.

It's possible we can move this patch backward in the queue later on.

Depends on D131625

Confidence: high

This implements the sorted indirect stubs vector and lookup in it.

This patch has a fix for a bug: the IndirectStubComparator will return
the wrong ordering if there are two equal funcIndex values (a common
occurrence in test cases as all modules will have funcIndex values
starting at zero) and the tls pointers are so far apart that their
difference will not be correctly represented by a signed integer.
This has been fixed.

(Another bug here was that uintptr_t was used to represent the pointer
values; intptr_t would have been more sensible.)

Depends on D131626

Confidence: high

This implements the IndirectStubTarget logic and the low-level driver
for indirect stub creation, notably createManyIndirectStubs.

There is a minor bug here, in that the less-than predicate used to
sort the candidate stubs before uniq'ing them is not compatible with
the equality predicate used to unique them: the former compares
funcIndex and tls value; the latter also compares the call entry
value. I see no reason why the call entry value should not match here
so I have turned the check on their equality into an assertion of
their equality that triggers if the funcIndex and tls match.

Depends on D131627

Attached file WIP: Bug 1742053 - More dead code (obsolete) —

Confidence: high

checkedCallEntryForWasmImportedFunction is unused.

Depends on D131628

Confidence: high

Instance::ensureIndirectStubs and its callees. See comments on a
previous patch about the discrepancy in predicate meaning between
RemoveDuplicates (in this patch) and IndirectStubTarget equality (in
that patch).

Depends on D131629

These are just helper functions, currently unused, to be used by the main patch.
Both functions would originally read bestTier(). This is unlikely to be
correct, as in some cases the caller would also read the tier, and should
pass it in to these functions.

Depends on D131630

Confidence: high?

Note that this patch cannot be tested by itself, it requires part 2.

Change the calling conventions so that indirect calls assume
same-module, while import calls still assume cross-module. Since we
no longer load the tls on an indirect call, the code pointer must be
checked for null.

Frame iteration must account for the interstitial frames, too - this
is tricky.

Depends on D131631

Confidence: mediumish

This contains a change to how tiers are handled during table.grow: the
tier is read exactly once in Instance::tableGrow and then propagated
through fillFuncRef into createIndirectStub, so as to ensure that the
tier is always the same. As it was, the tier would be read twice and
there was a chance that two different values would be seen and that
the call to fillFuncRef would OOM since it might see a higher tier and
need to allocate.

There's still probably a bug here in getFuncRef, where
lookupIndirectStubRange may find the stub in the wrong tier, since it
iterates over the tiers and chooses the stable tier before the best
tier. This needs to be fixed. I left a comment in the code.

Depends on D131632

200 try runs later the intermittent is not reproducing with this patch stack.

The only important change in this stack relative to the original, with respect to the tests that did fail before, is the one in D131627, where the IndirectStubComparator could return the wrong result for tls values that were far away. I have no good explanation yet for how that error could lead to some table element not being properly initialized, but at this time it's the only plausible explanation for failing to repro.

Of course, it could be that the existing code is not sensitive to the ordering bug per se but that the change to the sort order led to different heap dynamics (because some table is now smaller or larger because duplicates are correctly filtered), and that we've just covered up some other bug....

There were some other very modest changes in this stack. In particular, the bestTier values is being propagated down the call stack in a couple of places to avoid being read multiple times. But this is in code that is not exercised by the failing tests (I'm fairly sure) and should not matter.

The best choice right now might be to continue to run more tests but in parallel work on rectifying the remaining problems in this patch stack, with an eye towards landing later in the week if no more failures arise in the mean time.

Attachment #9251579 - Attachment description: WIP: Bug 1742053 - Indirect stub code range lookup - BENIGN BUG → WIP: Bug 1742053 - Indirect stub code range lookup
Attachment #9251581 - Attachment description: WIP: Bug 1742053 - Low-level driver for indirect stub creation - POSSIBLE BUG → WIP: Bug 1742053 - Low-level driver for indirect stub creation - MINOR BUGFIX
Attachment #9251583 - Attachment description: WIP: Bug 1742053 - Instance-level creation of indirect stubs → WIP: Bug 1742053 - Instance-level creation of indirect stubs - OPEN ROOTING ISSUE
Attachment #9251584 - Attachment description: WIP: Bug 1742053 - Support code for ref.func and table-of-func - POSSIBLE BUGFIX → WIP: Bug 1742053 - Support code for ref.func and table-of-func - TWEAKED
Attachment #9251586 - Attachment description: WIP: Bug 1742053 - Switchover part 2: tables and funcref - BUG, BUGFIX? → WIP: Bug 1742053 - Switchover part 2: tables and funcref - TWEAKED

First, restructure the Instance methods for table operations so that
they have the same structure: they dispatch on the table
representation type. There's no reason for these methods to all look
different.

Second, document the reason for the ensureIndirectStub call in
Instance::tableGrow, as it is subtle.

Third, rename createIndirectStub as ensureAndGetIndirectStub since
that is the function it has.

Fourth, document that fillFuncRef may OOM but that the OOM will be at a
safe spot in the code, before any effects are visible.

Depends on D131633

Attachment #9251574 - Attachment is obsolete: true
Attachment #9251582 - Attachment is obsolete: true
Attachment #9251573 - Attachment is obsolete: true

The stack is now pretty clean except for a potential rooting issue which I will address tomorrow. I have resolved a number of open issues, several of which were just confusion on my part. The two real bugs fixed in this stack so far are related to the different semantics for the ordering predicates for the indirect stubs tables, and the fact that the best tier was read twice in table.grow and there could be a race condition. The interdiff between this stack and Dmitry's original patch (modulo my "addendum" patch for cleaning up table code) is still quite small and manageable.

Attachment #9251581 - Attachment description: WIP: Bug 1742053 - Low-level driver for indirect stub creation - MINOR BUGFIX → WIP: Bug 1742053 - Low-level driver for indirect stub creation
Attachment #9251583 - Attachment description: WIP: Bug 1742053 - Instance-level creation of indirect stubs - OPEN ROOTING ISSUE → WIP: Bug 1742053 - Instance-level creation of indirect stubs
Attachment #9251584 - Attachment description: WIP: Bug 1742053 - Support code for ref.func and table-of-func - TWEAKED → WIP: Bug 1742053 - Support code for ref.func and table-of-func - BUGFIX
Attachment #9251586 - Attachment description: WIP: Bug 1742053 - Switchover part 2: tables and funcref - TWEAKED → WIP: Bug 1742053 - Switchover part 2: tables and funcref - BUGFIX

Clean stack. On try for broad testing. Interdiffs coming, and then we'll need to separately review the addendum, which I could land on a different bug.

Attached patch interdiff.diffSplinter Review

This is the interdiff between the original patch and the refactored stack without the addendum patch, both based on mozilla-central ba4d4963c38ba7a68e481d39b5b1a3698e5098d9 from Monday night UTC. The interdiff is fairly large - 750 lines - but once comments and trivial changes are removed it's down to about 250 lines, many of the remaining ones are also obvious changes.

For detailed information, see the commit messages on the individual patches in the queue, or ask. The three patches marked BUGFIX are the most interesting ones, but small changes appeared in some of the others too.

Dmitry, maybe you can take a look and we can discuss what to do next.

Flags: needinfo?(dbezhetskov)
Blocks: 1742779
Attachment #9251572 - Attachment description: WIP: Bug 1742053 - Renamings and other prep work → Bug 1742053 - Renamings and other prep work. r?dbezhetskov
Attachment #9251576 - Attachment description: WIP: Bug 1742053 - IsJSExportedFunction → Bug 1742053 - IsJSExportedFunction. r?dbezhetskov
Attachment #9251577 - Attachment description: WIP: Bug 1742053 - Logic/comments for frame layout and frame tracing → Bug 1742053 - Logic/comments for frame layout and frame tracing. r?dbezhetskov
Attachment #9251578 - Attachment description: WIP: Bug 1742053 - Code generation for indirect stubs → Bug 1742053 - Code generation for indirect stubs. r?dbezhetskov
Attachment #9251579 - Attachment description: WIP: Bug 1742053 - Indirect stub code range lookup → Bug 1742053 - Indirect stub code range lookup. r?dbezhetskov
Attachment #9251580 - Attachment description: WIP: Bug 1742053 - Indirect stub storage and lookup - BUGFIX → Bug 1742053 - Indirect stub storage and lookup - BUGFIX. r?dbezhetskov
Attachment #9251581 - Attachment description: WIP: Bug 1742053 - Low-level driver for indirect stub creation → Bug 1742053 - Low-level driver for indirect stub creation. r?dbezhetskov
Attachment #9251583 - Attachment description: WIP: Bug 1742053 - Instance-level creation of indirect stubs → Bug 1742053 - Instance-level creation of indirect stubs. r?dbezhetskov
Attachment #9251584 - Attachment description: WIP: Bug 1742053 - Support code for ref.func and table-of-func - BUGFIX → Bug 1742053 - Support code for ref.func and table-of-func - BUGFIX. r?dbezhetskov
Attachment #9251585 - Attachment description: WIP: Bug 1742053 - Switchover part 1: calls and frames → Bug 1742053 - Switchover part 1: calls and frames. r?dbezhetskov
Attachment #9251586 - Attachment description: WIP: Bug 1742053 - Switchover part 2: tables and funcref - BUGFIX → Bug 1742053 - Switchover part 2: tables and funcref - BUGFIX. r?dbezhetskov
Attachment #9252153 - Attachment description: WIP: Bug 1742053 - Addendum: Clean up table code and some other things → Bug 1742053 - Addendum: Clean up table code and some other things. r?dbezhetskov

A final tweak here: in the "Switchover part 2" patch there was a change (imported from Dmitry's original) where getFuncRef would now check elem.tls for null instead of elem.code, to test for a null entry. This change was spurious, it's related to a future fix (bug 1709578 and bug 1742936) and is not needed now, and indeed my draft for bug 1709578 would revert this. So I've reverted here.

Flags: needinfo?(dbezhetskov)
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/314070fed10e
Renamings and other prep work. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/0c87681b908f
IsJSExportedFunction. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/f6396c5b1e3a
Logic/comments for frame layout and frame tracing. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/68ccfb67dd27
Code generation for indirect stubs. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/3ab7006d1269
Indirect stub code range lookup. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/c5652456ad08
Indirect stub storage and lookup - BUGFIX. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/98f4dece320b
Low-level driver for indirect stub creation. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/2347af2773c0
Instance-level creation of indirect stubs. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/2e185b11cc85
Support code for ref.func and table-of-func - BUGFIX. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/324375982e5d
Switchover part 1: calls and frames. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/a7c7ce73c7f9
Switchover part 2: tables and funcref - BUGFIX. r=dbezhetskov
https://hg.mozilla.org/integration/autoland/rev/2200d1f5c956
Addendum: Clean up table code and some other things. r=dbezhetskov

There's an interesting bug in the "switchover part 1" bug. The null check checks only the low 32 bits of the pointer. This is a pre-existing problem, but it probably did not matter before because it's extremely unlikely that a tls is allocated on a 4GB boundary. It is much more likely that a code chunk is, and so intermittently the new null check may fail spuriously. The easy fix should be to test the whole pointer.

Let's keep this open - there's another patch coming to deal with the intermittent, at a minimum.

Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Regressions: 1743087

Re comment 22 and the likelihood that a piece of code is allocated on a 4GB boundary. I instrumented a build to log the allocations of the executable memory area, and ran jit-tests on the failing platform again. The log has about 5500 lines. None of the executable areas start at a 4GB boundary, but that's not what we'd expect - we'd expect it to start a little before that boundary, so that some code can be allocated before the wasm test comes in and needs memory for its thunks. Thus the executable address we're looking for looks something like /ff[ef].....$/. This is fairly common: there are eight of these in the log:

[task 2021-11-26T18:02:42.100Z] Executable memory at f8ffef0000
[task 2021-11-26T18:02:58.890Z] Executable memory at 196ffea0000
[task 2021-11-26T18:03:49.042Z] Executable memory at 26ffffd0000
[task 2021-11-26T18:04:50.195Z] Executable memory at 2cffeb0000
[task 2021-11-26T18:06:07.229Z] Executable memory at b7ffe60000
[task 2021-11-26T18:07:04.860Z] Executable memory at 20effe00000
[task 2021-11-26T18:10:22.502Z] Executable memory at 377fff10000
[task 2021-11-26T18:10:31.409Z] Executable memory at 7cffe30000

For a sufficient number of runs, and assuming a fairly random distribution of these patterns (with the last four digits always zero), it's not hard to believe that we'd occasionally hit the 4GB boundary.

There are several sources of randomness here that makes testing very hard. The executable area is placed on a randomish address. In addition, when allocating executable memory in the executable area, pages are skipped at random to create holes in the area and further confuse code placement.

One further fact here is that in a failing test it has always been the first indirect function in a table that would trip the null test. This further supports the hypothesis that code allocation on a 4GB boundary is a problem, because code pages are allocated on page boundaries and only the first function allocated in a new code allocation could have an address of the problematic kind.

(In other news, Dmitry reports that when he reverts my bugfix for indirect stub sorting in the patches above, the test failures come back in a big way. So that may be a separate bug.)

Fixes an old problem where the null test only tests the low 32 bits of
the pointer.

Regressions: 1743269
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f93e604de2df
Null test must test the whole pointer. r=dbezhetskov
Regressions: 1743607

I confused myself for a while worrying about who keeps the stubs code alive
if an instance creates a stub for an imported function in a table that it
then exports; the stub-creating instance could subsequently die without the
table dying. But since the stub-creating instance's tls is stored in the table
with the stub, this is fine. Just clarifying this.

This is fixed, apart from the pending review on the improved comment.

Keywords: leave-open
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f1a80f41637
Improve a comment.  r=dbezhetskov DONTBUILD
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Regressions: 1744711
Regressions: 1745170
Regressions: 1752870
Regressions: 1750930
See Also: → 1753061
Regressions: 1744943

Backed out by bug 1753061.

Regressions: 1715459
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: