Open Bug 1639153 Opened 3 months ago Updated 5 days ago

Optimize indirect calls

Categories

(Core :: Javascript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

ASSIGNED
Tracking Status
firefox78 --- wontfix
firefox79 --- wontfix

People

(Reporter: dbezhetskov, Assigned: dbezhetskov, NeedInfo)

References

Details

(Keywords: leave-open)

Attachments

(13 files, 2 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
2.24 KB, application/x-javascript
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
No description provided.
Assignee: nobody → dbezhetskov
Attachment #9150086 - Attachment description: Bug 1639153 - change call_indirect signature check to happen after frame is pushed → Bug 1639153 - change call_indirect signature check to happen after frame is pushed r=lth,wingo

I'm a bit confused about putting 2 extra words in each FunctionTableElem; I had been thinking the goal state is to get tables down to a single word. What's the benefit of performing the call_indirect signature check after pushing the frame? If the signature happens before pushing a frame, the trampoline can first check the signature, then push the frame, then call the tail entry (which would be after the signature check). Then there are only two entries per function.

Also nit: could we rename "tail entry" to "trampoline entry"? "Tail" keeps making me thing "tail call entry".

Hi, I understand your confusion.
Our final target is still to have two entries table and trampoline.

Final goal:
call_indirect -> call table_entry -> Trampoline stub -> target function

Trampoline stub:
check signature
check for instance equality
switch to callee instance
call the trampoline_entry
switch to caller instance

However, it's hard to achieve this in one commit and if I succeed to create such big commit (I tried to do this) it takes so much time because I don't know what could go wrong, so Andy and I decided to split this work into series of extremely short patches. It's maintained in the Steps to get there section of the abi-2020 doc (https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34).

Long story short:
I will push inlined trampoline stub right in call_indirect and then adapt stack maps and frame iteration.
Next, I'm going to finally remove Frame::tls.
Then we move inlined trampoline into the stub. Here we don't need to care about runtime stuff because it's already done in the previous steps.

Because I'm going to inline trampoline stub on caller side we need to move signature check after the frame is pushed. Because obviously, we can't do the check on the caller side and this is why I moved it.

WDYT?

Flags: needinfo?(luke)

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

To add on -- we will eventually need to be able to perform signature checks with a frame already on the stack, in the tail-call case. So it's more consistent if the frame is always present when signature-check traps are raised.

To expand also on the number of function entry points -- right now only the function is in a position to check its signature. So we need a "call entry" (target of a call instruction) and a "tail entry" (target of a jump), and for each we need a checked and unchecked variant. Before Dmitry's work there are two entries (checked and unchecked call). Dmitry's current patch adds checked and unchecked tail entries.

In the future it would be possible to put the signature in the table associating code pointer, JSFunction*, and TlsData, and arrange to only check signatures in the trampoline -- if that's what we want to do. (After all, anyone that calls direct already knows that the callee has the right type.) That would allow us to reduce to only a call and a tail entry. Not sure if a good optimization or not, though.

WDYT Luke?

Apologies for the battery of comments -- a tail entry is indeed that: an entry for a tail call. Whether it's a tail call resulting from a WebAssembly tail-call proposal (something we're not working on) or the result of a jump from a trampoline is a detail -- the salient point is that the fixed part of the frame is already on the stack.

Haha another small apology -- Luke you may be wondering, why implement tail entries when we are not working on tail calls?

The answer is in the design doc:

As an optimization, when control enters an indirect entry trampoline, the trampoline can check if the incoming WasmTlsReg is actually same-instance, and in that case can jump to the type-checking tail entry. In that case there will be no trampoline frame and the callee will return directly to the caller.

So in that case, the trampoline can tail-call the callee, by jumping to the callee's tail entry. But because the trampoline can't currently check the callee's signature, we target the checked tail entry.

AFAIU the unchecked tail entry would only ever be invoked by a direct tail call from the tail calls proposal, so I guess that bit isn't strictly necessary.

Thanks for all the responses and info; understanding that the goal here is to break up a larger patch into smaller independent patches makes total sense. Would it make sense to hold off on landing the patches until the final state is better than the current state? I have a light concern that we get stuck for a while at an in-between state.

Also, "tail call entry" now makes sense; I like the symmetry between trampolines and tail calls :)

The 4 entry points also make sense. Instead of recording dynamic offsets for all of them, would it be possible (now or later) to have static offsets, such that from a pointer to any one entry point you can compute the others? The reason I ask is thinking ahead toward the function-references proposal when you have one function pointer (a funcref or (ref (func ...))) that can be called with a fixed or dynamic signature with a normal or tail call, so it seems like we'd want to have function references point to one canonical entry point and have all the other entry points be static offsets made at the callsite.

(In reply to Dmitry Bezhetskov from comment #4)

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

Given a wasm function signature (and no other context), you can statically classify it as having either an immediate representation (where the signature is literally baked into the immediate, which means the immediate is invariant over executions and thus can be (de)serialized without patched) or requires an indirect representation. Thus, if the signature is classified as an immediate, then the trampoline can indeed bake in the cmp-with-immediate. (And if the signature is classified as indirect, then the trampoline can load it from the callee TlsData.)

Flags: needinfo?(luke)

Good morning! Good questions, Luke :) Point by point:

  1. About patch landing order. Completely agreed that we need to avoid shipping regressions. We operate a bit on instinct here as to what is a regression and what isn't; my impression was that in the planned change series at https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34/edit#heading=h.22ki0jodkf2h, no change would regress in a significant way.

I understand your concern to also be about leaving the tree in an awkward state for a long time, but I think Dmitry is on a roll at this point and when the inline trampoline code lands -- which is the next patch or so -- then we should start to see net performance improvements.

If that were all, I would be inclined to agree with you to hold off until a series can land, but on our side there has been the risk of making patches too big to review, or patches that don't do the right thing, or patches that simply need some style fixes. I think the pressure of getting something good enough to incrementally land has been a real positive thing for progress on this feature and so weighing everything, I would continue to land small steps toward the goal -- as long as we are all generally on board about the goal and the general plan for getting there.

  1. Static offsets would be great! Right now I think that might not be possible because of whether the function has an immediate signature check or not. But yes, I think you are right we can get there.

  2. Immediate signature checks: I think we were under the impression that in the final design, trampoline code would not be signature-dependent -- and so if you were to consider moving signature checks to trampolines, probably you would make all checks load the expected signature from the storage associated with the trampoline instead of having immediate comparisons. I was also unsure if immediate vs load-from-memory was important here -- the branch would surely predict perfectly, and it would "just" be another couple µops in the pipeline per indirect call. Dunno. WDYT here? As I said we have to operate a bit on instincts and ours could be miscalibrated.

Thanks for the explanations!

  1. Makes sense, sgtm
  2. I was thinking that differences of that nature could be covered up by nop'ing up to the desired static offset. But agreed that we don't need to do this now.
  3. Agreed that immediate vs load-from-memory is probably not a huge deal, but just in the interest of not giving it up unnecessarily: how would the trampoline be signature independent? My operating assumption has been that tramplines are statically specialized to the callee (so that they can use a direct call, and also bake in the callee's instance pointer as an immediate), and thus they could know the callee's expected signature as an immediate

Getting back to whether the signature check happens before or after pushing the frame: it makes sense that, for tail calls, the callee frame will have already been pushed, so this should be allowed. However, I think doing so for all call entries will cause extra prologue code to be generated for the two checked entries per function (since they must push their own frame and then jump over the frame-pushing of the unchecked entry). If the signature can be checked before pushing the frame, then the checked entries can check and then fall-through to the frame-pushing of the unchecked entries. IIRC from my measurements, b/c of the large number of small functions in Unity-type apps, prologue-generated machine code is a significant fraction of overall code size. Also icache locality would be somewhat improved. But I forget: are there any concrete problems with checking the signature before pushing the frame?

Are there any concrete problems with checking the signature before pushing the frame?

There are two nits and one serious problem.

  1. Nit: With tail calls, you have to deal with the signature check trap being raised either with or without the frame.
  2. Nit: If overhead is a problem and tail calls will come, then we should find a solution that allows tail calls without overhead.
  3. Problem: If a function is called from a trampoline, it needs to not set FP and instead use the FP set by the trampoline.

I think this last issue is the motivator to have checked call entries and checked tail entries. WDYT @luke ?

Luke, what is the plan for import calls? They use Frame::tls too, I suppose we are going to handle call_import in the same manner as call_indirect? It means that we are going to call thunk to switch instances.
WDYT?

Flags: needinfo?(luke)

Yeah, good question. I wasn't thinking about import calls b/c they are usually just simplified versions of indirect calls in that you statically know you're calling out of the instance. Thus, if it was easy, it would make sense to statically (at the callsite) do something instead of using a trampoline. That being said, I don't see an easy way to do that in this case so yeah, probably best just to call a trampoline just like indirect calls.

Flags: needinfo?(luke)

There are still a few remaining uses of masm.loadWasmTlsRegFromFrame():

  1. CodeGenerator::visitWasmCall
    here we reload Tls because Ion assumes that WasmTlsReg is non-volatile.
    After implementing trampoline and uses them for call_import and call_indirect we can safely remove this.
  2. MacroAssembler::callWithABI
    afaik we set tls here only for wasm BuiltinThunks. A BuiltinThunk uses WasmTlsReg only in the prologue/epilogue, so when we delete Frame::tls this will become obsolete and we can remove it then.
Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/578160fb0ac3
change call_indirect signature check to happen after frame is pushed r=lth,wingo

Backed out changeset 578160fb0ac3 (bug 1639153) for WasmFrameIter.cpp related bustage

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=build&fromchange=578160fb0ac35c82a12139b9de86ecf2d791928a&tochange=dac0f3345b5984106adcf22d20e2bceb4a8fb4e6&selectedTaskRun=NWgXbXLXTBa5zNMyhbNQyA-0

Backout link: https://hg.mozilla.org/integration/autoland/rev/dac0f3345b5984106adcf22d20e2bceb4a8fb4e6

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305426059&repo=autoland&lineNumber=5524

task 2020-06-08T06:29:22.459Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/irregexp'
[task 2020-06-08T06:29:22.459Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend0.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend0.o.pp   Unified_cpp_js_src_frontend0.cpp
[task 2020-06-08T06:29:22.459Z] js/src/frontend/Unified_cpp_js_src_frontend1.o
[task 2020-06-08T06:29:23.263Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_jit10.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_jit10.o.pp   Unified_cpp_js_src_jit10.cpp
[task 2020-06-08T06:29:23.264Z] js/src/jit/Unified_cpp_js_src_jit11.o
[task 2020-06-08T06:29:23.986Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_wasm1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_wasm1.o.pp   Unified_cpp_js_src_wasm1.cpp
[task 2020-06-08T06:29:23.986Z] js/src/wasm/Unified_cpp_js_src_wasm2.o
[task 2020-06-08T06:29:26.656Z] In file included from Unified_cpp_js_src_wasm1.cpp:11:0:
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp: In function 'void js::wasm::GenerateFunctionPrologue(js::jit::MacroAssembler&, const js::wasm::FuncTypeIdDesc&, const mozilla::Maybe<unsigned int>&, js::wasm::FuncOffsets*)':
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp:581:3: error: static assertion failed: code aligned
[task 2020-06-08T06:29:26.656Z]    static_assert(CheckedTailEntryOffset % CodeAlignment == 0, "code aligned");
[task 2020-06-08T06:29:26.656Z]    ^~~~~~~~~~~~~
[task 2020-06-08T06:29:28.430Z]    Compiling regex v1.3.3
[task 2020-06-08T06:29:28.430Z]      Running `CARGO=/builds/worker/fetches/rustc/bin/cargo CARGO_MANIFEST_DIR=/builds/worker/workspace/build/src/third_party/rust/regex CARGO_PKG_AUTHORS='The Rust Project Developers' CARGO_PKG_DESCRIPTION='An implementation of regular expressions for Rust. This implementation uses
[task 2020-06-08T06:29:28.430Z] finite automata and guarantees linear time matching on all inputs.
[task 2020-06-08T06:29:28.430Z] ' CARGO_PKG_HOMEPAGE='https://github.com/rust-lang/regex' CARGO_PKG_NAME=regex CARGO_PKG_REPOSITORY='https://github.com/rust-lang/regex' CARGO_PKG_VERSION=1.3.3 CARGO_PKG_VERSION_MAJOR=1 CARGO_PKG_VERSION_MINOR=3 CARGO_PKG_VERSION_PATCH=3 CARGO_PKG_VERSION_PRE= LD_LIBRARY_PATH='/builds/worker/workspace/build/src/obj-spider/release/deps:/builds/worker/fetches/rustc/lib:/builds/worker/workspace/build/src/obj-spider/dist/bin:/builds/worker/fetches/gcc/lib64' /builds/worker/fetches/rustc/bin/rustc --crate-name regex /builds/worker/workspace/build/src/third_party/rust/regex/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=2 --cfg 'feature="aho-corasick"' --cfg 'feature="default"' --cfg 'feature="memchr"' --cfg 'feature="perf"' --cfg 'feature="perf-cache"' --cfg 'feature="perf-dfa"' --cfg 'feature="perf-inline"' --cfg 'feature="perf-literal"' --cfg 'feature="std"' --cfg 'feature="thread_local"' --cfg 'feature="unicode"' --cfg 'feature="unicode-age"' --cfg 'feature="unicode-bool"' --cfg 'feature="unicode-case"' --cfg 'feature="unicode-gencat"' --cfg 'feature="unicode-perl"' --cfg 'feature="unicode-script"' --cfg 'feature="unicode-segment"' -C metadata=59cb2151c37e484b -C extra-filename=-59cb2151c37e484b --out-dir /builds/worker/workspace/build/src/obj-spider/release/deps -C linker=/builds/worker/workspace/build/src/build/cargo-linker -L dependency=/builds/worker/workspace/build/src/obj-spider/release/deps --extern aho_corasick=/builds/worker/workspace/build/src/obj-spider/release/deps/libaho_corasick-ae60b24f6684d433.rmeta --extern memchr=/builds/worker/workspace/build/src/obj-spider/release/deps/libmemchr-f1b26a4756e91620.rmeta --extern regex_syntax=/builds/worker/workspace/build/src/obj-spider/release/deps/libregex_syntax-a209697783b4651f.rmeta --extern thread_local=/builds/worker/workspace/build/src/obj-spider/release/deps/libthread_local-df70bf44d84bd469.rmeta --cap-lints warn`
[task 2020-06-08T06:29:30.442Z] /builds/worker/workspace/build/src/config/rules.mk:746: recipe for target 'Unified_cpp_js_src_wasm1.o' failed
[task 2020-06-08T06:29:30.442Z] make[3]: *** [Unified_cpp_js_src_wasm1.o] Error 1
[task 2020-06-08T06:29:30.443Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/wasm'
[task 2020-06-08T06:29:30.443Z] /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'js/src/wasm/target-objects' failed
[task 2020-06-08T06:29:30.443Z] make[2]: *** [js/src/wasm/target-objects] Error 2
[task 2020-06-08T06:29:30.443Z] make[2]: *** Waiting for unfinished jobs....
[task 2020-06-08T06:29:30.443Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend1.o.pp   Unified_cpp_js_src_frontend1.cpp
[task 2020-06-08T06:29:30.443Z] js/src/frontend/Unified_cpp_js_src_frontend2.o
[task 2020-06-08T06:29:31.950Z] In file included from Unified_cpp_js_src_frontend0.cpp:29:0:
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp: In member function 'bool js::frontend::BytecodeEmitter::emitSetOrInitializeDestructuring(js::frontend::ParseNode*, js::frontend::DestructuringFlavor)':
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp:2653:48: warning: 'kind' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:31.950Z]          NameOpEmitter noe(this, name, loc, kind);
[task 2020-06-08T06:29:31.950Z]                                                 ^
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp: In member function 'typename ParseHandler::ClassNodeType js::frontend::GeneralParser<ParseHandler, Unit>::classDefinition(js::frontend::YieldHandling, js::frontend::GeneralParser<ParseHandler, Unit>::ClassContext, js::frontend::DefaultHandling) [with ParseHandler = js::frontend::FullParseHandler; Unit = char16_t]':
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp:7356:16: warning: 'innerName' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:33.100Z]    NameNodeType innerName;
...
Flags: needinfo?(dbezhetskov)
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a4d1e2c3ba9e
change call_indirect signature check to happen after frame is pushed r=lth,wingo
Backout by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9bd3381b2697
Backed out changeset a4d1e2c3ba9e for causing bustages in WasmFrameIter.cpp
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4cf40cf33b0a
change call_indirect signature check to happen after frame is pushed r=lth,wingo
Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f1beae5af856
change call_indirect signature check to happen after frame is pushed r=lth,wingo

issues with signature moving are resolved and now the changes are landed.
The problem was in --disable-jit configuration, I messed up with MacroAssembler-none.h.

Flags: needinfo?(dbezhetskov)
Status: UNCONFIRMED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Keywords: leave-open
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: mozilla79 → ---
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/251dbf12af47
Remove using Frame::tls in Jit import exit stub r=lth,wingo
Pushed by abutkovits@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f82a44c95497
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo
Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b620533c6d63
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/caa8552f4f52
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

fixed and pushed updated changes

Flags: needinfo?(dbezhetskov)
Attachment #9157867 - Attachment description: Bug 1639153 - interpreter entry pushes trampoline frame to preserve tls for runtime r=lth → Bug 1639153 - Interpreter entry pushes trampoline frame to preserve tls for runtime r=lth

(In reply to Pulsebot from comment #33)

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b620533c6d63
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Hi, it seems this change caused build failure on mips platform, is it ok that I submit the fix patch to this bug? Or should I file a new bug? Thanks!

Hu Zhao, I think it is OK to submit patches here if the number of patches is small.
BTW, I wonder that we support mips platform at all.

(In reply to Dmitry Bezhetskov from comment #40)

BTW, I wonder that we support mips platform at all.

We do have MIPS support but it's a tier-3 platform, supported not by Mozilla but by external contributors (most of them at Longsoon). There are MIPS simulators in the tree in the same way as arm/arm64, and sometimes we try to keep the MIPS code working by running them, but usually we just change things as we want and wait for the MIPS support team to catch up.

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5078030eaeaf
[MIPS] Fix a recent refactoring to wasm::Frame on mips. r=lth
Attachment #9157867 - Attachment is obsolete: true
Attached file call_indirect_ubench.js (obsolete) —

Microbenchmark for call_indirect. On my Xeon workstation, I get these numbers with yesterday's mozilla-central (JS shell, configure with --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion):

external: 388.2451171875
internal: 333.218017578125

This suggests that in the code that has landed so far, indirect calls to module-internal functions are about 14% faster than indirect calls to imported functions.

Updated to add direct calls as a comparison. These come in at 187ms roughly, ie, at about 56% of the module-internal indirect calls.

Attachment #9159321 - Attachment is obsolete: true

For my amd machine it is:

branches/default/tip Fri Jun 26:

external: 256.566162109375
internal: 254.14599609375
direct: 186.510986328125

branches/default/tip Fri Jun 26 + additional trampoline frame:

external: 420.4189453125
internal: 279.97607421875
direct: 186.052978515625

Direct calls aren't affected.
Internal: 9% slowdown. It's because we add additional "if" there, before we just load/reload tls from the frame.
External: 64% slowdown. Pushing additional trampoline frame + if + additional call/return into/from trampoline.

If we didn't push Frame::tls we would save:
2 push for internals, one for the function's frame and one for the trampoline
3 push for external

See Also: → 1649109

Luke, if we had the same approach with trampolines for import calls it would that they will slower than now. As you say there isn't an easy way to make a workaround for import calls because we need to somehow signalize for frame iteration that instance is changed.

Actually, I can say that there will be the same slowdown as for cross-instance case for call_indirect. It is about 50% slowdown compared to the previous version where we just WasmTlsReg <- callee's tls and call callee.code.
The current plan is good, but it has also such consequences, lemme show trade-offs for current plan:

  • very fast direct calls, actually no overhead at all
  • a little bit more code, about 4 bytes per function for nops
  • import calls and cross-instance call_indirect become slow. We add constant overhead for this, additional call/return + push trampoline's data
  • trampoline doesn't need to copy stack args because we address it from FP.

I don't have a whole picture of wasm in SpiderMonkey, but I think you have. So, these tradeoffs are ok for us or not? I'm asking because saw the web IDL binding proposal and it looks like we are going to use import calls to call WebIDL.

There is another tradeoff we can do. We can fuse your original idea with the current plan.
Let me describe this a little bit more:
Your original idea was to add slots after stack arguments and then track tls via CallSiteDesc.
I've investigated this earlier and I can say that it is doable if the number of slots is equal 2. (I have abandoned PR for this).
One for calleeTls and one other one for callerTls. We need to preserve it only on possible cross-instance calls.
The current plan with addressing from FP and trampolines allows us don't copy arguments when we call a trampoline.
So, with this fusion we have:

  • a little bit slowdown for all calls included direct calls because we always allocate two slots after stack args, even if there is no any.
  • import calls don't need to call a trampoline because runtime already knows about this transition.
  • the amount of memory by wasm function is increased by 2 slots for each call.
  • trampolines already have slots for callerTls and calleeTls so it needs only to prepare a frame for the callee and jump into it.
  • trampoline doesn't need to copy stack args because we address it from FP.

You may ask, what is actually slowdown for direct calls because they are very frequent in wasm demos.
I've added to slots for all calls and measured on the benchmark (provided by Lars Hansen). There we call the function with no stack arguments so it's the most frequent cause for zengarden and other benchmarks.

I've run this test many times and calculate the average. The test machine is my hp laptop with ubuntu 18.04 and inter core i7 on board.
Base revision is (5c9a4697057bd5ef8e15e6223e049674107ca433 Wed Jul 1 Bug 1147091 - Add StableSort to nsTArray_Impl. r=froydnj Simon Giesecke <sgiesecke@mozilla.com>). Flags: --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion

with two additional slots:
external: 267.17
internal: 254.44
direct: 140.26

reference:
external: 269.44
internal: 260.42
direct: 145.84

It looks strange but this is what we have. I compared many times but it looks like there no overhead for all calls and even some improvements..
The amount of memory is increased, of course, it is about 2 slots for each call (depends on the alignment).

So, wdyt about these two cases?
First one is ok and constant overhead doesn't scare us
or we increase each call by two slots and have cheap import calls and cheap trampolines?

Flags: needinfo?(luke)
Flags: needinfo?(lth)
Flags: needinfo?(lth) → needinfo?(lhansen)

One possible disadvantage of the second approach is that for same-instance call_indirect we need to preserve tls because for runtime we can't distinguish between import call and call_indirect they both have kind == dynamic, but it's just two store instruction.

I think slowing down cross-instance indirect and import calls is not good, so anything we can do to mitigate that is worth thinking hard about.

I don't think "adding two extra slots" should cost anything in the no-stack-args case if we do things right. Ion creates the stack frame with enough space for all outgoing arguments on function entry, so there should be no reason to make any adjustment of the SP just prior to the call - the adjustment should be made on function entry when we create the local frame. I thought that there might be an added small cost in functions that make zero-stack-arg calls and that themselves do not currently need to allocate any stack memory, like int f(int x) { return g(x)+1 }, but currently we're allocating a frame (8 bytes unused memory) for those already to ensure proper stack alignment in the callee, so there would be no regression there either.

I'm a little more worried about code size, frankly, because programs are becoming large quickly. In the case of my function f above I see 14 bytes of NOPs on x64 currently:

00000002  55                        push %rbp
00000003  48 8b ec                  mov %rsp, %rbp
00000006  0f 1f 00                  nopl %eax, (%rax)
00000009  0f 1f 80 00 00 00 00      nopl %eax, (%rax)
00000010  41 83 fa 23               cmp $0x23, %r10d
00000014  0f 84 0c 00 00 00         jz 0x0000000000000026
0000001A  0f 0b                     ud2
0000001C  0f 1f 40 00               nopl %eax, (%rax)
00000020  41 56                     push %r14
00000022  55                        push %rbp
00000023  48 8b ec                  mov %rsp, %rbp
00000026  48 83 ec 08               sub $0x08, %rsp
...
Flags: needinfo?(lhansen)

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

Flags: needinfo?(luke)

(In reply to Luke Wagner [:luke] from comment #51)

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

We need to fill these slots for indirect calls. Imports calls will always store tls and calls via trampolines will fill them according to the trampoline logic.

We are going to remove Frame::tls and support trampolines for indirect calls, so we need to get rid of using Frame::tls.
In this and the followup patches I will iteratively remove all dependencies of Frame::tls and remove it eventually.

In this patch I changed wasm ABI to allocate two stack slots after stack args to preserve caller's and callee's tls'es in the near future.

This is a followup patch for removing Frame::tls.
Now, we are preserving caller's and callee's tls'es for all possible cross-instance calls in the previously allocated abi slots.
We will use them in the near future to untie c++ runtime from Frame::tls and to reload WasmTlsReg after a cross-instance call.

I'm afraid I'm about to go on vacation for a week, and probably at this point in time, Lars needs to be the better reviewer. I think he's also on vacation for I think the next two weeks, so I'm sorry about the delay.

Luke, thanks for the notice.
If you could review patches after the vacation it would be awesome. If you can't it is well, ok, /sigh/ for me to wait for Lars. In the last case I'll change you on Lars after hi returns from vacation.

BTW, patches are very simple and extremely short.

This is the third part of series of patches to Frame without tls pointer.
Here we preserve initial tls in all entry stubs and then use it to find a proper tls instance for a given frame.

To find the TlsData* for specific frame we start from a entry stub's tls
and then track tls through all possible cross-instance calls. This logic
is implemented in GetNearestEffectiveTls procedure.

Then, we use this new procedure to make singal handling free from Frame::tls.

Here we replace usage of Frame::tls in frame iteration with GetNearestEffectiveTls.
We also maintain current tls for frame iteration object to not to call GetNearestEffectiveTls everytime.

Here we remove remaining uses of Frame::tls. There are many places where
we use it: in debug frames, in profiling frames, in jit activation, etc.
All these places require short fixes to use our new scheme for getting
tls, so I gathered them together.

Let's see following code and let's assume that wasm-compiler is ion:

call foo
call bar

Ion can clobber tls inside foo and then just go with clobbered tls into bar.
It will crash if bar uses tls. At compile-time we don't know whether bar will use tls or not.

It works well when we restore tls each time when we return from a function because of the current frame structure.
But now, when we want to get rid of that Frame::tls we should guarantee that Ion doesn't clobber tls inside a function.
In fact we forces Ion to restore tls iff it clobbers it and it is a rare case.

Baseline doesn't need such logic because of its private stack slot for tls.

We remove Frame::tls and fix prologue/epilogue. Runtime, Ion and Baseline are ready for this move already.

When I did measurement I suddenly discovered that there is some small performance regression at patch 7.
My machine is not representative at all, but it is just a little bit mysterious.
At 6 part performance of direct calls: 185
At 7 part it is: 193

I haven't seen any major performance regression at patch 7.

I started investigation, compared the codes and discovered that SM debug (--enable-debug --disable-optimize) generates code which executes faster than SM release (--disable-debug --enable-optimize --enable-release).
At 7 part:
Debug: 183
Release: 195

Debug code:

   0x2802daadd2b0      push   %rbp
   0x2802daadd2b1      mov    %rsp,%rbp
   0x2802daadd2b4      sub    $0x20,%rsp
   0x2802daadd2b8      cmp    %rsp,0x30(%r14)
   0x2802daadd2bc      jb     0x2802daadd2c4
   0x2802daadd2c2      ud2
   0x2802daadd2c4      mov    %edi,0x1c(%rsp)
   0x2802daadd2c8      xor    %eax,%eax
   0x2802daadd2ca      mov    %eax,0x18(%rsp)
   0x2802daadd2ce      xchg   %ax,%ax
   0x2802daadd2d0      cmpl   $0x0,0x38(%r14)
   0x2802daadd2d5      je     0x2802daadd2dd
   0x2802daadd2db      ud2
   0x2802daadd2dd      mov    0x1c(%rsp),%eax
   0x2802daadd2e1      test   %eax,%eax
   0x2802daadd2e3      je     0x2802daadd311
   0x2802daadd2e9      sub    $0x1,%eax
   0x2802daadd2ec      mov    %eax,0x1c(%rsp)
   0x2802daadd2f0      mov    $0x7,%edi
   0x2802daadd2f5      test   $0xf,%spl
   0x2802daadd2f9      je     0x2802daadd300
   0x2802daadd2ff      int3
   0x2802daadd300      callq  0x2802daadd0a0
   0x2802daadd305      mov    %eax,%eax
   0x2802daadd307      add    0x18(%rsp),%eax
   0x2802daadd30b      mov    %eax,0x18(%rsp)
   0x2802daadd30f      jmp    0x2802daadd2ce
   0x2802daadd311      mov    0x18(%rsp),%eax
   0x2802daadd315      add    $0x20,%rsp
   0x2802daadd319      pop    %rbp
   0x2802daadd31a      retq 

Release code:

    0x1256f09c4290      push   %rbp
    0x1256f09c4291      mov    %rsp,%rbp
    0x1256f09c4294      sub    $0x20,%rsp
    0x1256f09c4298      cmp    %rsp,0x30(%r14)
    0x1256f09c429c      jb     0x1256f09c42a4 
    0x1256f09c42a2      ud2
    0x1256f09c42a4      mov    %edi,0x1c(%rsp)
    0x1256f09c42a8      xor    %eax,%eax
    0x1256f09c42aa      mov    %eax,0x18(%rsp)
    0x1256f09c42ae      xchg   %ax,%ax
    0x1256f09c42b0      cmpl   $0x0,0x38(%r14)
    0x1256f09c42b5      je     0x1256f09c42bd 
    0x1256f09c42bb      ud2
    0x1256f09c42bd      mov    0x1c(%rsp),%eax
    0x1256f09c42c1      test   %eax,%eax
    0x1256f09c42c3      je     0x1256f09c42e6 
    0x1256f09c42c9      sub    $0x1,%eax
    0x1256f09c42cc      mov    %eax,0x1c(%rsp)
    0x1256f09c42d0      mov    $0x7,%edi
    0x1256f09c42d5      callq  0x1256f09c40a0 
    0x1256f09c42da      mov    %eax,%eax
    0x1256f09c42dc      add    0x18(%rsp),%eax
    0x1256f09c42e0      mov    %eax,0x18(%rsp)
    0x1256f09c42e4      jmp    0x1256f09c42ae 
    0x1256f09c42e6      mov    0x18(%rsp),%eax
    0x1256f09c42ea      add    $0x20,%rsp
    0x1256f09c42ee      pop    %rbp           
    0x1256f09c42ef      retq

The only difference which I can see is:

test   $0xf,%spl
je     0x2802daadd300
int3

Looks like inserting this dynamic check is speeding up the code.
Looks like an alignment problem.

I'll check the SM behavior on my laptop and compare results.

If I increase the loop body:

  (func (export "run_direct") (param $count i32) (result i32)
    (local $sum i32)
    (block $END
      (loop $LOOP
        (br_if $END (i32.eqz (local.get $count)))
        (local.set $count (i32.sub (local.get $count) (i32.const 1)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (br $LOOP)))
    (local.get $sum))

then difference between the reference and part7 PR is very small 254.25 vs 255 so I think the previous anomaly affects only very small loops.

I also compared reference and my stack of patches on https://github.com/lars-t-hansen/embenchen in release mode with --wasm-compiler=ion
The results (lower score is better):

The first column is score for the reference and the second column is for the my stack of patches.
Looks like everything is good!

box2d:              700  691
bullet:             895  903
conditionals:       213  214
copy:               463  465
corrections:        1356 1356
fannkuch:           2334 2257
fasta:              616  616
fib:                639  612
ifs:                197  190
binarytrees:        3235 3234
memops:             979  976
primes:             1451 1454
raybench:           1430 1444
rust-fannkuch:      2189 2192
skinning:           541  539
zlib:               987  988

I also compared external and internal calls with the previous patched benchmark:

Reference:

external: 257.699951171875
internal: 254.968017578125
direct: 258.489990234375

My stack of patches:

external: 258.5400390625
internal: 234.716064453125
direct: 256.7490234375

To sum up, I can conclude that direct call aren't affected by modulo the first test, internal calls became even faster, and external calls aren't affected, so removing Frame::tls is a good independent step, imho.

Attachment #9162440 - Attachment description: Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=luke → Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth
Attachment #9162453 - Attachment description: Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=luke → Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=lth
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f68854b70d6a
Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth
https://hg.mozilla.org/integration/autoland/rev/296b7f5a0713
Part 2: Preserve callee and caller tls'es. r=lth

Backed out for bustage on MacroAssembler.h

backout: https://hg.mozilla.org/integration/autoland/rev/eeed6471b7255c0d93d96ff2a202567256ac5a30

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=Hqmhidc8RPeFuoWEjJPfPA.0&revision=296b7f5a07133f9cbbd28ab20c1468800b0e1d39

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=312226941&repo=autoland&lineNumber=21791

[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - In file included from /builds/worker/checkouts/gecko/js/src/jit/x86/SharedICRegisters-x86.h:10:
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - /builds/worker/checkouts/gecko/js/src/jit/MacroAssembler.h(4202,5): error: use of undeclared identifier 'increaseStackOffset'
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - increaseStackOffset(wasm::FrameWithTls::sizeWithoutFrame());
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - ^
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - 1 error generated.
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:748: Parser.obj] Error 1
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/src/frontend'
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: js/src/frontend/target-objects] Error 2
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[3]: *** Waiting for unfinished jobs....

Flags: needinfo?(dbezhetskov)
Attachment #9162715 - Attachment description: Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=luke → Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95d6c42f07fd
Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth
https://hg.mozilla.org/integration/autoland/rev/25931ec43a32
Part 2: Preserve callee and caller tls'es. r=lth
https://hg.mozilla.org/integration/autoland/rev/30af0e8c7956
Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth
Attachment #9162723 - Attachment description: Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=luke → Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=lth
Attachment #9162736 - Attachment description: Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=luke → Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=lth
Attachment #9162746 - Attachment description: Bug 1639153 - Part 6: Force Ion to preserve its tls. r=luke → Bug 1639153 - Part 6: Force Ion to preserve its tls. r=lth
Regressions: 1649928

Unverified but plausible regression: bug 1657741.

Regressions: 1657825
Regressions: 1657917
You need to log in before you can comment on or make changes to this bug.