Closed Bug 1482785 Opened 7 years ago Closed 7 months ago

Use a TLS register to access zone and context data

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox63 --- affected

People

(Reporter: pbone, Unassigned)

Details

Background: In GC allocation routines we access Zone data via absolute addresses, we try to do this as little as possible so that large addresses don't take up too much room in the instruction stream. Absolute addresses also make it impractical to cache compiled code, which is something (Bug 1479360 Comment 3). Also in Bug 1473213 we want to access the current JSContext, and off-thread Ion compilation can make this difficult since the context during compile time can be different from run time. There are probably other situations where we wish to access the JSContext and/or could avoid an absolute address. Proposal: Always use a TLS register in all compiled code. (we can implement this bit-by-bit). There is already a WasmTlsRegister, I propose removing the "Wasm" part of the name and encouraging its use everywhere. I also propose putting all runtime TLS info in the same structure (something like JSContext or TlsData but more general). This kind of change will probably be required to implement GC for Wasm. Question: Lars, is this reasonable/safe? Is this register available to JS code and can it be safely dereferenced? The comment https://searchfox.org/mozilla-central/source/js/src/wasm/WasmTypes.h#1878 says this structure is switched when calls cross module boundaries. If you think the idea is reasonable I'd like to talk about what implementation you think is best. (Should it be a general structure with a Wasm Module specific field or fields?, or as it is now, a Wasm Module structure that also contains some general fields).
NI for Lars
Flags: needinfo?(lhansen)
No longer blocks: 1473213
We're going to have to ask somebody with a higher pay grade about this (cc'd). There is a WasmTlsRegister, but it is in principle only live at call boundaries; internally to a function we'll reuse the register for other things and reload the tls value from the frame when we need it, at which point the tls pointer may end up in a different register than the WasmTlsRegister. The JS ABI would have to change (on all platforms) to accomodate a TLS register, which will be a bit of work. At this time I don't think the JS ABI even preserves the frame pointer everywhere. I don't see yet why this would be required to implement GC for wasm, we have frame pointers everywhere and can walk the stack just fine with those, and the GC can probably access the stack maps by using just the PC as a key. If we're going to rewrite a lot of code anyway to accomodate Cranelift nee Cretonne, it might be an interesting requirement on Cretonne-generated code. cc'ing some more people.
Flags: needinfo?(sunfish)
Flags: needinfo?(luke)
Flags: needinfo?(lhansen)
Flags: needinfo?(jdemooij)
Flags: needinfo?(bbouvier)
(In reply to Lars T Hansen [:lth] from comment #2) > The JS ABI would have to change (on all platforms) to accomodate a TLS > register, which will be a bit of work. Yeah, this is pretty difficult to do for JS (especially on x86) and I don't think there's a good reason to do this now, with our current JITs. Caching compiled code is not something we're working on atm, but if we wanted to do this, another option would be to patch the cx/zone pointers after loading code from the cache. This is what we do for certain pointers embedded in wasm code.
Flags: needinfo?(jdemooij)
Wasm doesn't always have a pinned TLS register, but it does maintain fp (which I think all JIT code should do: bug 1426134), and wasm::Frame has a TLS field, so we can always load TLS via fp->tls. The WasmTlsReg scheme is just a regalloc hack that attempts to minimize reloads of this frame field; it's not a proper pinned register. As a possibly-faster alternative: I think it would be possible to get thread-local access on x86/x64 in C++ and JIT code down to a single %gs/%fs-relative load/store: On OSX, WebKit/JSC does this already with certain magic slots: https://bug-169483-attachments.webkit.org/attachment.cgi?id=304134 https://opensource.apple.com/source/Libc/Libc-583/pthreads/pthread_machdep.h https://github.com/WebKit/webkit/blob/master/Source/WTF/wtf/FastTLS.h#L43 On Linux, non-PIC C++ already does this for `thread_local`, so it's a question of how to forcibly/hackily do this for non-PIC code (and get a hold of the constant offset). On Windows, there are 64 inline TLS slots in the ThreadEnvironmentBlock (https://en.wikipedia.org/wiki/Win32_Thread_Information_Block) pointed to by %fs and it seems possible to do a hack where we claim a one of these with a static offset in the TEB. For JIT code, I think fp->tls->x would only be a bit slower than %gs->x (I've noticed that segment-prefixed loads/stores are slower than normal loads/stores, but not drastically), but for C++, esp on Windows, the current codegen for MOZ_THREAD_LOCAL (last I checked) uses like 4-6 loads/stores, so this might be a nice general speedup.
Flags: needinfo?(luke)
I think cranelift would be ok with any of the approaches being discussed here. Some of them would take some work, such as x86 encoding work if we use segment registers, however I don't expect it would be over-complicated.
Flags: needinfo?(sunfish)
Not much to add here. If there's one register to sacrifice on x86, I think it should be FP, and then we can reload a TLS from it if we want to.
Flags: needinfo?(bbouvier)
Priority: -- → P3
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.