Remove static initializers in intgemm code
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: yury, Unassigned)
References
Details
New information came in from Lars: per our policy, we need to remove all static initializers/constructors from third-party library. Bug 1754207 fixes time spend in the initializers.
Now we need to move entire initialization to different place/time or eliminate it. For example all CPUID detection code can be moved to static method, which will be call with SpiderMonkey initializers.
![]() |
||
Comment 1•4 years ago
|
||
Well, to be specific, we are generally very careful not to have static initializers. (Zero-initializers are fine.) Instead we prefer init methods that are called from the engine init, or other places, as appropriate; or initializers that are called lazily. What I would like to see is a conversation between the Bergamot team and the module owner (Jan) to make sure that the best solution is found so that we don't risk regressing startup time or other code quality metrics.
Comment 2•4 years ago
|
||
The main multiply function takes a template argument that lets the user decide what to do with the output matrix: https://github.com/kpu/intgemm/blob/fc3a614351ce6e667197307d97f45db5265c96af/intgemm/intgemm.h#L252-L257 . This is part of the point of the library: one can run arbitrary postprocessing like activation functions while the results are still in registers before being written to memory.
At the same time, the multiply function is CPUID-dependent so it should be both templated and a function pointer. Because one can't have a templated variable yet, there's a struct wrapper where each struct has a function pointer that's switched by CPUID: https://github.com/kpu/intgemm/blob/fc3a614351ce6e667197307d97f45db5265c96af/intgemm/intgemm.h#L291-L292 .
I'm not sure how to handle that with an Init function. The Init function will not know which template arguments are being used in other CC files so it can't initialize all of them.
One way around that is to enumerate the template arguments Firefox uses in the cc file and explicitly instantiate the templates along with listing them in the init function. This is possible though that means a bespoke fork with generic functionality gone.
I've also been toying with the idea of a function that would change the pointer on first call.
#include <iostream>
extern void (*Foo)();
void Replacement() {
std::cout << "Replacement" << std::endl;
}
void Initial() {
std::cout << "Initial" << std::endl;
Foo = Replacement;
Replacement();
}
void (*Foo)() = Initial;
int main() {
Foo();
Foo();
}
However, in general the first calls can be concurrent. So the Foo variable would then need to be atomic, right? Is there a way to do this one-time init that's safe for concurrency and doesn't add overhead to the call-by-function-pointer?
Comment 3•4 years ago
|
||
This is an x86-only library, so it should be ok to rely on atomic assignment of aligned pointers (it's not going to read a corrupt pointer and should eventually get the right one). So in theory a stub function that gets CPUID, changes the pointer, and calls the intended function would work without any locking.
Neither the stub function approach nor the explicit instantiation (removing generic templates) would be an improvement in code smell or quality. But it would improve the metric...
![]() |
||
Comment 4•4 years ago
|
||
For that use case it should be fine to use a relaxed atomic for the function pointer - it will have the same performance as a non-atomic but will not tempt the C++ compiler to start looking for races that will allow it to pretend the behavior is undefined.
I'm not dogmatic about this, but if we do introduce nontrivial static initializers then I want us to understand the costs and the options. CC/ni jan, as the module owner.
![]() |
||
Updated•4 years ago
|
Comment 5•4 years ago
|
||
How many (and which) static initializers are left? Are all of them for ChooseCPU? How many times do we end up calling that approximately?
A static initializer is not necessarily bad, especially if it's in third party code we can't modify as easily, but especially for a library we won't use in most processes it would be nice to avoid the startup overhead if possible.
Comment 6•4 years ago
|
||
All of them are ChooseCPU. Empirically, the compiler generates 3 static initializers. One handles the 17 calls in intgemm.cc that a -O3 compiler is smart enough to combine into one function. Two come from function template instantiation, which is one call to ChooseCPU each. For what it's worth, the stuff about Int16 can be cut out because you're not using it anyway.
The thing about std::atomic on the function pointers is then one needs to call load with relaxed. Which would mean all the function pointers publicly exposed become short inline functions that do this load and then call the function pointer. I guess it could be wrapped in a class that has the std::atomic<function pointer> as a private member variable with an operator() to make a fake function pointer. That class would then be statically initialized with something like Initial above. Though since it would not be CPU-dependent static initialization, maybe the compiler would be smart enough to do all the constant inference.
We're on strike https://www.ucu.org.uk/HE-dispute-institutions and arguably applying for the grant extension is higher priority here (the delay in applying for the extension being due to Mozilla).
I recognize that intgemm won't be used in the vast majority of Firefox startups.
![]() |
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Serge, will this be fixed by your xsimd replacement?
Updated•3 years ago
|
Description
•