2.8 - 1.76% compiler_metrics num_static_constructors / compiler_metrics num_static_constructors + 9 more (Windows) regression on Fri February 4 2022
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox97 | --- | unaffected |
firefox98 | --- | wontfix |
firefox99 | --- | fixed |
People
(Reporter: aesanu, Assigned: aaggarwal)
References
(Regression)
Details
(Keywords: perf-alert, regression)
Attachments
(1 file)
Perfherder has detected a build_metrics performance regression from push d4be0d9bbd6885ecaf5dc780195f16de7461455d. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
3% | compiler_metrics num_static_constructors | windows2012-64-shippable | 107.00 -> 110.00 | |
3% | compiler_metrics num_static_constructors | windows2012-64 | 110.00 -> 113.00 | |
3% | compiler_metrics num_static_constructors | windows2012-32-shippable | 114.00 -> 117.00 | |
3% | compiler_metrics num_static_constructors | windows2012-32 | 118.00 -> 121.00 | |
2% | compiler_metrics num_static_constructors | windows2012-64-shippable | instrumented | 132.00 -> 135.00 |
2% | compiler_metrics num_static_constructors | windows2012-32-shippable | instrumented | 141.00 -> 144.00 |
2% | compiler_metrics num_static_constructors | windows2012-64 | 157.00 -> 160.00 | |
2% | compiler_metrics num_static_constructors | windows2012-32 | 159.00 -> 162.00 | |
2% | compiler_metrics num_static_constructors | windows2012-64 | fuzzing | 159.00 -> 162.00 |
2% | compiler_metrics num_static_constructors | windows-mingw32 | 32 clang debug | 169.00 -> 172.00 |
2% | compiler_metrics num_static_constructors | windows-mingw32 | 64 clang debug | 170.00 -> 173.00 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
The push in question contains 3 of the 4 patches that eventually landed in central to close the bug 1746631. These 3 patches caused some failures which were later fixed as the 4th patch and landed in the form of the push 7907a10250a1a42425a1eb4918cc0be6d2e83dfa.
Does the build_metrics performance regression happen even after the 4th patch?
Comment 2•3 years ago
|
||
I don't fully understand this metric and its effect on the performance. (cc'ing author of bug 1464522) Is this regression acceptable?
There is a chance the constructors are coming from third-party intgemm library. Is there a way to tell which new three symbols were added?
Comment 3•3 years ago
|
||
After running dump_syms on windows for jsshell build I can see only two "dynamic initializers" symbols:
FUNC 11cd2c0 6c 0 intgemm::Int8Shift::PrepareBiasImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite>::`dynamic initializer for 'run'()
FUNC 11cd9b0 73 0 intgemm::Int8Shift::MultiplyImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite>::`dynamic initializer for 'run'()
Let's assume that https://github.com/kpu/intgemm/blob/768aa689dda91f49024e647b5aee23e6b5d08556/intgemm/intgemm.h#L221 is the offender here. Looking at the code of GetCPUID will probably need to replace it with our own, e.g. to remove undesired get-environment-variable logic, console I/O, etc.
let us know your plans within 3 business days, or the offending patch(es) will be backed out
The intgemm is built only for Nightly and there is no urgency to back it out.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 4•3 years ago
|
||
I think it's commenting that there are static constructors that implement the CPUID dispatch by setting function pointers to CPU-dependent functions. With some refactoring this could be reduced to one static constructor by using a struct.
The EnvironmentCPUID() function is only there to support tests that need to emulate different CPU models. It can be removed entirely if you'd like.
diff --git a/intgemm/intgemm.cc b/intgemm/intgemm.cc
index 31370e2..62e090b 100644
--- a/intgemm/intgemm.cc
+++ b/intgemm/intgemm.cc
@@ -14,8 +14,6 @@
#include <stdlib.h>
-#include <iostream>
-
namespace intgemm {
namespace {
@@ -88,29 +86,10 @@ CPUType RealCPUID() {
#endif
}
-CPUType EnvironmentCPUID() {
-#if defined(_MSC_VER)
- char env_override[11];
- size_t len = 0;
- if (getenv_s(&len, env_override, sizeof(env_override), "INTGEMM_CPUID")) return CPUType::AVX512VNNI;
- if (!len) return CPUType::AVX512VNNI;
-#else
- const char *env_override = getenv("INTGEMM_CPUID");
- if (!env_override) return CPUType::AVX512VNNI; /* This will be capped to actual ID */
-#endif
- if (!strcmp(env_override, "AVX512VNNI")) return CPUType::AVX512VNNI;
- if (!strcmp(env_override, "AVX512BW")) return CPUType::AVX512BW;
- if (!strcmp(env_override, "AVX2")) return CPUType::AVX2;
- if (!strcmp(env_override, "SSSE3")) return CPUType::SSSE3;
- if (!strcmp(env_override, "SSE2")) return CPUType::SSE2;
- std::cerr << "Unrecognized INTGEMM_CPUID " << env_override << std::endl;
- return CPUType::AVX512VNNI;
-}
-
} // namespace
CPUType GetCPUID() {
- static const CPUType kLocalCPU = std::min(RealCPUID(), EnvironmentCPUID());
+ static const CPUType kLocalCPU = RealCPUID();
return kLocalCPU;
}
(Though in intgemm master I would want to ifdef this somehow to avoid breaking tests.)
And if the static const CPUType kLocalCPU = RealCPUID();
is a problem then it can be replaced with just return RealCPUID();
at the expense of repeatedly calling the function at load time.
It is true that intgemm::Int8Shift::MultiplyImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite>
is dynamically dispatched. This is by design so it can depend on the CPU model. However, if push comes to shove, the template arguments used are finite and could be enumerated in the cc file. I'm not sure that would change the static constructor count, so much as move them around though.
Comment 5•3 years ago
|
||
At a high level, intgemm is a matrix multiply library that compiles multiple implementations for different CPU models. Then, at load time, it runs CPUID and initializes function pointers to go to the correct implementation. This load time mechanism is triggered by a static initializer. This initialization appears to add .000162 s to average load time i.e. it is below the precision of the time
tool.
Can you help me understand the goal/requirements?
If it's loading time, eliminate the environment variable code as instructed above and I don't think you'll be able to measure the difference.
If it's the number of static initializers but one is still allowed, this might be doable by eliminating iostream (though the unsupported CPU error would also need to go and we like crash reporting) and instantiating C++ templates in the cc file rather than the header.
If it's a complete elimination of static initializers, the pointer initialization would move to an Init function. Risk of segfault due to uninitialized function pointers if Init isn't called before the other functions.
Script to reproduce timing:
#!/bin/bash
set -eo pipefail
mkdir repro
cd repro
# Download and compile intgemm
git clone https://github.com/kpu/intgemm
mkdir intgemm/build
pushd intgemm/build
cmake -DINTGEMM_DONT_BUILD_TESTS=ON ..
make -j4
popd
# A minimal program with just main
g++ -O3 -x c++ - <<<"int main() {}" -o without
# A minimal program using intgemm
cat >with.cc <<EOF
#include "intgemm/intgemm.h"
int main() { return (int)intgemm::kCPU; }
EOF
LIBRARY_PATH=intgemm/build g++ -Iintgemm -Iintgemm/build -O3 with.cc -o with -lintgemm
n=1000
# Run both programs 1000 times and time them.
for ((i=0;i<$n;++i)); do
(time ./with) 2>>with.times || true
(time ./without) 2>>without.times
done
#Sum up the seconds. Assumes we won't have minutes.
without_total=$(grep real without.times |cut -d m -f 2 |cut -d s -f 1 |paste -sd+ |bc)
with_total=$(grep real with.times |cut -d m -f 2 |cut -d s -f 1 |paste -sd+ |bc)
without_avg=$(bc <<<"scale=10; $without_total/$n")
with_avg=$(bc <<<"scale=10; $with_total/$n")
echo Average time without intgemm $without_avg second, with intgemm $with_avg second
difference=$(bc <<<$with_avg-$without_avg)
echo Average difference in time $difference second
Script output:
Average time without intgemm .0011670000 second, with intgemm .0013290000 second
Average difference in time .0001620000 second
Comment 6•3 years ago
|
||
Can you help me understand the goal/requirements?
From my understanding, but I can be corrected here, the goal is zero impact on startup performance of Firefox and adding initializers can potentially impact that. It is impossible to proof that by just looking at the number of static constructors -- it needs to be further analyzed.
There is some uncertainty in EnvironmentCPUID
since the performance of getenv
is unpredictable and vary across platforms -- my recommendation to remove that using preprocessor. The RealCPUID operation is still under question, but we can argue it is needed/required here. There will be a change in the amount of dynamic initializers when intgemm is included.
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Set release status flags based on info from the regressing bug 1746631
Updated•3 years ago
|
Comment 8•3 years ago
|
||
Changed iostream to fprintf in intgemm master.
Regarding EnvironmentCPUID hiding behind a preprocessor macro, take a look at: https://github.com/kpu/intgemm/pull/98/ . Have a look at the PR to confirm then I'll merge.
Neither of these will change the number of static constructors metric, but as you say this is metric is more of an indicator to look at startup time.
Comment 9•3 years ago
|
||
EnvironmentCPUID hiding behind a preprocessor macro now in master https://github.com/kpu/intgemm following review by Yury. Since you're not using cmake anyway, there should be no change to your build setup to have getenv off.
Assignee | ||
Comment 10•3 years ago
|
||
I will create a phab updating the intgemm to the latest commit to resolve this issue.
Reporter | ||
Comment 11•3 years ago
|
||
(In reply to Abhishek from comment #1)
The push in question contains 3 of the 4 patches that eventually landed in central to close the bug 1746631. These 3 patches caused some failures which were later fixed as the 4th patch and landed in the form of the push 7907a10250a1a42425a1eb4918cc0be6d2e83dfa.
Does the build_metrics performance regression happen even after the 4th patch?
Looking at this graph the regression remains even after the 4th patch and retriggers for it.
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&selected=1689407,1486464709&series=autoland,1689407,1,2&timerange=1209600&zoom=1643928337692,1644295637778,117.23413376247181,121.43626433669351
Assignee | ||
Comment 12•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
•
|
||
Assignee | ||
Comment 14•3 years ago
•
|
||
Hey Andra. Thanks a lot for confirming the regression after 4th patch.
Changes pulled as https://phabricator.services.mozilla.com/D138462:
- Changed iostream to fprintf
- Disabled using
getenv
by hidingEnvironmentCPUID
behind a preprocessor macro
Comment 15•3 years ago
|
||
Comment 16•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment 17•3 years ago
|
||
The patch landed in nightly and beta is affected.
:aaggarwal, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Assignee | ||
Comment 18•3 years ago
|
||
Firefox integer gemm intrinsic feature is currently only restricted to run in Nightly. So, we don't need to uplift it to beta. Thanks @Pascal for setting the flag.
Comment 19•3 years ago
|
||
(In reply to Abhishek from comment #14)
Hey Andra. Thanks a lot for confirming the regression after 4th patch.
Changes pulled as https://phabricator.services.mozilla.com/D138462:
- Changed iostream to fprintf
- Disabled using
getenv
by hidingEnvironmentCPUID
behind a preprocessor macro
Abhishek, the regression still appears to be present in the graph (https://treeherder.mozilla.org/perfherder/graphs?timerange=5184000&series=autoland,1689410,1,2) is this expected?
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
Hi Dave. I was on PTO. So, I couldn't reply earlier.
The regression is expected for the reasons explained in the comment below:
(In reply to Yury Delendik (:yury) from comment #6)
From my understanding, but I can be corrected here, the goal is zero impact on startup performance of Firefox and adding initializers can potentially impact that. It is impossible to proof that by just looking at the number of static constructors -- it needs to be further analyzed.
There is some uncertainty in
EnvironmentCPUID
since the performance ofgetenv
is unpredictable and vary across platforms -- my recommendation to remove that using preprocessor. The RealCPUID operation is still under question, but we can argue it is needed/required here. There will be a change in the amount of dynamic initializers when intgemm is included.
Please let me know if more information is required here. Thanks :)
Comment 21•3 years ago
|
||
Dave, is comment 20 sufficient justification?
Comment 22•3 years ago
|
||
(Didn't mean to close, sorry.)
Comment 23•3 years ago
|
||
Yes, sorry, all good here.
Description
•