<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 1

•

3 years ago

The push in question contains 3 of the 4 patches that eventually landed in central to close the bug 1746631. These 3 patches caused some failures which were later fixed as the 4th patch and landed in the form of the push 7907a10250a1a42425a1eb4918cc0be6d2e83dfa.

Does the build_metrics performance regression happen even after the 4th patch?

Flags: needinfo?(aaggarwal) → needinfo?(aesanu)

Comment 2

•

3 years ago

I don't fully understand this metric and its effect on the performance. (cc'ing author of bug 1464522) Is this regression acceptable?

There is a chance the constructors are coming from third-party intgemm library. Is there a way to tell which new three symbols were added?

Flags: needinfo?(mh+mozilla)

Comment 3

•

3 years ago

After running dump_syms on windows for jsshell build I can see only two "dynamic initializers" symbols:

    FUNC 11cd2c0 6c 0 intgemm::Int8Shift::PrepareBiasImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite>::`dynamic initializer for 'run'()
    FUNC 11cd9b0 73 0 intgemm::Int8Shift::MultiplyImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite>::`dynamic initializer for 'run'()

Let's assume that https://github.com/kpu/intgemm/blob/768aa689dda91f49024e647b5aee23e6b5d08556/intgemm/intgemm.h#L221 is the offender here. Looking at the code of GetCPUID will probably need to replace it with our own, e.g. to remove undesired get-environment-variable logic, console I/O, etc.

let us know your plans within 3 business days, or the offending patch(es) will be backed out

The intgemm is built only for Nightly and there is no urgency to back it out.

Updated

•

3 years ago

Flags: needinfo?(bugzilla)

Updated

•

3 years ago

Flags: needinfo?(mh+mozilla)

Comment 4

•

3 years ago

I think it's commenting that there are static constructors that implement the CPUID dispatch by setting function pointers to CPU-dependent functions. With some refactoring this could be reduced to one static constructor by using a struct.

The EnvironmentCPUID() function is only there to support tests that need to emulate different CPU models. It can be removed entirely if you'd like.

diff --git a/intgemm/intgemm.cc b/intgemm/intgemm.cc
index 31370e2..62e090b 100644
--- a/intgemm/intgemm.cc
+++ b/intgemm/intgemm.cc
@@ -14,8 +14,6 @@
 
 #include <stdlib.h>
 
-#include <iostream>
-
 namespace intgemm {
 
 namespace {
@@ -88,29 +86,10 @@ CPUType RealCPUID() {
 #endif
 }
 
-CPUType EnvironmentCPUID() {
-#if defined(_MSC_VER)
-  char env_override[11];
-  size_t len = 0;
-  if (getenv_s(&len, env_override, sizeof(env_override), "INTGEMM_CPUID")) return CPUType::AVX512VNNI;
-  if (!len) return CPUType::AVX512VNNI;
-#else
-  const char *env_override = getenv("INTGEMM_CPUID");
-  if (!env_override) return CPUType::AVX512VNNI; /* This will be capped to actual ID */
-#endif
-  if (!strcmp(env_override, "AVX512VNNI")) return CPUType::AVX512VNNI;
-  if (!strcmp(env_override, "AVX512BW")) return CPUType::AVX512BW;
-  if (!strcmp(env_override, "AVX2")) return CPUType::AVX2;
-  if (!strcmp(env_override, "SSSE3")) return CPUType::SSSE3;
-  if (!strcmp(env_override, "SSE2")) return CPUType::SSE2;
-  std::cerr << "Unrecognized INTGEMM_CPUID " << env_override << std::endl;
-  return CPUType::AVX512VNNI;
-}
-
 } // namespace
 
 CPUType GetCPUID() {
-  static const CPUType kLocalCPU = std::min(RealCPUID(), EnvironmentCPUID());
+  static const CPUType kLocalCPU = RealCPUID();
   return kLocalCPU;
 }

(Though in intgemm master I would want to ifdef this somehow to avoid breaking tests.)

And if the static const CPUType kLocalCPU = RealCPUID(); is a problem then it can be replaced with just return RealCPUID(); at the expense of repeatedly calling the function at load time.

It is true that intgemm::Int8Shift::MultiplyImpl<intgemm::callbacks::UnquantizeAndAddBiasAndWrite> is dynamically dispatched. This is by design so it can depend on the CPU model. However, if push comes to shove, the template arguments used are finite and could be enumerated in the cc file. I'm not sure that would change the static constructor count, so much as move them around though.

Flags: needinfo?(bugzilla)

Comment 5

•

3 years ago

At a high level, intgemm is a matrix multiply library that compiles multiple implementations for different CPU models. Then, at load time, it runs CPUID and initializes function pointers to go to the correct implementation. This load time mechanism is triggered by a static initializer. This initialization appears to add .000162 s to average load time i.e. it is below the precision of the time tool.

Can you help me understand the goal/requirements?

If it's loading time, eliminate the environment variable code as instructed above and I don't think you'll be able to measure the difference.

If it's the number of static initializers but one is still allowed, this might be doable by eliminating iostream (though the unsupported CPU error would also need to go and we like crash reporting) and instantiating C++ templates in the cc file rather than the header.

If it's a complete elimination of static initializers, the pointer initialization would move to an Init function. Risk of segfault due to uninitialized function pointers if Init isn't called before the other functions.

Script to reproduce timing:

#!/bin/bash
set -eo pipefail
mkdir repro
cd repro

# Download and compile intgemm
git clone https://github.com/kpu/intgemm
mkdir intgemm/build
pushd intgemm/build
cmake -DINTGEMM_DONT_BUILD_TESTS=ON ..
make -j4
popd

# A minimal program with just main
g++ -O3 -x c++ - <<<"int main() {}" -o without

# A minimal program using intgemm
cat >with.cc <<EOF
#include "intgemm/intgemm.h"
int main() { return (int)intgemm::kCPU; }
EOF
LIBRARY_PATH=intgemm/build g++ -Iintgemm -Iintgemm/build -O3 with.cc -o with -lintgemm

n=1000
# Run both programs 1000 times and time them.
for ((i=0;i<$n;++i)); do
  (time ./with) 2>>with.times || true
  (time ./without) 2>>without.times
done

#Sum up the seconds.  Assumes we won't have minutes.
without_total=$(grep real without.times |cut -d m -f 2 |cut -d s -f 1 |paste -sd+ |bc)
with_total=$(grep real with.times |cut -d m -f 2 |cut -d s -f 1 |paste -sd+ |bc)
without_avg=$(bc <<<"scale=10; $without_total/$n")
with_avg=$(bc <<<"scale=10; $with_total/$n")
echo Average time without intgemm $without_avg second, with intgemm $with_avg second
difference=$(bc <<<$with_avg-$without_avg)
echo Average difference in time $difference second

Script output:

Average time without intgemm .0011670000 second, with intgemm .0013290000 second
Average difference in time .0001620000 second

BugBot [:suhaib / :marco/ :calixte]

Comment 6

•

3 years ago

Can you help me understand the goal/requirements?

From my understanding, but I can be corrected here, the goal is zero impact on startup performance of Firefox and adding initializers can potentially impact that. It is impossible to proof that by just looking at the number of static constructors -- it needs to be further analyzed.

There is some uncertainty in EnvironmentCPUID since the performance of getenv is unpredictable and vary across platforms -- my recommendation to remove that using preprocessor. The RealCPUID operation is still under question, but we can argue it is needed/required here. There will be a change in the amount of dynamic initializers when intgemm is included.

Lars T Hansen [:lth]

Updated

•

3 years ago

Priority: -- → P2

Comment 7

•

3 years ago

Set release status flags based on info from the regressing bug 1746631

status-firefox97: --- → unaffected

status-firefox98: --- → affected

status-firefox-esr91: --- → unaffected

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Has Regression Range: --- → yes

Comment 8

•

3 years ago

Changed iostream to fprintf in intgemm master.

Regarding EnvironmentCPUID hiding behind a preprocessor macro, take a look at: https://github.com/kpu/intgemm/pull/98/ . Have a look at the PR to confirm then I'll merge.

Neither of these will change the number of static constructors metric, but as you say this is metric is more of an indicator to look at startup time.

Flags: needinfo?(ydelendik)

Comment 9

•

3 years ago

EnvironmentCPUID hiding behind a preprocessor macro now in master https://github.com/kpu/intgemm following review by Yury. Since you're not using cmake anyway, there should be no change to your build setup to have getenv off.

Andra Esanu (needinfo me)

Assignee

Comment 10

•

3 years ago

I will create a phab updating the intgemm to the latest commit to resolve this issue.

Reporter

Comment 11

•

3 years ago

(In reply to Abhishek from comment #1)

The push in question contains 3 of the 4 patches that eventually landed in central to close the bug 1746631. These 3 patches caused some failures which were later fixed as the 4th patch and landed in the form of the push 7907a10250a1a42425a1eb4918cc0be6d2e83dfa.

Does the build_metrics performance regression happen even after the 4th patch?

Looking at this graph the regression remains even after the 4th patch and retriggers for it.
https://treeherder.mozilla.org/perfherder/graphs?highlightAlerts=1&highlightChangelogData=1&highlightCommonAlerts=0&selected=1689407,1486464709&series=autoland,1689407,1,2&timerange=1209600&zoom=1643928337692,1644295637778,117.23413376247181,121.43626433669351

Flags: needinfo?(aesanu)

Assignee

Comment 12

•

3 years ago

Attached file Bug 1754207 - Updating the vendor intgemm library r=yury,rhunt — Details

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → aaggarwal

Status: NEW → ASSIGNED

Assignee

Comment 13

•

3 years ago

•

Edited

https://hg.mozilla.org/mozilla-central/rev/6074d13732e8

Assignee

Comment 14

•

3 years ago

•

Edited

Hey Andra. Thanks a lot for confirming the regression after 4th patch.

Changes pulled as https://phabricator.services.mozilla.com/D138462:

Changed iostream to fprintf
Disabled using getenv by hiding EnvironmentCPUID behind a preprocessor macro

Pulsebot

Comment 15

•

3 years ago

Pushed by ydelendik@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6074d13732e8 Updating the vendor intgemm library r=yury

Marian-Vasile Laza

Comment 16

•

3 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox99: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 99 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

Flags: needinfo?(ydelendik)

BugBot [:suhaib / :marco/ :calixte]

Comment 17

•

3 years ago

The patch landed in nightly and beta is affected.
:aaggarwal, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(aaggarwal)

Pascal Chevrel:pascalc

Updated

•

3 years ago

status-firefox98: affected → wontfix

Updated

•

3 years ago

Blocks: 1722102

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Comment 18

•

3 years ago

Firefox integer gemm intrinsic feature is currently only restricted to run in Nightly. So, we don't need to uplift it to beta. Thanks @Pascal for setting the flag.

Flags: needinfo?(aaggarwal)

Comment 19

•

3 years ago

(In reply to Abhishek from comment #14)

Hey Andra. Thanks a lot for confirming the regression after 4th patch.

Changes pulled as https://phabricator.services.mozilla.com/D138462:

Changed iostream to fprintf

Disabled using getenv by hiding EnvironmentCPUID behind a preprocessor macro

Abhishek, the regression still appears to be present in the graph (https://treeherder.mozilla.org/perfherder/graphs?timerange=5184000&series=autoland,1689410,1,2) is this expected?

Flags: needinfo?(aaggarwal)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---