Bug 1746631 Comment 1 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

As a sanity check, I tested my implementation of the matrix-multiply intrinsics along with the [intgemm sources](https://bugzilla.mozilla.org/show_bug.cgi?id=1722102) in Firefox Nightly for Bergamot project and I could do the inference successfully on MacOS. 

Further, I benchmarked the following setups to compare the translation speeds:
1. `Wasm Gemm`              : Gemm library (intgemm) compiled to wasm
2. `Wormhole`                  : Gemm library (intgemm) compiled to wasm but using wormhole for 3 most expensive Intel instructions
3. `Native Firefox gemm` : Entire Gemm library (intgemm) exported as intrinsics from within Firefox
    1. I could benchmark both SSSE3 and AVX2

I am using the same translator configuration for benchmarking.

Models used for evaluation: [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende), [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes)

Length of the text used for translation: ~5000 words

wps: Translation speed measured in words per second

Results:
1. `Wasm Gemm`                : 95 wps      [Profiler](https://share.firefox.dev/3p44Ej6)
2. `Wormhole`                    : 390 wps   (+310% to Wasm Gemm), [Profiler](https://share.firefox.dev/3dZm0HO)
3. `Native Firefox gemm`
    1. SSSE3                     : 490 wps   (+25% to Wormhole, +415% to Wasm Gemm), [Profiler](https://share.firefox.dev/3oYFl28)
    2. AVX2                      : 560 wps   (+43% to Wormhole, +489% to Wasm Gemm), [Profiler](https://share.firefox.dev/3IVHuDt)
As a sanity check, I tested my implementation of the matrix-multiply intrinsics along with the [intgemm sources](https://bugzilla.mozilla.org/show_bug.cgi?id=1722102) in Firefox Nightly for Bergamot project and I could do the inference successfully on MacOS. 

Further, I benchmarked the following setups to compare the translation speeds:
1. `Wasm Gemm`              : Gemm library (intgemm) compiled to wasm
2. `Wormhole`                  : Gemm library (intgemm) compiled to wasm but using wormhole for 3 most expensive Intel instructions
3. `Native Firefox gemm` : Entire Gemm library (intgemm) exported as intrinsics from within Firefox
    1. I could benchmark both SSSE3 and AVX2

I am using the same translator configuration for benchmarking.

Models used for evaluation: [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende), [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes)

Length of the text used for translation: ~5000 words
System: MacBook Pro (15-inch, 2017), MacOS version 11.6.2, 3.1 GHz Quad-Core Intel Core i7 processor, 16 GB 2133 MHz RAM
wps: Translation speed measured in words per second

Results for [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende):
1. `Wasm Gemm`                : 95 wps      [Profiler](https://share.firefox.dev/3p44Ej6)
2. `Wormhole`                    : 390 wps   (+310% to Wasm Gemm), [Profiler](https://share.firefox.dev/3dZm0HO)
3. `Native Firefox gemm`
    1. SSSE3                     : 490 wps   (+25% to Wormhole, +415% to Wasm Gemm), [Profiler](https://share.firefox.dev/3oYFl28)
    2. AVX2                      : 560 wps   (+43% to Wormhole, +489% to Wasm Gemm), [Profiler](https://share.firefox.dev/3IVHuDt)

Results for [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes):
1. `Wasm Gemm`                : 105 wps
2. `Wormhole`                    : 440 wps   (+319% to Wasm Gemm)
3. `Native Firefox gemm`
    1. SSSE3                     : 550 wps   (+25% to Wormhole, +423% to Wasm Gemm)
    2. AVX2                      : 625 wps   (+43% to Wormhole, +495% to Wasm Gemm)

Back to Bug 1746631 Comment 1