As a sanity check, I tested my implementation of the matrix-multiply intrinsics along with the [intgemm sources](https://bugzilla.mozilla.org/show_bug.cgi?id=1722102) in Firefox Nightly for Bergamot project and I could do the inference successfully on MacOS. Further, I benchmarked the following setups to compare the translation speeds: 1. `Wasm Gemm` : Gemm library (intgemm) compiled to wasm 2. `Wormhole` : Gemm library (intgemm) compiled to wasm but using wormhole for 3 most expensive Intel instructions 3. `Native Firefox gemm` : Entire Gemm library (intgemm) exported as intrinsics from within Firefox 1. I could benchmark both SSSE3 and AVX2 I am using the same translator configuration for benchmarking. Models used for evaluation: [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende), [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes) Length of the text used for translation: ~5000 words wps: Translation speed measured in words per second Results: 1. `Wasm Gemm` : 95 wps [Profiler](https://share.firefox.dev/3p44Ej6) 2. `Wormhole` : 390 wps (+310% to Wasm Gemm), [Profiler](https://share.firefox.dev/3dZm0HO) 3. `Native Firefox gemm` 1. SSSE3 : 490 wps (+25% to Wormhole, +415% to Wasm Gemm), [Profiler](https://share.firefox.dev/3oYFl28) 2. AVX2 : 560 wps (+43% to Wormhole, +489% to Wasm Gemm), [Profiler](https://share.firefox.dev/3IVHuDt)
Bug 1746631 Comment 1 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
As a sanity check, I tested my implementation of the matrix-multiply intrinsics along with the [intgemm sources](https://bugzilla.mozilla.org/show_bug.cgi?id=1722102) in Firefox Nightly for Bergamot project and I could do the inference successfully on MacOS. Further, I benchmarked the following setups to compare the translation speeds: 1. `Wasm Gemm` : Gemm library (intgemm) compiled to wasm 2. `Wormhole` : Gemm library (intgemm) compiled to wasm but using wormhole for 3 most expensive Intel instructions 3. `Native Firefox gemm` : Entire Gemm library (intgemm) exported as intrinsics from within Firefox 1. I could benchmark both SSSE3 and AVX2 I am using the same translator configuration for benchmarking. Models used for evaluation: [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende), [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes) Length of the text used for translation: ~5000 words System: MacBook Pro (15-inch, 2017), MacOS version 11.6.2, 3.1 GHz Quad-Core Intel Core i7 processor, 16 GB 2133 MHz RAM wps: Translation speed measured in words per second Results for [English -> German](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/ende): 1. `Wasm Gemm` : 95 wps [Profiler](https://share.firefox.dev/3p44Ej6) 2. `Wormhole` : 390 wps (+310% to Wasm Gemm), [Profiler](https://share.firefox.dev/3dZm0HO) 3. `Native Firefox gemm` 1. SSSE3 : 490 wps (+25% to Wormhole, +415% to Wasm Gemm), [Profiler](https://share.firefox.dev/3oYFl28) 2. AVX2 : 560 wps (+43% to Wormhole, +489% to Wasm Gemm), [Profiler](https://share.firefox.dev/3IVHuDt) Results for [English -> Spanish](https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enes): 1. `Wasm Gemm` : 105 wps 2. `Wormhole` : 440 wps (+319% to Wasm Gemm) 3. `Native Firefox gemm` 1. SSSE3 : 550 wps (+25% to Wormhole, +423% to Wasm Gemm) 2. AVX2 : 625 wps (+43% to Wormhole, +495% to Wasm Gemm)