Closed Bug 1829765 Opened 1 year ago Closed 1 year ago

Enable SIMD support in wasm2c for RLBox sandboxing

Categories

(Core :: Security: RLBox, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
117 Branch
Tracking Status
firefox117 --- fixed

People

(Reporter: shravanrn, Assigned: wrv)

References

Details

Attachments

(1 file)

wasm2c provides SIMD support for wasm programs through the simd-everywhere library. We need to vendor in simd-everywhere to support this.

Assignee: nobody → wrv

Vendor in support for simd-everywhere

Depends on: 1827704

For the record: we already vendor two simd abstraction layers : https://github.com/google/highway (as a dependency of JPEG XL) and https://github.com/QuantStack/xsimd (used by fx code).

That being said both of the above use C++ headers while wasm2c requires C headers, so I guess it's ok (?).

Willy, any hint on the expected speedup?

highway, xsimd --- Ah, the multiple libraries for this is unfortunate. I think longer term we can look at whether wasm2c can additionally support one of those simd intrinsic libraries as well. (See update below) For now, I think it makes sense to stick to simd-everywhere.

Capturing some of the context in this bug thread
So far we have only RLBox-sandboxed libraries that do not depend on simd, so this probably won't change the performance of those libraries. However, I think we would really like to sandbox libsoundtouch which does use simd. We tried this a few years ago (Bug 1829765) but at the time concluded the performance slowdown without simd was too significant. I don't recall the exact numbers from that effort, but @bholley may know, so we can needinfo him if we need this data.

I'll let Willy speak to the details of how far he is in the current performance measurement effort.

Oh correction, I believe Willy did evaluate those libraries too before choosing simd-everywhere. I'll let him speak to the details

And bobby pointed me to the previous soundtouch performance evaluation by Tom Ritter when disabling SIMD in wasm
https://docs.google.com/document/d/1h4UsI_9bXrMS3jy1dtu83GLwdi7xB3bNYrWh2MdyEes/edit#heading=h.4vy1ev261ixz

(Summary: disabling simd in soundtouch seems to slow things down by 2.5x. We need to see how much of that overhead we can recover)

Regarding performance, we have some preliminary numbers with libpng showing a 3x speedup, but we're focusing on getting soundtouch numbers now, and will share those results once we have them.

As for the other libraries, we found simd-everywhere (simde) to provide a more direct mapping as we can call simde equivalent intrinsics that get ported to a target architecture (e.g., the V128And SIMD128 opcode is simply simde_wasm_v128_and which simde maps to SSE2, NEON, etc.). Highway and xsimd each have their own APIs for which we'd have to map the SIMD128 opcodes to.

One thing to note is that since Highway provides WASMSIMD as an output architecture, JPEG XL may be a worth evaluating for sandboxing with SIMD as well.

Thanks for the digits! That looks like a good enough motivation for moving forward.

Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e261c34a0fc4
Enable SIMD support in wasm2c for RLBox sandboxing r=glandium,shravanrn CLOSED TREE

I think this conflicted with some other wasm2c changes we were landing at the same time. I'll coordinate with Willy to update

Flags: needinfo?(wrv)

We have some SoundTouch numbers we can now share [1].

We calculated these numbers by following the methodology described in [2,3]:

  1. Open [4] and click 'set rate and play'. This will start playing a sound at 0.25x speed. Each time we click 'set rate and play' the speed will increase by 1x, up to 3.25x.
  2. Begin collecting Profiling information with Firefox Profiler
  3. Let the audio play completely at each speed, then click 'set rate and play' to go to the next speed.
  4. Once it stops playing, stop Profiling.
  5. Use [5] to plot Audio Callback values and get statistics.

Note that Native+SIMD is the default in Firefox.

Results:

 **Mean time (s)**
          No SIMD |   SIMD
        -------------------
 Native | 0.03382 | 0.01486 |
        |-------------------|
  Wasm  | 0.04419 | 0.01745 |
        -------------------
**Median time (s)**
          No SIMD |   SIMD
        -------------------
 Native | 0.03304 | 0.01386 |
        |-------------------|
  Wasm  | 0.04165 | 0.01473 |
        -------------------

The above Mean time numbers show that an RLBoxed SoundTouch with SIMD (0.01745s) gives us a 2.5x speedup compared to an RLBoxed SoundTouch without SIMD (0.04419s). Previous results compared native+SIMD (0.01486s) to RLBoxed SoundTouch without SIMD (0.04419s) giving an almost 200% overhead. Compared to native+SIMD (0.01486s), RLBoxed SoundTouch with SIMD (0.01745s) has a 17% overhead. The linked to Google docs [1] has some more details.

[1] New results: https://docs.google.com/document/d/19LtpeSWA6ghfAEGw1lN8im2QPMc7wL2W7LusXPzfg18/edit
[2] Methodology: https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/#the-new-solution
[3] Previous results: https://docs.google.com/document/d/1h4UsI_9bXrMS3jy1dtu83GLwdi7xB3bNYrWh2MdyEes/edit
[4] Benchmark website: https://ritter.vg/misc/ff/soundtouch-perf.html
[5] Audio callback plotting plugin: https://github.com/padenot/fx-profiler-audio-cb

Thanks, that sounds great. Paul, you ok with shipping this now?

Flags: needinfo?(padenot)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/6c00250ffff4
Enable SIMD support in wasm2c for RLBox sandboxing r=glandium,sergesanspaille
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch

(In reply to Bobby Holley (:bholley) from comment #12)

Thanks, that sounds great. Paul, you ok with shipping this now?

Absolutely, thanks everybody for this, it's exciting to be able to use this technology in such a difficult environment that is low-latency real-time digital signal processing!

Thanks for the performance metrics as well, and in particular the document linked above. As anticipated, that the standard deviation of the distribution hasn't changed too much between the various technique (native vs rlbox, scalar vs. SIMD, and variants), and the distributions are normal as well. Those two characteristics are a lot more important than the absolute real-time load.

It's interesting that there are little spikes on the profile (first view, plotting of the load), when wasm2c is used. It must coincide with the speed changes, and shows that there is a non-linear overhead, it's workload dependent (in the allocator maybe?), very much unlike the processing, which is stable. The important part being the processing (with a static time stretching factor), I'm not concerned by this.

There are three questions at the bottom of the document, that I can answer here:

Audio Callback Tracing seems like a new addition, compared to the videos in https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/. How does this change profiling?

It does not change profiling real-time audio callback. It's the same statistical analysis, but applied to the duration it takes for the various stages of decoding audio and video: demuxing, copying of the compressed media data, decoding and sometimes copying of the decoded media data, this copy can be very large in the case of high-resolution videos.

How is libSoundTouch evaluated on other architectures if SoundTouch only has SSE support? Is the real time requirement there looser?

It's not loser, it's real-time on all OSes and architectures. There's no explicit vectorized path on e.g. aarch64, but the main developer of SoundTouch as made some changes and verified that it auto-vectorizes well. I haven't double-checked that it is the case with our compilation options. Other architectures are not tier-1, and I haven't spent any time investigating.

libSoundTouch has some lines that say “#pragma omp parallel for”. Could that be impacting performance? Is that being used?

It is not being used. In low-latency real-time digital signal processing, it's hard to use threads because of the deadline uncertainties that this causes. We don't use OpenMP anyway.

Flags: needinfo?(padenot) → needinfo?(wrv)
Regressions: 1851301
Flags: needinfo?(wrv)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: