Closed Bug 1732201 Opened 3 years ago Closed 2 years ago

Sandbox woff2 in ots using RLBox

Categories

(Core :: Security: RLBox, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
96 Branch
Tracking Status
firefox96 --- fixed

People

(Reporter: deian, Assigned: deian)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

No description provided.

Depends on D126435

Depends on D126435

This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.

Nice!

(In reply to Deian Stefan from comment #3)

This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.

Having them in the same sandbox seems fine to me.

Attachment #9242608 - Attachment description: Bug 1732201 - Part 1: Sandbox woff2 in ots using RLBox → Bug 1732201 - Part 1: Sandbox woff2 in OTS using RLBox

(In reply to Deian Stefan from comment #3)

This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.

Note that brotli is also used by a few other parts of the code: see https://searchfox.org/mozilla-central/search?q=brotli%2Fdecode.h. So as I understand it, the patch here will mean we have two copies of brotli (including its big static dictionary, https://searchfox.org/mozilla-central/source/modules/brotli/common/dictionary.c), one copy compiled to WASM and the other as native code.

That seems unfortunate; if we're going to WASM-ify brotli for woff2 use, we should also use that version for the HTTP compression code (and the layout inspector code) instead of carrying two "compiled" versions of it.

So for woff2 purposes it seems fine to have brotli in the same sandbox, but I think we should ensure that it is also usable for the other modules that want to access brotli-decode functionality.

Depends on D126435

Depends on D126435

Attachment #9242609 - Attachment description: Bug 1732201 - Part 2: Turn on Wasm sandbox support for RLBoxed woff2 → Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2

New patches expose brotli to the sandboxed code. We can revisit all of this once we put brotli in a sandbox too :)

Try in the works: https://treeherder.mozilla.org/jobs?repo=try&revision=8d764cb0e60d1a893fe42a9575ffc5e38cb13aeb

AFAICT the try failures are unrelated to these set of patches.

See Also: → 1737207

I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?

Flags: needinfo?(jfkthame)

(In reply to Deian Stefan from comment #9)

I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?

Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.

[1] https://searchfox.org/mozilla-central/rev/f8576fec48d866c5f988baaf1fa8d2f8cce2a82f/dom/media/ogg/OggDemuxer.cpp#156

(In reply to Bobby Holley (:bholley) from comment #10)

Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.

Done.

Flags: needinfo?(jfkthame)

Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?

Flags: needinfo?(shravanrn)

(In reply to Deian Stefan from comment #12)

Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?

We do. Updated part 1.

Flags: needinfo?(shravanrn)

On windows, we're failing at an invoke here with WASM_RT_TRAP_CALL_INDIRECT_TYPE_MISMATCH errors. Here is an example: https://treeherder.mozilla.org/logviewer?job_id=356511111&repo=try&lineNumber=5931

I got this reproduced locally in a vm, but my ability to use windows is pathetic. @shravan is this something obvious to you? (I'm wondering if it's the same issue we ran into with hunspell but it triggers a bit differently.)

Flags: needinfo?(shravanrn)
Flags: needinfo?(shravanrn)

To see what impact this might have on performance, I did a local build with instrumentation added around the call to SanitizeOpenTypeData that is part of loading a webfont resource. I collected the list of resource sizes and time spent in SanitizeOpenTypeData when visiting the Google Fonts home page, which loads fairly small subsets of a number of fonts, and also the Source Han Serif page on TypeKit, which includes some larger resources.

Results (from a MacBook Pro) are shown at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing, sorted by WOFF2 resource size, which compares the time taken with a current mozilla-central build with the time when the patches here are applied. (In both cases this is using a default Opt -- but not PGO -- build.)

Although the figures vary quite a bit (and would doubtless show variability across multiple runs), it's clear that there is a noticeable overhead as a result of the sandboxing/wasm-translation; the total time taken to decode the 50-plus resources is about 30% more. However, note that for small resources the per-font decoding time is typically less than a millisecond, so the 30% -- or even 50% or 100% -- regression is scarcely going to be noticeable among all the work of loading the page. Larger font resources, in the 100K-plus size range, may show a regression of several milliseconds, but these are resources that (on my connection, at least) have spent dozens or hundreds of ms downloading, so in terms of the overall webfont-loading process, the regression in decoding speed seems pretty minor to me.

Agree that the overhead is probably acceptable. Could you post a profile of a 100x WOFF2 loop so that we can see if there are any obvious hotspots we could optimize further?

Flags: needinfo?(jfkthame)

Updated the sheet at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing to have numbers from a shippable build, and wrapped the WOFF2-decompression in a 100-iteration loop to make it easier to see things.

What's clear from these numbers -- where small resources show a huge regression, percentage-wise (even though it's less than a millisecond per iteration), while larger resources show a much smaller percentage -- is that the overhead of setting up the sandbox each time is quite significant on small resources.

This can also be clearly seen in a profile: https://share.firefox.dev/3BSsEJk is a profile captured while loading https://fonts.google.com, which displays small subsets of a number of fonts. create_sandbox and destroy_sandbox (to a lesser extent) show up as dominating the processing of small font resources.

Given that small font resources are very common on the web, I think we should reconsider the strategy of spinning up a new sandbox for each call. Maybe we could make it thread-local, so that when a given thread is asked to process several resources in succession, it doesn't pay the overhead of a fresh sandbox every time?

Flags: needinfo?(jfkthame)

The downside of minting thread-local sandboxes is that the sandboxes allocate a fair amount of memory (~325k I think), and so it's not good for browser memory usage to just have them sitting around.

One strawman idea would be the following:

  • We have a shared (locked) nsTArray of spare sandboxes
  • When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.
  • When a thread is done with its sandbox, it pushes it back to the spares.
  • When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.

Jonathan, does that seem like it would strike the right balance based on how this stuff gets used?

My 2 cents - either option could work.

I think thread locals can work fine, but I think we would want to the same, destroy sandbox if unused for 3 seconds sort of behavior even on thread locals, so that we don't leave sandboxes hanging around. Specifically, it would have to operate like a a shared_pointer thread_local, but the destructor would need to delay execution by 3 seconds.

Fwiw, we did try the sandbox pool approach in our original paper and it worked quite well when we measured performance of sandboxing zlib, libjpeg and other very commonly used libraries.

I'll let both of you make the final call here.

(In reply to Bobby Holley (:bholley) from comment #20)

One strawman idea would be the following:

  • We have a shared (locked) nsTArray of spare sandboxes
  • When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.
  • When a thread is done with its sandbox, it pushes it back to the spares.
  • When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.

That sounds like it'd be reasonable. Generally, if there's just a single font resource involved in a given page-load or reflow, the extra millisecond or so to set up a sandbox is pretty insignificant; the concern would be when reflow ends up requesting a dozen separate resources, so as long as we get to (largely) re-use sandboxes in that case we should be fine.

Using thread locals seems nice in that it avoids any need for the (lock-protected) array, and the total number of sandboxes automatically matches the number of threads using them (aside: I'm looking into whether bumping up the background-thread pool count may be helpful). But I don't have a strong view here; AFAICS either approach should work pretty much equally well, so I'm happy for the implementer to choose which way to do it. :)

Thread pool support added. Pretty much the strawman implementation, except we don't always clear the whole array.

Try auto: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7ba732f3f16759a46da4333871beb907885a97f

Depends on: 1740624
Component: Graphics → Security: RLBox
Blocks: 1742916

Comment on attachment 9242609 [details]
Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2

Revision D126436 was moved to bug 1742916. Setting attachment 9242609 [details] to obsolete.

Attachment #9242609 - Attachment is obsolete: true
Attachment #9245624 - Attachment is obsolete: true
Attachment #9242608 - Attachment description: Bug 1732201 - Part 1: Sandbox woff2 in OTS using RLBox → Bug 1732201 - Sandbox woff2 in OTS using RLBox
Depends on: 1743023
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d486edc7499b
Sandbox woff2 in OTS using RLBox r=bholley

Backed out for causing web-platform-tests failures on header-totalsfntsize-001.xht

Flags: needinfo?(deian)
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dcd88b342c8c
Sandbox woff2 in OTS using RLBox r=bholley
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
Flags: needinfo?(deian)
Blocks: 1754147
Blocks: 1758626
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: