1732201 - Sandbox woff2 in ots using RLBox

Attachment #9242608 - Attachment description: Bug 1732201 - Part 1: Sandbox woff2 in ots using RLBox → Bug 1732201 - Part 1: Sandbox woff2 in OTS using RLBox

Jonathan Kew [:jfkthame]

Comment 5

•

4 years ago

(In reply to Deian Stefan from comment #3)

This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.

Note that brotli is also used by a few other parts of the code: see https://searchfox.org/mozilla-central/search?q=brotli%2Fdecode.h. So as I understand it, the patch here will mean we have two copies of brotli (including its big static dictionary, https://searchfox.org/mozilla-central/source/modules/brotli/common/dictionary.c), one copy compiled to WASM and the other as native code.

That seems unfortunate; if we're going to WASM-ify brotli for woff2 use, we should also use that version for the HTTP compression code (and the layout inspector code) instead of carrying two "compiled" versions of it.

So for woff2 purposes it seems fine to have brotli in the same sandbox, but I think we should ensure that it is also usable for the other modules that want to access brotli-decode functionality.

Deian Stefan

Assignee

Comment 6

•

4 years ago

Attached file Bug 1732201 - Part 2: Apply RLBox patches to ots and woff2 (obsolete) — Details

Depends on D126435

Phabricator Automation

Updated

•

4 years ago

Attachment #9242609 - Attachment description: Bug 1732201 - Part 2: Turn on Wasm sandbox support for RLBoxed woff2 → Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2

Deian Stefan

Assignee

Comment 7

•

4 years ago

New patches expose brotli to the sandboxed code. We can revisit all of this once we put brotli in a sandbox too :)

Try in the works: https://treeherder.mozilla.org/jobs?repo=try&revision=8d764cb0e60d1a893fe42a9575ffc5e38cb13aeb

Deian Stefan

Assignee

Comment 8

•

4 years ago

AFAICT the try failures are unrelated to these set of patches.

Deian Stefan

Assignee

Updated

•

4 years ago

Comment 9

•

4 years ago

I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?

Flags: needinfo?(jfkthame)

Bobby Holley (:bholley)

Comment 10

•

4 years ago

(In reply to Deian Stefan from comment #9)

I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?

Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.

[1] https://searchfox.org/mozilla-central/rev/f8576fec48d866c5f988baaf1fa8d2f8cce2a82f/dom/media/ogg/OggDemuxer.cpp#156

Deian Stefan

Assignee

Comment 11

•

4 years ago

(In reply to Bobby Holley (:bholley) from comment #10)

Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.

Done.

Flags: needinfo?(jfkthame)

Deian Stefan

Assignee

Comment 12

•

4 years ago

Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?

Flags: needinfo?(shravanrn)

Deian Stefan

Assignee

Comment 13

•

4 years ago

Kicked off new try: https://treeherder.mozilla.org/jobs?repo=try&revision=f81676be8d2797de4e17237bf5817c76cbaa0eae

Deian Stefan

Assignee

Comment 14

•

4 years ago

(In reply to Deian Stefan from comment #12)

Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?

We do. Updated part 1.

Flags: needinfo?(shravanrn)

Deian Stefan

Assignee

Comment 15

•

4 years ago

On windows, we're failing at an invoke here with WASM_RT_TRAP_CALL_INDIRECT_TYPE_MISMATCH errors. Here is an example: https://treeherder.mozilla.org/logviewer?job_id=356511111&repo=try&lineNumber=5931

I got this reproduced locally in a vm, but my ability to use windows is pathetic. @shravan is this something obvious to you? (I'm wondering if it's the same issue we ran into with hunspell but it triggers a bit differently.)

Flags: needinfo?(shravanrn)

Deian Stefan

Assignee

Comment 16

•

4 years ago

•

Edited

Fixed up windows. Issue was rlbox callback with *_t types. (WIP try: https://treeherder.mozilla.org/jobs?repo=try&revision=c38e83380001fb03144a29b3334d50b67365898a&selectedTaskRun=KRBbrcypR3S45aCAYgu_1A.0) And just kicked off a try fuzzy to be sure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6c88e8f7dd1a22e80a682d092ec9e7218ccac77

Flags: needinfo?(shravanrn)

Jonathan Kew [:jfkthame]

Comment 17

•

4 years ago

•

Edited

To see what impact this might have on performance, I did a local build with instrumentation added around the call to SanitizeOpenTypeData that is part of loading a webfont resource. I collected the list of resource sizes and time spent in SanitizeOpenTypeData when visiting the Google Fonts home page, which loads fairly small subsets of a number of fonts, and also the Source Han Serif page on TypeKit, which includes some larger resources.

Results (from a MacBook Pro) are shown at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing, sorted by WOFF2 resource size, which compares the time taken with a current mozilla-central build with the time when the patches here are applied. (In both cases this is using a default Opt -- but not PGO -- build.)

Although the figures vary quite a bit (and would doubtless show variability across multiple runs), it's clear that there is a noticeable overhead as a result of the sandboxing/wasm-translation; the total time taken to decode the 50-plus resources is about 30% more. However, note that for small resources the per-font decoding time is typically less than a millisecond, so the 30% -- or even 50% or 100% -- regression is scarcely going to be noticeable among all the work of loading the page. Larger font resources, in the 100K-plus size range, may show a regression of several milliseconds, but these are resources that (on my connection, at least) have spent dozens or hundreds of ms downloading, so in terms of the overall webfont-loading process, the regression in decoding speed seems pretty minor to me.

Bobby Holley (:bholley)

Comment 18

•

4 years ago

Agree that the overhead is probably acceptable. Could you post a profile of a 100x WOFF2 loop so that we can see if there are any obvious hotspots we could optimize further?

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 19

•

4 years ago

Updated the sheet at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing to have numbers from a shippable build, and wrapped the WOFF2-decompression in a 100-iteration loop to make it easier to see things.

What's clear from these numbers -- where small resources show a huge regression, percentage-wise (even though it's less than a millisecond per iteration), while larger resources show a much smaller percentage -- is that the overhead of setting up the sandbox each time is quite significant on small resources.

This can also be clearly seen in a profile: https://share.firefox.dev/3BSsEJk is a profile captured while loading https://fonts.google.com, which displays small subsets of a number of fonts. create_sandbox and destroy_sandbox (to a lesser extent) show up as dominating the processing of small font resources.

Given that small font resources are very common on the web, I think we should reconsider the strategy of spinning up a new sandbox for each call. Maybe we could make it thread-local, so that when a given thread is asked to process several resources in succession, it doesn't pay the overhead of a fresh sandbox every time?

Flags: needinfo?(jfkthame)

Bobby Holley (:bholley)

Comment 20

•

4 years ago

The downside of minting thread-local sandboxes is that the sandboxes allocate a fair amount of memory (~325k I think), and so it's not good for browser memory usage to just have them sitting around.

One strawman idea would be the following:

We have a shared (locked) nsTArray of spare sandboxes
When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.
When a thread is done with its sandbox, it pushes it back to the spares.
When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.

Jonathan, does that seem like it would strike the right balance based on how this stuff gets used?

Shravan Narayan

Comment 21

•

4 years ago

•

Edited

My 2 cents - either option could work.

I think thread locals can work fine, but I think we would want to the same, destroy sandbox if unused for 3 seconds sort of behavior even on thread locals, so that we don't leave sandboxes hanging around. Specifically, it would have to operate like a a shared_pointer thread_local, but the destructor would need to delay execution by 3 seconds.

Fwiw, we did try the sandbox pool approach in our original paper and it worked quite well when we measured performance of sandboxing zlib, libjpeg and other very commonly used libraries.

I'll let both of you make the final call here.

Jonathan Kew [:jfkthame]

Comment 22

•

4 years ago

(In reply to Bobby Holley (:bholley) from comment #20)

One strawman idea would be the following:

We have a shared (locked) nsTArray of spare sandboxes

When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.

When a thread is done with its sandbox, it pushes it back to the spares.

When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.

That sounds like it'd be reasonable. Generally, if there's just a single font resource involved in a given page-load or reflow, the extra millisecond or so to set up a sandbox is pretty insignificant; the concern would be when reflow ends up requesting a dozen separate resources, so as long as we get to (largely) re-use sandboxes in that case we should be fine.

Using thread locals seems nice in that it avoids any need for the (lock-protected) array, and the total number of sandboxes automatically matches the number of threads using them (aside: I'm looking into whether bumping up the background-thread pool count may be helpful). But I don't have a strong view here; AFAICS either approach should work pretty much equally well, so I'm happy for the implementer to choose which way to do it. :)

Deian Stefan

Assignee

Comment 23

•

4 years ago

Thread pool support added. Pretty much the strawman implementation, except we don't always clear the whole array.

Try auto: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7ba732f3f16759a46da4333871beb907885a97f

Deian Stefan

Assignee

Updated

•

4 years ago

Depends on: 1740624

Deian Stefan

Assignee

Comment 24

•

4 years ago

Kicked off try with the new threadpool: https://treeherder.mozilla.org/jobs?repo=try&revision=7a475c93cad9dd0388e6671ce2696ae3acfdf265

Deian Stefan

Assignee

Updated

•

4 years ago

Component: Graphics → Security: RLBox

Deian Stefan

Assignee

Updated

•

4 years ago

Blocks: 1742916

Phabricator Automation

Comment 25

•

4 years ago

Comment on attachment 9242609 [details]
Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2

Revision D126436 was moved to bug 1742916. Setting attachment 9242609 [details] to obsolete.

Attachment #9242609 - Attachment is obsolete: true

Phabricator Automation

Updated

•

4 years ago

Attachment #9245624 - Attachment is obsolete: true

Phabricator Automation

Updated

•

4 years ago

Attachment #9242608 - Attachment description: Bug 1732201 - Part 1: Sandbox woff2 in OTS using RLBox → Bug 1732201 - Sandbox woff2 in OTS using RLBox

Deian Stefan

Assignee

Comment 26

•

4 years ago

•

Edited

https://treeherder.mozilla.org/jobs?repo=try&revision=4d43d8c5d461ce6b74f376fbce29cc17873fb192

Deian Stefan

Assignee

Updated

•

4 years ago

Depends on: 1743023

Deian Stefan

Assignee

Comment 27

•

4 years ago

New try auto: https://treeherder.mozilla.org/jobs?repo=try&revision=69bd626a47ded820f007b847b18602510e789972

Pulsebot

Comment 28

•

4 years ago

Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d486edc7499b Sandbox woff2 in OTS using RLBox r=bholley

Cristian Tuns

Comment 29

•

4 years ago

Backed out for causing web-platform-tests failures on header-totalsfntsize-001.xht

Backout link
Push with failures
Failure Log
Failure line: TEST-UNEXPECTED-FAIL | /css/WOFF2/header-totalsfntsize-001.xht | assertion count 1 is more than expected 0 assertions

Flags: needinfo?(deian)

Pulsebot

Comment 30

•

4 years ago

Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dcd88b342c8c Sandbox woff2 in OTS using RLBox r=bholley

Andreea Pavel [:apavel]

Comment 31

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/dcd88b342c8c

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox96: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 96 Branch

Deian Stefan

Assignee

Updated

•

3 years ago

Flags: needinfo?(deian)

Mike Hommey [:glandium]

Updated

•

3 years ago

Blocks: 1754147

Mike Hommey [:glandium]

Updated

•

3 years ago

Blocks: 1758626

Bug 1732201 - Sandbox woff2 in OTS using RLBox 4 years ago Deian Stefan 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2 4 years ago Deian Stefan 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1732201 - Part 2: Apply RLBox patches to ots and woff2 4 years ago Deian Stefan 48 bytes, text/x-phabricator-request		Details \| Review