Closed Bug 1586217 Opened 2 years ago Closed 2 years ago

Qt based threaded WASM application are not runnable anymore

Categories

(Core :: Javascript: WebAssembly, defect, P1)

71 Branch
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: nagyimre, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36

Steps to reproduce:

https://www.qt.io/web-assembly-example-mandelbrot

Actual results:

Check webconsole, the following error appears:
TypeError: WebAssembly.Memory cannot be cloned in this context

Expected results:

Application should draw a mandelbrot on the screen.
I also affecting other thread based wasm application: It happens with multiple of our internal test applications, thus it is believed that this change influences all Qt based threaded WASM apps.

Component: Untriaged → Javascript: WebAssembly
Product: Firefox → Core

I think this a DOM bug actually; a permissions object in the DOM code was recently upgraded from allow-cloning-by-default to disallow-cloning-by-default, so as to be multiprocess friendly. I'm guessing that the permission needs to be switched on again for some case. I'll investigate as soon as I can.

Priority: -- → P1

FWIW, Rust folks are reporting the same bugs break their threads demos.

Ah yeah was just about to comment! Our demo at https://rustwasm.github.io/wasm-bindgen/exbuild/raytrace-parallel/ receives the same error, and it looks like postMessage no longer works with a WebAssembly.Memory backed by a SharedArrayBuffer. I wasn't sure if this was an intended change (as it's all unstable after all), but ended up here! In any case was mostly just curious from our end of how to show off a demo, if we need custom headers or something like that we just want to be sure to document it on our end!

Attached file emscripten_test.zip

I was just about to file a bug as well, the emscripten pthreads tests hit this on firefox nightly too.

Looks like there are already a few links with STR, but here's an attached complete testcase in case that's useful.

In Worker::PostMessage() in dom/workers/Worker.cpp:

  JS::CloneDataPolicy clonePolicy;
  if (NS_IsMainThread()) {
    nsGlobalWindowInner* win = nsContentUtils::CallerInnerWindow(aCx);
    if (win && win->CanShareMemory(mWorkerPrivate->AgentClusterId())) {
      clonePolicy.allowSharedMemory();
    }
  } else {
    ...
  }

the win is not null but CanShareMemory() returns false, hence the policy object denies the clone.

... because in CanShareMemory in dom/base/nsGlobalWindowInner.cpp, this test fails:

2401	  if (!StaticPrefs::dom_postMessage_sharedArrayBuffer_withCOOP_COEP()) {

... the immediate cause of which is that this pref is not set in about:config: dom.postMessage.sharedArrayBuffer.withCOOP_COEP, but even with that set we fail, because there is a subsequent test in that same function:

2415	  return bc->Top()->GetOpenerPolicy() ==
2416	         nsILoadInfo::OPENER_POLICY_SAME_ORIGIN_EMBEDDER_POLICY_REQUIRE_CORP;

which fails because

(gdb) p bc->Top()->GetOpenerPolicy()
$5 = (const nsILoadInfo::CrossOriginOpenerPolicy &) @0x7f033ed5536a: nsILoadInfo::OPENER_POLICY_NULL

This makes me think that this is not actually a Firefox bug, but a problem with the test content not sending the correct headers for the document. Firefox will not re-enable SAB for content that does not opt into sandboxing the shared-memory content, so as to avoid side-channel attacks. There's an emerging cross-browser spec for this.

Anne, in a nutshell, can you explain what's required of content providers here?

Blocks: resab
Flags: needinfo?(annevk)

Bug that landed the policy change is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1562663

https://docs.google.com/document/d/1zDlfvfTJ_9e8Jdc8ehuV4zMEu9ySMCiTGMS9y0GU92k/edit has an explainer that's quite detailed. You need to use HTTPS. Top-level documents need these headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Nested documents and workers only need the second header of those. Nested documents and subresources might need Cross-Origin-Resource-Policy: cross-site (or same-site) if they are cross-origin (or cross-origin-but-same-site).

Also, to play with this in Firefox Nightly you need to set these flags:

  • dom.postMessage.sharedArrayBuffer.withCOOP_COEP
  • javascript.options.shared_memory
  • browser.tabs.remote.useCrossOriginOpenerPolicy
  • browser.tabs.remote.useCrossOriginEmbedderPolicy

(I filed https://github.com/mdn/sprints/issues/2219 on getting this all on MDN.)

Flags: needinfo?(annevk)

For the simple ray tracing app, I did not need to change the last two prefs. All that said, once I changed my http server config file to add this:

<Directory /var/www/sandbox/raytrace>
        Header set Cross-Origin-Opener-Policy same-origin
        Header set Cross-Origin-Embedder-Policy require-corp
</Directory>

and set the two prefs, and served everything over https, I could run the raytracing demo that Alex referenced above.

Closing this as WORKSFORME because I think it's working as intended...

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

We'll work on a better error message for this in bug 1562667. Thanks for reporting this, this was a helpful early warning.

Should we add a "Just force them on in a non-standard way useful for testing" pref?

(In reply to Luke Wagner [:luke] from comment #12)

Should we add a "Just force them on in a non-standard way useful for testing" pref?

IMO yes, because it was a little bit of hair to set up Apache to do this (you can't do https with the Python server I think and it doesn't do headers ergonomically either) and this will be a pain point for everyone. It would be good probably to print something in the console every time we rely on that setting though, to remind the dev that this won't work generally.

File a separate bug on that? Ideally with some notion or precedent as to why it's okay to allow enabling security holes through preferences.

(In reply to Anne (:annevk) from comment #14)

File a separate bug on that? Ideally with some notion or precedent as to why it's okay to allow enabling security holes through preferences.

(/me senses a trap, cleverly disguised as sarcasm.)

Personally I think that features that make development faster and less cumbersome are generally good, provided we can control them; and if I were to stop and think about it I would probably conclude that browsers have too few ways to disable security checks for development. (Though I am not a web developer and my opinion counts for not very much.) Restrictions on loading content that are perfectly sensible in the context of the full web are a good example; that I now have to use a full apache instance with an https certificate to test a little JS is a poster child for the problem IMO.

It comes down to how you control the features, I guess. One could imagine that if any of these security-lowering mechanisms were enabled there would be a dialog on startup (at least) prominently stating that essential security features have been disabled and do you really mean to do this?

I'll let somebody else fall on this sword, though.

Maybe make it a Nightly-only pref? That way websites won't be able to coerce Firefox-stable users into setting the pref, for example, but we (and other power users) can still use it.

Just on behalf of web developement side, let me explain why this issue was soo dreadful for us: we are developing and testing applications and offering FF for the customer, we tell them that it is nice, reliable and pukes rainbows whenever the customer wishes. We prepare demos, show off and one morning, this whole stuff just does not work. There is not any clue why. Cloning? Context? It does not ring a bell at all. Looking up documentation? Nothing. Google? Nothing. This topic is ongoing since February of 2019, but this changes are not propagated well to the webdevs side. Even Qt devs were blinking eyes and had no clue what was happening ...

It is a great luck, that it is caught in nightly and we all could adapt the necesseary fixes and updates to cope with this situation. If we do not test with Nightly (since it is said to be unstable by its nature) this change would have gone unnoticed to the stable release, thus all of our customers would face a morning where all of their data are not reacheable anymore.

Although I am thankful for the quick resolution of this case, this also cast fear if this could happen in the future and we are not prepared for it in any way. It is also a special - at least for us - since we are using Qt + WASM framwork on FF (and Chrome) thus we are using bleeding edge from both browsers, we cannot and are not able to wait until the changes are appearing in the stable version.

For all of the above, I would like to ask - or propose - that please add some key information to fatal errors when security is hardened and it requires modification on a 3rd party layer (ex. apache). A bug id or some uniqe documentum id would be enough, we could do the research and the adaptation. I just really ask for not let use crawling around in the dark.

As for lowering the bars for a single JS app, I would not go for it, even if that is tempting and could mean a much more convenient way. Security starts with rote. If you are enforced to set up a secure environment from the first steps, you would understand it and it would be a part of your developement. This is not tons of extra work, even for newbies, but it also require to have an updated documentation of the changes, as I mentioned above.

So, please add some uniqe key to the fatals, and I thank you for the quick work and coverage on this issue.

No trap intended, I'm just unsure. I think it would be great to solve this though.

  1. I don't think we have a "disable secure context requirement" preference at the moment, so to some extent this problem already exists. Only allowing Nightly users to opt out of security is an interesting idea. This is tracked by bug 1409841, roughly.
  2. It's a bit unclear to me what the processing model for this would be. Would we ignore the additional serialization requirement for the SharedArrayBuffer and WebAssembly.Module objects? (Results in a world that is incompatible with the standardized semantics and would not tell you what subresources to update and such. We could maybe still log some things in the console, but at that point there's a lot of additional branching.) Or would we pretend that each document/worker has these headers set in a way that serialization would succeed? (Likely ends up breaking a lot of sites.) Or something more complicated such as per-origin settings?

Of these, it seems there's a reasonable plan for 1, but we'd have to get it prioritized. 2 needs some kind of design input and depending on what we want to convey might be a bit of work, so also would need to be prioritized somehow.

(In reply to Nagy Imre from comment #17)

We prepare demos, show off and one morning, this whole stuff just does not work. There is not any clue why. Cloning? Context? It does not ring a bell at all. Looking up documentation? Nothing. Google? Nothing. This topic is ongoing since February of 2019, but this changes are not propagated well to the webdevs side. Even Qt devs were blinking eyes and had no clue what was happening ...

I agree this is a problem for you, but let me push back a little. Where is the problem? The feature that is being used here - JS/wasm shared memory - is off by default not just in Release but even in Nightly because it is not safe in its current form (or, in the form it had before we added the checks that made your application not work). Everyone who tests or uses this feature by enabling it in Release has (in principle) an unsafe browser. Chrome has chosen to ship it on-by-default because they have stronger process isolation than FF currently, but as far as I know they will start requiring the same safeguards FF Nightly now implements, with HTTPS and headers to opt-in to content that uses shared memory. If you ship products to customers requiring this preference to be flipped in production browsers, you are basically requiring them to lower their security level. The added risk is very, very tiny, probably at the level of the theoretical, but that is still why this feature is off by default.

We can always do better with error messages, and I apologize for this particular failure and the time that was lost to it. I'll file a bug to have it improved, which should not be hard in this case.

It is a great luck, that it is caught in nightly and we all could adapt the necesseary fixes and updates to cope with this situation. If we do not test with Nightly (since it is said to be unstable by its nature) this change would have gone unnoticed to the stable release, thus all of our customers would face a morning where all of their data are not reacheable anymore.

Your customers could also have been attacked by somebody able to engineer a side-channel attack against their browsers.

Essentially the preference that we've exposed does what you're advocating we should not do -- allow security to be lowered by pref. This was a trade-off because we want the feature to be tested by developers, but by making the pref off-by-default we hoped it would be developers only. That seems not to have worked out for us.

See Also: → 1587007

My 2c:

  • There are prefs to enable various in-development features, and those may well be less secure than the far more hardened defaults.
  • Opening about:config already shows an appropriately-scary dialog for users about the risks.
  • Aside from those general issues, this feature is special as the long-term goal - correct me if I am wrong - is for it to work the way it used to (i.e., like it works on Chrome atm) once Firefox's site isolation is complete.
  • Making it harder to test in-development features has downsides (concretely, this is breaking Emscripten's testing setup for Firefox + pthreads, both on CI and when running locally).

That is wrong, Chrome will align with Firefox as lth noted above and this will be a permanent change in "web platform architecture".

Thanks Anne, looks like things have changed dramatically since I last looked at these discussions...

Sounds like this may be somewhat disruptive for developers and users, but I suppose worthwhile for consistency and security. Anyhow, hopefully we can mitigate some of the disruption for users, I'll see what I can do on the Chromium side...

Is this change expected to ride the trains normally, that is, reach beta in some weeks, and stable some weeks after that?

(That will give us an idea of how soon it will affect Emscripten's main Firefox CI, which currently uses beta.)

Bug 1562663 is expected to ride the trains as I understand it.

After more discussion with lth I filed bug 1587394 to add a bypass that will help with non-Release use cases. I don't really have an ETA, but I hope we can add that soon.

I'm glad a setting for the non-Release use cases has been added here. But...

...I'm rather skeptical that adding hard-to-configure static text strings into http header variables is a viable tool for solving any fundamental security problems. I skimmed the document and didn't find myself feeling any safer. (I'm reminded somewhat of JSONP, where a tremendous amount of hassle was set up over getting data, but then scripts were still allowed to run so now you make systems less secure.)

In my case, I'm serving .wasm files off of s3. How am I supposed to set these flags? My understanding is you cannot (correct me if I'm wrong):

https://stackoverflow.com/questions/38949864/how-to-setup-custom-response-header-in-aws-s3

Is there anyone to speak to and lobby about not doing this? :-/ (Or at least holding off until a broader community can audit the question of its necessity, and if it's actually protecting anyone in a way that is proportional to the inconvenience?)

Unfortunately, these headers are necessary to address fundamental security issues related to high-res time and Spectre-like timing attacks. If you read this document (co-written by Firefox and Chrome security/standards folks):
https://docs.google.com/document/d/1zDlfvfTJ_9e8Jdc8ehuV4zMEu9ySMCiTGMS9y0GU92k
You can see the Chrome folks acknowledging the same dangers, implying that they would be adding the same restrictions at some point in time in the future.

Unfortunately, these headers are necessary to address fundamental security issues related to high-res time and Spectre-like timing attacks.

I do see that's what it says. But though I don't want to be too dismissive speaking about an area that is not my specialty (e.g. security), my layman's experience with this pattern of solution is that it usually is just "security theater". Flags and strings with no cryptographic basis aren't very powerful tools. Seems they force people to do things that are obfuscating, and almost certainly less secure than what they are replacing.

(By contrast: let's imagine you told me "we've implemented a way in the browser where your <script> tag supplies a hash along with the library you intend to load...so you can check to make sure the bytes are trusted ones". Then I might feel security was getting tighter in some way that makes sense to me.)

I mentioned JSONP because it's sort of my go-to example when explaining this failure mode. Reading data cross-domains was blocked, when running scripts from remote URLs is not...and the script loading ability can't be turned off for the web to work. Hence the clear and obvious need to exchange data with another domain is jury-rigged as scripts that run code from another domain...just in order to poke the data they wanted in a place they can dig it out.

But a more recent/topical example to what this thread is about: I brought up with @AlonZakai that there is a rule prohibiting web workers from loading cross-origin, even if CORS is enabled for that origin. Yet I need to load threaded libs cross-origin. For today (at least) you can use URL.createObjectURL from a CORS-fetched Blob:

https://github.com/emscripten-core/emscripten/issues/8338

(Or, maybe that's considered a "hack" or a loophole that will be closed...? I don't know--and it seems none of us do. This relates to some of the communication problem hinted at in this thread. 🤐)

Maybe you guys know more than me, I dunno. But I'm still stuck at the "I don't get it" level...and feel a bit like a physicist confronted with a schematic for a big perpetual motion machine and asked to prove why it doesn't work. The burden of proof is on the person to show how they're breaking the laws of physics.

So let me try and be explicit about my needs, here:

I have a WASM library which uses threading/workers I want to host statically. I'm willing to put things on the using page to indicate my comfort with the thing I'm loading (hashes if need be). But I can't necessarily control every header on the statically hosted file. I'm already being asked to do CORS, and in the case of S3 that's doable. These other headers are not, at this time. So I ask you to please find a way to satisfy your security concerns that accepts the validity of this scenario...instead of making me load a multi-megabyte file with URL.createObjectURL.

Please take pity on me...as WASM is a rather large bet for my project 🙈. So I'm hoping this desire is considered worth accommodating! (Or... @NagyImre may agree with the sentiment of: if you want this technology to succeed... "help us, help you."

If you fetch the Wasm resource using CORS, it doesn't need to set the Cross-Origin-Resource-Policy header. Your document and workers will need the Cross-Origin-Opener-Policy/Cross-Origin-Embedder-Policy headers.

As for JSONP, that is what makes the system less secure. Using JSON you don't need to allow a remote script to execute with the authority of your site. And yes, that requires CORS if the resource is cross-origin, as we don't want sites to be able to read arbitrary JSON resources. I can assure you none of the things you mention is security theater. If they were browsers would have long removed them as keeping all that working properly is quite the hassle.

As for JSONP, that is what makes the system less secure.

This response suggests you did not understand my post. Maybe it's my mistake (?). But I'm afraid I communicated at the best level I know how, and I can't do better.

So please re-read the bolded section.

You need to log in before you can comment on or make changes to this bug.