Closed Bug 1532644 Opened 5 years ago Closed 1 year ago

Create a JavaScript validator that can be run by networking

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
110 Branch
Tracking Status
firefox110 --- fixed

People

(Reporter: annevk, Assigned: sefeng)

References

(Depends on 1 open bug, Blocks 4 open bugs)

Details

(Whiteboard: [orb:m1], [wptsync upstream])

Attachments

(4 files, 3 obsolete files)

In order to block as much "no-cors" cross-origin responses from entering the content process we need a JavaScript validator of sorts. It'll need to be standardized eventually.

This is in particular for responses that lack a valid JavaScript MIME type.

It needs to reject JSON.

I think it can also reject JavaScript modules, as those have to be fetched with CORS and are required to have a correct JavaScript MIME type.

I somewhat suspect we only want to run this on the first 1024 bytes of the response and if we cannot definitely tell at that point we'd block.

Telemetry might be needed to determine potential breakage.

We already have JS_Utf8BufferIsCompilableUnit, and it wouldn't be that tricky to have a UTF-16-ready version of it. But such wouldn't reject JSON that consists of an array literal.

It's a longer-term thing, perhaps, but I think we could get to the point where cross-origin JS loads are required to have a JS MIME type, if we did some evangelism, got in a console warning, worked with other browsers on it, etc.

Given https://github.com/whatwg/fetch/issues/870 (has Chrome data) and bug 1333995 I'm not hopeful about MIME type (only) plans. Even the most optimistic safelist allows MIME types we definitely need to filter. If that were to change to only JavaScript MIME types, this would indeed be obsolete though. (We should skip this filter whenever the response carries a JavaScript MIME type.)

I don't think we need a non-ASCII scanner. At least, I suspect UTF-16 JavaScript is extremely rare. Rejecting JSON is important though.

Exposing the whole JavaScript parser to more a privileged code (at least networking, if we introduce a network process) feels like a risk to me. Is there some subset of JavaScript that we could restrict this to?

We should have something like our syntax parser, but written in a safe language and maybe even automatically generated, considering that nice error messages etc. don't matter for this case.

Agreed with Tom that, to avoid adding a new attack vector, we don't want to move the whole SM parser to the parent process; that we'd want something new and in Rust.

For the task of securing credentialed 3rd party resources, we only need something with a very high probability of rejecting non-JS, non-JSON resources that are being maliciously pulled into the child process so that they can be attacked (via Spectre or native exploit). Now it has to match 100.0% of valid JS, so we can't be too simplistic, but I bet we could come up with something cheap and easy to parse (and have an automated testing flag that ensures it accepts every .js the real parser accepts).

Flags: needinfo?(jorendorff)

OK, this would prevent a compromised content process from using <script> to mine an intranet.

  • Are there any other goals I'm missing?

  • If not, does a specific syntax really need to be standardized? Can't vendors perform extra checking whenever and wherever we want, as a security feature, as long we don't reject standard-compliant JS?

(If web compat permits blocking scripts that start with a [ token, we'd want to standardize that. But otherwise...?)

Flags: needinfo?(jorendorff) → needinfo?(annevk)

The goal is to prevent as many "no-cors" cross-origin bytes from entering the content process as possible, be it via the script, img, audio, or video element, or fetch() in "no-cors" mode. Having a JavaScript validator is part of the solution, but not the whole solution (see the blocking bug).

Given that the blocking happens across various contexts, what exactly we end up blocking will be observable in some manner. Typically differences at that level lead to some kind of exploit. Now, if we also consider the potential ability to read process memory, differences there might end up exposing resources that did not expect to be exposed.

I think we need a variant of https://tc39.github.io/ecma262/#sec-parse-script (no need to support modules). A question is if we want to scan only the first X bytes (with X likely being 1024 as that's our general scanning limit) or the whole resource. Parsing only a subset would complicate the specification (and likely implementation) as you need special handling for when you reach the limit.

Flags: needinfo?(annevk)

OK, just so I understand: "content process" is not a normative part of any standard; it must be that we're going to standardize something like "in such-and-such case, the response is blocked, meaning no information about it is exposed to content, even in error messages". And then maybe a non-normative note mentioning process sandboxing. Is that right?

Unicode: I bet non-ASCII characters show up all the time in comments and strings, at least. So options are:

  • Reject non-ASCII bytes and probably break the web
  • Sniff the encoding (nobody wants this, but)
  • Ignoring all headers, enforce UTF-8 strictly
  • Dodge the encoding issue by treating every non-ASCII byte as U+FFFD just for the purpose of this parse

Suppose we do the last option. Then maybe we want something like:

  1. Assert: sourceBytes is a List of bytes.

  2. Let sourceBytesPrefix be sourceBytes, truncated to length 1024.

  3. Let sourceText be the result of interpreting sourceBytesPrefix as ASCII text, replacing each byte in the range 0-127 with the corresponding Unicode code point, and each other byte with U+FFFD (REPLACEMENT CHARACTER).

  4. If sourceText can be parsed as a Script, and the parse result contains no Early Errors, return true.

  5. If sourceBytes is more than 1024 bytes and sourceText is the prefix of any ECMAScript source text that can be parsed as a Script, producing a parse result that contains no Early Errors, return true; else return false.

For bonus points, convince TC39 to divide ECMAScript's Early Errors into syntactic errors (which can be implemented in a mostly stateless single pass) and semantic errors (which require tracking scopes, bindings, and strict mode); and enforce only the syntactic ones here. It would be nice to automatically generate this thing somehow.

Yeah, the standard would only talk about whether the caller gets a response or a network error, based in part on this algorithm. (At some point we should probably detail the threat model and mitigations in more detail, perhaps in a note as you suggest, but we're still pretty far away from even having the ideal setup for all this in standards.)

That algorithm looks like a great start. The only thing that's missing as far as I can tell is that if it's also valid JSON, we should still reject. So if it starts with an object or array literal that goes beyond the 1024 mark we wouldn't consider it valid.

Priority: -- → P3
Depends on: 1753424
Assignee: nobody → afarre
Blocks: 1773499
Whiteboard: [orb:m1]
Attachment #9283094 - Attachment description: WIP: Bug 1532644 - Create a JavaScript validator that can be run by networking. → WIP: Bug 1532644 - Part 1: Add partial JavaScript validation to ORB.
Assignee: afarre → kmaglione+bmo
Attachment #9283094 - Attachment is obsolete: true
Attachment #9285075 - Attachment is obsolete: true

The initial plan was to validate the first 1024 bytes and make a decision there, however this doesn't seem viable anymore (or needs more time to investigate), see this github issue.

The high level ideal of what we are going to do is we wait for In-Memory Cache Javascript to be done (which is going to be done soon), then we can parse the file into stencil without needing JS context at all in the utility process. After that, we can encode the stencil into bytes and use shmem to share it with the content process, then the content process can decode it back to stencil and run it as needed. We are going to be the first user of using this approach, so expect there will be some issues.

Depends on: 1773319
Severity: normal → S3
Assignee: kmaglione+bmo → sefeng
Depends on: 1795955

I wonder about if the following cases need to be allowed or if they can be potentially rejected for simplicity:

  • Data fails to parse as either JSON or JS
  • A file that is only invalid JSON because of comments
  • A file that isn't strictly valid JSON but otherwise just a literal { unquotedKey: "value" }

Being flexible about this would let us easily hijack the spidermonkey parser (sandboxed to utility process).

Attachment #9284950 - Attachment is obsolete: true

For things that can be parsed as Javascript, we need to figure out
if they are JSON, and we want to block opaque JSON resources for ORB.

This initial version just checks the first byte of the response, and
blocks it if it's a curly bracket.

Attachment #9305706 - Attachment description: WIP: Bug 1532644 - Implement the initial version of the Javascript Validator for ORB → Bug 1532644 - Implement the initial version of the Javascript Validator for ORB r=farre!,smaug!
Blocks: 1804086
Blocks: 1804261
Blocks: 1778135
Blocks: 1804638
Attachment #9305706 - Attachment description: Bug 1532644 - Implement the initial version of the Javascript Validator for ORB r=farre!,smaug! → WIP: Bug 1532644 - Implement the initial version of the Javascript Validator for ORB r=farre!,smaug!
Blocks: 1805228
Attachment #9305706 - Attachment description: WIP: Bug 1532644 - Implement the initial version of the Javascript Validator for ORB r=farre!,smaug! → Bug 1532644 - Implement the initial version of the Javascript Validator for ORB r=farre!,smaug!

If we decide to cancel an request, fetch request needs it to be done before
FetchDriver::OnStartRequest is called, otherwise it'll resolve the
promise and we want it to throw for ORB.

Blocks: 1806501
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f58b82a75c29
Implement the initial version of the Javascript Validator for ORB r=farre,smaug
https://hg.mozilla.org/integration/autoland/rev/e1140838e735
Don't call mNext->OnStartRequest if we are waiting for JS validator to make a decision r=farre,smaug,necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/380433f77fc0
1532644: apply code formatting via Lando

Backed out 3 changesets (Bug 1532644) for causing wpt failures on credentials.sub.html.
Backout link
Push with failures <--> wpt1 <--> wpt7 <--> Bb
Failure Log wpt1
Failure Log wpt7
Failure Log Bb

Flags: needinfo?(sefeng)

dynamic-imports-credentials.sub.html,
dynamic-imports-credentials-setTimeout.sub.html and credentials.sub.html
intends to test the cookie behavior based on various conditions and
ORB shouldn't be a factor here. Currently, it uses "{}" as the
cookie and let's update it to something that is not JSON to keep
the intention of the tests.

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0a736f3ff23c
Implement the initial version of the Javascript Validator for ORB r=farre,smaug
https://hg.mozilla.org/integration/autoland/rev/2943c62bd7a2
Don't call mNext->OnStartRequest if we are waiting for JS validator to make a decision r=farre,smaug,necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/49c1638654d6
Make credentials to use "cookie" rather than "{}" as the cookie r=farre,smaug
https://hg.mozilla.org/integration/autoland/rev/e18eed2287d2
1532644, 1532644: apply code formatting via Lando
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/37691 for changes under testing/web-platform/tests
Whiteboard: [orb:m1] → [orb:m1], [wptsync upstream]
Flags: needinfo?(sefeng)

Backed out for causing wpt failures in /fetch/api/abort/general.any.serviceworker.html

Backout link: https://hg.mozilla.org/integration/autoland/rev/99a46c8cc479f9b3c8e3b072d0e8a0226dfb3ac5

Push with failures

Failure log

Flags: needinfo?(sefeng)
Upstream PR was closed without merging

One of the subtests (Underlying connection is closed when aborting after
receiving response - no-cors) creates an fetch request with infinite
response, and it expects the promise of this fetch will be resolved
while the server is still responding, however, this behavior is
not compatible with ORB.

See https://github.com/annevk/orb/issues/38 for more details.

I am disabling the JS validator for these tests, we could change
the ini file to mark them as TIMEOUT, since the test has
// META: timeout=long, it takes a very long time to timeout.

Once https://github.com/annevk/orb/issues/38 is resolved, we
can enable the JS validator for them.

Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b065f9bcfce6
Implement the initial version of the Javascript Validator for ORB r=farre,smaug
https://hg.mozilla.org/integration/autoland/rev/348cc45c4022
Don't call mNext->OnStartRequest if we are waiting for JS validator to make a decision r=farre,smaug,necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/5e6a95776a53
Make credentials to use "cookie" rather than "{}" as the cookie r=farre,smaug
https://hg.mozilla.org/integration/autoland/rev/b477116c838d
Don't use ORB's JS validator for fetch/api/abort/general.any.js r=smaug
https://hg.mozilla.org/integration/autoland/rev/6a31f66a6b44
1532644, 1532644, 1532644: apply code formatting via Lando
Upstream PR merged by moz-wptsync-bot
Flags: needinfo?(sefeng)
Regressions: 1815962
No longer regressions: 1815962
No longer depends on: 1773319
Regressions: 1818465
Whiteboard: [orb:m1], [wptsync upstream] → [orb:m1], [wptsync upstream][sp3]
Whiteboard: [orb:m1], [wptsync upstream][sp3] → [orb:m1], [wptsync upstream]
Regressions: 1825492
Regressions: 1833110
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: